Open
Description
Given a table:
mysql> describe demo.customers;
+------------+-------------+------+-----+-------------------+-----------------------------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+-------------------+-----------------------------------------------+
| id | int(11) | NO | PRI | NULL | |
| first_name | varchar(50) | YES | | NULL | |
| last_name | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| gender | varchar(50) | YES | | NULL | |
| comments | varchar(90) | YES | | NULL | |
| UPDATE_TS | timestamp | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
+------------+-------------+------+-----+-------------------+-----------------------------------------------+
7 rows in set (0.00 sec)
It is valid to run a query with the predicate column not present in the SELECT
criteria:
mysql> SELECT FIRST_NAME, LAST_NAME, GENDER, COMMENTS FROM demo.customers WHERE UPDATE_TS > CURRENT_TIMESTAMP - 100000 ;
+------------+-----------+--------+------------------------------------------------------+
| FIRST_NAME | LAST_NAME | GENDER | COMMENTS |
+------------+-----------+--------+------------------------------------------------------+
| Bibby | Argabrite | Female | Reactive exuding productivity |
| Auberon | Sulland | Male | Organized context-sensitive Graphical User Interface |
| Marv | Dalrymple | Male | Versatile didactic pricing structure |
| Nolana | Yeeles | Female | Adaptive real-time archive |
| Modestia | Coltart | Female | Reverse-engineered non-volatile success |
+------------+-----------+--------+------------------------------------------------------+
5 rows in set (0.00 sec)
However the JDBC connector throws an error if the timestamp.column.name
is not included in the SELECT
.
Config:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "jdbc_source_mysql_zz",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://mysql:3306/demo",
"connection.user": "connect_user",
"connection.password": "asgard",
"topic.prefix": "mysql-zz-",
"mode":"timestamp",
"query":"SELECT FIRST_NAME, LAST_NAME, GENDER, COMMENTS FROM demo.customers",
"timestamp.column.name": "UPDATE_TS",
"validate.non.null": false
}
}'
Error:
[2019-01-04 17:15:20,005] ERROR WorkerSourceTask{id=jdbc_source_mysql_zz-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.DataException: UPDATE_TS is not a valid field name
at org.apache.kafka.connect.data.Struct.lookupField(Struct.java:254)
at org.apache.kafka.connect.data.Struct.get(Struct.java:74)
at io.confluent.connect.jdbc.source.TimestampIncrementingCriteria.extractOffsetTimestamp(TimestampIncrementingCriteria.java:211)
at io.confluent.connect.jdbc.source.TimestampIncrementingCriteria.extractValues(TimestampIncrementingCriteria.java:181)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.extractRecord(TimestampIncrementingTableQuerier.java:185)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:309)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:244)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:220)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The JDBC source connector should not mandate that the field is present, if the query can successfully execute without it.
Activity
rhauch commentedon Jan 4, 2019
@rmoff the timestamp value must be included in the SELECT so that we know the timestamp value of each record. I agree that the error message should be a lot more clear about this requirement.
aliasbadwolf commentedon Mar 6, 2019
@rmoff The final query that connector would fire will be:
SELECT FIRST_NAME, LAST_NAME, GENDER, COMMENTS FROM demo.customers where UPDATE_TS > '<last_saved_offset_timestamp> || 1970-01-01 00:00:00'
The important aspect is that the timestamp column or increment column actually build the WHERE clause. See this class for Timestamp+Incrementing mode (last 3 methods). For other modes there are other classes.
https://github.com/confluentinc/kafka-connect-jdbc/blob/master/src/main/java/io/confluent/connect/jdbc/source/TimestampIncrementingCriteria.java
dinabogdan commentedon Dec 11, 2019
Hi! A possible workaround can be to add the timestamp column in the query, but to exclude from the final result using a
transform
statement like this:where
created_at
is the timestamp column.