Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Primary key issues with MySQL #3308

Open
1 of 2 tasks
CodyPin opened this issue May 8, 2024 · 6 comments
Open
1 of 2 tasks

[Bug] Primary key issues with MySQL #3308

CodyPin opened this issue May 8, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@CodyPin
Copy link

CodyPin commented May 8, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.7.0

Compute Engine

Flink 1.17.2

Minimal reproduce step

Not sure. The problem is inconsistent, on some MySQL server it won't happen, on some it will filter out most of the tables

What doesn't meet your expectations?

To load all the tables mentioned in the argument 'including_tables'

Anything else?

Some details:
I have checked and tried most of the results online, check for database permission, table permission, etc.
I have also tried loading all tables, but it will also says it 'filtered out of table capturing'. I have looked into Flink, Ververica, and Paimon's code but I still have no idea. And I don't seem to be able to find anyone else with this problem.

Below I am trying to access some tables from the database 'littlemall', those tables are:

  • order
  • order_detail
  • payment_record
  • store
  • user
  • order_detail_product_item
  • order_detail_delivery_fee_item
  • order_audit_log
  • order_detail_promotion_item
  • sku_list
  • payment_refund_record
  • mall_category
  • store_category

But the end result is that Flink was only able to capture two tables, namely sku_list and user
firefox_7pI30zDhi6

I have cropped some logs and screenshots to provide more information, please let me know if you need anything additional or if you have any idea at all!

Args:
2024-05-08 08:44:25,714 INFO org.apache.paimon.flink.action.ActionFactory [] - mysql_sync_database job args: --warehouse hdfs://hktv-cluster/core/data-stream/flink/data/paimon/warehouse --database littlemall --ignore_incompatible true --mysql_conf hostname=<hostname> --mysql_conf port=3306 --mysql_conf username=flink-cdc --mysql_conf password=**** --mysql_conf database-name=littlemall --mysql_conf start-server-id=7400 --mysql_conf server-time-zone=Asia/Hong_Kong --including_tables order|order_detail|payment_record|store|user|order_detail_product_item|order_detail_delivery_fee_item|order_audit_log|order_detail_promotion_item|sku_list|payment_refund_record|mall_category|store_category --catalog_conf metastore=filesystem --catalog_conf case-sensitive=false --table_conf bucket=4 --table_conf changelog-producer=input

MySQL CLI to confirm user have permission to read and are base table:
firefox_8pBEYpUKLP
firefox_JHhDG7b6fv

firefox_BxtleAqmC8

Tables being filtered out:

2024-05-08 08:44:32,440 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - Read list of available tables in each database
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.TASK_EXECUTION' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.TASK_EXECUTION_PARAMS' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.TASK_LOCK' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.TASK_SEQ' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.TASK_TASK_BATCH' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_area' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_district' is filtered out of table capturing
2024-05-08 08:44:32,443 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_estate' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_estate_flat' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_estate_floor' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_street' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_street_number' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.address_street_number_2_estate' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.delivery_provider' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.delivery_provider_store_delivery' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.delivery_provider_store_pickup_point' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.delivery_provider_third_party' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.mall_category' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.mall_landing_component' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.oauth2_authorization' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.oauth2_authorization_consent' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.oauth2_registered_client' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order_audit_log' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Resolved ResourceManager address, beginning registration
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order_detail' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order_detail_delivery_fee_item' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order_detail_product_item' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.order_detail_promotion_item' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.payment_record' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.payment_refund_record' is filtered out of table capturing
2024-05-08 08:44:32,444 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.pending_refund_and_refill_stock' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.shipment' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.shipment_status_log' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.shipment_third_party' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 including table 'littlemall.sku_list' for further processing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store_category' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store_delivery_integration' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store_landing_component' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store_mall_category_mapping' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.store_static_page' is filtered out of table capturing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 including table 'littlemall.user' for further processing
2024-05-08 08:44:32,445 INFO  com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - 	 'littlemall.user_membership' is filtered out of table capturing
2024-05-08 08:44:32,512 INFO  io.debezium.jdbc.JdbcConnection                              [] - Connection gracefully closed

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@CodyPin CodyPin added the bug Something isn't working label May 8, 2024
@CodyPin
Copy link
Author

CodyPin commented May 9, 2024

I have also tried paimon-flink-action-0.8, specifically paimon-flink-action-0.8-20240507.002037-81.jar from https://repository.apache.org/content/groups/snapshots/org/apache/paimon/paimon-flink-action/0.8-SNAPSHOT/paimon-flink-action-0.8-20240507.002037-81.jar but it just have the same effects. Maybe its something to do with table's metadata? Just guessing at this point

@MOBIN-F
Copy link
Contributor

MOBIN-F commented May 9, 2024

Does the ignored table have a schema change? You can set --ignore-incompatible false to check whether the table has schema changes, or check the log for "This table will be ignored".
paimon Cdc Ingestion supports a limited number of schema changes
@CodyPin

@CodyPin
Copy link
Author

CodyPin commented May 9, 2024

@MOBIN-F Thanks for the reply, this is my first try loading this database, so I guess it wouldn't matter if any of the tables has a schema change? I am not sure, in any case those table's schema haven't been touched since they were created. But I set the --ignore-incompatible for testing just to see if there would be any difference, but no, removing --ignore-incompatible or setting it to true doesn't have any difference.

And just to give more background, initially I was doing this with Flink and sinking into Iceberg, which would be able to get all the tables required, thus leading me to think it might be a Paimon issue either than Flink

@CodyPin
Copy link
Author

CodyPin commented May 9, 2024

On futher investigation, using the query
show KEYS from <tablename> where Key_name = 'PRIMARY';
on the tables that got included, namely user and sku_list shows this
firefox_aDVDCyceji
firefox_tzV4e1xere

But on those who got filtered out, for example store and payment_record, it reutrns nothing:
firefox_oPP2Vc5Xn7
firefox_ff3m4hgh1S

Even tho when we query describe store shows that the table does indeed have a primary key:
firefox_bLKW9TK5n0

Turns out, when the tables were created, it didn't specify the primary key, but was instead later added using the query
ALTER TABLE store ADD PRIMARY KEY (id); . This leads to the problem that it might appears that the table does not have a primary key, like when using the query show create table store; will show this:
firefox_SpXNHLZxDb
showing that it does not have a primary key.

With that said, althought the tables clearly have a primary key, it might appares as it does not depending on how we are checking for its existence, and of course Paimon only synchronizes tables with primary keys. So I think this is a bug in the wording of the documents, but would be a really nice feature if it would include tables that have a primary key no matter if its created with or altered later on.
@MOBIN-F what do you think?

@MOBIN-F
Copy link
Contributor

MOBIN-F commented May 10, 2024

I reproduced the problem, and the reason is as stated in the official documentation: If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.

CREATE TABLE test_implicit_pk (
id bigint(20) NOT null,
name varchar(255) NOT NULL,
school varchar(50) DEFAULT NULL,
UNIQUE key uk_sta_id (id,name),
UNIQUE KEY uk_sta (school)
) ENGINE=InnoDB

desc test_implicit_pk

Field |Type |Null|Key|Default|Extra|
------+------------+----+---+-------+-----+
id |bigint(20) |NO |PRI| | |
name |varchar(255)|NO |PRI| | |
school|varchar(50) |YES |UNI| | |

actual primary key information exists

show create table test_implicit_pk

CREATE TABLE `test_implicit_pk` (
   `id` bigint(20) NOT NULL,
   `name` varchar(255) NOT NULL,
   `school` varchar(50) DEFAULT NULL,
   UNIQUE KEY `uk_sta_id` (`id`,`name`),
   UNIQUE KEY `uk_sta` (`school`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

No primary key information is displayed

In this case, the primary key information cannot be obtained using metaData.getPrimaryKeys. Instead, we need to use metaData.getIndexInfo and metaData.getColumns to determine the primary key
what do you think? @JingsongLi @yuzelin

@CodyPin CodyPin changed the title [Bug] Wrongly filtering out some tables with mysql_sync_database when using Flink CDC from MySQL [Bug] Primary key issues with MySQL May 10, 2024
@yuzelin
Copy link
Contributor

yuzelin commented May 11, 2024

In this case, the primary key information cannot be obtained using metaData.getPrimaryKeys. Instead, we need to use metaData.getIndexInfo and metaData.getColumns to determine the primary key

Thanks for your suggestion @MOBIN-F , I will check it later.

@yuzelin yuzelin self-assigned this May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants