Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

bnestere · 2024-04-18T18:26:06Z

If pseudo_thread_id is set to 0 (or implicitly set to 0 because it was attempted to be set to a negative value), then the thread_id attribute of the Gtid_log_event will not be written at all, whereas the Query_log_event still will write thread_id=0. The underlying problem was that the
FL_EXTRA_THREAD_ID would only be added if thread_id was greater than 0.

This patch fixes this by changing the condition to mark the FL_EXTRA_THREAD_ID when generating a GTID log event, such that it will always be written on the primary, and if on the slave, the rpl_group_info's gtid_ev_flags_extra is checked to see if the flag was present when reading it in.

Under terms of MDEV 27490 we'll add support for non-BMP identifiers and upgrade casefolding information to Unicode version 14.0.0. In Unicode-14.0.0 conversion to lower and upper cases can increase octet length of the string, so conversion won't be possible in-place any more. This patch removes virtual functions performing in-place casefolding: - my_charset_handler_st::casedn_str() - my_charset_handler_st::caseup_str() and fixes the code to use the non-inplace functions instead: - my_charset_handler_st::casedn() - my_charset_handler_st::caseup()

…r U+0700..U+07FF New tests display additional information about characters from the BMP range: - A summary with a COUNT(*) for all distinct combinations of properties telling how the "=" and the "LIKE" predicates compare characters to their LOWER() and UPPER() variants. - A detailed list of trciky characters for which the "=" and the "LIKE" predicates compare LOWER(c)/UPPER(c) variants as not equal to just "c". Tricky characters include: - Turkish letters: ı - small dotless letter i - Croatian letters: precombined contractions for Dž, Dz, Lj, Nj - Units of measurement: Ω,K,Å (Ohm, Kelvin, Angstrom) These ones look very similar to Greek letter Omega, Latin letter Kra, Swedish/Finnish letter A with a ring above.

…mp_unicode_impl() This is a refactoring patch, it does not change the behaviour. The MTR tests are being added only to cover the LIKE predicate better. (these tests should have been added earlier under terms of MDEV 9711). This patch does not need its own specific MTR tests. Moving the duplicate code into a new shared file ctype-wildcmp.inl and including it from multiple places, to define the following functions: - my_wildcmp_uca_impl(), in ctype-uca.c For utf8mb3, utf8mb4, ucs2, utf16, utf32, using cs->cset->mb_wc(). For UCA based collations. - my_wildcmp_mb2_or_mb4_general_ci_impl(), in ctype-ucs2.c: For ucs2, utf16, utf32, using cs->cset->mb_wc(). For general_ci-style collations: - xxx_general_ci - xxx_general_mysql500_ci - xxx_general_nopad_ci - my_wildcmp_mb2_or_mb4_bin_impl(), in ctype-ucs2.c: For ucs2, utf16, utf32, using cs->cset->mb_wc(). For _bin collations: - xxx_bin - xxx_nopad_bin - my_wildcmp_utf8mb3_general_ci_impl(), in ctype-utf8.c Optimized for utf8mb3, using my_mb_wc_utf8mb3_quick(). For general_ci-style collations: - utf8mb3_general_ci - utf8mb3_general_mysql500_ci - utf8mb3_general_nopad_ci - my_wildcmp_utf8mb4_general_ci_impl(), in ctype-utf8.c Optimized for utf8mb4, using my_mb_wc_utf8mb4_quick(). For general_ci-style collations: - utf8mb4_general_ci - utf8mb4_general_nopad_ci

This commit removes the WITH_SSL=<custom_location_of_openssl> option, leaving only -DWITH_SSL=bundled/system. The rationale behind this removal is as follows: 1. The WITH_SSL=<custom_location_of_openssl> option is obscure and not widely used. 2. There is no added value in this option compared to using OPENSSL_ROOT_DIR. In fact, the availability of "helpful" MySQL options might discourage users from exploring proper CMake options independently. 3. Users may incorrectly assume full MySQL compatibility even with this option, including undocumented behaviors such as MySQL's preference for static libraries with WITH_SSL=<custom_location_of_openssl>. This change simplifies the configuration options and encourages users to adopt more standardized and documented practices.

sql_sequence.h:233:19: runtime error: signed integer overflow: -9223372036854775808 + -1 cannot be represented in type 'long long int' followup for 374783c

temporarily disable view protocol

also, don't require -DWITH_SSL=system if OPENSSL_ROOT_DIR is specified

The corresponding table param was deprecated as part of MDEV-28861

…ds() This avoids non-integral types breaking the call of sequence_structure().

Add doxygen markup so comments get picked up. Also fix minor typos and expand documentation where relevant. Signed-off-by: Trevor Gross <tmgross@umich.edu>

It was wrong to derive Item_func_uuid from Item_func_sys_guid, because the former is a function returning the UUID data type, while the latter is a string function returning VARCHAR. As a result of the wrong hierarchy, Item_func_uuid erroneously derived Item_str_func::fix_fields(), which contains this code: /* In Item_str_func::check_well_formed_result() we may set null_value flag on the same condition as in test() below. */ if (thd->is_strict_mode()) set_maybe_null(); This code is not relevant to UUID() at all. A simple fix would be to set_maybe_null(false) in Item_func_uuid::fix_length_and_dec(). However, it'd fix only exactly this single consequence of the wrong class hierarchy, and similar bugs could appear again in the future. Moreover, we're going to add functions UUIDv4() and UUIDv7() soon (in 11.6). So it's better to fix the class hierarchy in the right way before adding these new functions. Fix: - Adding a new abstract class Item_fbt_func in the template in sql_type_fixedbin.h - Deriving Item_typecast_fbt from Item_fbt_func - Deriving Item_func_uuid from Item_fbt_func - Adding a new helper class UUIDv1. It derives from UUID, and additionally initializes the value to "UUID version 1" right in the constructor. Note, the new coming soon SQL functions UUIDv4() and UUIDv7() will also have corresponding classes UUIDv4 and UUIDv7. So now UUID() is a pure "returning UUID" function, like CAST(expr AS UUID) used to be, without any unintentional artifacts of functions returning VARCHAR/TEXT. Cleanup: - Removing the member Item_func_sys_guid::with_dashes, as it's not needed any more: * Item_func_sys_guid now does not have any descendants any more * Item_func_sys_guid::val_str() itself always displays without dashes

We add an extra condition that makes the inequality testing in SEQUENCE::increment_value() mathematically watertight, and we cast to and from unsigned in potential underflow and overflow addition and subtractions to avoid undefined behaviour. Let's start by distinguishing between c++ expressions and mathematical expressions. by c++ expression I mean an expression with the outcome determined by the compiler/runtime. by mathematical expression I mean an expression whose value is mathematically determined. So a c++ expression -9223372036854775806 - 1000 at worst can evaluate to any value due to underflow. A mathematical expression -9223372036854775806 - 1000 evaluates to -9223372036854776806. The problem boils down to how to write a c++ expression equivalent to an mathematical expression x + y < z where x and z can take any values of long long int, and y < 0 is also a long long int. Ideally we want to avoid underflow, but I'm not sure how this can be done. The correct c++ form should be (x + y < z || x < z - y || x < z). Let M=9223372036854775808 i.e. LONGLONG_MAX + 1. We have -M < x < M - 1 -M < y < 0 -M < z < M - 1 Let's consider the case where x + y < z is true as a mathematical expression. If the first disjunct underflows, i.e. the mathematical expression x + y < -M. If the arbitrary value resulting from the underflow causes the c++ expression to hold too, then we are done. Otherwise we move onto the next expression x < z - y. If there's no overflow in z - y then we are done. If there's overflow i.e. z - y > M - 1, and the c++ expression evals to false, then we are onto x < z. There's no over or underflow here, and it will eval to true. To see this, note that x + y < -M means x < -M - y < -M - (-M) = 0 z - y > M - 1 means z > y + M - 1 > - M + M - 1 = -1 so x < z. Now let's consider the case where x + y < z is false as a mathematical expression. The first disjunct will not underflow in this case, so we move to (x < z - y). This will not overflow. To see this, note that x + y >= z means z - y <= x < M - 1 So it evals to false too. And the third disjunct x < z also evals to false because x >= z - y > z. I suspect that in either case the expression x < z does not determine the final value of the disjunction in the vast majority cases, which is why we leave it as the final one in case of the rare cases of both an underflow and an overflow happening. Here's an example of both underflow and overflow happening and the added inequality x < z saves the day: x = - M / 2 y = - M / 2 - 1 z = M / 2 x + y evals to M - 1 which is > z z - y evals to - M + 1 which is < x We can do the same to test x + y > z where the increment y is positive: (x > z - y || x + y > z || x > z) And the same analysis applies to unsigned cases.

We do this by checking server status. By doing this we avoid printing session tracking info from previous (but not the last) statement. The change is from Sergei Golubchik

Values of all session tracking system variables will be sent in the first ok packet upon connection after successful authentication. Also updated mtr to print session track info on connection (h/t Sergei Golubchik) so that we can write mtr tests for this change.

Support index condition pushdown within partitioned tables. - ha_partition will pass the pushed index condition into all of the used partitions. - We require that all of the partitions to handle the pushed index condition in the same way. - When using ICP, one may read rows (e.g. call h->index_read_map(buf, ...) only to buf= table->record[0], for two reasons: * Pushed index condition's Item_field objects point into record[0] * InnoDB requires this: it calls offset() which assumes record[0]. So, when using ICP, ha_partition will read partition records to table->record[0] and then will copy record away if it needs it to be elsewhere.

Add assertions about limitations one has when using Index Condition Pushdown: - add handler::assert_icp_limitations() - call this function from functions that may attempt violations. Verified that assert_icp_limitations() as well as calls to it are compiled away in release build.

This patch also fixes: MDEV-33050 Build-in schemas like oracle_schema are accent insensitive MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0 MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0 MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0 MDEV-33088 Cannot create triggers in the database `MYSQL` MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0 MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0 MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0 MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0 - Removing the virtual function strnncoll() from MY_COLLATION_HANDLER - Adding a wrapper function CHARSET_INFO::streq(), to compare two strings for equality. For now it calls strnncoll() internally. In the future it will turn into a virtual function. - Adding new accent sensitive case insensitive collations: - utf8mb4_general1400_as_ci - utf8mb3_general1400_as_ci They implement accent sensitive case insensitive comparison. The weight of a character is equal to the code point of its upper case variant. These collations use Unicode-14.0.0 casefolding data. The result of my_charset_utf8mb3_general1400_as_ci.strcoll() is very close to the former my_charset_utf8mb3_general_ci.strcasecmp() There is only a difference in a couple dozen rare characters, because: - the switch from "tolower" to "toupper" comparison, to make utf8mb3_general1400_as_ci closer to utf8mb3_general_ci - the switch from Unicode-3.0.0 to Unicode-14.0.0 This difference should be tolarable. See the list of affected characters in the MDEV description. Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters! Unlike utf8mb4_general_ci, it does not treat all BMP characters as equal. - Adding classes representing names of the file based database objects: Lex_ident_db Lex_ident_table Lex_ident_trigger Their comparison collation depends on the underlying file system case sensitivity and on --lower-case-table-names and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci. - Adding classes representing names of other database objects, whose names have case insensitive comparison style, using my_charset_utf8mb3_general1400_as_ci: Lex_ident_column Lex_ident_sys_var Lex_ident_user_var Lex_ident_sp_var Lex_ident_ps Lex_ident_i_s_table Lex_ident_window Lex_ident_func Lex_ident_partition Lex_ident_with_element Lex_ident_rpl_filter Lex_ident_master_info Lex_ident_host Lex_ident_locale Lex_ident_plugin Lex_ident_engine Lex_ident_server Lex_ident_savepoint Lex_ident_charset engine_option_value::Name - All the mentioned Lex_ident_xxx classes implement a method streq(): if (ident1.streq(ident2)) do_equal(); This method works as a wrapper for CHARSET_INFO::streq(). - Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name" in class members and in function/method parameters. - Replacing all calls like system_charset_info->coll->strcasecmp(ident1, ident2) to ident1.streq(ident2) - Taking advantage of the c++11 user defined literal operator for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h) data types. Use example: const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column; is now a shorter version of: const Lex_ident_column primary_key_name= Lex_ident_column({STRING_WITH_LEN("PRIMARY")});

it's a pointer into the net buffer, so it might be overwritten by the next read or write. And the next plugin switch (in multi-auth) will try to compare it (in send_plugin_request_packet) which is normally harmless but fails the assert with Lex_ident::is_valid_ident()

New option works just like --tab, wrt output (sql file for table definition and tab-separated for data, same options, e.g --parallel) Compared to --tab it allows --databases and --all-databases. When --dir is used , it creates directory structure in the output directory, pointed to by --dir. For every database to be dumped, there will be a directory with database name. All options that --tab supports, are also supported by --dir, in particular --parallel

COM_STMT_BULK_STMT new flag to server to returns all unitary results

it no longer supports TLSv1.0

C/C 3.4 disables mysql_old_password by default, so add an option for the `connect` command to support specifying allowed authentication plugins (MARIADB_OPT_RESTRICTED_AUTH). use it to enable mysql_old_password when needed for testing

… to fail This was caused by the patch for MDEV-32567 Remove thr_alarm from server codebase

This patch augments Gtid_log_event with the user thread-id. In particular that compensates for the loss of this info in Rows_log_events. Gtid_log_event::thread_id gets visible in mysqlbinlog output like #231025 16:21:45 server id 1 end_log_pos 537 CRC32 0x1cf1d963 GTID 0-1-2 ddl thread_id=10 as 64 bit unsigned integer. While the size of Gtid event has grown by 8-9 bytes replication from OLD <-> NEW is not affected by it. This patch also slightly changes the logic to convert Gtid events to Query events for older replicas which don't support Gtid. Instead of hard-coding the padding of the sys var section of the generated Query event, the length to pad is dynamically calculated based on the length of the Gtid event. This work was started by the late Sujatha Sivakumar. Brandon Nesterenko took it over, reviewed initial patches and extended the work. Reviewed-by: ============= Andrei Elkin <andrei.elkin@mariadb.com> Kristian Nielsen <knielsen@knielsen-hq.org>

…et to 0 or negative If pseudo_thread_id is set to 0 (or implicitly set to 0 because it was attempted to be set to a negative value), then the thread_id attribute of the Gtid_log_event will not be written at all, whereas the Query_log_event still will write thread_id=0. The underlying problem was that the FL_EXTRA_THREAD_ID would only be added if thread_id was greater than 0. This patch fixes this by changing the condition to mark the FL_EXTRA_THREAD_ID when generating a GTID log event, such that it will always be written on the primary, and if on the slave, the rpl_group_info's gtid_ev_flags_extra is checked to see if the flag was present when reading it in. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>

In accordance with MDEV-15089, and to be consistent with Query_log_event::thread_id, the Gtid_log_event::thread_id should also be 32-bit when writing/reading to the binary log.

CLAassistant · 2024-04-23T15:32:33Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 9 committers have signed the CLA.

✅ vaintroub
✅ tgross35
✅ spetrunia
✅ bnestere
❌ abarkov
❌ vuvova
❌ DaveGosselin-MariaDB
❌ montywi
❌ mariadb-YuchenPei
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

abarkov and others added 21 commits February 28, 2024 22:20

main.information_schema-big is neither big nor not_embedded

2be3b8e

MDEV-28152 fix UBSAN error

473ee85

sql_sequence.h:233:19: runtime error: signed integer overflow: -9223372036854775808 + -1 cannot be represented in type 'long long int' followup for 374783c

wsrep.wsrep_provider_plugin_defaults update to 26.4.17

d0c47cd

MDEV-31531 fix wsrep tests after 929c2e0

f10805f

MDEV-33696 main.dyncol and ctype_unicode_casefold_bmp.inc in --view

71d9272

temporarily disable view protocol

re-enable WITH_SSL as a backward-compatibility shortcut

fcd7ae7

also, don't require -DWITH_SSL=system if OPENSSL_ROOT_DIR is specified

MDEV-31789 Deprecate spider_casual_read

593392b

The corresponding table param was deprecated as part of MDEV-28861

MDEV-33739 Check field type of the first field in check_sequence_fiel…

aba03ee

…ds() This avoids non-integral types breaking the call of sequence_structure().

Update doxygen annotations in plugin.h

0c0db46

Add doxygen markup so comments get picked up. Also fix minor typos and expand documentation where relevant. Signed-off-by: Trevor Gross <tmgross@umich.edu>

MDEV-31609 mtr: only print session tracking from the last statement

e32736e

We do this by checking server status. By doing this we avoid printing session tracking info from previous (but not the last) statement. The change is from Sergei Golubchik

bnestere added the MariaDB Corporation label Apr 18, 2024

bnestere requested review from knielsen and andrelkin April 18, 2024 18:26

bnestere assigned bnestere and andrelkin Apr 18, 2024

vaintroub and others added 4 commits April 18, 2024 21:07

MDEV-30366 Permit bulk implementation to return ALL individual results

ea6975b

COM_STMT_BULK_STMT new flag to server to returns all unitary results

fix SSL tests for the new C/C 3.4

594bd86

it no longer supports TLSv1.0

small cleanup: mysqltest

5d74e43

vuvova and others added 5 commits April 22, 2024 14:59

mysqltest: support MARIADB_OPT_RESTRICTED_AUTH

3f91821

C/C 3.4 disables mysql_old_password by default, so add an option for the `connect` command to support specifying allowed authentication plugins (MARIADB_OPT_RESTRICTED_AUTH). use it to enable mysql_old_password when needed for testing

Fixed bug in semi-sync that caused rpl.rpl_semi_sync_slave_reply_fail…

fe42938

… to fail This was caused by the patch for MDEV-32567 Remove thr_alarm from server codebase

MDEV-33924: Gtid_log_event thread_id should be 32-bit

489ad3f

In accordance with MDEV-15089, and to be consistent with Query_log_event::thread_id, the Gtid_log_event::thread_id should also be 32-bit when writing/reading to the binary log.

andrelkin force-pushed the 11.5-MDEV-33924 branch from 347ad1f to 489ad3f Compare April 23, 2024 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

bnestere commented Apr 18, 2024

CLAassistant commented Apr 23, 2024 •

edited

Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

Are you sure you want to change the base?

Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

Conversation

bnestere commented Apr 18, 2024

CLAassistant commented Apr 23, 2024 • edited

CLAassistant commented Apr 23, 2024 •

edited