Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gtid_log_event thread_id missing in binlog when pseudo_thread_id is 0 #3215

Open
wants to merge 30 commits into
base: bb-11.5-MDEV-7850-preview
Choose a base branch
from

Conversation

bnestere
Copy link
Contributor

If pseudo_thread_id is set to 0 (or implicitly set to 0 because it was attempted to be set to a negative value), then the thread_id attribute of the Gtid_log_event will not be written at all, whereas the Query_log_event still will write thread_id=0. The underlying problem was that the
FL_EXTRA_THREAD_ID would only be added if thread_id was greater than 0.

This patch fixes this by changing the condition to mark the FL_EXTRA_THREAD_ID when generating a GTID log event, such that it will always be written on the primary, and if on the slave, the rpl_group_info's gtid_ev_flags_extra is checked to see if the flag was present when reading it in.

abarkov and others added 21 commits February 28, 2024 22:20
Under terms of MDEV 27490 we'll add support for non-BMP identifiers
and upgrade casefolding information to Unicode version 14.0.0.
In Unicode-14.0.0 conversion to lower and upper cases can increase octet length
of the string, so conversion won't be possible in-place any more.

This patch removes virtual functions performing in-place casefolding:
  - my_charset_handler_st::casedn_str()
  - my_charset_handler_st::caseup_str()
and fixes the code to use the non-inplace functions instead:
  - my_charset_handler_st::casedn()
  - my_charset_handler_st::caseup()
…r U+0700..U+07FF

New tests display additional information about characters from the BMP range:

- A summary with a COUNT(*) for all distinct combinations of properties
  telling how the "=" and the "LIKE" predicates compare characters to their
  LOWER() and UPPER() variants.

- A detailed list of trciky characters
  for which the "=" and the "LIKE" predicates compare
  LOWER(c)/UPPER(c) variants as not equal to just "c".

Tricky characters include:
 - Turkish letters: ı - small dotless letter i
 - Croatian letters: precombined contractions for Dž, Dz, Lj, Nj
 - Units of measurement: Ω,K,Å (Ohm, Kelvin, Angstrom)
   These ones look very similar to Greek letter Omega,
   Latin letter Kra, Swedish/Finnish letter A with a ring above.
…mp_unicode_impl()

This is a refactoring patch, it does not change the behaviour.
The MTR tests are being added only to cover the LIKE predicate better.
(these tests should have been added earlier under terms of MDEV 9711).
This patch does not need its own specific MTR tests.

Moving the duplicate code into a new shared file ctype-wildcmp.inl
and including it from multiple places, to define the following functions:

- my_wildcmp_uca_impl(), in ctype-uca.c

  For utf8mb3, utf8mb4, ucs2, utf16, utf32, using cs->cset->mb_wc().
  For UCA based collations.

- my_wildcmp_mb2_or_mb4_general_ci_impl(), in ctype-ucs2.c:

  For ucs2, utf16, utf32, using cs->cset->mb_wc().
  For general_ci-style collations:
      - xxx_general_ci
      - xxx_general_mysql500_ci
      - xxx_general_nopad_ci

- my_wildcmp_mb2_or_mb4_bin_impl(), in ctype-ucs2.c:

  For ucs2, utf16, utf32, using cs->cset->mb_wc().
  For _bin collations:
      - xxx_bin
      - xxx_nopad_bin

- my_wildcmp_utf8mb3_general_ci_impl(), in ctype-utf8.c

  Optimized for utf8mb3, using my_mb_wc_utf8mb3_quick().

  For general_ci-style collations:
      - utf8mb3_general_ci
      - utf8mb3_general_mysql500_ci
      - utf8mb3_general_nopad_ci

- my_wildcmp_utf8mb4_general_ci_impl(), in ctype-utf8.c

  Optimized for utf8mb4, using my_mb_wc_utf8mb4_quick().

  For general_ci-style collations:
      - utf8mb4_general_ci
      - utf8mb4_general_nopad_ci
This commit removes the WITH_SSL=<custom_location_of_openssl> option,
leaving only -DWITH_SSL=bundled/system.

The rationale behind this removal is as follows:

1. The WITH_SSL=<custom_location_of_openssl> option is obscure
and not widely used.

2. There is no added value in this option compared to using
OPENSSL_ROOT_DIR. In fact, the availability of "helpful" MySQL options
might discourage users from exploring proper CMake options independently.

3. Users may incorrectly assume full MySQL compatibility even with this
option, including undocumented behaviors such as MySQL's preference for static libraries
with WITH_SSL=<custom_location_of_openssl>.

This change simplifies the configuration options and encourages users to
adopt more standardized and documented practices.
sql_sequence.h:233:19: runtime error: signed integer overflow: -9223372036854775808 + -1 cannot be represented in type 'long long int'

followup for 374783c
also, don't require -DWITH_SSL=system if OPENSSL_ROOT_DIR is specified
The corresponding table param was deprecated as part of MDEV-28861
…ds()

This avoids non-integral types breaking the call of
sequence_structure().
Add doxygen markup so comments get picked up. Also fix minor typos and
expand documentation where relevant.

Signed-off-by: Trevor Gross <tmgross@umich.edu>
It was wrong to derive Item_func_uuid from Item_func_sys_guid,
because the former is a function returning the UUID data type,
while the latter is a string function returning VARCHAR.

As a result of the wrong hierarchy, Item_func_uuid erroneously derived
Item_str_func::fix_fields(), which contains this code:

  /*
    In Item_str_func::check_well_formed_result() we may set null_value
    flag on the same condition as in test() below.
  */
  if (thd->is_strict_mode())
    set_maybe_null();

This code is not relevant to UUID() at all.
A simple fix would be to set_maybe_null(false) in
Item_func_uuid::fix_length_and_dec(). However,
it'd fix only exactly this single consequence of the wrong
class hierarchy, and similar bugs could appear again in
the future. Moreover, we're going to add functions UUIDv4()
and UUIDv7() soon (in 11.6). So it's better to fix the class hierarchy
in the right way before adding these new functions.

Fix:

- Adding a new abstract class Item_fbt_func in the template
  in sql_type_fixedbin.h
- Deriving Item_typecast_fbt from Item_fbt_func
- Deriving Item_func_uuid from Item_fbt_func
- Adding a new helper class UUIDv1. It derives from UUID, and additionally
  initializes the value to "UUID version 1" right in the constructor.
  Note, the new coming soon SQL functions UUIDv4() and UUIDv7()
  will also have corresponding classes UUIDv4 and UUIDv7.

So now UUID() is a pure "returning UUID" function,
like CAST(expr AS UUID) used to be, without any unintentional
artifacts of functions returning VARCHAR/TEXT.

Cleanup:
- Removing the member Item_func_sys_guid::with_dashes,
  as it's not needed any more:
  * Item_func_sys_guid now does not have any descendants any more
  * Item_func_sys_guid::val_str() itself always displays without dashes
We add an extra condition that makes the inequality testing in
SEQUENCE::increment_value() mathematically watertight, and we cast to
and from unsigned in potential underflow and overflow addition and
subtractions to avoid undefined behaviour.

Let's start by distinguishing between c++ expressions and mathematical
expressions. by c++ expression I mean an expression with the outcome
determined by the compiler/runtime. by mathematical expression I mean
an expression whose value is mathematically determined. So a c++
expression -9223372036854775806 - 1000 at worst can evaluate to any
value due to underflow. A mathematical expression -9223372036854775806
- 1000 evaluates to -9223372036854776806.

The problem boils down to how to write a c++ expression equivalent to
an mathematical expression x + y < z where x and z can take any values
of long long int, and y < 0 is also a long long int. Ideally we want
to avoid underflow, but I'm not sure how this can be done.

The correct c++ form should be (x + y < z || x < z - y || x < z).
Let M=9223372036854775808 i.e. LONGLONG_MAX + 1. We have

-M < x < M - 1
-M < y < 0
-M < z < M - 1

Let's consider the case where x + y < z is true as a mathematical
expression.

If the first disjunct underflows, i.e. the mathematical expression x
+ y < -M. If the arbitrary value resulting from the underflow causes
the c++ expression to hold too, then we are done. Otherwise we move
onto the next expression x < z - y. If there's no overflow in z
- y then we are done. If there's overflow i.e. z - y > M - 1,
and the c++ expression evals to false, then we are onto x < z.
There's no over or underflow here, and it will eval to true. To see
this, note that

x + y < -M means x < -M - y < -M - (-M) = 0
z - y > M - 1 means z > y + M - 1 > - M + M - 1 = -1
so x < z.

Now let's consider the case where x + y < z is false as a mathematical
expression.

The first disjunct will not underflow in this case, so we move to (x <
z - y). This will not overflow. To see this, note that

x + y >= z means z - y <= x < M - 1

So it evals to false too. And the third disjunct x < z also evals to
false because x >= z - y > z.

I suspect that in either case the expression x < z does not determine
the final value of the disjunction in the vast majority cases, which
is why we leave it as the final one in case of the rare cases of both
an underflow and an overflow happening.

Here's an example of both underflow and overflow happening and the
added inequality x < z saves the day:

x = - M / 2
y = - M / 2 - 1
z = M / 2

x + y evals to M - 1 which is > z
z - y evals to - M + 1 which is < x

We can do the same to test x + y > z where the increment y is positive:

(x > z - y || x + y > z || x > z)

And the same analysis applies to unsigned cases.
We do this by checking server status. By doing this we avoid printing
session tracking info from previous (but not the last) statement.

The change is from Sergei Golubchik
Values of all session tracking system variables will be sent in the
first ok packet upon connection after successful authentication.

Also updated mtr to print session track info on connection (h/t Sergei
Golubchik) so that we can write mtr tests for this change.
Support index condition pushdown within partitioned tables.
- ha_partition will pass the pushed index condition into all of the used
  partitions.
  - We require that all of the partitions to handle the pushed index
    condition in the same way.
- When using ICP, one may read rows (e.g. call h->index_read_map(buf, ...)
  only to buf= table->record[0], for two reasons:
  * Pushed index condition's Item_field objects point into record[0]
  * InnoDB requires this: it calls offset() which assumes record[0].
  So, when using ICP, ha_partition will read partition records to
  table->record[0] and then will copy record away if it needs it to be
  elsewhere.
Add assertions about limitations one has when using Index Condition
Pushdown:
- add handler::assert_icp_limitations()
- call this function from functions that may attempt violations.

Verified that assert_icp_limitations() as well as calls to it are
compiled away in release build.
This patch also fixes:
  MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
  MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
  MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
  MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
  MDEV-33088 Cannot create triggers in the database `MYSQL`
  MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
  MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
  MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
  MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
  MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER

- Adding a wrapper function CHARSET_INFO::streq(), to compare
  two strings for equality. For now it calls strnncoll() internally.
  In the future it will turn into a virtual function.

- Adding new accent sensitive case insensitive collations:
    - utf8mb4_general1400_as_ci
    - utf8mb3_general1400_as_ci
  They implement accent sensitive case insensitive comparison.
  The weight of a character is equal to the code point of its
  upper case variant. These collations use Unicode-14.0.0 casefolding data.

  The result of
     my_charset_utf8mb3_general1400_as_ci.strcoll()
  is very close to the former
     my_charset_utf8mb3_general_ci.strcasecmp()

  There is only a difference in a couple dozen rare characters, because:
    - the switch from "tolower" to "toupper" comparison, to make
      utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
    - the switch from Unicode-3.0.0 to Unicode-14.0.0
  This difference should be tolarable. See the list of affected
  characters in the MDEV description.

  Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
  Unlike utf8mb4_general_ci, it does not treat all BMP characters
  as equal.

- Adding classes representing names of the file based database objects:

    Lex_ident_db
    Lex_ident_table
    Lex_ident_trigger

  Their comparison collation depends on the underlying
  file system case sensitivity and on --lower-case-table-names
  and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.

- Adding classes representing names of other database objects,
  whose names have case insensitive comparison style,
  using my_charset_utf8mb3_general1400_as_ci:

  Lex_ident_column
  Lex_ident_sys_var
  Lex_ident_user_var
  Lex_ident_sp_var
  Lex_ident_ps
  Lex_ident_i_s_table
  Lex_ident_window
  Lex_ident_func
  Lex_ident_partition
  Lex_ident_with_element
  Lex_ident_rpl_filter
  Lex_ident_master_info
  Lex_ident_host
  Lex_ident_locale
  Lex_ident_plugin
  Lex_ident_engine
  Lex_ident_server
  Lex_ident_savepoint
  Lex_ident_charset
  engine_option_value::Name

- All the mentioned Lex_ident_xxx classes implement a method streq():

  if (ident1.streq(ident2))
     do_equal();

  This method works as a wrapper for CHARSET_INFO::streq().

- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
  in class members and in function/method parameters.

- Replacing all calls like
    system_charset_info->coll->strcasecmp(ident1, ident2)
  to
    ident1.streq(ident2)

- Taking advantage of the c++11 user defined literal operator
  for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
  data types. Use example:

  const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;

  is now a shorter version of:

  const Lex_ident_column primary_key_name=
    Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
it's a pointer into the net buffer, so it might be overwritten by the
next read or write. And the next plugin switch (in multi-auth) will
try to compare it (in send_plugin_request_packet) which is normally
harmless but fails the assert with Lex_ident::is_valid_ident()
vaintroub and others added 4 commits April 18, 2024 21:07
New option works just like --tab, wrt output (sql file for table definition
and tab-separated for data, same options, e.g --parallel)

Compared to --tab it allows --databases and --all-databases.
When --dir is used , it creates directory structure in the output directory,
pointed to by --dir. For every database to be dumped, there will be a
directory with database name.

All options that --tab supports, are also supported by --dir, in particular
--parallel
COM_STMT_BULK_STMT new flag to server to returns all unitary results
it no longer supports TLSv1.0
vuvova and others added 5 commits April 22, 2024 14:59
C/C 3.4 disables mysql_old_password by default, so

add an option for the `connect` command to support specifying
allowed authentication plugins (MARIADB_OPT_RESTRICTED_AUTH).

use it to enable mysql_old_password when needed for testing
… to fail

This was caused by the patch for
MDEV-32567 Remove thr_alarm from server codebase
This patch augments Gtid_log_event with the user thread-id.
In particular that compensates for the loss of this info in
Rows_log_events.

Gtid_log_event::thread_id gets visible in mysqlbinlog output like

  #231025 16:21:45 server id 1  end_log_pos 537 CRC32 0x1cf1d963  GTID 0-1-2 ddl thread_id=10

as 64 bit unsigned integer.

While the size of Gtid event has grown by 8-9 bytes
replication from OLD <-> NEW is not affected by it. This patch
also slightly changes the logic to convert Gtid events to Query
events for older replicas which don't support Gtid. Instead of
hard-coding the padding of the sys var section of the generated
Query event, the length to pad is dynamically calculated based
on the length of the Gtid event.

This work was started by the late Sujatha Sivakumar.
Brandon Nesterenko took it over, reviewed initial patches and extended
the work.

Reviewed-by:
=============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
…et to 0 or negative

If pseudo_thread_id is set to 0 (or implicitly set to 0 because
it was attempted to be set to a negative value), then the
thread_id attribute of the Gtid_log_event will not be written
at all, whereas the Query_log_event still will write
thread_id=0.  The underlying problem was that the
FL_EXTRA_THREAD_ID would only be added if thread_id was greater
than 0.

This patch fixes this by changing the condition to mark the
FL_EXTRA_THREAD_ID when generating a GTID log event, such that
it will always be written on the primary, and if on the slave,
the rpl_group_info's gtid_ev_flags_extra is checked to see if
the flag was present when reading it in.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
In accordance with MDEV-15089, and to be consistent with
Query_log_event::thread_id, the Gtid_log_event::thread_id
should also be 32-bit when writing/reading to the binary
log.
@CLAassistant
Copy link

CLAassistant commented Apr 23, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 9 committers have signed the CLA.

✅ vaintroub
✅ tgross35
✅ spetrunia
✅ bnestere
❌ abarkov
❌ vuvova
❌ DaveGosselin-MariaDB
❌ montywi
❌ mariadb-YuchenPei
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment