-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-34129 mariadb-install-db appears to hang on macOS #3253
base: 10.5
Are you sure you want to change the base?
Conversation
|
60e2be0
to
45279cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the fix. What do you mean by "we could fail to initialize the signal handler but mark it as active anyway"? How could it have happened?
static void wait_for_signal_thread_to_end() | ||
{ | ||
uint i, n_waits= DBUG_EVALUATE("force_sighup_processing_timeout", 5, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed "force_sighup_processing_timeout", but it's used in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that part of the test, too, because as far as I can see it wasn't necessary; it was only needed because we had to wait, but we had to wait because of the bug that was in the signal_hand
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add that by using pthread_join
, we'll wait for all SIGHUP
processing to finish, so there's no need for a special timeout. Joining pthreads is considered a best practice for the pthread library (unless the thread is created in a detached state, which isn't the case here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rpl.rpl_shutdown_sighup
continues to succeed with these changes, because the SIGHUP
processing concludes as expected, without the unnecessary extra waiting semantics. Those were brittle because sleep
s (and equivalent) are not reliable for multithreaded coordination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked specifically because you did not remove that part of the test. It's line 95
https://github.com/MariaDB/server/pull/3253/files#diff-5186f012c3459b3333465ba9ba04df6972c376d56f8d591967e4fc13022b9498R95
The signal handler will fail to initialize if |
static void wait_for_signal_thread_to_end() | ||
{ | ||
uint i, n_waits= DBUG_EVALUATE("force_sighup_processing_timeout", 5, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked specifically because you did not remove that part of the test. It's line 95
https://github.com/MariaDB/server/pull/3253/files#diff-5186f012c3459b3333465ba9ba04df6972c376d56f8d591967e4fc13022b9498R95
signal_thread_in_use= 1; | ||
|
||
// initialize this newly created thread | ||
if (my_thread_init()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can this fail? what's the realistic scenario when it fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can fail only in one of two ways:
- calloc fails (extremely unlikely)
- !my_thread_global_init_done is true at start of my_thread_init
I originally wrote this on 11.4 and ported it back to 10.5 before posting. Now running on latest 11.4, there are new macOS test failures. I cannot be sure that there was a real world failure, and the slow startup and 'hanging' install are both solved without this change. From the code perspective, calling my_end
(which calls my_thread_global_end
and sets my_thread_global_init_done
to 0) will cause my_thread_init
to return with an error, leaving the thread started but not initialized. In the server, we call my_end
only once but in tests we call it in multiple places. During initial development, I had an abort()
in the body of this conditional to see if we had a case where the init would fail, and I'm certain I saw it, but now cannot replicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all that said, IMHO we should ship this fragment with the rest of the PR because there technically is a problem here without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- my_thread_global_init_done is true at start of my_thread_init
is false, you mean. And I don't see how it can happen. The server calls my_init()
at the beginning of mysqld_main()
May be it's set back to false because of some early shutdown? This sounds more plausible, but I still couldn't find how it could happen
45279cb
to
adc58ac
Compare
Ahh... this was an oversight on my part, I removed that in the latest diff. |
An overcomplicated implementation of wait_for_signal_handler_to_end wouldn't join the signal_hand thread, causing the server to exit with an error. Additionally, let's immediately close down the signal handler loop when we decide to break connections--it's the start of process termination anyway, and there's no need to wait once we've invoked break_connections.
adc58ac
to
94595c0
Compare
An overcomplicated implementation of
wait_for_signal_handler_to_end
wouldn't join thesignal_hand
thread, causing the server to exit with an error.Additionally, let's immediately close down the signal handler loop when we decide to break connections--it's the start of process termination anyway, and there's no need to wait once we've invoked break_connections.