Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve scheduler entrypoint #846

Closed
wants to merge 4 commits into from

Conversation

ppenguin
Copy link

Fixes poorly defined behaviour when the scheduler main process crashes.

The original code uses wait $pid to wait until the main.py process exits. This means that if /var/run/bunkerweb/scheduler.pid still exists after wait $pid returns, the contained pid therein is per definition invalid. I.e. if we should assume that a clean exit of main.py means the pid file should have been cleaned up, its mere existence at this point indicates a crash of the process.

This PR fixes the logic issues and makes crashes visible in the logs. It additionally tries to restart a crashed process after a delay.

Possibly usage of the /var/tmp/bunkerweb/scheduler.healthy file can also be removed (if its only function was to check for crashes), and the entrypoint could be made a bit leaner still.

Bonus info: the scheduler process crashed silently (in a podman container) when memory was limited to 100M, with >= 150M it appears to work.

Fixes poorly defined behaviour when the scheduler main process crashes.
@ppenguin
Copy link
Author

(I just noticed that the bw/entrypoint.sh suffers from the same issue. I.e. it can end up in an endless loop for an invalid pid file)

@ppenguin
Copy link
Author

... but the "fix" for the BW entrypoint doesn't seem to do the right thing: from the nginx main processes POV it is ok, but the problem seems to be that if BW restarts a crashed nginx process by itself rather than being restarted by the scheduler, the scheduler doesn't know it should then immediately initiate the config requests to the BW API and the new BW process is then only running in its default config.

So let's keep the scope of this PR limited to the scheduler's entrypoint, and solve the bw/entrypoint later.

@TheophileDiot TheophileDiot self-assigned this Jan 12, 2024
@TheophileDiot TheophileDiot changed the base branch from master to dev January 12, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants