Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exception catching for when Redis server is down #1153

Closed
corynezin opened this issue Oct 25, 2019 · 3 comments · Fixed by #1387 · May be fixed by #1261
Closed

Add exception catching for when Redis server is down #1153

corynezin opened this issue Oct 25, 2019 · 3 comments · Fixed by #1387 · May be fixed by #1261

Comments

@corynezin
Copy link
Contributor

After my Redis server went down briefly, it seems all of my workers were killed as a result of this error:

10:33:55 Worker rq:worker:7870f7200ab64bdfae843ce5f4dc5ae0: found an unhandled exception, quitting...
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/rq/worker.py", line 470, in work
    result = self.dequeue_job_and_maintain_ttl(timeout)
  File "/usr/local/lib/python3.6/site-packages/rq/worker.py", line 521, in dequeue_job_and_maintain_ttl
    job_class=self.job_class)
  File "/usr/local/lib/python3.6/site-packages/rq/queue.py", line 469, in dequeue_any
    result = cls.lpop(queue_keys, timeout, connection=connection)
  File "/usr/local/lib/python3.6/site-packages/rq/queue.py", line 441, in lpop
    result = connection.blpop(queue_keys, timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1618, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.6/site-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 699, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 309, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 241, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 186, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)

I would like them to stay alive, even if they are not working.

Could we add a try/except block around lines like:

rq/rq/queue.py

Line 441 in 75644ba

result = connection.blpop(queue_keys, timeout)

@selwin
Copy link
Collaborator

selwin commented Dec 8, 2019

Using a process manager like systemd would help you with this.

I think changing the behaviorto have workers sleep for a few seconds before retrying again would also be good. Please open a PR for this.

@corynezin
Copy link
Contributor Author

I did consider systemd, but it was too much of a pain to set up on Docker, where my application is running. I will look into the PR

@mdawar
Copy link
Contributor

mdawar commented Jun 23, 2020

I did consider systemd, but it was too much of a pain to set up on Docker, where my application is running. I will look into the PR

If you're using Docker, why not rely on Docker's restart policy? for example --restart on-failure or --restart unless-stopped, this way the worker containers will be restarted by Docker on failure.

Asrst added a commit to Asrst/rq that referenced this issue Dec 3, 2020
solves issue rq#1153, rq#998
rq workers not auto connecting to redis server incase if they are down/restarted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants