Skip to content

CRITICAL WORKER TIMEOUT when running Flask app #1801

Closed
@bigunyak

Description

@bigunyak

It seems there have been already several reports related to [CRITICAL] WORKER TIMEOUT error but it just keeps popping up. Here is my issue.

I'm running this Flask hello world application.

from flask import Flask
application = Flask(__name__)

@application.route('/')
def hello_world():
    return 'Hello, World!'

The gunicorn command is this one:

gunicorn -b 0.0.0.0:5000 --log-level=debug hello

And this is the console output:

[2018-06-05 14:56:21 +0200] [11229] [INFO] Starting gunicorn 19.8.1
[2018-06-05 14:56:21 +0200] [11229] [DEBUG] Arbiter booted
[2018-06-05 14:56:21 +0200] [11229] [INFO] Listening at: http://0.0.0.0:5000 (11229)
[2018-06-05 14:56:21 +0200] [11229] [INFO] Using worker: sync
[2018-06-05 14:56:21 +0200] [11232] [INFO] Booting worker with pid: 11232
[2018-06-05 14:56:21 +0200] [11229] [DEBUG] 1 workers
[2018-06-05 14:56:32 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:56:57 +0200] [11232] [DEBUG] Closing connection. 
[2018-06-05 14:57:16 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:57:47 +0200] [11229] [CRITICAL] WORKER TIMEOUT (pid:11232)
[2018-06-05 14:57:47 +0200] [11232] [INFO] Worker exiting (pid: 11232)
[2018-06-05 14:57:47 +0200] [11324] [INFO] Booting worker with pid: 11324

Can you please clearly explain why do I get the error and if it's expected in this example? How do I fix it or if it's an expected behavior why critical error then?

Activity

tilgovi

tilgovi commented on Jun 5, 2018

@tilgovi
Collaborator

The error is not expected, but there is nothing from your example that shows why it happens. Tell us more about your environment.

  • What client is used to connect to Gunicorn?
  • What reverse proxy, if any, is used to connect to Gunicorn?
bigunyak

bigunyak commented on Jun 6, 2018

@bigunyak
Author

Tell us more about your environment.

  • What client is used to connect to Gunicorn?
    => I just send request from Chromium: http://localhost:5000/
  • What reverse proxy, if any, is used to connect to Gunicorn?
    => No proxy, just Gunicorn + Flask

I've just reproduced the problem on a completely fresh setup, here are the steps:

mkdir gunicorn
cd gunicorn/
pipenv --python 3.6
pipenv install flask
pipenv install gunicorn
vim hello.py
pipenv shell
gunicorn -b 0.0.0.0:5000 --log-level=debug hello

The hello.py is exactly the same Flask application as I posted in the initial report.
Below is the complete log.

~$ gunicorn -b 0.0.0.0:5000 --log-level=debug hello
[2018-06-06 09:16:21 +0200] [19829] [DEBUG] Current configuration:
  config: None
  bind: ['0.0.0.0:5000']
  backlog: 2048
  workers: 1
  worker_class: sync
  threads: 1
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 30
  graceful_timeout: 30
  keepalive: 2
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /home/dima/work/gunicorn
  daemon: False
  raw_env: []
  pidfile: None
  worker_tmp_dir: None
  user: 1000
  group: 985
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: None
  disable_redirect_access_to_syslog: False
  access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
  errorlog: -
  loglevel: debug
  capture_output: False
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  statsd_prefix: 
  proc_name: None
  default_proc_name: hello
  pythonpath: None
  paste: None
  on_starting: <function OnStarting.on_starting at 0x7f9757112d08>
  on_reload: <function OnReload.on_reload at 0x7f9757112e18>
  when_ready: <function WhenReady.when_ready at 0x7f9757112f28>
  pre_fork: <function Prefork.pre_fork at 0x7f9756c230d0>
  post_fork: <function Postfork.post_fork at 0x7f9756c231e0>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f9756c232f0>
  worker_int: <function WorkerInt.worker_int at 0x7f9756c23400>
  worker_abort: <function WorkerAbort.worker_abort at 0x7f9756c23510>
  pre_exec: <function PreExec.pre_exec at 0x7f9756c23620>
  pre_request: <function PreRequest.pre_request at 0x7f9756c23730>
  post_request: <function PostRequest.post_request at 0x7f9756c237b8>
  child_exit: <function ChildExit.child_exit at 0x7f9756c238c8>
  worker_exit: <function WorkerExit.worker_exit at 0x7f9756c239d8>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f9756c23ae8>
  on_exit: <function OnExit.on_exit at 0x7f9756c23bf8>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: TLSv1
  raw_paste_global_conf: []
[2018-06-06 09:16:21 +0200] [19829] [INFO] Starting gunicorn 19.8.1
[2018-06-06 09:16:21 +0200] [19829] [DEBUG] Arbiter booted
[2018-06-06 09:16:21 +0200] [19829] [INFO] Listening at: http://0.0.0.0:5000 (19829)
[2018-06-06 09:16:21 +0200] [19829] [INFO] Using worker: sync
[2018-06-06 09:16:21 +0200] [19832] [INFO] Booting worker with pid: 19832
[2018-06-06 09:16:22 +0200] [19829] [DEBUG] 1 workers
[2018-06-06 09:16:48 +0200] [19832] [DEBUG] GET /
[2018-06-06 09:17:19 +0200] [19829] [CRITICAL] WORKER TIMEOUT (pid:19832)
[2018-06-06 09:17:19 +0200] [19832] [INFO] Worker exiting (pid: 19832)
[2018-06-06 09:17:19 +0200] [19872] [INFO] Booting worker with pid: 19872
^C[2018-06-06 09:17:26 +0200] [19829] [INFO] Handling signal: int
[2018-06-06 09:17:26 +0200] [19872] [INFO] Worker exiting (pid: 19872)
[2018-06-06 09:17:26 +0200] [19829] [INFO] Shutting down: Master
~$ pip list
Package      Version
------------ -------
click        6.7    
Flask        1.0.2  
gunicorn     19.8.1 
itsdangerous 0.24   
Jinja2       2.10   
MarkupSafe   1.0    
pip          10.0.1 
setuptools   39.2.0 
Werkzeug     0.14.1 
wheel        0.31.1
yunstanford

yunstanford commented on Jun 8, 2018

@yunstanford
Contributor

@bigunyak I think it's because of the default timeout, your worker has been silent for 30s. http://docs.gunicorn.org/en/stable/settings.html#timeout

From your log,

[2018-06-05 14:57:16 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:57:47 +0200] [11229] [CRITICAL] WORKER TIMEOUT (pid:11232)
[2018-06-06 09:16:48 +0200] [19832] [DEBUG] GET /
[2018-06-06 09:17:19 +0200] [19829] [CRITICAL] WORKER TIMEOUT (pid:19832)
yunstanford

yunstanford commented on Jun 8, 2018

@yunstanford
Contributor

I'm seeing the same thing: workers are timing out even when serving no requests, with sync worker.
To that sense, the critical level log is quite confusing.

Try use Gevent worker could solve this.

bigunyak

bigunyak commented on Jun 8, 2018

@bigunyak
Author

To that sense, the critical level log is quite confusing.

Exactly, that was my original question: if it's an expected behavior why critical error then?
Would be also nice to get some background on why workers need to be restarted, maybe this could be added to the Design document.

berkerpeksag

berkerpeksag commented on Jun 8, 2018

@berkerpeksag
Collaborator

I'm seeing this as well (reproduced by using examples/test.py with gunicorn test:app -b localhost:9595 --log-level=debug --timeout=5) and I agree that the critical level is a bit confusing. I'd be OK change it to debug level. @benoitc @tilgovi what do you think?

yunstanford

yunstanford commented on Jun 9, 2018

@yunstanford
Contributor

I'm thinking info level might be a little bit better.

neocolor

neocolor commented on Jun 14, 2018

@neocolor

I had the same with MSYS2 on Win10 but finally could solved.

in notify() of ...\gunicorn\workers\workertmp.py, os.fchmod is used originally. But it does not work in MSYS. Instead of os.fchmod, I used os.utime. The code is followed. I think it could work for all platform.

    def notify(self):
        try:
            self.spinner = (self.spinner + 1) % 2
            os.fchmod(self._tmp.fileno(), self.spinner)
            if PLATFORM.startswith('MSYS') :
                os.utime(self._tmp.fileno(), None)
        except AttributeError:
            # python < 2.6
            self._tmp.truncate(0)
            os.write(self._tmp.fileno(), b"X")
benoitc

benoitc commented on Jun 20, 2018

@benoitc
Owner

@berkerpeksag I wouldn't expect that the worker exit because no requests happen. This error should only happen if the worker has been kept busy for a time > to the timeout. So the error is critical. Imo we should improve the documentation to provide more use cases and responses to such errors.

If the error still happen when the worker is not kept busy then there is something else happening and we have probably a bug.

self-assigned this
on Jul 4, 2018
kozlek

kozlek commented on Jul 6, 2018

@kozlek

[EDIT]
Same bug for me.
With Django 1.10 / gunicorn 19.6.0 / Python 2.7.15 in python2.7-alpine on Debian 8.8 and stock kernel 3.16, all was working fine.
After updating to Django 1.11 and gunicorn 19.8.1, the workers keep failing at boot with [CRITICAL WORKER TIMEOUT].
Downgrading gunicorn to 19.6.0 doesn't fix the problem.
We updated the host kernel to 4.9.0 (it was scheduled), and the workers successfully booted without errors.
But:

  • we use 4 workers
  • after exactly 4 calls to the Django app, next calls will timeout, showing [CRITICAL WORKER TIMEOUT] in the logs
  • the linux top command shows 4 gunicorn processes stuck with pretty high CPU consumption.

We're going to try gunicorn gevent to see if we'are able to get our app back online.

kozlek

kozlek commented on Jul 6, 2018

@kozlek

Using gunicorn with gevent didn't fixed the bug.

baeriswyld

baeriswyld commented on Aug 2, 2018

@baeriswyld

any update on this issue ?

92 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @benoitc@bobf@tilgovi@berkerpeksag@bigunyak

      Issue actions

        CRITICAL WORKER TIMEOUT when running Flask app · Issue #1801 · benoitc/gunicorn