Bind GPUs and workers #2511

vigneshwaran-nv-10329 · 2021-02-04T10:12:20Z

Distribute workers among multiple GPUs.

Fix lint

tilgovi · 2021-02-16T01:54:04Z

You should implement this kind of custom logic with Server Hooks. You can probably use pre_fork or post_fork and child_exit for what you want.

tilgovi · 2021-02-16T01:55:44Z

Thank you for taking the time to make a PR, but I don't think this is a generic feature that Gunicorn needs to implement.

benoitc · 2021-02-16T06:12:16Z

@tilgolvi i think that should be discussed. I’m curious to hear about the current usage of it. @vignesh34v can you elaborate?

benoitc · 2021-02-16T06:17:49Z

I think supporting guys mah be useful but it would be probably better to not mix the type of processes in any case.

tilgovi · 2021-02-16T23:22:55Z

Am I correct that this is just giving each worker an environment variable telling it which GPU to use, and ensuring that this stays balanced as workers start and stop?

benoitc · 2021-02-16T23:39:54Z

Oh I misread the code then. so yes it can probably be closed. though I’m still interested about the use case it is trying to solve. If it’s a generic pattern.

On Wed 17 Feb 2021 at 00:23, Randall Leeds ***@***.***> wrote: Am I correct that this is just giving each worker an environment variable telling it which GPU to use, and ensuring that this stays balanced as workers start and stop? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2511 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADRIQT6AJQ4VSYELMMYK3S7L455ANCNFSM4XCRCIDA> .

-- Sent from my Mobile

vigneshwaran-nv-10329 · 2021-04-01T04:52:36Z

Am I correct that this is just giving each worker an environment variable telling it which GPU to use, and ensuring that this stays balanced as workers start and stop?

Yes, it is.

vigneshwaran-nv-10329 · 2021-04-01T04:54:06Z

Oh I misread the code then. so yes it can probably be closed. though I’m still interested about the use case it is trying to solve. If it’s a generic pattern.
On Wed 17 Feb 2021 at 00:23, Randall Leeds @.***> wrote: Am I correct that this is just giving each worker an environment variable telling it which GPU to use, and ensuring that this stays balanced as workers start and stop? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2511 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRIQT6AJQ4VSYELMMYK3S7L455ANCNFSM4XCRCIDA .
-- Sent from my Mobile

Use case: Deploying a model across multiple GPUs with multiple workers.

mathematicalmichael · 2021-08-03T14:41:56Z

+1 for the use-case. minimal image / video processing apps that can distribute workloads to multiple GPUs is niche for now, but potentially very useful.

tilgovi · 2021-08-20T00:59:52Z

As I explained above, it should be possible to use server hooks to implement something like this. Perhaps someone can post an example.

If it's possible to articulate a more general description of the problem, maybe we could implement it in core.

This implementation uses some environment variables that specifically refer to GPUs, but the feature is actually just a balancing feature. It could easily be implemented just by having each worker have an ordinal number.

Does someone want to make an attempt? I'm thinking a minheap / priority queue of numbers [0, num workers). That's maybe not sufficient because I don't think our current strategy for replacing workers when the worker count changes would prioritize killing the highest numbered workers. There's probably other nuance I'm missing.

Anyway, someone should try to implement this with server hooks so that people who want this can have it now without it being in core. And if someone wants to articulate the problem statement and algorithm more generally, that would be great.

mathematicalmichael · 2023-05-25T00:50:16Z

fwiw, I took a stab at this with chatGPT. Here's its un-edited response:

The Gunicorn maintainer is proposing to use server hooks to balance GPU assignment among workers, rather than hard-coding GPU identifiers into environment variables. This approach leverages the lifecycle of Gunicorn workers to achieve load balancing.

Here is a basic example of how you might use Gunicorn server hooks to balance GPU assignment among workers:

import os
import heapq

gpu_heap = list(range(0, num_gpus))  # replace num_gpus with the number of GPUs

def pre_fork(server, worker):
    # Grab a GPU identifier from the heap before a worker is forked
    if gpu_heap:
        gpu_id = heapq.heappop(gpu_heap)
        worker.gpu_id = gpu_id

def post_fork(server, worker):
    # Set the GPU identifier as an environment variable after a worker is forked
    os.environ['CUDA_VISIBLE_DEVICES'] = str(worker.gpu_id)

def worker_exit(server, worker):
    # When a worker exits, return its GPU identifier to the heap
    heapq.heappush(gpu_heap, worker.gpu_id)
    del os.environ['CUDA_VISIBLE_DEVICES']  # Optionally clear the environment variable

# Then in your Gunicorn config file, add:
# worker_class = 'path.to.your.worker.class'
# on_starting = pre_fork
# post_fork = post_fork
# worker_exit = worker_exit

This script will make use of Python's built-in heap data structure (implemented as a binary heap), which is essentially a priority queue that will always give the minimum value (in this case, the lowest GPU identifier) when popped.

Please note that the workers do not directly share memory, so the gpu_heap in this example would need to be shared across processes or stored in a database that is accessible by all worker processes.

Moreover, this is a high-level example and your real-world use case may require additional complexity. For instance, handling cases when there are more workers than GPU devices, managing GPU memory usage more closely, or handling errors in GPU assignment.

Finally, please be aware that os.environ changes will not propagate to child processes after they have been forked, so setting the CUDA_VISIBLE_DEVICES variable in the post_fork function will only affect the worker process in which it is set. It's usually more straightforward to set this environment variable before starting the Gunicorn server. The example provided is for illustrating the concept.

So I asked it to try addressing those points...

The major challenge is that Gunicorn workers don't directly share memory, and Python multiprocessing utilities like multiprocessing.Value and multiprocessing.Array aren't designed for non-numeric and non-array data structures, like heap.

We can address this limitation by using a third-party library such as Redis, which is an in-memory data structure store. It will act as our shared memory for the GPU heap. Here's an improved script that uses Redis as the shared memory and provides handling for more edge cases:

import os
import redis
from heapq import heappush, heappop

# Connect to Redis server. Make sure the Redis server is running.
r = redis.Redis(host='localhost', port=6379, db=0)

# Initialize GPU heap in Redis.
num_gpus = 4  # replace this with the number of your GPUs
gpu_heap_key = 'gpu_heap'
if not r.exists(gpu_heap_key):
    r.rpush(gpu_heap_key, *range(num_gpus))

def pre_fork(server, worker):
    # Grab a GPU identifier from the heap before a worker is forked.
    # If no GPU is available, assign None.
    worker.gpu_id = None
    if r.llen(gpu_heap_key) > 0:
        worker.gpu_id = int(r.lpop(gpu_heap_key))

def post_fork(server, worker):
    # Set the GPU identifier as an environment variable after a worker is forked.
    # If no GPU is available, this will not set the environment variable.
    if worker.gpu_id is not None:
        os.environ['CUDA_VISIBLE_DEVICES'] = str(worker.gpu_id)

def worker_exit(server, worker):
    # When a worker exits, return its GPU identifier to the heap.
    # If the worker did not have a GPU, this does nothing.
    if worker.gpu_id is not None:
        r.rpush(gpu_heap_key, worker.gpu_id)
        del os.environ['CUDA_VISIBLE_DEVICES']  # Optionally clear the environment variable

# Then in your Gunicorn config file, add:
# worker_class = 'path.to.your.worker.class'
# on_starting = pre_fork
# post_fork = post_fork
# worker_exit = worker_exit

This script connects to a local Redis server and uses a Redis list as the GPU heap. It pops a GPU ID from the list when a worker is forked, and pushes the GPU ID back into the list when the worker exits.

If there are no GPUs available in the heap, the worker's gpu_id attribute is set to None, and the CUDA_VISIBLE_DEVICES environment variable is not set.

Please note that you need to install the redis Python package (with pip install redis or conda install redis) and have a running Redis server to use this script.

In addition, the workers might need additional configuration to use the specific GPU based on the 'CUDA_VISIBLE_DEVICES' variable in their own context.

Again, this is still a simplified example. In a production environment, you might need to handle additional complexities, such as GPU memory management, GPU utilization balancing, error handling and more.

Hopefully this is helpful for anyone coming across this in the future. I certainly learned quite a lot looking back on this PR and trying to understand Server Hooks.

tilgovi · 2023-12-28T00:41:31Z

Hah. That's very cool! I think the first example is maybe correct, and ChatGPT expressed too much doubt here. Using the pre_fork hook means the priority queue is in arbiter memory. Passing the value on the worker instance itself is a great way to get the information from the arbiter into the worker. I think the environment variable is never set in the arbiter, so doesn't need to be cleared, but otherwise ChatGPT got it right 😆.

mathematicalmichael · 2024-01-03T03:09:12Z

😆 what a wild ride it's been, eh?
i think this was one of the earliest examples of me genuinely learning something new with it, as this issue stuck around in my mind for years and I re-encountered it with serving a stable-diffusion app.

vigneshwaran-nv-10329 added 5 commits February 4, 2021 15:40

Distribute workers among multiple GPUs

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

2d9f68e

Fix pylint

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

46cc9fd

Update 2020-news.rst

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

ee70118

Update 2020-news.rst

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

2278073

Merge pull request #2 from vignesh34v/vignesh34v-patch-1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

bd64163

Fix lint

tilgovi closed this Feb 16, 2021

benoitc reopened this Feb 16, 2021

vigneshwaran-nv-10329 added 2 commits April 1, 2021 10:52

Merge branch 'master' into patch-1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

7f81268

Merge branch 'benoitc:master' into patch-1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

d3d4e9c

benoitc closed this May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bind GPUs and workers #2511

Bind GPUs and workers #2511

vigneshwaran-nv-10329 commented Feb 4, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

benoitc commented Feb 16, 2021 •

edited

Loading

Uh oh!

benoitc commented Feb 16, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

benoitc commented Feb 16, 2021 via email

Uh oh!

vigneshwaran-nv-10329 commented Apr 1, 2021

Uh oh!

vigneshwaran-nv-10329 commented Apr 1, 2021

Uh oh!

mathematicalmichael commented Aug 3, 2021

Uh oh!

tilgovi commented Aug 20, 2021

Uh oh!

mathematicalmichael commented May 25, 2023 •

edited

Loading

Uh oh!

tilgovi commented Dec 28, 2023

Uh oh!

mathematicalmichael commented Jan 3, 2024

Uh oh!

Bind GPUs and workers #2511

Bind GPUs and workers #2511

Conversation

vigneshwaran-nv-10329 commented Feb 4, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

benoitc commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benoitc commented Feb 16, 2021

Uh oh!

tilgovi commented Feb 16, 2021

Uh oh!

benoitc commented Feb 16, 2021 via email

Uh oh!

vigneshwaran-nv-10329 commented Apr 1, 2021

Uh oh!

vigneshwaran-nv-10329 commented Apr 1, 2021

Uh oh!

mathematicalmichael commented Aug 3, 2021

Uh oh!

tilgovi commented Aug 20, 2021

Uh oh!

mathematicalmichael commented May 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tilgovi commented Dec 28, 2023

Uh oh!

mathematicalmichael commented Jan 3, 2024

Uh oh!

benoitc commented Feb 16, 2021 •

edited

Loading

mathematicalmichael commented May 25, 2023 •

edited

Loading