-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Bind GPUs and workers #2511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind GPUs and workers #2511
Conversation
You should implement this kind of custom logic with Server Hooks. You can probably use |
Thank you for taking the time to make a PR, but I don't think this is a generic feature that Gunicorn needs to implement. |
@tilgolvi i think that should be discussed. I’m curious to hear about the current usage of it. @vignesh34v can you elaborate? |
I think supporting guys mah be useful but it would be probably better to not mix the type of processes in any case. |
Am I correct that this is just giving each worker an environment variable telling it which GPU to use, and ensuring that this stays balanced as workers start and stop? |
Oh I misread the code then. so yes it can probably be closed. though I’m
still interested about the use case it is trying to solve. If it’s a
generic pattern.
On Wed 17 Feb 2021 at 00:23, Randall Leeds ***@***.***> wrote:
Am I correct that this is just giving each worker an environment variable
telling it which GPU to use, and ensuring that this stays balanced as
workers start and stop?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#2511 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADRIQT6AJQ4VSYELMMYK3S7L455ANCNFSM4XCRCIDA>
.
--
Sent from my Mobile
|
Yes, it is. |
Use case: Deploying a model across multiple GPUs with multiple workers. |
+1 for the use-case. minimal image / video processing apps that can distribute workloads to multiple GPUs is niche for now, but potentially very useful. |
As I explained above, it should be possible to use server hooks to implement something like this. Perhaps someone can post an example. If it's possible to articulate a more general description of the problem, maybe we could implement it in core. This implementation uses some environment variables that specifically refer to GPUs, but the feature is actually just a balancing feature. It could easily be implemented just by having each worker have an ordinal number. Does someone want to make an attempt? I'm thinking a minheap / priority queue of numbers [0, num workers). That's maybe not sufficient because I don't think our current strategy for replacing workers when the worker count changes would prioritize killing the highest numbered workers. There's probably other nuance I'm missing. Anyway, someone should try to implement this with server hooks so that people who want this can have it now without it being in core. And if someone wants to articulate the problem statement and algorithm more generally, that would be great. |
fwiw, I took a stab at this with chatGPT. Here's its un-edited response: The Gunicorn maintainer is proposing to use server hooks to balance GPU assignment among workers, rather than hard-coding GPU identifiers into environment variables. This approach leverages the lifecycle of Gunicorn workers to achieve load balancing. Here is a basic example of how you might use Gunicorn server hooks to balance GPU assignment among workers: import os
import heapq
gpu_heap = list(range(0, num_gpus)) # replace num_gpus with the number of GPUs
def pre_fork(server, worker):
# Grab a GPU identifier from the heap before a worker is forked
if gpu_heap:
gpu_id = heapq.heappop(gpu_heap)
worker.gpu_id = gpu_id
def post_fork(server, worker):
# Set the GPU identifier as an environment variable after a worker is forked
os.environ['CUDA_VISIBLE_DEVICES'] = str(worker.gpu_id)
def worker_exit(server, worker):
# When a worker exits, return its GPU identifier to the heap
heapq.heappush(gpu_heap, worker.gpu_id)
del os.environ['CUDA_VISIBLE_DEVICES'] # Optionally clear the environment variable
# Then in your Gunicorn config file, add:
# worker_class = 'path.to.your.worker.class'
# on_starting = pre_fork
# post_fork = post_fork
# worker_exit = worker_exit This script will make use of Python's built-in heap data structure (implemented as a binary heap), which is essentially a priority queue that will always give the minimum value (in this case, the lowest GPU identifier) when popped. Please note that the workers do not directly share memory, so the Moreover, this is a high-level example and your real-world use case may require additional complexity. For instance, handling cases when there are more workers than GPU devices, managing GPU memory usage more closely, or handling errors in GPU assignment. Finally, please be aware that So I asked it to try addressing those points... The major challenge is that Gunicorn workers don't directly share memory, and Python multiprocessing utilities like We can address this limitation by using a third-party library such as Redis, which is an in-memory data structure store. It will act as our shared memory for the GPU heap. Here's an improved script that uses Redis as the shared memory and provides handling for more edge cases: import os
import redis
from heapq import heappush, heappop
# Connect to Redis server. Make sure the Redis server is running.
r = redis.Redis(host='localhost', port=6379, db=0)
# Initialize GPU heap in Redis.
num_gpus = 4 # replace this with the number of your GPUs
gpu_heap_key = 'gpu_heap'
if not r.exists(gpu_heap_key):
r.rpush(gpu_heap_key, *range(num_gpus))
def pre_fork(server, worker):
# Grab a GPU identifier from the heap before a worker is forked.
# If no GPU is available, assign None.
worker.gpu_id = None
if r.llen(gpu_heap_key) > 0:
worker.gpu_id = int(r.lpop(gpu_heap_key))
def post_fork(server, worker):
# Set the GPU identifier as an environment variable after a worker is forked.
# If no GPU is available, this will not set the environment variable.
if worker.gpu_id is not None:
os.environ['CUDA_VISIBLE_DEVICES'] = str(worker.gpu_id)
def worker_exit(server, worker):
# When a worker exits, return its GPU identifier to the heap.
# If the worker did not have a GPU, this does nothing.
if worker.gpu_id is not None:
r.rpush(gpu_heap_key, worker.gpu_id)
del os.environ['CUDA_VISIBLE_DEVICES'] # Optionally clear the environment variable
# Then in your Gunicorn config file, add:
# worker_class = 'path.to.your.worker.class'
# on_starting = pre_fork
# post_fork = post_fork
# worker_exit = worker_exit This script connects to a local Redis server and uses a Redis list as the GPU heap. It pops a GPU ID from the list when a worker is forked, and pushes the GPU ID back into the list when the worker exits. If there are no GPUs available in the heap, the worker's Please note that you need to install the In addition, the workers might need additional configuration to use the specific GPU based on the 'CUDA_VISIBLE_DEVICES' variable in their own context. Again, this is still a simplified example. In a production environment, you might need to handle additional complexities, such as GPU memory management, GPU utilization balancing, error handling and more. Hopefully this is helpful for anyone coming across this in the future. I certainly learned quite a lot looking back on this PR and trying to understand Server Hooks. |
Hah. That's very cool! I think the first example is maybe correct, and ChatGPT expressed too much doubt here. Using the |
😆 what a wild ride it's been, eh? |
Distribute workers among multiple GPUs.