-
Notifications
You must be signed in to change notification settings - Fork 40.5k
Support Posix Shared Memory across containers in a pod #28272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also ref #24588 (comment), in which we also discussed whether we should expose shmsize in pod configuration. |
I am not sure I can see that discussion in that issue about exposing ShmSize on the Kubernetes API :( As I understand, that discussion is about how to use the Docker API after it introduced the ShmSize attribute. |
I should say "in which we also mentioned whether we should expose shmsize in container configuration." |
@Random-Liu All right, thank you! I missed that point. |
@janosi @CsatariGergely - the 64m default is not enough? what would be the best way to make it configurable for your use? (pass a parameter in kubelet command line?) |
@dims Or maybe it is too much to waste? ;) |
We have a chance to work on this issue now. I would like to align the design before applying it on the code. Your comments are welcome! The change on the versioned API would be, that there would be a new field in type PodSpec: //Optional: Docker "--shm-size" support. Defines the size of /dev/shm in a Docker-managed cotainer
//If not defined here Docker uses a default value
//Cannot be updated
ShmSize *resource.Quantity `json:"shmSize,omitempty"` |
@janosi Did you have a patch for this? we currently hit this issue running db on k8s, would like to have shm size configurable. |
@ddysher We are working on it. We send the PRs in the next weeks. |
Just want to chime in that we are hitting this problem as well |
Hi, is there any known workaround for this problem? I need to increase the shmem size to at least 2GB, and I have no idea how. |
@ddysher @wstrange @gjcarneiro Please share your use cases with @vishh and @derekwaynecarr on the pull request #34928 They have concerns about extending the API with this shmsize option, and they have different solution proposals. They would like to understand whether users really require this on the API, or shm size could be adjusted by k8s automatically to some calculated value. |
My use case is a big shared memory database, typically at a 1 GiB order, but we usually reserve 3 GiB shared memory space just in case it grows. This data is constantly being updated by a writer (a process), and must be made available to readers (other processes). Previously we tried redis server for this, but the performance for this solution was not great, so shared memory it is. My current workaround is (1) mount a tmpfs volume in /dev/shm, as in this openshift article, and (2) make the writer and reader processes all run in the same container. |
My use case is an Apache policy agent plugin that allocates a very large (2GB) cache. I worked around it by setting a very low shm value. This is OK for development, but I need a solution for production. Adjusting shm size dynamically seems tricky. From my perspective, declaring it as a container resource would be fine. |
My use case is to run database application on top of kubernetes that needs at least (2GB) shared memory. Right now, we just set a large default; it would be nice to have a configurable option. |
@ddysher @wstrange @gjcarneiro Do you applications dynamically adjust their behavior based on the shm size? Will they be able to function if the default size is >= pod's memory limit? |
The shm size is configurable only when the application starts (i.e. , you can say "only use this much shm"). It can not be adjusted dynamically. |
@wstrange Thanks for clarifying. |
Same for me, shm size is a constant in a configuration file. |
Great. In that case, kubelet can set the default size of |
@vishh What about if there is no memory limit imposed on the application? For reference, it looks like Linux defaults to half of the total RAM. |
@vishh you can close the PR is you think so. |
/remove-lifecycle stale It would be nice to get a confirmation that this is fixed. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/remove-lifecycle stale Is this fixed? Seems to be an important issue since many folks are needing additional shared memory for running LLM inference huggingface/accelerate#412 (comment) |
For posix, the default posix shared memory volume in openshift is 64 MB. We could solve this problem by mounting a larger volume and mounting that. Or, we can just use sysv with a large enough shmall/shmmax, which we already have. Since sysv is easier to implement, we're doing this here. The details below show how to check the shared memory types, sizes, and possible solutions for both posix and sysv. We originally had posix for dynamic shared memory: ``` postgres=# show shared_buffers; shared_buffers ---------------- 1GB (1 row) postgres=# show shared_memory_type; shared_memory_type -------------------- mmap (1 row) postgres=# show dynamic_shared_memory_type; dynamic_shared_memory_type ---------------------------- posix (1 row) ``` with /dev/shm of only the default: 64 MB: ``` sh-4.4$ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on shm 65536 10000 55536 16% /dev/shm ``` According to enterprisedb, you can solve this for each dynamic_shared_memory_type: a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small) Add something like this: volumeMounts: - mountPath: /dev/shm name: shm volumes: - name: shm emptyDir: medium: Memory sizeLimit: 1Gi b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume. https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/ CrunchyData/postgres-operator#2783 kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here) Since we had simarly enormous shmall values, we tried "sysv" ``` sh-4.4$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398509481980 min seg size (bytes) = 1 sh-4.4$ cat /proc/sys/kernel/shmall 18446744073692774399 sh-4.4$ cat /proc/sys/kernel/shmmax 18446744073692774399 ``` To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap. We added a new file: ``` data: 001_yolo_overrides.conf: > #------------------------------------------------------------------------------ dynamic_shared_memory_type = sysv #------------------------------------------------------------------------------ 01_miq_overrides.conf: > ... ```
For posix, the default posix shared memory volume in openshift is 64 MB. We could solve this problem by mounting a larger volume and mounting that. Or, we can just use sysv with a large enough shmall/shmmax, which we already have. Since sysv is easier to implement, we're doing this here. The details below show how to check the shared memory types, sizes, and possible solutions for both posix and sysv. We originally had posix for dynamic shared memory: ``` postgres=# show shared_buffers; shared_buffers ---------------- 1GB (1 row) postgres=# show shared_memory_type; shared_memory_type -------------------- mmap (1 row) postgres=# show dynamic_shared_memory_type; dynamic_shared_memory_type ---------------------------- posix (1 row) ``` with /dev/shm of only the default: 64 MB: ``` sh-4.4$ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on shm 65536 10000 55536 16% /dev/shm ``` According to enterprisedb, you can solve this for each dynamic_shared_memory_type: a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small) Add something like this: volumeMounts: - mountPath: /dev/shm name: shm volumes: - name: shm emptyDir: medium: Memory sizeLimit: 1Gi b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume. https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/ CrunchyData/postgres-operator#2783 kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here) Since we had similarly enormous shmall values, we tried "sysv" ``` sh-4.4$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398509481980 min seg size (bytes) = 1 sh-4.4$ cat /proc/sys/kernel/shmall 18446744073692774399 sh-4.4$ cat /proc/sys/kernel/shmmax 18446744073692774399 ``` To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap. We added a new file: ``` data: 001_yolo_overrides.conf: > #------------------------------------------------------------------------------ dynamic_shared_memory_type = sysv #------------------------------------------------------------------------------ 01_miq_overrides.conf: > ... ```
For posix, the default posix shared memory volume in openshift is 64 MB. We could solve this problem by mounting a larger volume and mounting that. Or, we can just use sysv with a large enough shmall/shmmax, which we already have. Since sysv is easier to implement, we're doing this here. The details below show how to check the shared memory types, sizes, and possible solutions for both posix and sysv. We originally had posix for dynamic shared memory: ``` postgres=# show shared_buffers; shared_buffers ---------------- 1GB (1 row) postgres=# show shared_memory_type; shared_memory_type -------------------- mmap (1 row) postgres=# show dynamic_shared_memory_type; dynamic_shared_memory_type ---------------------------- posix (1 row) ``` with /dev/shm of only the default: 64 MB: ``` sh-4.4$ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on shm 65536 10000 55536 16% /dev/shm ``` According to enterprisedb, you can solve this for each dynamic_shared_memory_type: a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small) Add something like this: volumeMounts: - mountPath: /dev/shm name: shm volumes: - name: shm emptyDir: medium: Memory sizeLimit: 1Gi b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume. https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/ CrunchyData/postgres-operator#2783 kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here) Since we had similarly enormous shmall values, we tried "sysv" ``` sh-4.4$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398509481980 min seg size (bytes) = 1 sh-4.4$ cat /proc/sys/kernel/shmall 18446744073692774399 sh-4.4$ cat /proc/sys/kernel/shmmax 18446744073692774399 ``` To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap. We added a new file: ``` data: 001_yolo_overrides.conf: > #------------------------------------------------------------------------------ dynamic_shared_memory_type = sysv #------------------------------------------------------------------------------ 01_miq_overrides.conf: > ... ```
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
Docker implemented modifiable shmsize (see 1) in version 1.9. It should be possible to define the shmsize of a pod on the API and Kubernetes shall pass this information to Docker.
The text was updated successfully, but these errors were encountered: