Support Posix Shared Memory across containers in a pod #28272

CsatariGergely · 2016-06-30T07:25:11Z

Docker implemented modifiable shmsize (see 1) in version 1.9. It should be possible to define the shmsize of a pod on the API and Kubernetes shall pass this information to Docker.

Random-Liu · 2016-06-30T07:41:40Z

Also ref #24588 (comment), in which we also discussed whether we should expose shmsize in pod configuration.

janosi · 2016-06-30T08:04:21Z

I am not sure I can see that discussion in that issue about exposing ShmSize on the Kubernetes API :( As I understand, that discussion is about how to use the Docker API after it introduced the ShmSize attribute.

Random-Liu · 2016-06-30T08:26:11Z

I would like kube to set an explicit default ShmSize using the option 1 proposed by @Random-Liu and I wonder if we should look to expose ShmSize as a per container option in the future.

I should say "in which we also mentioned whether we should expose shmsize in container configuration."

janosi · 2016-06-30T08:28:33Z

@Random-Liu All right, thank you! I missed that point.

dims · 2016-07-08T01:26:23Z

@janosi @CsatariGergely - the 64m default is not enough? what would be the best way to make it configurable for your use? (pass a parameter in kubelet command line?)

janosi · 2016-07-11T08:20:20Z

@dims Or maybe it is too much to waste? ;)
But yes, sometimes 64m is not enough.
We would prefer a new optional attribute for the pod in PodSpec in the API,like e.g. "shmSize".
As shm is shared among containers in the pod, PodSpec would be the appropriate place, I think.

janosi · 2016-09-02T20:11:02Z

We have a chance to work on this issue now. I would like to align the design before applying it on the code. Your comments are welcome!

The change on the versioned API would be, that there would be a new field in type PodSpec:

//Optional: Docker "--shm-size" support. Defines the size of /dev/shm in a Docker-managed cotainer 
//If not defined here Docker uses a default value
//Cannot be updated
ShmSize *resource.Quantity `json:"shmSize,omitempty"`

ddysher · 2016-09-21T08:54:17Z

@janosi Did you have a patch for this? we currently hit this issue running db on k8s, would like to have shm size configurable.

janosi · 2016-09-22T17:22:38Z

@ddysher We are working on it. We send the PRs in the next weeks.

wstrange · 2016-10-05T21:26:39Z

Just want to chime in that we are hitting this problem as well

gjcarneiro · 2016-11-09T14:14:32Z

Hi, is there any known workaround for this problem? I need to increase the shmem size to at least 2GB, and I have no idea how.

janosi · 2016-11-11T10:08:06Z

@ddysher @wstrange @gjcarneiro Please share your use cases with @vishh and @derekwaynecarr on the pull request #34928 They have concerns about extending the API with this shmsize option, and they have different solution proposals. They would like to understand whether users really require this on the API, or shm size could be adjusted by k8s automatically to some calculated value.

gjcarneiro · 2016-11-11T10:53:59Z

My use case is a big shared memory database, typically at a 1 GiB order, but we usually reserve 3 GiB shared memory space just in case it grows. This data is constantly being updated by a writer (a process), and must be made available to readers (other processes). Previously we tried redis server for this, but the performance for this solution was not great, so shared memory it is.

My current workaround is (1) mount a tmpfs volume in /dev/shm, as in this openshift article, and (2) make the writer and reader processes all run in the same container.

wstrange · 2016-11-11T12:40:08Z

My use case is an Apache policy agent plugin that allocates a very large (2GB) cache. I worked around it by setting a very low shm value. This is OK for development, but I need a solution for production.

Adjusting shm size dynamically seems tricky. From my perspective, declaring it as a container resource would be fine.

ddysher · 2016-11-11T13:24:19Z

My use case is to run database application on top of kubernetes that needs at least (2GB) shared memory. Right now, we just set a large default; it would be nice to have a configurable option.

vishh · 2016-11-11T20:11:20Z

@ddysher @wstrange @gjcarneiro Do you applications dynamically adjust their behavior based on the shm size? Will they be able to function if the default size is >= pod's memory limit?

wstrange · 2016-11-11T21:33:09Z

The shm size is configurable only when the application starts (i.e. , you can say "only use this much shm").

It can not be adjusted dynamically.

vishh · 2016-11-11T21:38:17Z

@wstrange Thanks for clarifying.

ddysher · 2016-11-13T06:13:16Z

@vishh We have the same case as @wstrange. shm size doesn't need to be adjusted dynamically.

gjcarneiro · 2016-11-13T12:08:50Z

Same for me, shm size is a constant in a configuration file.

vishh · 2016-11-14T18:22:06Z

Great. In that case, kubelet can set the default size of /dev/shm to be that of the pod's memory limit. Apps will have to be configured to use a value that is lesser than the pod's memory limit for shm.

elyscape · 2016-11-16T00:57:56Z

@vishh What about if there is no memory limit imposed on the application? For reference, it looks like Linux defaults to half of the total RAM.

janosi · 2016-11-21T21:20:31Z

@vishh you can close the PR is you think so.

mitar · 2022-04-06T20:57:01Z

/remove-lifecycle stale

It would be nice to get a confirmation that this is fixed.

k8s-triage-robot · 2022-07-05T22:09:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mitar · 2022-07-05T22:20:41Z

/remove-lifecycle stale

k8s-triage-robot · 2022-10-03T22:42:09Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

palonsoro · 2022-10-04T08:34:10Z

/remove-lifecycle stale

k8s-triage-robot · 2023-01-02T09:39:42Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

remram44 · 2023-01-02T18:00:29Z

/remove-lifecycle stale

wardnath · 2023-05-03T18:06:20Z

/remove-lifecycle stale

Is this fixed? Seems to be an important issue since many folks are needing additional shared memory for running LLM inference huggingface/accelerate#412 (comment)

For posix, the default posix shared memory volume in openshift is 64 MB. We could solve this problem by mounting a larger volume and mounting that. Or, we can just use sysv with a large enough shmall/shmmax, which we already have. Since sysv is easier to implement, we're doing this here. The details below show how to check the shared memory types, sizes, and possible solutions for both posix and sysv. We originally had posix for dynamic shared memory: ``` postgres=# show shared_buffers; shared_buffers ---------------- 1GB (1 row) postgres=# show shared_memory_type; shared_memory_type -------------------- mmap (1 row) postgres=# show dynamic_shared_memory_type; dynamic_shared_memory_type ---------------------------- posix (1 row) ``` with /dev/shm of only the default: 64 MB: ``` sh-4.4$ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on shm 65536 10000 55536 16% /dev/shm ``` According to enterprisedb, you can solve this for each dynamic_shared_memory_type: a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small) Add something like this: volumeMounts: - mountPath: /dev/shm name: shm volumes: - name: shm emptyDir: medium: Memory sizeLimit: 1Gi b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume. https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/ CrunchyData/postgres-operator#2783 kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here) Since we had simarly enormous shmall values, we tried "sysv" ``` sh-4.4$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398509481980 min seg size (bytes) = 1 sh-4.4$ cat /proc/sys/kernel/shmall 18446744073692774399 sh-4.4$ cat /proc/sys/kernel/shmmax 18446744073692774399 ``` To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap. We added a new file: ``` data: 001_yolo_overrides.conf: > #------------------------------------------------------------------------------ dynamic_shared_memory_type = sysv #------------------------------------------------------------------------------ 01_miq_overrides.conf: > ... ```

For posix, the default posix shared memory volume in openshift is 64 MB. We could solve this problem by mounting a larger volume and mounting that. Or, we can just use sysv with a large enough shmall/shmmax, which we already have. Since sysv is easier to implement, we're doing this here. The details below show how to check the shared memory types, sizes, and possible solutions for both posix and sysv. We originally had posix for dynamic shared memory: ``` postgres=# show shared_buffers; shared_buffers ---------------- 1GB (1 row) postgres=# show shared_memory_type; shared_memory_type -------------------- mmap (1 row) postgres=# show dynamic_shared_memory_type; dynamic_shared_memory_type ---------------------------- posix (1 row) ``` with /dev/shm of only the default: 64 MB: ``` sh-4.4$ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on shm 65536 10000 55536 16% /dev/shm ``` According to enterprisedb, you can solve this for each dynamic_shared_memory_type: a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small) Add something like this: volumeMounts: - mountPath: /dev/shm name: shm volumes: - name: shm emptyDir: medium: Memory sizeLimit: 1Gi b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume. https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/ CrunchyData/postgres-operator#2783 kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here) Since we had similarly enormous shmall values, we tried "sysv" ``` sh-4.4$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398509481980 min seg size (bytes) = 1 sh-4.4$ cat /proc/sys/kernel/shmall 18446744073692774399 sh-4.4$ cat /proc/sys/kernel/shmmax 18446744073692774399 ``` To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap. We added a new file: ``` data: 001_yolo_overrides.conf: > #------------------------------------------------------------------------------ dynamic_shared_memory_type = sysv #------------------------------------------------------------------------------ 01_miq_overrides.conf: > ... ```

k8s-triage-robot · 2024-05-02T19:00:44Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot · 2024-07-31T19:03:33Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

adampl · 2024-08-07T11:22:44Z

/lifecycle frozen

j3ffml added sig/node team/ux labels Jun 30, 2016

pwittrock removed the team/ux label Jul 18, 2016

ravilr mentioned this issue Oct 11, 2016

Add an opaque "container runtime params" object to api.Container #17064

Closed

janosi mentioned this issue Oct 17, 2016

Add ShmSize support #34928

Closed

vishh self-assigned this Nov 14, 2016

vishh added this to the v1.6 milestone Nov 14, 2016

k8s-ci-robot removed the lifecycle/stale label Apr 6, 2022

k8s-ci-robot added the lifecycle/stale label Jul 5, 2022

k8s-ci-robot removed the lifecycle/stale label Jul 5, 2022

Shreyanand mentioned this issue Sep 21, 2022

Enable user to provision shared memory for pipeline node elyra-ai/elyra#2838

Closed

k8s-ci-robot added the lifecycle/stale label Oct 3, 2022

k8s-ci-robot removed the lifecycle/stale label Oct 4, 2022

nethibernate mentioned this issue Oct 13, 2022

1. Java ZGC 堆初始化失败导致无法启动问题 nethibernate/blog#19

Closed

k8s-ci-robot added the lifecycle/stale label Jan 2, 2023

k8s-ci-robot removed the lifecycle/stale label Jan 2, 2023

jrafanie mentioned this issue Jun 9, 2023

Let sysv handle dynamic shared memory type for containers ManageIQ/manageiq-pods#974

Merged

andy108369 mentioned this issue Mar 18, 2024

Grok deployment on Akash Network akash-network/awesome-akash#507

Merged

k8s-ci-robot added needs-triage and removed triage/accepted labels May 2, 2024

k8s-ci-robot added the lifecycle/stale label Jul 31, 2024

k8s-ci-robot added lifecycle/frozen and removed lifecycle/stale labels Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Posix Shared Memory across containers in a pod #28272

Support Posix Shared Memory across containers in a pod #28272

CsatariGergely commented Jun 30, 2016

Random-Liu commented Jun 30, 2016 •

edited

Loading

janosi commented Jun 30, 2016

Random-Liu commented Jun 30, 2016

janosi commented Jun 30, 2016

dims commented Jul 8, 2016

janosi commented Jul 11, 2016

janosi commented Sep 2, 2016 •

edited

Loading

ddysher commented Sep 21, 2016

janosi commented Sep 22, 2016

wstrange commented Oct 5, 2016

gjcarneiro commented Nov 9, 2016

janosi commented Nov 11, 2016

gjcarneiro commented Nov 11, 2016

wstrange commented Nov 11, 2016

ddysher commented Nov 11, 2016

vishh commented Nov 11, 2016

wstrange commented Nov 11, 2016

vishh commented Nov 11, 2016

ddysher commented Nov 13, 2016

gjcarneiro commented Nov 13, 2016

vishh commented Nov 14, 2016

elyscape commented Nov 16, 2016

janosi commented Nov 21, 2016

mitar commented Apr 6, 2022

k8s-triage-robot commented Jul 5, 2022

mitar commented Jul 5, 2022

k8s-triage-robot commented Oct 3, 2022

palonsoro commented Oct 4, 2022

k8s-triage-robot commented Jan 2, 2023

remram44 commented Jan 2, 2023

wardnath commented May 3, 2023

k8s-triage-robot commented May 2, 2024

k8s-triage-robot commented Jul 31, 2024

adampl commented Aug 7, 2024

Support Posix Shared Memory across containers in a pod #28272

Support Posix Shared Memory across containers in a pod #28272

Comments

CsatariGergely commented Jun 30, 2016

Random-Liu commented Jun 30, 2016 • edited Loading

janosi commented Jun 30, 2016

Random-Liu commented Jun 30, 2016

janosi commented Jun 30, 2016

dims commented Jul 8, 2016

janosi commented Jul 11, 2016

janosi commented Sep 2, 2016 • edited Loading

ddysher commented Sep 21, 2016

janosi commented Sep 22, 2016

wstrange commented Oct 5, 2016

gjcarneiro commented Nov 9, 2016

janosi commented Nov 11, 2016

gjcarneiro commented Nov 11, 2016

wstrange commented Nov 11, 2016

ddysher commented Nov 11, 2016

vishh commented Nov 11, 2016

wstrange commented Nov 11, 2016

vishh commented Nov 11, 2016

ddysher commented Nov 13, 2016

gjcarneiro commented Nov 13, 2016

vishh commented Nov 14, 2016

elyscape commented Nov 16, 2016

janosi commented Nov 21, 2016

mitar commented Apr 6, 2022

k8s-triage-robot commented Jul 5, 2022

mitar commented Jul 5, 2022

k8s-triage-robot commented Oct 3, 2022

palonsoro commented Oct 4, 2022

k8s-triage-robot commented Jan 2, 2023

remram44 commented Jan 2, 2023

wardnath commented May 3, 2023

k8s-triage-robot commented May 2, 2024

k8s-triage-robot commented Jul 31, 2024

adampl commented Aug 7, 2024

Random-Liu commented Jun 30, 2016 •

edited

Loading

janosi commented Sep 2, 2016 •

edited

Loading