Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Posix Shared Memory across containers in a pod #28272

Open
CsatariGergely opened this issue Jun 30, 2016 · 95 comments
Open

Support Posix Shared Memory across containers in a pod #28272

CsatariGergely opened this issue Jun 30, 2016 · 95 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@CsatariGergely
Copy link

Docker implemented modifiable shmsize (see 1) in version 1.9. It should be possible to define the shmsize of a pod on the API and Kubernetes shall pass this information to Docker.

@Random-Liu
Copy link
Member

Random-Liu commented Jun 30, 2016

Also ref #24588 (comment), in which we also discussed whether we should expose shmsize in pod configuration.

@janosi
Copy link
Contributor

janosi commented Jun 30, 2016

I am not sure I can see that discussion in that issue about exposing ShmSize on the Kubernetes API :( As I understand, that discussion is about how to use the Docker API after it introduced the ShmSize attribute.

@Random-Liu
Copy link
Member

I would like kube to set an explicit default ShmSize using the option 1 proposed by @Random-Liu and I wonder if we should look to expose ShmSize as a per container option in the future.

I should say "in which we also mentioned whether we should expose shmsize in container configuration."

@janosi
Copy link
Contributor

janosi commented Jun 30, 2016

@Random-Liu All right, thank you! I missed that point.

@j3ffml j3ffml added sig/node Categorizes an issue or PR as relevant to SIG Node. team/ux labels Jun 30, 2016
@dims
Copy link
Member

dims commented Jul 8, 2016

@janosi @CsatariGergely - the 64m default is not enough? what would be the best way to make it configurable for your use? (pass a parameter in kubelet command line?)

@janosi
Copy link
Contributor

janosi commented Jul 11, 2016

@dims Or maybe it is too much to waste? ;)
But yes, sometimes 64m is not enough.
We would prefer a new optional attribute for the pod in PodSpec in the API,like e.g. "shmSize".
As shm is shared among containers in the pod, PodSpec would be the appropriate place, I think.

@pwittrock pwittrock removed the team/ux label Jul 18, 2016
@janosi
Copy link
Contributor

janosi commented Sep 2, 2016

We have a chance to work on this issue now. I would like to align the design before applying it on the code. Your comments are welcome!

The change on the versioned API would be, that there would be a new field in type PodSpec:

//Optional: Docker "--shm-size" support. Defines the size of /dev/shm in a Docker-managed cotainer 
//If not defined here Docker uses a default value
//Cannot be updated
ShmSize *resource.Quantity `json:"shmSize,omitempty"`

@ddysher
Copy link
Contributor

ddysher commented Sep 21, 2016

@janosi Did you have a patch for this? we currently hit this issue running db on k8s, would like to have shm size configurable.

@janosi
Copy link
Contributor

janosi commented Sep 22, 2016

@ddysher We are working on it. We send the PRs in the next weeks.

@wstrange
Copy link
Contributor

wstrange commented Oct 5, 2016

Just want to chime in that we are hitting this problem as well

@gjcarneiro
Copy link

Hi, is there any known workaround for this problem? I need to increase the shmem size to at least 2GB, and I have no idea how.

@janosi
Copy link
Contributor

janosi commented Nov 11, 2016

@ddysher @wstrange @gjcarneiro Please share your use cases with @vishh and @derekwaynecarr on the pull request #34928 They have concerns about extending the API with this shmsize option, and they have different solution proposals. They would like to understand whether users really require this on the API, or shm size could be adjusted by k8s automatically to some calculated value.

@gjcarneiro
Copy link

My use case is a big shared memory database, typically at a 1 GiB order, but we usually reserve 3 GiB shared memory space just in case it grows. This data is constantly being updated by a writer (a process), and must be made available to readers (other processes). Previously we tried redis server for this, but the performance for this solution was not great, so shared memory it is.

My current workaround is (1) mount a tmpfs volume in /dev/shm, as in this openshift article, and (2) make the writer and reader processes all run in the same container.

@wstrange
Copy link
Contributor

My use case is an Apache policy agent plugin that allocates a very large (2GB) cache. I worked around it by setting a very low shm value. This is OK for development, but I need a solution for production.

Adjusting shm size dynamically seems tricky. From my perspective, declaring it as a container resource would be fine.

@ddysher
Copy link
Contributor

ddysher commented Nov 11, 2016

My use case is to run database application on top of kubernetes that needs at least (2GB) shared memory. Right now, we just set a large default; it would be nice to have a configurable option.

@vishh
Copy link
Contributor

vishh commented Nov 11, 2016

@ddysher @wstrange @gjcarneiro Do you applications dynamically adjust their behavior based on the shm size? Will they be able to function if the default size is >= pod's memory limit?

@wstrange
Copy link
Contributor

The shm size is configurable only when the application starts (i.e. , you can say "only use this much shm").

It can not be adjusted dynamically.

@vishh
Copy link
Contributor

vishh commented Nov 11, 2016

@wstrange Thanks for clarifying.

@ddysher
Copy link
Contributor

ddysher commented Nov 13, 2016

@vishh We have the same case as @wstrange. shm size doesn't need to be adjusted dynamically.

@gjcarneiro
Copy link

Same for me, shm size is a constant in a configuration file.

@vishh
Copy link
Contributor

vishh commented Nov 14, 2016

Great. In that case, kubelet can set the default size of /dev/shm to be that of the pod's memory limit. Apps will have to be configured to use a value that is lesser than the pod's memory limit for shm.

@vishh vishh self-assigned this Nov 14, 2016
@vishh vishh added this to the v1.6 milestone Nov 14, 2016
@elyscape
Copy link
Contributor

@vishh What about if there is no memory limit imposed on the application? For reference, it looks like Linux defaults to half of the total RAM.

@janosi
Copy link
Contributor

janosi commented Nov 21, 2016

@vishh you can close the PR is you think so.

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2021
@remram44
Copy link

remram44 commented Oct 7, 2021

/remove-lifecycle stale

I didn't think accepted issues would go stale 🤔

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2021
@swapkh91
Copy link

swapkh91 commented Dec 7, 2021

any update on this?
I'm trying to deploy Triton Inference Server using Kserve and need to change shm-size

@counter2015
Copy link

I tried this method of use.

    volumeMounts:
    - mountPath: /dev/shm
      name: shm
  volumes:
    - name: shm
      emptyDir:
        medium: Memory
        sizeLimit: 5120Mi

@kebyn thanks, it worked for me.

@remram44
Copy link

Is this fixed now that #94444 is merged?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 27, 2022
@mitar
Copy link
Contributor

mitar commented Apr 6, 2022

/remove-lifecycle stale

It would be nice to get a confirmation that this is fixed.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2022
@mitar
Copy link
Contributor

mitar commented Jul 5, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2022
@palonsoro
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2023
@remram44
Copy link

remram44 commented Jan 2, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2023
@wardnath
Copy link

wardnath commented May 3, 2023

/remove-lifecycle stale

Is this fixed? Seems to be an important issue since many folks are needing additional shared memory for running LLM inference huggingface/accelerate#412 (comment)

jrafanie added a commit to jrafanie/manageiq-pods that referenced this issue Jun 9, 2023
For posix, the default posix shared memory volume in openshift is 64 MB.
We could solve this problem by mounting a larger volume and mounting that.

Or, we can just use sysv with a large enough shmall/shmmax, which we already have.

Since sysv is easier to implement, we're doing this here.

The details below show how to check the shared memory types, sizes, and possible
solutions for both posix and sysv.

We originally had posix for dynamic shared memory:

```
postgres=# show shared_buffers;
 shared_buffers
----------------
 1GB
(1 row)

postgres=# show shared_memory_type;
 shared_memory_type
--------------------
 mmap
(1 row)

postgres=# show dynamic_shared_memory_type;
 dynamic_shared_memory_type
----------------------------
 posix
(1 row)
```

with /dev/shm of only the default: 64 MB:

```
sh-4.4$ df /dev/shm
Filesystem     1K-blocks  Used Available Use% Mounted on
shm                65536 10000     55536  16% /dev/shm
```

According to enterprisedb, you can solve this for each dynamic_shared_memory_type:

a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small)

   Add something like this:
        volumeMounts:
        - mountPath: /dev/shm
          name: shm
      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi

b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume.

https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/
CrunchyData/postgres-operator#2783
kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here)

Since we had simarly enormous shmall values, we tried "sysv"

```
sh-4.4$ ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1

sh-4.4$ cat /proc/sys/kernel/shmall
18446744073692774399
sh-4.4$ cat /proc/sys/kernel/shmmax
18446744073692774399
```

To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap.

We added a new file:

```
data:
  001_yolo_overrides.conf: >
    #------------------------------------------------------------------------------

    dynamic_shared_memory_type = sysv

    #------------------------------------------------------------------------------
  01_miq_overrides.conf: >
...
```
jrafanie added a commit to jrafanie/manageiq-pods that referenced this issue Jun 9, 2023
For posix, the default posix shared memory volume in openshift is 64 MB.
We could solve this problem by mounting a larger volume and mounting that.

Or, we can just use sysv with a large enough shmall/shmmax, which we already have.

Since sysv is easier to implement, we're doing this here.

The details below show how to check the shared memory types, sizes, and possible
solutions for both posix and sysv.

We originally had posix for dynamic shared memory:

```
postgres=# show shared_buffers;
 shared_buffers
----------------
 1GB
(1 row)

postgres=# show shared_memory_type;
 shared_memory_type
--------------------
 mmap
(1 row)

postgres=# show dynamic_shared_memory_type;
 dynamic_shared_memory_type
----------------------------
 posix
(1 row)
```

with /dev/shm of only the default: 64 MB:

```
sh-4.4$ df /dev/shm
Filesystem     1K-blocks  Used Available Use% Mounted on
shm                65536 10000     55536  16% /dev/shm
```

According to enterprisedb, you can solve this for each dynamic_shared_memory_type:

a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small)

   Add something like this:
        volumeMounts:
        - mountPath: /dev/shm
          name: shm
      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi

b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume.

https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/
CrunchyData/postgres-operator#2783
kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here)

Since we had similarly enormous shmall values, we tried "sysv"

```
sh-4.4$ ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1

sh-4.4$ cat /proc/sys/kernel/shmall
18446744073692774399
sh-4.4$ cat /proc/sys/kernel/shmmax
18446744073692774399
```

To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap.

We added a new file:

```
data:
  001_yolo_overrides.conf: >
    #------------------------------------------------------------------------------

    dynamic_shared_memory_type = sysv

    #------------------------------------------------------------------------------
  01_miq_overrides.conf: >
...
```
jrafanie added a commit to jrafanie/manageiq-pods that referenced this issue Jun 9, 2023
For posix, the default posix shared memory volume in openshift is 64 MB.
We could solve this problem by mounting a larger volume and mounting that.

Or, we can just use sysv with a large enough shmall/shmmax, which we already have.

Since sysv is easier to implement, we're doing this here.

The details below show how to check the shared memory types, sizes, and possible
solutions for both posix and sysv.

We originally had posix for dynamic shared memory:

```
postgres=# show shared_buffers;
 shared_buffers
----------------
 1GB
(1 row)

postgres=# show shared_memory_type;
 shared_memory_type
--------------------
 mmap
(1 row)

postgres=# show dynamic_shared_memory_type;
 dynamic_shared_memory_type
----------------------------
 posix
(1 row)
```

with /dev/shm of only the default: 64 MB:

```
sh-4.4$ df /dev/shm
Filesystem     1K-blocks  Used Available Use% Mounted on
shm                65536 10000     55536  16% /dev/shm
```

According to enterprisedb, you can solve this for each dynamic_shared_memory_type:

a) posix: by specifying a larger volume for "posix" (the default of 64 MB is too small)

   Add something like this:
        volumeMounts:
        - mountPath: /dev/shm
          name: shm
      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi

b) sysv: or if your shmall/shmmax is large enough in the container, you can use "sysv" for your dynamic_shared_memory_type and you don't need to worry about the volume.

https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/postgresql_conf/
CrunchyData/postgres-operator#2783
kubernetes/kubernetes#28272 (posix shared memory was implemented in kubernetes here)

Since we had similarly enormous shmall values, we tried "sysv"

```
sh-4.4$ ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1

sh-4.4$ cat /proc/sys/kernel/shmall
18446744073692774399
sh-4.4$ cat /proc/sys/kernel/shmmax
18446744073692774399
```

To do this manually on an existing podified installation, we edited the postgresql-configs ConfigMap.

We added a new file:

```
data:
  001_yolo_overrides.conf: >
    #------------------------------------------------------------------------------

    dynamic_shared_memory_type = sysv

    #------------------------------------------------------------------------------
  01_miq_overrides.conf: >
...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.