Skip to content

Container with multiple processes not terminated when OOM #50632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kellycampbell opened this issue Aug 14, 2017 · 22 comments
Closed

Container with multiple processes not terminated when OOM #50632

kellycampbell opened this issue Aug 14, 2017 · 22 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@kellycampbell
Copy link

/kind bug

What happened:

A pod container reached its memory limit. Then the oom-killer killed only one process within the container. This container has a uwsgi python server which gave this error in its logs:

DAMN ! worker 1 (pid: 1432) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 1473)

The only errors I could find in k8s were in the syslog on the node:

Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.105281] uwsgi invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=-998
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.109569] uwsgi cpuset=05d27aafc4e80e117506eb5da77dea2d881129d8db17466d31c0cc8ad8e13c52 mems_allowed=0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.115236] CPU: 0 PID: 13965 Comm: uwsgi Tainted: G            E   4.4.65-k8s #1
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.118330] Hardware name: Xen HVM domU, BIOS 4.2.amazon 02/16/2017
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  0000000000000286 00000000a2a9f130 ffffffff812f67b5 ffff880011423e20
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  ffff8801fd092800 ffffffff811d8855 ffffffff81826173 ffff8801fd092800
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  ffffffff81a6b740 0000000000000206 0000000000000002 ffff8800e9f10ab8
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445] Call Trace:
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff812f67b5>] ? dump_stack+0x5c/0x77
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff811d8855>] ? dump_header+0x62/0x1d7
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff8116ded1>] ? oom_kill_process+0x211/0x3d0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff811d0f4f>] ? mem_cgroup_iter+0x1cf/0x360
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff811d2de3>] ? mem_cgroup_out_of_memory+0x283/0x2c0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff811d3abd>] ? mem_cgroup_oom_synchronize+0x32d/0x340
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff811cf170>] ? mem_cgroup_begin_page_stat+0x90/0x90
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff8116e5b4>] ? pagefault_out_of_memory+0x44/0xc0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.120445]  [<ffffffff815a65b8>] ? page_fault+0x28/0x30
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.200077] Task in /kubepods/pod71cf1407-73c1-11e7-8d6e-063b53e2a39f/05d27aafc4e80e117506eb5da77dea2d881129d8db17466d31c0cc8ad8e13c52 killed as a result of limit of /kubepods/pod71cf1407-73c1-11e7-8d6e-063b53e2a39f
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.209412] memory: usage 256000kB, limit 256000kB, failcnt 379
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.212293] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.215517] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.219331] Memory cgroup stats for /kubepods/pod71cf1407-73c1-11e7-8d6e-063b53e2a39f: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.231158] Memory cgroup stats for /kubepods/pod71cf1407-73c1-11e7-8d6e-063b53e2a39f/d836fcdfc1ab1f1d4ec7a49ba763b8770015248f3b7da43bdb0948faee5d6163: cache:0KB rss:36KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:36KB inactive_file:0KB active_file:0KB unevictable:0KB
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.245937] Memory cgroup stats for /kubepods/pod71cf1407-73c1-11e7-8d6e-063b53e2a39f/05d27aafc4e80e117506eb5da77dea2d881129d8db17466d31c0cc8ad8e13c52: cache:696KB rss:255268KB rss_huge:0KB mapped_file:696KB dirty:0KB writeback:0KB inactive_anon:300KB active_anon:255652KB inactive_file:0KB active_file:0KB unevictable:0KB
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.260703] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.265697] [ 6726]     0  6726      257        1       4       2        0          -998 pause
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.270433] [ 6857]     1  6857    29473     2393      29       3        0          -998 uwsgi
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.274906] [13957]     1 13957   385176    66344     270       5        0          -998 uwsgi
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.279180] Memory cgroup out of memory: Kill process 13957 (uwsgi) score 42 or sacrifice child
Aug 14 12:06:33 ip-172-20-157-22 kernel: [1620280.283460] Killed process 13957 (uwsgi) total-vm:1540704kB, anon-rss:252808kB, file-rss:12568kB

What you expected to happen:

I expected the whole container/pod to be terminated (and then restarted by the replica-set controller). I also expected to see "Restarts" count above 0 on the pods, and events in the pod or replica-set.

According to documentation at https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-containers-memory-limit the whole container should be terminated:

If a container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to to consume memory beyond its limit, the Container is terminated. If a terminated Container is restartable, the kubelet will restart it, as with any other type of runtime failure.

How to reproduce it (as minimally and precisely as possible):

Setup a multi-process server in a pod, e.g. uwsgi and django, where the uwsgi is the main process started in the container by k8s. Then have the child process use up more memory than the container limit.

Anything else we need to know?:

Another nice-to-have would be for the endpoint of a container that reaches the mem limit to immediately become not-ready and endpoints to be taken out of services until it passes health checks again. Because of the hard sigkill, we're not able to gracefully handle this condition and client connections get dropped. I saw the workaround in #40157, so we will try that.

Environment:

  • Kubernetes version: v1.6.4
  • Cloud provider or hardware configuration: AWS
  • OS: Debian GNU/Linux 8 (jessie)
  • Kernel: 4.4.65-k8s
  • Install tools: kops
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 14, 2017
@k8s-github-robot
Copy link

@kellycampbell
There are no sig labels on this issue. Please add a sig label by:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <label>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. You can find the group list here and label list here.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 14, 2017
@dims
Copy link
Member

dims commented Aug 14, 2017

@kellycampbell - Did you try the die-on-term option in uwsgi configuration?

unbit/uwsgi#849 (comment)

Thanks,
Dims

@kellycampbell
Copy link
Author

Yes, we already use die-on-term. This works when the main process (uwsgi) receives sigterm, e.g. during rolling update. This problem isn't a sigterm, and it is going to the child process (uwsgi-python worker).

Here's our uwsgi config if it helps:

[uwsgi]
uid = 1
http-socket = :8000
wsgi-file = myapp/wsgi.py
# lazy-apps required for prometheus metrics on separate port
lazy-apps = true
enable-threads = true
# one process per kubernetes pod
processes = 1
threads = 32
listen = 10
stats = :9191
master-fifo = /tmp/uwsgi-fifo
# Alarm when the listen queue gets full. http://uwsgi-docs.readthedocs.io/en/latest/AlarmSubsystem.html
alarm = queue_full signal:3
alarm-backlog = queue_full
# Shutdown on sigterm
die-on-term = true

@kellycampbell
Copy link
Author

Only thing on the sig list which looks like applicable is resource-management.

/sig wg-resource-management

@dims
Copy link
Member

dims commented Aug 14, 2017

@kellycampbell i don't see any support in uwsgi for the parent to get notified when the child gets killed. i can see folks struggling with this with just docker - google for "oom-killer docker kill all processes"

Is master=false an option then?

@kellycampbell
Copy link
Author

uwsgi reaps the child process just fine, and restarts another worker in its place. The problem from my point of view is that this doesn't match the k8s documentation about memory limits as quoted in my first post, and doesn't surface the fact that the infrastructure is killing a process, e.g. in the events list for the pod or somewhere easy to notice.

Ideally, there would also be a way to gracefully handle the interruption of the process being killed as feature request #40157 requests.

@xiangpengzhao
Copy link
Contributor

A pod container reached its memory limit. Then the oom-killer killed only one process within the container.

IIUC, the process which consumes the most memory will be oom-killed in this case. The container won't terminate unless the killed perocess is the main process within the container.

/sig node
@vishh

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 15, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 15, 2017
@dims
Copy link
Member

dims commented Aug 16, 2017

@xiangpengzhao yes, that's why i was looking at options for @kellycampbell where the main process is the only one in the container. Guess we need a big disclaimer about child process(es) being oom-killed

@kellycampbell
Copy link
Author

I think maybe it's not clear how the resource limits are enforced. After troubleshooting this issue, I discovered my own misunderstanding of responsibilities for k8s vs the container runtime and linux cgroups.

I found this documentation helpful in understanding what is happening: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html

This other page in the k8s docs could have better info under the "How Pods with resource limits are run" section: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Outside of documentation changes, the two other things that I think should be considered long-term for k8s are:

a) how to surface the event when a particular container breached its memory limit and had processes killed (if the pod doesn't terminate itself from the oom-killed process) so admins know why things aren't working.

b) a way to more gracefully handle containers reaching their memory limits, e.g. with a signal to the pod (#40157)

@timothysc
Copy link
Member

/cc @kubernetes/sig-node-bugs

@vishh
Copy link
Contributor

vishh commented Aug 16, 2017

Containers are marked as OOM killed only when the init pid gets killed by the kernel OOM killer. There are apps that can tolerate OOM kills of non init processes and so we chose to not track non-init process OOM kills.

@vishh
Copy link
Contributor

vishh commented Aug 16, 2017

I'd say this is Working as Intended.

@vishh vishh closed this as completed Aug 16, 2017
@chrissound
Copy link

I agree with @kellycampbell this behavior is not very well documented...

@discordianfish
Copy link
Contributor

I just ran into this issue too and I agree that this isn't well documented. I can see how one would assume that k8s enforces the memory limit and communicates this via the api/events/metrics.

The real problem IMO though is a lack of visibility when this happens. You can get this from the kernel log and more recent kernels expose that in vmstat (which is surfaced by the node-exporter as node_vmstat_oom_kill) but can't be correlated to a pod.

@desaintmartin
Copy link
Member

Hello,
This behavior is quite misleading as it actually delegates the termination of the Pod to... the container itself.

This can lead to misbehaving or non-optimal Pods which still pass the healthchecks but should be destroyed anyway.

I actually had a case where the same process was being killed over and over (~2000 times over 1 hour) but was kept being re-spawned by its init process. Then the init process got OOMKilled and the container restarted.

I suppose this issue is more a Docker issue than a Kubernetes one.

@marcellodesales
Copy link

How can we evict a pod that has a container restarting, among more than one container per pod?

@kellycampbell
Copy link
Author

This project may be useful: https://github.com/ricardomaraschini/oomhero

@hmeerlo
Copy link

hmeerlo commented Jan 3, 2020

Ok, this has bitten me in the a** bigtime. Cost me a day to find out that one of my Python child processes was OOM killed. I would absolutely vote for a oom-killer which always kills the parent process no matter what. That makes the behaviour at least consistent. You assign a resource limit to the pod (as an entity), and clearly the pod went over that limit so it should have been restarted.

@ringerc
Copy link

ringerc commented Jun 10, 2022

It's vital not to unconditonally kill whole pods. Some apps rely on being able to handle child process OOMs themselves, and do not want the whole pod recursively killed. PostgreSQL for example.

@qixiaobo
Copy link

The same problem , but cause some serious consequences.
Nodejs was killed by oomkiller. But pod still alive.
Then the nodejs process restarted causing heavy io.
At last the kubelet process hang on slow io.
Then the node become zombie……

Nov 30 10:16:29 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787161 (node), UID 1000, total-vm:612960kB, anon-rss:77084kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:16:29 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787213 (node), UID 1000, total-vm:612960kB, anon-rss:77456kB, file-rss:13704kB, shmem-rss:0kB
Nov 30 10:16:30 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787445 (node), UID 1000, total-vm:622236kB, anon-rss:81348kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:31 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787331 (node), UID 1000, total-vm:634944kB, anon-rss:93752kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:16:32 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787282 (node), UID 1000, total-vm:625864kB, anon-rss:90644kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:33 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787368 (node), UID 1000, total-vm:630184kB, anon-rss:94864kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 789403 (node), UID 1000, total-vm:629600kB, anon-rss:91844kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 789484 (node), UID 1000, total-vm:629600kB, anon-rss:92264kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 786912 (node), UID 1000, total-vm:613060kB, anon-rss:76188kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:37 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 790141 (node), UID 1000, total-vm:621860kB, anon-rss:84052kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:37 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 790397 (node), UID 1000, total-vm:621860kB, anon-rss:84372kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:37 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 790604 (node), UID 1000, total-vm:618780kB, anon-rss:76388kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:16:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 791803 (node), UID 1000, total-vm:625180kB, anon-rss:83660kB, file-rss:13584kB, shmem-rss:0kB
Nov 30 10:16:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 791087 (node), UID 1000, total-vm:619096kB, anon-rss:82968kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:16:40 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 791551 (node), UID 1000, total-vm:631160kB, anon-rss:88824kB, file-rss:13664kB, shmem-rss:0kB
Nov 30 10:16:41 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 787033 (node), UID 1000, total-vm:613676kB, anon-rss:77892kB, file-rss:13732kB, shmem-rss:0kB
Nov 30 10:16:42 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 789644 (node), UID 1000, total-vm:613020kB, anon-rss:75412kB, file-rss:13720kB, shmem-rss:0kB
Nov 30 10:16:42 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 793234 (node), UID 1000, total-vm:629472kB, anon-rss:87216kB, file-rss:13660kB, shmem-rss:0kB
Nov 30 10:16:43 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 792834 (node), UID 1000, total-vm:630172kB, anon-rss:91256kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:16:44 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 793084 (node), UID 1000, total-vm:624736kB, anon-rss:87972kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:16:44 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 793165 (node), UID 1000, total-vm:624736kB, anon-rss:88412kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:45 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 792611 (node), UID 1000, total-vm:620940kB, anon-rss:85168kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:16:46 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 794218 (node), UID 1000, total-vm:631760kB, anon-rss:93532kB, file-rss:13712kB, shmem-rss:0kB
Nov 30 10:16:47 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 795260 (node), UID 1000, total-vm:641608kB, anon-rss:101024kB, file-rss:13608kB, shmem-rss:0kB
Nov 30 10:16:48 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 793711 (node), UID 1000, total-vm:636192kB, anon-rss:100896kB, file-rss:13712kB, shmem-rss:0kB
Nov 30 10:16:48 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 794502 (node), UID 1000, total-vm:612028kB, anon-rss:75516kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:16:51 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 795881 (node), UID 1000, total-vm:620108kB, anon-rss:84004kB, file-rss:13680kB, shmem-rss:0kB
Nov 30 10:16:51 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 795630 (node), UID 1000, total-vm:621120kB, anon-rss:84620kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:16:51 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 797281 (node), UID 1000, total-vm:618644kB, anon-rss:78188kB, file-rss:13640kB, shmem-rss:0kB
Nov 30 10:16:52 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 794836 (node), UID 1000, total-vm:620152kB, anon-rss:84244kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:53 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 796206 (node), UID 1000, total-vm:620540kB, anon-rss:84448kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:16:53 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 797444 (node), UID 1000, total-vm:614776kB, anon-rss:78336kB, file-rss:13680kB, shmem-rss:0kB
Nov 30 10:16:54 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 798189 (node), UID 1000, total-vm:663612kB, anon-rss:123244kB, file-rss:13496kB, shmem-rss:0kB
Nov 30 10:16:55 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 798492 (node), UID 1000, total-vm:613696kB, anon-rss:76020kB, file-rss:13560kB, shmem-rss:0kB
Nov 30 10:16:56 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 798355 (node), UID 1000, total-vm:655992kB, anon-rss:114776kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:16:57 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 797606 (node), UID 1000, total-vm:624180kB, anon-rss:81596kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:16:57 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 799153 (node), UID 1000, total-vm:606096kB, anon-rss:68308kB, file-rss:13648kB, shmem-rss:0kB
Nov 30 10:16:58 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 799402 (node), UID 1000, total-vm:642596kB, anon-rss:102060kB, file-rss:13664kB, shmem-rss:0kB
Nov 30 10:16:59 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 799889 (node), UID 1000, total-vm:650784kB, anon-rss:110296kB, file-rss:13516kB, shmem-rss:0kB
Nov 30 10:17:00 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 799707 (node), UID 1000, total-vm:613964kB, anon-rss:78132kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:17:00 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 799791 (node), UID 1000, total-vm:613964kB, anon-rss:78776kB, file-rss:13704kB, shmem-rss:0kB
Nov 30 10:17:01 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 800660 (node), UID 1000, total-vm:611820kB, anon-rss:74736kB, file-rss:13592kB, shmem-rss:0kB
Nov 30 10:17:01 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 800329 (node), UID 1000, total-vm:623756kB, anon-rss:84900kB, file-rss:13660kB, shmem-rss:0kB
Nov 30 10:17:02 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 801542 (node), UID 1000, total-vm:643292kB, anon-rss:103412kB, file-rss:13516kB, shmem-rss:0kB
Nov 30 10:17:02 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 798918 (node), UID 1000, total-vm:620628kB, anon-rss:84496kB, file-rss:13708kB, shmem-rss:0kB
Nov 30 10:17:02 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 801084 (node), UID 1000, total-vm:621564kB, anon-rss:80728kB, file-rss:13652kB, shmem-rss:0kB
Nov 30 10:17:03 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 801822 (node), UID 1000, total-vm:630504kB, anon-rss:90168kB, file-rss:13572kB, shmem-rss:0kB
Nov 30 10:17:04 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802158 (node), UID 1000, total-vm:655340kB, anon-rss:116072kB, file-rss:13500kB, shmem-rss:0kB
Nov 30 10:17:04 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802277 (node), UID 1000, total-vm:655340kB, anon-rss:116660kB, file-rss:13524kB, shmem-rss:0kB
Nov 30 10:17:06 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 800868 (node), UID 1000, total-vm:629336kB, anon-rss:92424kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:17:06 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802415 (node), UID 1000, total-vm:625528kB, anon-rss:88576kB, file-rss:13712kB, shmem-rss:0kB
Nov 30 10:17:06 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802527 (node), UID 1000, total-vm:625528kB, anon-rss:89120kB, file-rss:13720kB, shmem-rss:0kB
Nov 30 10:17:06 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802746 (node), UID 1000, total-vm:612188kB, anon-rss:71620kB, file-rss:13664kB, shmem-rss:0kB
Nov 30 10:17:07 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 803286 (node), UID 1000, total-vm:655704kB, anon-rss:116168kB, file-rss:13520kB, shmem-rss:0kB
Nov 30 10:17:07 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 802966 (node), UID 1000, total-vm:632524kB, anon-rss:93188kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:17:09 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 803116 (node), UID 1000, total-vm:622044kB, anon-rss:80488kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:09 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 803190 (node), UID 1000, total-vm:622044kB, anon-rss:80884kB, file-rss:13680kB, shmem-rss:0kB
Nov 30 10:17:09 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 803791 (node), UID 1000, total-vm:621216kB, anon-rss:79468kB, file-rss:13620kB, shmem-rss:0kB
Nov 30 10:17:11 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 804294 (node), UID 1000, total-vm:622260kB, anon-rss:85460kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:11 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 804536 (node), UID 1000, total-vm:618792kB, anon-rss:82528kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:17:12 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 804741 (node), UID 1000, total-vm:618772kB, anon-rss:81692kB, file-rss:13664kB, shmem-rss:0kB
Nov 30 10:17:13 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 804970 (node), UID 1000, total-vm:619540kB, anon-rss:81808kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:17:13 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 805001 (node), UID 1000, total-vm:619540kB, anon-rss:82136kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:14 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 805705 (node), UID 1000, total-vm:629396kB, anon-rss:89368kB, file-rss:13620kB, shmem-rss:0kB
Nov 30 10:17:15 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 805098 (node), UID 1000, total-vm:657336kB, anon-rss:116804kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:15 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 806272 (node), UID 1000, total-vm:604344kB, anon-rss:65916kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:16 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 806571 (node), UID 1000, total-vm:611556kB, anon-rss:72348kB, file-rss:13644kB, shmem-rss:0kB
Nov 30 10:17:16 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 807531 (node), UID 1000, total-vm:621552kB, anon-rss:82676kB, file-rss:13508kB, shmem-rss:0kB
Nov 30 10:17:17 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 807038 (node), UID 1000, total-vm:632804kB, anon-rss:93684kB, file-rss:13560kB, shmem-rss:0kB
Nov 30 10:17:18 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 805551 (node), UID 1000, total-vm:646604kB, anon-rss:109800kB, file-rss:13720kB, shmem-rss:0kB
Nov 30 10:17:18 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 807860 (node), UID 1000, total-vm:620456kB, anon-rss:83200kB, file-rss:13588kB, shmem-rss:0kB
Nov 30 10:17:19 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 808067 (node), UID 1000, total-vm:626324kB, anon-rss:85180kB, file-rss:13652kB, shmem-rss:0kB
Nov 30 10:17:20 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 808340 (node), UID 1000, total-vm:621760kB, anon-rss:84704kB, file-rss:13584kB, shmem-rss:0kB
Nov 30 10:17:21 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 808732 (node), UID 1000, total-vm:621080kB, anon-rss:80024kB, file-rss:13608kB, shmem-rss:0kB
Nov 30 10:17:22 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 807234 (node), UID 1000, total-vm:621276kB, anon-rss:83724kB, file-rss:13712kB, shmem-rss:0kB
Nov 30 10:17:22 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 809316 (node), UID 1000, total-vm:616708kB, anon-rss:80284kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:23 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 809484 (node), UID 1000, total-vm:621612kB, anon-rss:80548kB, file-rss:13716kB, shmem-rss:0kB
Nov 30 10:17:23 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 809722 (node), UID 1000, total-vm:622156kB, anon-rss:80408kB, file-rss:13468kB, shmem-rss:0kB
Nov 30 10:17:24 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 810194 (node), UID 1000, total-vm:653432kB, anon-rss:113496kB, file-rss:13656kB, shmem-rss:0kB
Nov 30 10:17:25 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 809100 (node), UID 1000, total-vm:647216kB, anon-rss:109912kB, file-rss:13708kB, shmem-rss:0kB
Nov 30 10:17:26 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 810558 (node), UID 1000, total-vm:623776kB, anon-rss:87124kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:26 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 810956 (node), UID 1000, total-vm:622368kB, anon-rss:80384kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:17:27 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 811188 (node), UID 1000, total-vm:642488kB, anon-rss:100196kB, file-rss:13600kB, shmem-rss:0kB
Nov 30 10:17:28 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 811835 (node), UID 1000, total-vm:631488kB, anon-rss:94216kB, file-rss:13636kB, shmem-rss:0kB
Nov 30 10:17:29 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 811542 (node), UID 1000, total-vm:619564kB, anon-rss:82360kB, file-rss:13668kB, shmem-rss:0kB
Nov 30 10:17:30 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 811328 (node), UID 1000, total-vm:614972kB, anon-rss:78764kB, file-rss:13680kB, shmem-rss:0kB
Nov 30 10:17:30 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 811461 (node), UID 1000, total-vm:614972kB, anon-rss:79112kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:30 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 812238 (node), UID 1000, total-vm:612720kB, anon-rss:75324kB, file-rss:13668kB, shmem-rss:0kB
Nov 30 10:17:31 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 812561 (node), UID 1000, total-vm:626664kB, anon-rss:88200kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:32 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 812867 (node), UID 1000, total-vm:626992kB, anon-rss:88864kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:17:33 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 814390 (node), UID 1000, total-vm:627744kB, anon-rss:86892kB, file-rss:13480kB, shmem-rss:0kB
Nov 30 10:17:33 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 813396 (node), UID 1000, total-vm:610408kB, anon-rss:69252kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:34 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 814095 (node), UID 1000, total-vm:623492kB, anon-rss:87424kB, file-rss:13576kB, shmem-rss:0kB
Nov 30 10:17:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 813033 (node), UID 1000, total-vm:616936kB, anon-rss:78952kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815224 (node), UID 1000, total-vm:631584kB, anon-rss:91684kB, file-rss:13496kB, shmem-rss:0kB
Nov 30 10:17:36 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 813848 (node), UID 1000, total-vm:649924kB, anon-rss:109880kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:37 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815308 (node), UID 1000, total-vm:613916kB, anon-rss:75748kB, file-rss:13640kB, shmem-rss:0kB
Nov 30 10:17:37 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815428 (node), UID 1000, total-vm:612584kB, anon-rss:72416kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 814995 (node), UID 1000, total-vm:613968kB, anon-rss:78128kB, file-rss:13716kB, shmem-rss:0kB
Nov 30 10:17:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815119 (node), UID 1000, total-vm:613968kB, anon-rss:79028kB, file-rss:13724kB, shmem-rss:0kB
Nov 30 10:17:39 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815625 (node), UID 1000, total-vm:623472kB, anon-rss:87244kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:39 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815747 (node), UID 1000, total-vm:623472kB, anon-rss:87764kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:17:40 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 815883 (node), UID 1000, total-vm:628180kB, anon-rss:91680kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:40 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 816683 (node), UID 1000, total-vm:607504kB, anon-rss:68844kB, file-rss:13676kB, shmem-rss:0kB
Nov 30 10:17:41 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 817782 (node), UID 1000, total-vm:619324kB, anon-rss:77732kB, file-rss:13624kB, shmem-rss:0kB
Nov 30 10:17:41 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 817933 (node), UID 1000, total-vm:615452kB, anon-rss:75656kB, file-rss:13532kB, shmem-rss:0kB
Nov 30 10:17:42 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 816420 (node), UID 1000, total-vm:631444kB, anon-rss:93600kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:17:43 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 817037 (node), UID 1000, total-vm:636704kB, anon-rss:96300kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:44 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 817194 (node), UID 1000, total-vm:674616kB, anon-rss:133380kB, file-rss:13648kB, shmem-rss:0kB
Nov 30 10:17:44 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 818294 (node), UID 1000, total-vm:620872kB, anon-rss:83360kB, file-rss:13576kB, shmem-rss:0kB
Nov 30 10:17:45 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 818809 (node), UID 1000, total-vm:618796kB, anon-rss:82076kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:46 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 818550 (node), UID 1000, total-vm:639348kB, anon-rss:97916kB, file-rss:13628kB, shmem-rss:0kB
Nov 30 10:17:47 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 819466 (node), UID 1000, total-vm:619184kB, anon-rss:78112kB, file-rss:13672kB, shmem-rss:0kB
Nov 30 10:17:47 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 819825 (node), UID 1000, total-vm:631544kB, anon-rss:90732kB, file-rss:13504kB, shmem-rss:0kB
Nov 30 10:17:48 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 819991 (node), UID 1000, total-vm:653172kB, anon-rss:112444kB, file-rss:13628kB, shmem-rss:0kB
Nov 30 10:17:50 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 819228 (node), UID 1000, total-vm:631344kB, anon-rss:95232kB, file-rss:13716kB, shmem-rss:0kB
Nov 30 10:17:50 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 820153 (node), UID 1000, total-vm:626168kB, anon-rss:91088kB, file-rss:13620kB, shmem-rss:0kB
Nov 30 10:17:51 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 820759 (node), UID 1000, total-vm:620332kB, anon-rss:83120kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:17:51 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 821210 (node), UID 1000, total-vm:629680kB, anon-rss:67164kB, file-rss:13616kB, shmem-rss:0kB
Nov 30 10:17:52 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 821658 (node), UID 1000, total-vm:622004kB, anon-rss:82000kB, file-rss:13584kB, shmem-rss:0kB
Nov 30 10:17:53 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 822313 (node), UID 1000, total-vm:627980kB, anon-rss:89480kB, file-rss:13556kB, shmem-rss:0kB
Nov 30 10:17:53 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 821419 (node), UID 1000, total-vm:621340kB, anon-rss:85168kB, file-rss:13676kB, shmem-rss:0kB
Nov 30 10:17:54 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 821970 (node), UID 1000, total-vm:639940kB, anon-rss:103656kB, file-rss:13688kB, shmem-rss:0kB
Nov 30 10:17:55 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 822705 (node), UID 1000, total-vm:644200kB, anon-rss:105104kB, file-rss:13500kB, shmem-rss:0kB
Nov 30 10:17:57 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 823554 (node), UID 1000, total-vm:617756kB, anon-rss:79964kB, file-rss:13564kB, shmem-rss:0kB
Nov 30 10:17:57 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 822153 (node), UID 1000, total-vm:610156kB, anon-rss:74544kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:17:59 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 823455 (node), UID 1000, total-vm:636784kB, anon-rss:101756kB, file-rss:13696kB, shmem-rss:0kB
Nov 30 10:18:01 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 823830 (node), UID 1000, total-vm:643164kB, anon-rss:106528kB, file-rss:13708kB, shmem-rss:0kB
Nov 30 10:18:12 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 824903 (node), UID 1000, total-vm:743664kB, anon-rss:212060kB, file-rss:13692kB, shmem-rss:0kB
Nov 30 10:18:15 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 825226 (node), UID 1000, total-vm:767008kB, anon-rss:235068kB, file-rss:13684kB, shmem-rss:0kB
Nov 30 10:18:15 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 825301 (node), UID 1000, total-vm:767008kB, anon-rss:235396kB, file-rss:13700kB, shmem-rss:0kB
Nov 30 10:18:23 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 826166 (node), UID 1000, total-vm:721192kB, anon-rss:189156kB, file-rss:13728kB, shmem-rss:0kB
Nov 30 10:18:26 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 824150 (node), UID 1000, total-vm:808352kB, anon-rss:276524kB, file-rss:13740kB, shmem-rss:0kB
Nov 30 10:18:30 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 825425 (node), UID 1000, total-vm:781264kB, anon-rss:248056kB, file-rss:13736kB, shmem-rss:0kB
Nov 30 10:18:35 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 835824 (node), UID 1000, total-vm:794580kB, anon-rss:259332kB, file-rss:9060kB, shmem-rss:0kB
Nov 30 10:18:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 827050 (node), UID 1000, total-vm:799756kB, anon-rss:270804kB, file-rss:8896kB, shmem-rss:0kB
Nov 30 10:18:38 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 824608 (node), UID 1000, total-vm:754444kB, anon-rss:226136kB, file-rss:8916kB, shmem-rss:0kB
Nov 30 10:18:47 iZuf601dofntkxgp7lyfi9Z kernel: Killed process 832547 (node), UID 1000, total-vm:956752kB, anon-rss:430040kB, file-rss:5944kB, shmem-rss:0kB

@alexandru-lazarev
Copy link

alexandru-lazarev commented Apr 5, 2024

It's vital not to unconditonally kill whole pods. Some apps rely on being able to handle child process OOMs themselves, and do not want the whole pod recursively killed. PostgreSQL for example.

And recently in a PROD OOM Killer killed withing PG the bg_writer process, so our PG got in inconsistent state, along with it some logical and streaming replicas. (At least this what I observed and deducted from logs)

@ringerc
Copy link

ringerc commented Jun 12, 2024

@alexandru-lazarev If a Pg process gets OOM-killed the postmaster does an emergency shutdown and restart. It's inconvenient, but should never cause any data issues unless you're doing very unsafe things like running with the fsync = off setting. A bug is always possible, but Pg is pretty crash-safe and I'd be surprised if a bgwriter crash or OOM kill corrupted anything. If you have enough info, consider raising the issue you encountered with the postgres mailing lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests