failed to create target - too many open files, ulimit -n 1048576 #1153

cameronbraid · 2019-10-14T16:24:37Z

I am getting errors in promtail :

level=error ts=2019-10-14T16:21:22.143910463Z caller=filetargetmanager.go:261 msg="Failed to create target" key="{container_name=\"sentinel\", deployment=\"redis-ha-3\", deploymentconfig=\"redis-ha\", instance=\"redis-ha-3-vmxdz\", job=\"project/redis-ha\", name=\"redis-ha\", namespace=\"project\", redis_sentinel=\"true\"}" error="filetarget.fsnotify.NewWatcher: too many open files"

on the host

ulimit -n
1024000

in the promtail container

root@promtail-22r6h:/# ulimit -n
1048576

Any hints on how to solve this ?

The text was updated successfully, but these errors were encountered:

cyriltovena · 2019-10-15T12:05:03Z

Hello @cameronbraid do you have more details ?

Is this happening for every nodes ? or only certain nodes ? Can you check which files are open ?

cameronbraid · 2019-10-15T12:18:47Z

Sure, its a openshift cluster v3.9, docker log driver is json-file

There are 3 nodes that are working fine, and three that have this issue.

On one of the problematic nodes the file counts are :

>find /var/log/pods | wc -l
200
> find /var/lib/docker/containers | wc -l
2304
> sysctl fs.file-max
fs.file-max = 13059628
> sysctl fs.file-nr
fs.file-nr = 62688      0       13059628

cyriltovena · 2019-10-15T14:22:33Z

I think there is something on those nodes already using a lot of file descriptors, and this is not promtail (unless you really have more containers running on those nodes).

You should check what that is, if this is fine for you, then the only options is busting up that limit.

cameronbraid · 2019-10-16T08:08:36Z

Yes, there are lots of file descriptors used, that's not the issue as nothing else is complaining about it. Its only promtail.

And you can see from the stats, 62688 open before launching promtail, with a max of 13059628, leaving room for 12996940 more. The targets that promtail creates are all in /var/log/pods and /var/lib/docker/containers which are a total of 2500.

cyriltovena · 2019-10-17T00:43:06Z

Could you check ulimit from within the promtail container ? It should see the same ulimit than the host but it doesn’t seems true here

cameronbraid · 2019-10-17T00:48:08Z

in promtail container

# ulimit -n
1048576

cameronbraid · 2019-10-17T00:50:11Z

Also in the container :

> cat /proc/sys/fs/file-nr
85248   0       13059968

cyriltovena · 2019-10-17T00:56:40Z

Have you tried to check the /service-discovery and /targets page of promtail to see how many targets you have ? I’m wondering if this is a promtail or openshift issue.

cyriltovena · 2019-10-17T00:59:09Z

Can you also activate debug log in promtail and share that with us ? we will see path and targets found.

cyriltovena · 2019-10-17T01:01:06Z

You can also check the metric exposed promtail_files_active_total.

cyriltovena · 2019-10-17T01:04:48Z

One last thing if your log files are rotating but old logs files are not deleted over time, promtail will keep watching them. This could easily build up, any chance you have ton of log file not used ?

cyriltovena · 2019-10-17T01:08:40Z

https://access.redhat.com/solutions/2334181 Do hou have a max file limit for docker ?

cameronbraid · 2019-10-17T01:17:01Z

/metrics :
promtail_files_active_total 0.0
promtail_targets_active_total 99.0

/service-discovery :
kubernetes-pods-app (17/794 active targets)
kubernetes-pods-direct-controllers (42/819 active targets)
kubernetes-pods-indirect-controller (0/777 active targets)
kubernetes-pods-name (40/817 active targets)
kubernetes-pods-static (0/777 active targets)

/targets :
kubernetes-pods-app (0/17 ready)
kubernetes-pods-direct-controllers (0/42 ready)
kubernetes-pods-indirect-controller (0/0 ready)
kubernetes-pods-name (0/40 ready)
kubernetes-pods-static (0/0 ready)

re #1153 (comment)
I don't think this is the case as :
find /var/lib/docker/containers -name "*.log" | wc -l
185

cameronbraid · 2019-10-17T01:18:09Z

debug-log.txt

cameronbraid · 2019-10-17T01:23:50Z

Also, by default openshift 3.9 uses journald for logging, however I changed it to use json-file and enabled log rotation as well

cyriltovena · 2019-10-17T02:05:06Z

From the log file you are not tailing a single file, but still creating a new watcher failed.

Are those master nodes or anything special ?

cameronbraid · 2019-10-17T02:08:17Z

The node I am using is compute,infra and master, though I am not sure how that would impact anything ?

cyriltovena · 2019-10-17T02:37:06Z

Have you tried this on the host ?

sysctl -a | grep inotify
lsof | grep inotify | wc -l

cameronbraid · 2019-10-17T02:49:26Z

fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 65536

lsof | grep inotify | wc -l
19547

I upped the inotify limits and am no longer getting that error.. however I still have issues.. will open a separate ticket for that.

Thanks heaps for your help.

cyriltovena · 2019-10-17T02:52:45Z

you should take a look at who is using inotify, if an application is leaking could be interesting to know, I don't think its promtail since from the logs it was not even able to go that far.

miklezzzz · 2019-12-12T07:22:20Z

@cyriltovena Hi, I observe something similar to TS.
Before promtail is running:

lsof | grep inotify | wc -l
490

After:

lsof | grep inotify | grep promtail | wc -l
15429

I wouldn't notice a problem, but kubectl logs to pods on a certain node started to fail with too many open files.

Is it by design?

miklezzzz · 2019-12-12T10:18:40Z

at the same time, there are not more than 160+ log files to tail.

cat /run/promtail/positions.yaml | wc -l 
165

some of them are marked with "?", though...

miklezzzz · 2019-12-12T12:43:11Z

i've updated /proc/sys/fs/inotify/max_user_instances to 512 instead of default 128 and the problem's gone

roidelapluie · 2019-12-12T12:47:52Z

good catch .. wondering if promtail should check this

Update instrumentation calls to remove deprecated interface

adamcharnock · 2024-03-08T10:42:44Z

I found this command helpful to get a list inotify counts by process. Note that is important to run it as root (via sudo here) otherwise you'll only get a partial count:

sudo lsof | grep inotify | tr -s ' ' | cut -f1,2 -d" " | sort | uniq -c | sort

Example tail of the output:

     39 container 38751
     39 container 455668
     51 cilium-ag 3306
     60 container 4848
     78 kubelet 455304
     90 kube-apis 8649
     92 container 454706
    312 promtail 4314

Edit: Also, thank you to @miklezzzz for your answer. That sorted it out for me. I went with 1024 in my case.

younsl · 2024-08-24T09:20:01Z

Same as miklezzzz's solution. On a target instance with promtail v2.9.8 installed, increase the fs.inotify.max_user_instances kernel parameter from 128 (default) to 1024 using the following procedure, then filetarget.fsnotify.NewWatcher: too many open files errors have disappeared.

# Environment: promtail 2.9.8 on amd64 EC2
echo "fs.inotify.max_user_instances = 1024" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
sysctl fs.inotify

StianOvrevage · 2025-01-22T22:52:52Z

This can be done by the suggested init container that is in the helm chart values:

initContainer:
  # -- Specifies whether the init container for setting inotify max user instances is to be enabled
  - name: init
    # -- Docker registry, image and tag for the init container image
    image: docker.io/busybox:1.33
    # -- Docker image pull policy for the init container image
    imagePullPolicy: IfNotPresent
    # -- The inotify max user instances to configure
    command:
      - sh
      - -c
      - sysctl -w fs.inotify.max_user_instances=512
    securityContext:
      privileged: true

https://github.com/grafana/helm-charts/blob/promtail-6.16.6/charts/promtail/values.yaml#L82

Can confirm that this works:

root@promtail-v2-g82w9:/# cat /proc/sys/fs/inotify/max_user_instances
512

rtomik · 2025-02-18T18:00:17Z

@StianOvrevage thanks it solved the issue for me

cameronbraid changed the title ~~to create target - too many open files, ulimit -n 1048576~~ failed to create target - too many open files, ulimit -n 1048576 Oct 14, 2019

cyriltovena mentioned this issue Oct 17, 2019

Documentation on how to get started with Openshift #1165

Open

cyriltovena closed this as completed Oct 17, 2019

billimek mentioned this issue Jan 22, 2020

pi4s keep locking up billimek/homelab-infrastructure#12

Closed

cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021

Merge pull request grafana#1153 from cortexproject/instrument-collector

57cf7b1

Update instrumentation calls to remove deprecated interface

sh0rez mentioned this issue Nov 11, 2021

Promtail using lots of inotify instances #4740

Closed

lorenzo-w mentioned this issue Oct 16, 2023

Include fix for promtail too many files error in setup playbook cloudlane-one/k8s-cloud#94

Closed

saurabh3460 mentioned this issue Mar 19, 2024

promtail fail on k3d cluster with "too many open files" infracloudio/sre-stack#79

Closed

This was referenced Aug 8, 2024

Documentation feedback: /docs/sources/send-data/promtail/cloud/ec2/_index.md #13809

Open

promtail: too many open files - possibly GO related? #3346

Closed

failed to create target - too many open files, ulimit -n 1048576 #1153

failed to create target - too many open files, ulimit -n 1048576 #1153

Comments

cameronbraid commented Oct 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cyriltovena commented Oct 15, 2019

Uh oh!

cameronbraid commented Oct 15, 2019

Uh oh!

cyriltovena commented Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cameronbraid commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

cameronbraid commented Oct 17, 2019

Uh oh!

cyriltovena commented Oct 17, 2019

Uh oh!

miklezzzz commented Dec 12, 2019

Uh oh!

miklezzzz commented Dec 12, 2019

Uh oh!

miklezzzz commented Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roidelapluie commented Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamcharnock commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younsl commented Aug 24, 2024

Uh oh!

StianOvrevage commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtomik commented Feb 18, 2025

Uh oh!

cameronbraid commented Oct 14, 2019 •

edited

Loading

cyriltovena commented Oct 15, 2019 •

edited

Loading

cameronbraid commented Oct 16, 2019 •

edited

Loading

miklezzzz commented Dec 12, 2019 •

edited

Loading

roidelapluie commented Dec 12, 2019 •

edited

Loading

adamcharnock commented Mar 8, 2024 •

edited

Loading

StianOvrevage commented Jan 22, 2025 •

edited

Loading