Skip to content

failed to create target - too many open files, ulimit -n 1048576 #1153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cameronbraid opened this issue Oct 14, 2019 · 28 comments
Closed

failed to create target - too many open files, ulimit -n 1048576 #1153

cameronbraid opened this issue Oct 14, 2019 · 28 comments

Comments

@cameronbraid
Copy link

cameronbraid commented Oct 14, 2019

I am getting errors in promtail :

level=error ts=2019-10-14T16:21:22.143910463Z caller=filetargetmanager.go:261 msg="Failed to create target" key="{container_name=\"sentinel\", deployment=\"redis-ha-3\", deploymentconfig=\"redis-ha\", instance=\"redis-ha-3-vmxdz\", job=\"project/redis-ha\", name=\"redis-ha\", namespace=\"project\", redis_sentinel=\"true\"}" error="filetarget.fsnotify.NewWatcher: too many open files"

on the host

ulimit -n
1024000

in the promtail container

root@promtail-22r6h:/# ulimit -n
1048576

Any hints on how to solve this ?

@cameronbraid cameronbraid changed the title to create target - too many open files, ulimit -n 1048576 failed to create target - too many open files, ulimit -n 1048576 Oct 14, 2019
@cyriltovena
Copy link
Contributor

Hello @cameronbraid do you have more details ?

Is this happening for every nodes ? or only certain nodes ? Can you check which files are open ?

@cameronbraid
Copy link
Author

Sure, its a openshift cluster v3.9, docker log driver is json-file

There are 3 nodes that are working fine, and three that have this issue.

On one of the problematic nodes the file counts are :

>find /var/log/pods | wc -l
200
> find /var/lib/docker/containers | wc -l
2304
> sysctl fs.file-max
fs.file-max = 13059628
> sysctl fs.file-nr
fs.file-nr = 62688      0       13059628

@cyriltovena
Copy link
Contributor

cyriltovena commented Oct 15, 2019

I think there is something on those nodes already using a lot of file descriptors, and this is not promtail (unless you really have more containers running on those nodes).

You should check what that is, if this is fine for you, then the only options is busting up that limit.

@cameronbraid
Copy link
Author

cameronbraid commented Oct 16, 2019

Yes, there are lots of file descriptors used, that's not the issue as nothing else is complaining about it. Its only promtail.

And you can see from the stats, 62688 open before launching promtail, with a max of 13059628, leaving room for 12996940 more. The targets that promtail creates are all in /var/log/pods and /var/lib/docker/containers which are a total of 2500.

@cyriltovena
Copy link
Contributor

Could you check ulimit from within the promtail container ? It should see the same ulimit than the host but it doesn’t seems true here

@cameronbraid
Copy link
Author

in promtail container

# ulimit -n
1048576

@cameronbraid
Copy link
Author

Also in the container :

> cat /proc/sys/fs/file-nr
85248   0       13059968

@cyriltovena
Copy link
Contributor

Have you tried to check the /service-discovery and /targets page of promtail to see how many targets you have ? I’m wondering if this is a promtail or openshift issue.

@cyriltovena
Copy link
Contributor

Can you also activate debug log in promtail and share that with us ? we will see path and targets found.

@cyriltovena
Copy link
Contributor

You can also check the metric exposed promtail_files_active_total.

@cyriltovena
Copy link
Contributor

One last thing if your log files are rotating but old logs files are not deleted over time, promtail will keep watching them. This could easily build up, any chance you have ton of log file not used ?

@cyriltovena
Copy link
Contributor

https://access.redhat.com/solutions/2334181 Do hou have a max file limit for docker ?

@cameronbraid
Copy link
Author

/metrics :
promtail_files_active_total 0.0
promtail_targets_active_total 99.0

/service-discovery :
kubernetes-pods-app (17/794 active targets)
kubernetes-pods-direct-controllers (42/819 active targets)
kubernetes-pods-indirect-controller (0/777 active targets)
kubernetes-pods-name (40/817 active targets)
kubernetes-pods-static (0/777 active targets)

/targets :
kubernetes-pods-app (0/17 ready)
kubernetes-pods-direct-controllers (0/42 ready)
kubernetes-pods-indirect-controller (0/0 ready)
kubernetes-pods-name (0/40 ready)
kubernetes-pods-static (0/0 ready)

re #1153 (comment)
I don't think this is the case as :
find /var/lib/docker/containers -name "*.log" | wc -l
185

@cameronbraid
Copy link
Author

debug-log.txt

@cameronbraid
Copy link
Author

Also, by default openshift 3.9 uses journald for logging, however I changed it to use json-file and enabled log rotation as well

@cyriltovena
Copy link
Contributor

From the log file you are not tailing a single file, but still creating a new watcher failed.

Are those master nodes or anything special ?

@cameronbraid
Copy link
Author

The node I am using is compute,infra and master, though I am not sure how that would impact anything ?

@cyriltovena
Copy link
Contributor

Have you tried this on the host ?

sysctl -a | grep inotify
lsof | grep inotify | wc -l

@cameronbraid
Copy link
Author

fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 65536
lsof | grep inotify | wc -l
19547

I upped the inotify limits and am no longer getting that error.. however I still have issues.. will open a separate ticket for that.

Thanks heaps for your help.

@cyriltovena
Copy link
Contributor

you should take a look at who is using inotify, if an application is leaking could be interesting to know, I don't think its promtail since from the logs it was not even able to go that far.

@miklezzzz
Copy link

@cyriltovena Hi, I observe something similar to TS.
Before promtail is running:

lsof | grep inotify | wc -l
490

After:

lsof | grep inotify | grep promtail | wc -l
15429

I wouldn't notice a problem, but kubectl logs to pods on a certain node started to fail with too many open files.

Is it by design?

@miklezzzz
Copy link

at the same time, there are not more than 160+ log files to tail.

cat /run/promtail/positions.yaml | wc -l 
165

some of them are marked with "?", though...

@miklezzzz
Copy link

miklezzzz commented Dec 12, 2019

i've updated /proc/sys/fs/inotify/max_user_instances to 512 instead of default 128 and the problem's gone

@roidelapluie
Copy link

roidelapluie commented Dec 12, 2019

good catch .. wondering if promtail should check this

cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
Update instrumentation calls to remove deprecated interface
@adamcharnock
Copy link

adamcharnock commented Mar 8, 2024

I found this command helpful to get a list inotify counts by process. Note that is important to run it as root (via sudo here) otherwise you'll only get a partial count:

sudo lsof | grep inotify | tr -s ' ' | cut -f1,2 -d" " | sort | uniq -c | sort

Example tail of the output:

     39 container 38751
     39 container 455668
     51 cilium-ag 3306
     60 container 4848
     78 kubelet 455304
     90 kube-apis 8649
     92 container 454706
    312 promtail 4314

Edit: Also, thank you to @miklezzzz for your answer. That sorted it out for me. I went with 1024 in my case.

@younsl
Copy link

younsl commented Aug 24, 2024

Same as miklezzzz's solution. On a target instance with promtail v2.9.8 installed, increase the fs.inotify.max_user_instances kernel parameter from 128 (default) to 1024 using the following procedure, then filetarget.fsnotify.NewWatcher: too many open files errors have disappeared.

# Environment: promtail 2.9.8 on amd64 EC2
echo "fs.inotify.max_user_instances = 1024" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
sysctl fs.inotify

@StianOvrevage
Copy link

StianOvrevage commented Jan 22, 2025

This can be done by the suggested init container that is in the helm chart values:

initContainer:
  # -- Specifies whether the init container for setting inotify max user instances is to be enabled
  - name: init
    # -- Docker registry, image and tag for the init container image
    image: docker.io/busybox:1.33
    # -- Docker image pull policy for the init container image
    imagePullPolicy: IfNotPresent
    # -- The inotify max user instances to configure
    command:
      - sh
      - -c
      - sysctl -w fs.inotify.max_user_instances=512
    securityContext:
      privileged: true

https://github.com/grafana/helm-charts/blob/promtail-6.16.6/charts/promtail/values.yaml#L82

Can confirm that this works:

root@promtail-v2-g82w9:/# cat /proc/sys/fs/inotify/max_user_instances
512

@rtomik
Copy link

rtomik commented Feb 18, 2025

@StianOvrevage thanks it solved the issue for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants