Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cri-o container stdout/stderr default log format (text) is unstructured and cant be parsed by monitoring tools #3181

Closed
ganga1980 opened this issue Feb 2, 2020 · 12 comments

Comments

@ganga1980
Copy link

ganga1980 commented Feb 2, 2020

Description

Default format of the stdout/stderr container logs at /var/logs/containers is not uniform across all the container logs. This breaks integration with monitoring tools like fluentbit and fluentd to tail the logs. Reviewing the cri-o code, default logformat is logrus text , but logrus recommends to enable default format is JSON so that monitoring tools can consume these container logs. https://github.com/sirupsen/logrus.
See this statement from logrus github
With log.SetFormatter(&log.JSONFormatter{}), for easy parsing by logstash or Splunk:


Here are the stdout/stderr logs emitted by the CRI-O and as you can see below there is no uniformity in the format of the logs

[root@myocp42-pdqlw-worker-eastus2-thg2w ~]# cd /var/log/containers
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# ls
alertmanager-main-0_openshift-monitoring_alertmanager-1cfc778377b1069df75ca29bc8d077ca3719eb27ee0d5d027ca452fe4547e2a9.log
alertmanager-main-0_openshift-monitoring_alertmanager-proxy-162795b5ba062f1d5c6940c161301907006288bf238b681155d70bdab93f48bb.log
alertmanager-main-0_openshift-monitoring_config-reloader-988780c6e6994a825088e5ba3eff7c1d6eb4ae9fc99555d3c99eca6b4fdabd9d.log
certified-operators-7c885d47cf-mxxf9_openshift-marketplace_certified-operators-afb8c027deff07e2940724f8204e04c12f987eef249c0f74d652356a66ee7695.log
community-operators-848b856d7b-8lfx5_openshift-marketplace_community-operators-350ea68a0c0f28c3a68d56394e85f6c9ce613098e52f84b895d0a847c6e8e34e.log
datadog-1580419294-mnfsh_default_datadog-be72c822f63b2e1c1819d63665b5597abfd1a8ec380873f7b45657e97f687e99.log
dns-default-62qqw_openshift-dns_dns-f5a033b61263fd04fb66c665b2f930c5e9460f252403787600d5b2726e8811de.log
dns-default-62qqw_openshift-dns_dns-node-resolver-02328c28458aaab74cea3e9927ffbdc28a84ead636406ba6a60ccc4fe99e8b3d.log
machine-config-daemon-v99f6_openshift-machine-config-operator_machine-config-daemon-5653a6fd09c7c20948b0f10f1e8221c64a9b88920550b9ad985e282d9704755d.log
multus-bgxjp_openshift-multus_cni-plugins-e0dcaaedfce59e9847c36b04cd8e9f6116fa90233c2f356216b95c8b4a6d88df.log
multus-bgxjp_openshift-multus_kube-multus-de75edd820bcf881c58c8c5f351c9b0e6b99464afccb061aee307b3a0f2cbc51.log
node-admin_default_node-admin-75bf12fae91d53345de5c8854bd5b4bc4fcf820cb5c5047624a7cba43a0932a4.log
node-ca-ps8n5_openshift-image-registry_node-ca-6cab09ab1bec05fcf3530c74a0b0194a04bb12c3f1bc2020a3c775f21ff78d61.log
node-exporter-lvl5v_openshift-monitoring_init-textfile-88dcf6ffd2cea05e453ff8eea26af7e19c28faf3af9f059b054069937ae91c8a.log
node-exporter-lvl5v_openshift-monitoring_kube-rbac-proxy-af40092efb6f0c4e80013e74a013bb0e9f553d7a766904172ecf7c843904b51a.log
node-exporter-lvl5v_openshift-monitoring_node-exporter-efb2497a3509b71687dc85cc05313172cb0aabac5532aaa042d409fe06ff75ea.log
omsagent-5p78p_kube-system_omsagent-ae15859879a3cd86b43804fb146b0c60b8111f4d21074f4a397eed4029d6069d.log
ovs-7gtdd_openshift-sdn_openvswitch-f23a4ccfbbb205bf803cce5ab764ff220da781605ce2f662ea22d7ec7a7dc1dc.log
prometheus-adapter-67b8d456bd-swc7z_openshift-monitoring_prometheus-adapter-2ad8ecccd492c69f47fadb5ecf2585e405089cec56c1a7e0c3f93db7693f26b4.log
prometheus-k8s-0_openshift-monitoring_kube-rbac-proxy-9df49d1b40b807a493c9a4682a80c06c0b7dcb612e5f504d4cc51dbe4a810b82.log
prometheus-k8s-0_openshift-monitoring_prometheus-38035db7a2e42b5fd03b12d1e1a03d9fb28d17a22aad7cd29cf55fe6f4816446.log
prometheus-k8s-0_openshift-monitoring_prometheus-config-reloader-f948209d5776739ddcc8e45ce70b85634f6300ebc4b30282cac7aa36b17df692.log
prometheus-k8s-0_openshift-monitoring_prometheus-df3f022def20b03c2049065d68dabad08d8814ac12b3dff4b1605d78afb56382.log
prometheus-k8s-0_openshift-monitoring_prometheus-proxy-78e5b00a805cca65fe80b2a842a53efe44cdd4062d67c024c2cab5952e3ca950.log
prometheus-k8s-0_openshift-monitoring_prom-label-proxy-ca130e4e672a2b9a8433b890e80c4a643582b78c74c6e9d76cd07256ade0e614.log
prometheus-k8s-0_openshift-monitoring_rules-configmap-reloader-f6a2c4b484cace66d3efe719d51b0ae429df95c9250af117504b21735f1a47ec.log
sdn-8wdc2_openshift-sdn_install-cni-plugins-f87e182ed8823a6bbc2ebf652805cbccdb42544710e38e41b82843bd9af8bab7.log
sdn-8wdc2_openshift-sdn_sdn-05801ed16ac6024f7293bceeb782121f944e20e241f849418fd704da37431d88.log
tuned-nqz52_openshift-cluster-node-tuning-operator_tuned-fbcf86b92048c1b5f8a51a9387636c79aff8f002c908565846421e2feb9f0a73.log
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 alertmanager-main-0_openshift-monitoring_alertmanager-proxy-162795b5ba062f1d5c6940c161301907006288bf238b681155d70bdab93f48bb.log
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:117: Defaulting client-id to system:serviceaccount:openshift-monitoring:alertmanager-main
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:122: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:310: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2020-01-26T03:17:29.377462695+00:00 stderr F 2020/01/26 03:17:29 oauthproxy.go:200: mapping path "/" => upstream "http://localhost:9093/"
2020-01-26T03:17:29.377462695+00:00 stderr F 2020/01/26 03:17:29 oauthproxy.go:221: compiled skip-auth-regex => "^/metrics"
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 datadog-1580419294-mnfsh_default_datadog-be72c822f63b2e1c1819d63665b5597abfd1a8ec380873f7b45657e97f687e99.log
2020-01-30T21:22:26.087215489+00:00 stdout P [s6-init] making user provided files available at /var/run/s6/etc...
2020-01-30T21:22:26.096765379+00:00 stdout F exited 0.
2020-01-30T21:22:26.097538051+00:00 stdout P [s6-init] ensuring user provided files have correct perms...
2020-01-30T21:22:26.145585425+00:00 stdout F exited 0.
2020-01-30T21:22:26.146722031+00:00 stdout F [fix-attrs.d] applying ownership & permissions fixes...
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 node-admin_default_node-admin-75bf12fae91d53345de5c8854bd5b4bc4fcf820cb5c5047624a7cba43a0932a4.log
2020-02-02T05:45:45.735791473+00:00 stdout F Failed to connect to bus: No data available
2020-02-02T05:45:45.744456380+00:00 stdout P
2020-02-02T05:45:45.745337662+00:00 stdout P [root@myocp42-pdqlw-worker-eastus2-thg2w /]#
[root@myocp42-pdqlw-worker-eastus2-thg2w /]#
2020-02-02T05:45:58.193206752+00:00 stdout P c
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 omsagent-5p78p_kube-system_omsagent-ae15859879a3cd86b43804fb146b0c60b8111f4d21074f4a397eed4029d6069d.log
2020-02-02T03:45:58.799991063+00:00 stdout F getting gid for docker.sock
2020-02-02T03:45:58.801683320+00:00 stdout F creating a local docker group
2020-02-02T03:45:58.892908216+00:00 stdout F adding omsagent user to local docker group
2020-02-02T03:45:59.096509878+00:00 stderr F curl: (7) Couldn't connect to server
2020-02-02T03:45:59.097687388+00:00 stderr F Traceback (most recent call last):
[root@myocp42-pdqlw-worker-eastus2-thg2w containers]#

Steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

cri-o container stdout/stderr default log format broken.
Describe the results you expected:
As described in https://github.com/sirupsen/logrus. enable the default format JSON so that monitoring tools can parse and consume. Default text format not usable for monitoring tools.

Additional information you deem important (e.g. issue happens only occasionally):

Output of crio --version:

[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# crictl version
Version:  0.1.0
RuntimeName:  cri-o
RuntimeVersion:  1.14.11-6.dev.rhaos4.2.git627b85c.el8
RuntimeApiVersion:  v1alpha1

Additional environment details (AWS, VirtualBox, physical, etc.):

@ganga1980 ganga1980 changed the title cri-o container stdout/stderr default log format (text) broken. cri-o container stdout/stderr default log format (text) is unstructured and cant be parsed by monitoring tools Feb 3, 2020
@haircommander
Copy link
Member

haircommander commented Feb 3, 2020

Note: there's a distinction between CRI-O logs (logged by logrus) and container logs (container's std{out,err}). Changing the formatter for CRI-O's logs doesn't sound like what you want. What it seems like you want is containers/conmon#37 as conmon is where container logs are processed in CRI-O.

Also note: CRI-O has to use the current format of logging for the CRI (which we call k8s-file), so any different format or file you want conmon to write the logs to will have to be done in addition to k8s-file.

The ask is not unrealistic, but it's fallen to the wayside a bit because CRI-O always must write in k8s-file format, and conmon's I/O handling needs to be refactored to better handle writing multiple places upon receiving output from the container (to make sure the container's performance isn't blocked on conmon reading from the pipe)

@ganga1980
Copy link
Author

ganga1980 commented Feb 3, 2020

Hi, @haircommander , thanks for looking into. yes, I am looking to container's std{out,err} format. As I shared the sample container logs of the CRI-O, current format is unstructured and it cant be parsed and consumed any monitoring tools (like fluentD/fluentbit). I couldnt find much information about what k8s-file format and how k8s parsing this format. Can you point me to k8s-file format? If the CRI-O container logs uses uses the same as Docker container logs JSON format then it makes seamless integration with monitoring tools or at least emit them in JSON format.

@haircommander
Copy link
Member

the format is actually called the cri logging format, and is parsed here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/logs/logs.go#L125-L169
it seems you are having similar feelings to whoever reported kubernetes/kubernetes#70539

While adding the docker json format to cri-o is on my list of things to do, it is unfortunately not very high. So unless another contributor or anyone from the community wants to take a stab at it, it probably won't get done for a while 😞

@ganga1980
Copy link
Author

ganga1980 commented Feb 3, 2020

Hi, @haircommander , thanks for pointing to me parse CRILog function. As you know, Container std{out,err} logs are very useful of monitoring info for troubleshooting container issues. Since there many monitoring tools out there which are built and have support for Docker JSON logs, and enabling the CRI-O for docker json format makes the monitoring tools and eco-system works seamless including k8s logs (kubectl logs).

As I pointed out in the few container log examples, current CRILog format is unstructured and
in-consistent format across all the containers. This makes harder to parse and consume these logs.

Reviewing the code, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/logs/logs.go#L125-L169, isn't that removing parseCRILog function and using existing parseDockerJSONLog would suffice to address this issue? Of course, we need to do perf testing to make sure no impact on the CRI-O perf because of JSON decoding.

Since you have more knowledge than on the CRI-O, appreciate if you can prioritize this work so that entire CRI-O community can benefit.

@haircommander
Copy link
Member

it is not that easy I'm afraid. The kubelet speaks CRI to CRI-O, and what you're asking for is a change to the CRI. that involves consensus in the CRI implementations, consensus in kubernetes SIG-node, and multiple versions to transition the feature of using JSON output to a feature to rely on.

Also, note the world of kubernetes is moving away from the docker shm to run containers. From my perspective, the future is either CRI-O or cri-containerD. I don't know what cri-containerD does with logs, but it needs to log in the cri format too. Is it possible for you to propose those tools start using the CRI, so all logging infrastructure speaks the language that the kubernetes world is moving to (CRI)?

Finally, I personally cannot see a lack of uniformity. Can you point out some inconsistencies you're seeing? It follows the format of timestamp stdout/stderr partial/full-line log-line... If a log line is longer than the fixed sized buffer expected, it's a full-line, and it is continued until there aren't enough characters but instead a newline, at which point it is a partial. The newline handling is perhaps the oddest part, but even then it's pretty regimented

@haircommander
Copy link
Member

CRI-O could implement a sidecar style logger, where it duplicates the logs in JSON format, without consensus from CRI and SIG-node, but again that's inefficient in runtime and in disk space, so I'm not personally compelled to implement it unless there is a serious need. PRs welcome as always though 😄

@ganga1980
Copy link
Author

Hi, @haircommander ,
Here are the std {out;err} logs of few containers in CRI-O. Any thing after P/F, is a log-line? Particularly format of log-line is arbitrary. what is the delimiter between standard fields (timestamp stdout/stderr partial/full-line log-line)?. is it tab separated?

[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 alertmanager-main-0_openshift-monitoring_alertmanager-proxy-162795b5ba062f1d5c6940c161301907006288bf238b681155d70bdab93f48bb.log
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:117: Defaulting client-id to system:serviceaccount:openshift-monitoring:alertmanager-main
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:122: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2020-01-26T03:17:29.346494690+00:00 stderr F 2020/01/26 03:17:29 provider.go:310: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2020-01-26T03:17:29.377462695+00:00 stderr F 2020/01/26 03:17:29 oauthproxy.go:200: mapping path "/" => upstream "http://localhost:9093/"
2020-01-26T03:17:29.377462695+00:00 stderr F 2020/01/26 03:17:29 oauthproxy.go:221: compiled skip-auth-regex => "^/metrics"

[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 datadog-1580419294-mnfsh_default_datadog-be72c822f63b2e1c1819d63665b5597abfd1a8ec380873f7b45657e97f687e99.log
2020-01-30T21:22:26.087215489+00:00 stdout P [s6-init] making user provided files available at /var/run/s6/etc...
2020-01-30T21:22:26.096765379+00:00 stdout F exited 0.
2020-01-30T21:22:26.097538051+00:00 stdout P [s6-init] ensuring user provided files have correct perms...
2020-01-30T21:22:26.145585425+00:00 stdout F exited 0.
2020-01-30T21:22:26.146722031+00:00 stdout F [fix-attrs.d] applying ownership & permissions fixes...

[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 node-admin_default_node-admin-75bf12fae91d53345de5c8854bd5b4bc4fcf820cb5c5047624a7cba43a0932a4.log
2020-02-02T05:45:45.735791473+00:00 stdout F Failed to connect to bus: No data available
2020-02-02T05:45:45.744456380+00:00 stdout P
2020-02-02T05:45:45.745337662+00:00 stdout P [root@myocp42-pdqlw-worker-eastus2-thg2w /]#

[root@myocp42-pdqlw-worker-eastus2-thg2w containers]# head -5 omsagent-5p78p_kube-system_omsagent-ae15859879a3cd86b43804fb146b0c60b8111f4d21074f4a397eed4029d6069d.log
2020-02-02T03:45:58.799991063+00:00 stdout F getting gid for docker.sock
2020-02-02T03:45:58.801683320+00:00 stdout F creating a local docker group
2020-02-02T03:45:58.892908216+00:00 stdout F adding omsagent user to local docker group
2020-02-02T03:45:59.096509878+00:00 stderr F curl: (7) Couldn't connect to server
2020-02-02T03:45:59.097687388+00:00 stderr F Traceback (most recent call last):

@haircommander
Copy link
Member

space separated, and yes everthing after the P or F is the log line

@ganga1980
Copy link
Author

space separated, and yes everthing after the P or F is the log line

Thanks. Looks like fluent bit has the parser support for CRI-O log format which might unblock me. fluent/fluent-bit#522. Testing this.

@haircommander
Copy link
Member

excellent to hear!

@ganga1980
Copy link
Author

Verified parsing works with Fluentbit cri-o parser so I am good on this. I think, I can close this comment now.

@megastef
Copy link

Please note @sematext Logagent has support for CRI-O log format as well: https://sematext.com/docs/logagent/installation-docker/#kubernetes-with-containerd-and-ibm-cloud

Plugin docs: https://sematext.com/docs/logagent/input-filter-containerd/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants