Skip to content

Allow timeout for metrics #19

Closed
@matthiasr

Description

@matthiasr

In some scenarios, a client will stop pushing metrics because it has gone away. Currently every node needs to be deleted explicitly or the last value will stick around forever. It would be good to be able to configure a timeout after which a metric is considered stale and removed. I think it would be best if the client could specify this.

Activity

self-assigned this
on Feb 3, 2015
brian-brazil

brian-brazil commented on Feb 3, 2015

@brian-brazil
Contributor

You may be interested in the textfile module of the node_exporter. It allows you to export information on the local filesystem, and as it's on the node will go away when the node does.

juliusv

juliusv commented on Feb 3, 2015

@juliusv
Member

@matthiasr Actually this is a great point by @brian-brazil. We should simply move chef-client exporting to the node exporter, since it can be considered a per-node metric. Then the time series will go away automatically if the host is gone.

brian-brazil

brian-brazil commented on Feb 4, 2015

@brian-brazil
Contributor

A use case for this has appeared, it may be a way to allow clients who really really want to push to do so; while offering some GC.

matthiasr

matthiasr commented on Feb 4, 2015

@matthiasr
Author

@juliusv agree, that side-steps the issue in our case. But I think it's still something needed e.g. for cron jobs – an hourly cronjob may report to pushgateway, but after >1h that metric is no longer valid.

brian-brazil

brian-brazil commented on Feb 4, 2015

@brian-brazil
Contributor

@matthiasr Hourly cronjobs is service-level monitoring of batch jobs which is the primary use case for the Pushgateway, you'd export that without any expiry, timestamps or other advanced things like that..

matthiasr

matthiasr commented on Feb 4, 2015

@matthiasr
Author

Not necessarily … I'm not necessarily monitoring the job itself, but instead e.g. some complex calculated value from a Hadoop job.

But even when monitoring, say, the runtime of my cronjob, how would I tell whether it just always takes the same time or there has never been a run again? I'd rather have no metric if it didn't run than the metric from the last time it ran. At least in some cases, which is why I think it should be optional.

juliusv

juliusv commented on Feb 4, 2015

@juliusv
Member

@matthiasr To expand on what Brian said, an hourly cronjob would push its last completion timestamp to the pushgateway. That way you can monitor (via time() - last_successful_run_timestamp_seconds) whether your batch job hasn't run for too long. The metric would still be ingested by Prometheus upon every scrape from the pushgateway and get a server-side current timestamp attached.

added a commit that references this issue on Jun 1, 2016

14 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @matthiasr@juliusv@fvigotti@beorn7@brian-brazil

    Issue actions

      Allow timeout for metrics · Issue #19 · prometheus/pushgateway