Allow timeout for metrics #19

Closed

Closed

Allow timeout for metrics#19

Assignees

Labels

feature requestwontfix

In some scenarios, a client will stop pushing metrics because it has gone away. Currently every node needs to be deleted explicitly or the last value will stick around forever. It would be good to be able to configure a timeout after which a metric is considered stale and removed. I think it would be best if the client could specify this.

self-assigned this

added

Contributor

You may be interested in the textfile module of the node_exporter. It allows you to export information on the local filesystem, and as it's on the node will go away when the node does.

Member

@matthiasr Actually this is a great point by @brian-brazil. We should simply move chef-client exporting to the node exporter, since it can be considered a per-node metric. Then the time series will go away automatically if the host is gone.

Contributor

A use case for this has appeared, it may be a way to allow clients who really really want to push to do so; while offering some GC.

Author

@juliusv agree, that side-steps the issue in our case. But I think it's still something needed e.g. for cron jobs – an hourly cronjob may report to pushgateway, but after >1h that metric is no longer valid.

Contributor

@matthiasr Hourly cronjobs is service-level monitoring of batch jobs which is the primary use case for the Pushgateway, you'd export that without any expiry, timestamps or other advanced things like that..

Author

Not necessarily … I'm not necessarily monitoring the job itself, but instead e.g. some complex calculated value from a Hadoop job.

But even when monitoring, say, the runtime of my cronjob, how would I tell whether it just always takes the same time or there has never been a run again? I'd rather have no metric if it didn't run than the metric from the last time it ran. At least in some cases, which is why I think it should be optional.

Member

@matthiasr To expand on what Brian said, an hourly cronjob would push its last completion timestamp to the pushgateway. That way you can monitor (via time() - last_successful_run_timestamp_seconds) whether your batch job hasn't run for too long. The metric would still be ingested by Prometheus upon every scrape from the pushgateway and get a server-side current timestamp attached.

Member

See also http://prometheus.io/docs/practices/instrumentation/#batch-jobs

Contributor

Also, http://prometheus.io/docs/instrumenting/pushing/#java-batch-job-example

added

feature request

and removed

on May 22, 2015

added a commit that references this issue

Implement lifetime feature

mentioned this

Implement lifetime feature #78

14 remaining items

Loading

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

beorn7

Labels

feature requestwontfix

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

Add documentation about TTL and GH issuesprometheus/pushgateway

Participants