-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
Today if a node exceeds the disk flood stage watermark, the disk threshold monitor will apply a special read-only index block to any indices that have a shard allocated to the node that exceeded the watermark. This block carries with it a forbidden status code so that if an attempt is made to index into such an index, the client receives a HTTP 403 status code.
Clients assume that a 403 status code is not retryable and they drop data.
This situation is retryable though, as once the disk threshold monitor observes the free disk space go above the appropriate threshold, the index block is automatically removed.
Rather than expecting our clients to all account for this situation (by inspecting the specifics of the exception that led to the 403 status code), we should indicate retryability by using HTTP status code 429. While 429 is often translated as "too many requests", the HTTP specification is liberal about what this means:
Note that this specification does not define how the origin server identifies the user, nor how it counts requests. For example, an origin server that is limiting request rates can do so based upon counts of requests on a per-resource basis, across the entire server, or even among a set of servers.
By making this change, all of our clients can start retrying when faced with an index that was marked read-only due to a flood stage watermark exceeded event.
Similarly, the status codes of other cluster blocks should be reexamined in this context.
Activity
elasticmachine commentedon Nov 20, 2019
Pinging @elastic/es-distributed (:Distributed/CRUD)
gaobinlong commentedon Dec 2, 2019
Hi @jasontedor , I'm intersted in this issue. Should we return 429 status code if the cluster block is set manually rather than set automaticly when the flood stage is exceeded?
jasontedor commentedon Dec 10, 2019
@gaobinlong I think it's fine to treat them the same. I wish we had an easy way to distinguish when it's automatically set versus when it's not, be we don't really so let's proceed to treat them as the same.
gaobinlong commentedon Dec 10, 2019
@jasontedor ok, I got it.
gaobinlong commentedon Dec 13, 2019
Hi @jasontedor , I hava made a PR for this issue, can you help to review the code change?
Return 429 status code on read_only_allow_delete index block (#50166)
Return 429 status code on read_only_allow_delete index block (elastic…
Return 429 status code on read_only_allow_delete index block (#50166)
zez3 commentedon Mar 27, 2021
#50166
This PR valid from 7.7 onwards has been brought to my attention
DaveCTurner commentedon Jul 30, 2021
Closed by #50166.