Skip to content

Task Management #15117

Closed
Closed
@imotov

Description

@imotov

We have identified several potential features of elasticsearch that can spawn long running tasks and therefore require a common management mechanism for this tasks. This issue will introduce task management API that will provide a mechanism for communicating with and controlling currently running tasks. The task management API will be based on the top of existing TransportAction framework, which will allow any transport action to become a task.

The tasks will maintain parent/child relationship between tasks running on the coordinating nodes and subtasks that are spawn by the coordinating node on other nodes.

The task management will be introduced in several iterations. The first phase will be back-ported to 2.x and the second phase will be only available in 5.0.
Phase I

Phase II

Activity

nik9000

nik9000 commented on Dec 2, 2015

@nik9000
Member

I wonder if we need a way to store the results of a task until they are fetched? I'm thinking of something like update-by-query which would be a task because it is long running, cancelable, etc. But it wants to return counts of how many documents it updated and things like that. Maybe just write them to an index? Maybe with a ttl?

raf64flo

raf64flo commented on Dec 2, 2015

@raf64flo

Nice remark of @nik9000 about long task results availability after its end, as it is already done for snapshots.
But I'd prefer a TTL or/and a dedicated query to drop the result instead of only drop on fetch, which could be problematic in my opinion.

nik9000

nik9000 commented on Dec 2, 2015

@nik9000
Member

But I'd prefer a TTL or/and a dedicated query to drop the result instead of only drop on fetch, which could be problematic in my opinion.

Yeah - drop on fetch would be rough.

Not all tasks will want to do this but I think some would like it.

imotov

imotov commented on Dec 2, 2015

@imotov
ContributorAuthor

@nik9000 is the goal to make results available after the task finished?

nik9000

nik9000 commented on Dec 2, 2015

@nik9000
Member

@nik9000 is the goal to make results available after the task finished?

Yeah. In the case of update-by-query it'd be just to make the status available. The most "convenient" way to do it seems like write it to an index with a ttl - but I think I'm just stuck on that idea because it came to me. The point is that after the task is done you'll want to see what its results were for some period of time. You'd want some place you could fetch the results by task id, some way to clear out results when you've finished with them, some way for them to clear themselves out if you don't read them back soon enough.

I don't think it needs to come at iteration 1, but at some point it'd be nice.

Look at delete-by-query, it makes some effort to build a nice results object. Once it becomes a "task" it'll have nothing to do with the fancy result object.

Another thing that might be useful is to make an API that'd block until the task was finished and return the result of it. Or just fetch the result if it was already finished. This'd be super useful in general but kind of required for the REST tests because they don't have loops and things.

imotov

imotov commented on Dec 2, 2015

@imotov
ContributorAuthor

I think traditionally we do that in two places - 1) log files for per-operation level and 2) in stats as combined metric. I can see how we might want to have a third way, but I think the biggest question here is lifecycle of this result. Persistence (even temporary persistence) of results is very unclear to me unless the result is associated with some persistent object (such as snapshot). So, I would rather make it an option to block and get result if you are interested in the result.

nik9000

nik9000 commented on Dec 2, 2015

@nik9000
Member

So, I would rather make it an option to block and get result if you are interested in the result

I don't know if that'll be enough in the end though. Imagine the delete-by-query operation that takes 30 minutes too complete. Its too long for any blocking to be reliable - all kinds of http equipment will time you out after 5 minutes and something is bound to sneak in and get you a connection reset by peer.

So you'd have to build in a retry to the blocking. But if results aren't persisted, at least for a little while, then there is always the possibility that the job will finish between one request timing out and the next one starting. A low possibility but an icky one.

Something like a TTL on the result with explicit commands to read the result and delete it would work. These results wouldn't be huge documents so we could probably keep them in memory, certainly if they were serialized xContent or implemented Accountable or something.

Its complicated but I can't think of how else to report on tasks that are "do a thing" rather than "make a thing".

niemyjski

niemyjski commented on Dec 3, 2015

@niemyjski
Contributor

+1

52 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @clintongormley@raf64flo@nik9000@colings86@jprante

        Issue actions

          Task Management · Issue #15117 · elastic/elasticsearch