Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow processing of async requests in Knative #4522

Closed
nimakaviani opened this issue Jun 25, 2019 · 60 comments
Closed

Feature Request: Allow processing of async requests in Knative #4522

nimakaviani opened this issue Jun 25, 2019 · 60 comments
Assignees
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)

Comments

@nimakaviani
Copy link
Contributor

nimakaviani commented Jun 25, 2019

/area API

Describe the feature

As a Knative user, I can request the http workload to be processed asynchronously.

Requirements

  • Same endpoint should allow for both blocking and non-blocking async requests
  • The incoming async requests trigger the running of the async job in Knative
  • An async request does not block. Instead, it returns immediately
  • The response will have an identifier that can be used to track progress for the async job
  • A follow-up request to track progress will need to supply the identifier
    • If the async job is still in progress
      • the response will return an in progress status
        (e.g, a 302 Found status code with an empty body)
    • If the async job is finished
      • the response body will have the result from the app processing the async job
      • the status code will be identical to what is returned from the app after the job is done
  • Any instance of the running Knative app should be able to accept / respond to an async query
  • A failed async job will return a HTTP 5xx code

Usecases

  • Long-running jobs
    • Notifications (e.g. mobile push notification, SMS notification, mass emails, etc)
    • Database migrations
    • Batch processing (e.g. data transformation)
  • Stream processing
  • Highly parallel jobs (document, image, … processing)
  • Fan-out workloads
  • Serveless Operator
@nimakaviani nimakaviani added the kind/feature Well-understood/specified features, ready for coding. label Jun 25, 2019
@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jun 25, 2019
@nimakaviani
Copy link
Contributor Author

/cc @duglin

@vagababov
Copy link
Contributor

Question is, why would you want to use knative for this?
Most of what you described is what your app has to do.
I presume, you don't really expect Knative to provide a generic state machine/workflow engine?
Isn't everything you need is minScale=1, to permit async processing, without scaling to zero?

@markusthoemmes
Copy link
Contributor

I always thought async workloads are more of a territory of Knative Eventing in conjunction with Knative Serving. After all, the protocol for Knative Serving currently is request/response based and everything revolves around that pretty much.

Any thoughts on why Eventing is not a suitable solution for async workloads like the ones you mentioned?

@mattmoor mattmoor added this to the Ice Box milestone Jun 25, 2019
@mattmoor
Copy link
Member

As Markus says, this is where eventing comes in, perhaps even eventing delivering events to something other than serving.

cc @evankanderson

@nimakaviani
Copy link
Contributor Author

@vagababov I think setting minScale=1 is an anti-pattern to support async. Particularly for the case of infrequent async jobs, I dont think the expectation of having resources sitting idle is ideal.

@markusthoemmes good question. I think this is the conjunction of Knative Eventing and Knative Serving that is in fact problematic. Knative's presumption of short-lived requests with persistent connection to the client goes against what is expected from an async job, regardless of whether the request is initiated by an outside client or an internal Cloud Event. This is where I think @mattmoor 's point regarding having "something other than serving" deal with async requests comes into play. But does an app developer really want to have a separate app to send batch emails or perform data transformations? or even worse, to support separate code bases or deployment models?

so to @vagababov's question, I don't really expect Knative to provide a generic state machine or workflow engine, but for it to close the loop on common requirements in a developer's workflow without the developers having to go up and down in the stack to have different pieces of their application deployed.

@mattmoor
Copy link
Member

This is where I think @mattmoor 's point regarding having "something other than serving" deal with async requests comes into play

To be clear, I mean having something other than serving dealing with the async aspect, but that may result in the Eventing system having a synchonous call to Serving. For sufficiently long-running (a la batch) cases, Serving may be ill-suited for this (or the scaling model may not be right), and so it may make sense to compose eventing with other forms of addressable compute that aren't currently covered by knative.

@mikehelmick
Copy link
Contributor

Another vote for using Eventing to handle this (turning a sync request into an async request).

Also, there is room for serverless products for handling long running job, but currently, I think this it outside the charter of knative/serving.

@duglin
Copy link

duglin commented Jul 2, 2019

I'm not following the relationship to eventing. The connection between a client and the KnService is not an event. It shouldn't be morphed into a CloudEvent, or go thru brokers, triggers, fanned-out, etc. I also wouldn't expect the client to change how it interacts with a KnService based on whether it's doing it via sync or async - meaning I would not expect the URL of the KnService to change, which is what I think using eventing for async would require. People may be interacting in both modes at the same time.

I don't see the internal flow or structure needed by async being that different from sync - with the exception of how responses are handled, so it's hard for me to see why this is out of scope for serving (yes I know there's a bit more to it, but I think the response handling is the biggie). Async is a well established model in the serverless space so it would be odd for us to exclude support for it.

@duglin
Copy link

duglin commented Jul 2, 2019

Thinking more about the tie-in with eventing... if what's meant is that something in the normal flow detects the "run this in async mode" flag, and as a result redirects the request to some eventing component because it knows how to store responses in some kind of persistence, then that's just an impl detail. But converting it to a CloudEvent seems odd.

In the end though, a KnService is called and I think the only connection that we need to ensure stays open the entire time is the one between the user container and the queue proxy - and I'm not sure eventing helps with that since it assumes persistent connections from channels, no? Although, if we combined this support with how eventing manages responses (meaning, instead of the channel waiting for a response, the response is sent via a call-back managed by the queue proxy) then I think those two worlds might be more aligned than I originally considered. But, that all impl details and the user should be unaware of it.

@mbehrendt
Copy link

I think this it outside the charter of knative/serving.

@mikehelmick can you pls point me to that charter?

@mbehrendt
Copy link

re 'handling async via eventing' , adding to what @duglin said above: if we somehow magically handled it via eventing, you still have the issue that somehow under the cover the knservice gets called synchronously. I.e. you're bound to the max execution time associated with synchronous calls, and to the resource consumption implied by potentially keeping 1000's of http connections open.

@mbehrendt
Copy link

other forms of addressable compute that aren't currently covered by knative.

@mattmoor can you pls elaborate on which other forms of addressable compute you're referring to? Are there special semantics behind you emphasizing the currently? E.g. is there sth cooking behind the scenes?

@nimakaviani
Copy link
Contributor Author

my understanding of @mattmoor 's suggestion was that it needs to be handled through other Kubernetes controllers (e.g., deployments, etc.). If that's right then, back to my original point, it won't be a great user experience if the developers have to go up and down in the stack to have different pieces of their application deployed.

Another vote for using Eventing to handle this (turning a sync request into an async request).

For the above, or like @markusthoemmes suggested, bringing serving and eventing somehow together, requests will have to come back to the Kn app at some point for processing. With Kn requiring "persistent time-limited http connections", the problem remains, like @mbehrendt and @duglin mentioned. Unless we modify eventing to support long-running workloads.

@nimakaviani
Copy link
Contributor Author

nimakaviani commented Jul 2, 2019

Also I updated the original requirements with the following item:

  • Same endpoint should allow for both blocking and non-blocking async requests

@duglin
Copy link

duglin commented Jul 2, 2019

it might be good to separate out UX from impl details/requirements.

From a UX perspective:

  • the endpoint for a KnService should be the same regardless of whether it is invoked synchronously or asynchronously
  • while it may be possible for a KnService to declare itself as an async service (and therefore all requests are treated as such and a 202 is returned immediately) that is not the case we're most interested in. We're looking for the one there the person invoking the KnService chooses whether the processing is done async or not via some flag (e.g. perhaps a query param or http header)
  • user can then get the response metadata via some mechanism

From an impl perspective:

  • I actually think trying to have eventing and serving leverage shared components makes a lot of sense. My initial push-back was because I didn't see the existing eventing infrastructure being a good fit and I was interpreting the "use eventing" statements as "expose something new to the end user" - and that breaks the first bullet point above
  • in order for this sharing to happen though eventing would need to do basically what this issue is proposing.... modify the serving side of things to support async invocations, then it could leverage that.
  • how we then store the responses and the metadata associated with it could be done via eventing or anything else -that's just an impl detail that (for the most part) isn't seen by the end user

@dgerd
Copy link

dgerd commented Jul 2, 2019

Some questions I have about the proposal:

  • At a high-level, how does the tracking and management of async tasks not turn into a workflow manager like Apache Airflow ( https://github.com/apache/airflow )? Why should this be built into Knative Serving rather than orchestrated on-top?
  • Knative Serving today relies on HTTP information to provide the "serverless magic" of autoscaling both up and down. We rely on the closing or timeout of the HTTP connection to determine that a container has finished. From the comments above it sounds like the desire is for the HTTP connection to be closed, how do you propose that we determine that an async container has finished? When finished how is the response/output of the async job is captured by the control plane?
  • How do you imagine the runtime contract for async tasks to differ from synchronous tasks? Is an HTTP endpoint still required?
  • How do you propose our settings of container concurrency work with async requests?

@nimakaviani
Copy link
Contributor Author

My thoughts in the same order as the questions above:

  • I cannot quite establish the link between the workflow manager and async requests. It's not the flow that matters but instead, the processing of a single request and storing the results for later retrieval without the persistent external connection. how does the flow help? what am I missing?
  • Busyness for a pod and whether or not a request is internally finished is determined and tracked by the queue proxy based on open connections to the user container. It can (and should) be kept independent of whether there exists a corresponding external HTTP connection from the client. Queue proxy can track async requests too, and do the corresponding bookkeeping.
  • The runtime contract won't change, nor will the http endpoint.
  • container concurrency is also determined by the queue proxy. Given that queue proxy can track async requests too, there wont be any change to container concurrency settings.

@vagababov
Copy link
Contributor

This contradicts the model we've been operating with.

It can (and should) be kept independent of whether there exists a corresponding external HTTP connection from the client. Queue proxy can track async requests too, and do the corresponding bookkeeping.

and

container concurrency is also determined by the queue proxy. Given that queue proxy can track async requests too, there wont be any change to container concurrency settings.

I dont' see how Q-P can achieve that.

  1. How would it determine which request is async and which one is sync, since according to your model, async is fire-and-forget kind? This needs request annotation of sorts.
  2. Currently we reason about the load, presuming each request cost about the same (cost being 1 request in flight). Async requests may/will vary in load generated and even if we somehow teach QP to discern one from another, we can't equalize them. Which means all our Autoscaling logic will be incorrect.

The runtime contract won't change, nor will the http endpoint.

Determining async=sync at QP level would require some flag/header/X to be set -- this is RTC change.

I cannot quite establish the link between the workflow manager and async requests. It's not the flow that matters but instead, the processing of a single request and storing the results for later retrieval without the persistent external connection.

You have to run a stateful load for that. You might be interested what LightBend folk are doing. As such just supporting arbitrary stateful loads is probably not what we're aiming for right now...

@nimakaviani
Copy link
Contributor Author

nimakaviani commented Jul 2, 2019

Determining async=sync at QP level would require some flag/header/X to be set -- this is RTC change.

correct. like @duglin suggested earlier

(e.g. perhaps a query param or http header)

and sure there is a change in RTC too. But it is an additive change and backward compatible.

Currently we reason about the load, presuming each request cost about the same (cost being 1 request in flight). Async requests may/will vary in load generated and even if we somehow teach QP to discern one from another, we can't equalize them. Which means all our Autoscaling logic will be incorrect.

I am not sure if I understand the above. KPA works based on the number of requests and last time I checked, it supports scaling on cpu too. Even now, revision.spec.timeoutSeconds has a default value of 5m which is 3000x larger than an average 100ms request time. I didn't see any specifics on how Autoscaling presumes each request would cost about the same and what the same implies where it ranges from milliseconds to 5 minutes. Even if we set aside the termination grace period, it is still 90sec before terminating an instance.

You have to run a stateful load for that. You might be interested what LightBend folk are doing.

I am not sure if stateful load helps. there's nothing in the load that is stateful. It is more of a stateful response if anything.

This contradicts the model we've been operating with.

The only place where I see it impact the model, is the assumption of having an external connection from the client. I am not sure if loosening the assumption would go down as contradiction. Particularly if QP continues to do proper bookkeeping of connections to the user container.

@vagababov
Copy link
Contributor

KPA never scaled based on CPU. Only on concurrency.

@steren
Copy link
Contributor

steren commented Jul 3, 2019

Long processing times is a a feature request that I've heard from many customers.
This is likely due to the fact that Cloud Run and Cloud Run on GKE have a maximum request timeout of 15 minutes that can be limiting. As mentioned in the first comment, the use cases are often about compute intensive tasks and data transformation.

Assuming that our goal is to allow customers to do long processing (O(a few hours)), it makes sense to me to explore another developer experience and variations to our runtime container contract. Indeed, I doubt the synchronous request/response model is the right one, as it forces clients to keep the connection open.

I also agree that going async should not mean getting rid of the other benefits of knative/serving:

  • I still want to think in terms of "Service"
  • I still want one stable endpoint
  • I still want to manage my revisions and traffic split between them.
  • I still want to provide a container, specify env vars...

I am supportive of exploring an alternate container contract tailored to async use cases.
I am not sure that the one suggested in the first comment is what we should adopt (I could also see the developer having to call an internal endpoint to signal the end of the processing).

@mattmoor
Copy link
Member

mattmoor commented Jul 3, 2019

@mattmoor can you pls elaborate on which other forms of addressable compute you're referring to? Are there special semantics behind you emphasizing the currently? E.g. is there sth cooking behind the scenes?

Nothing behinds the scenes, the whole point of Addressable is to enable Eventing to compose with things other than Serving. Today Channel is Addressable and can deliver events to a K8s Service over a Deployment. I have also built several other PoCs (all OSS on github) that implement Addressable and compose with Eventing, but nothing in this category (nor secret).

I honestly think it would take a day to prototype this on top of Serving and Eventing, assuming I understand the desired requirements.

I'd have the Service receive the request, wrap it in a cloud event with a uuid in the payload, post it to a channel and return the uuid. The channel would have a subscription that delivered the event to some long-timeout Service for processing (the max timeout is a knob in config-defaults now), and the subscription would send the response from that Service to something that persisted it associated with that uuid for later retrieval.

There are other variants of this that would also work that involve delegating the compute to K8s Jobs / Pods. Am I missing something?


The scope creep of this request is considerable, and among other things entails Serving taking on responsibility for durably(?) storing results (how much? how long?), and providing access to those results later (how is that authenticated?). If nothing else, it is a big jump in the complexity of Serving, which I don't take lightly.

Have you looked at what it would take to implement this on-top of Serving and Eventing? If so, what's missing? If not, then what is undesirable about a new layer for this abstraction?

While we don't have other forms of directly addressable compute in Knative today, it doesn't mean we won't. Perhaps put together a PoC and we should talk about knative/asyncing?

@duglin
Copy link

duglin commented Jul 3, 2019

Many of the questions you ask can be asked of just about anything in Kn. For example your question about durability of the responses applies to durability of events in brokers/bchannels. We solve this via configuration and pluggability of components so people can choose how to answer the question for their own needs. We don't have to have a one-size-fits-all answer for everything.

While I think serving support for async directly would be the best option for the project, I don't necessarily think supporting it on top of serving is a horrible option. However, in order to do that there would still need to be changes made to serving. For example, determining "busyness" of an instance based on the presence of a connection back to the client.

I think part of this issue (and #4098) comes down to whether Knative is a platform on which to build multiple offerings where each can have their own differentiating value proposition (while still having a core consistency), or whether Kn is going to be parental/prescriptive and only allow for one view of how things should work (even if that differs from what many similar offerings do today).

re: PoC - we do have one and @nimakaviani can demo it if people would like.

@vagababov
Copy link
Contributor

I do concur, that this is more like a different product async batch or however you want to call it.

Besides durability there are questions like checkpointing, restarting/retrying, etc.

This all just feels like a different product in general.

@duglin
Copy link

duglin commented Jul 3, 2019

I'm wondering why things like batch, checkpoints, restarting... are mentioned as features when they are not part of the proposal. If a pod running async dies it has the same semantics as a pod running sync - it dies and life goes on. If features like workflow or orchestration are a concern then those questions should be asked of the eventing side of Kn since pipelines, broker/channels w/responses are all much closer to those features than this issue is proposing.

This proposal is asking for a much more focused design request... allow for long running function calls. Something many FaaS support today.

@vagababov
Copy link
Contributor

vagababov commented Jul 3, 2019 via email

@knative-housekeeping-robot

Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

@knative-prow-robot knative-prow-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 9, 2020
@duglin
Copy link

duglin commented Feb 9, 2020

/remove-lifecycle rotten

@knative-prow-robot knative-prow-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 9, 2020
@markusthoemmes
Copy link
Contributor

FWIW, there seems to even be a standard header to indicate from the client that the server shall handle things asynchronously: https://tools.ietf.org/html/rfc7240#page-8

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2020
@duglin
Copy link

duglin commented Jun 26, 2020

/remove-lifecycle stale

/cc @beemarie

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2020
@lukasheinrich
Copy link

there has been a similar discussion in OpenFaas

openfaas/faas#657

I agree with @duglin that the complexity of code can change over time and having to switch deployment methods isn't a great UX

@mattmoor
Copy link
Member

mattmoor commented Jul 7, 2020

FYI we created #async-requests on slack.knative.dev for further discussion and scheduling of follow-ups.

@bvennam
Copy link
Contributor

bvennam commented Aug 10, 2020

Following the process for feature request - created our feature proposal document to capture the decisions & direction so far:
https://docs.google.com/document/d/1a8f6mVlqQsr0VttWTRLcFT1PtnOHF9dKZXZEos9NSBA/edit?usp=sharing

@pjcubero
Copy link

I have one integration with several knative services, every ksvc is connected with the next via inmemorychannel, imagine that the first ksvc sends a message via channel to the second ksvc, if this second ksvc is down, when this ksvc is up, the message will be processed by the second ksvc?

@mattmoor
Copy link
Member

@pjcubero yes it should be. This particular issue isn't related to that sort of asynchronous processing. If you have deeper questions about the in-memory channel's delivery/durability semantics, I'd suggest raising them in knative/eventing.

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2020
@duglin
Copy link

duglin commented Nov 27, 2020

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2020
@junneyang
Copy link

junneyang commented Nov 27, 2020

you could refer the OpenFaaS project:
it allows us call functions async as follows:
curl http://gateway/**async-func**/function/xxxxxx
and sync call like this:
curl http://gateway/**func**/function/xxxxxx

openfaas document: https://docs.openfaas.com/reference/async/

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 26, 2021
@evankanderson
Copy link
Member

Note that Knative services support a broad range of HTTP requests, including the entire HTTP query path space and multiple verbs.

I believe @beemarie is working on this in https://github.com/knative-sandbox/async-component; it might be worth looking at that for an overview of one design.

@evankanderson
Copy link
Member

/triage accepted

/assign @beemarie

(We may want to close this in favor of directing people to the async-component repo)

@knative-prow-robot knative-prow-robot added the triage/accepted Issues which should be fixed (post-triage) label Mar 22, 2021
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2021
@bvennam
Copy link
Contributor

bvennam commented Jun 22, 2021

@evankanderson Agree with closing this & pointing folks to async repo.

@dprotaso dprotaso removed this from the Ice Box milestone Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests