Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamed events arrive all at once, _after_ generation is all done #132

Open
sevaseva opened this issue Feb 5, 2024 · 13 comments
Open

Streamed events arrive all at once, _after_ generation is all done #132

sevaseva opened this issue Feb 5, 2024 · 13 comments
Assignees

Comments

@sevaseva
Copy link

sevaseva commented Feb 5, 2024

I observe that with any response, long and short, events don't actually arrive as the response is generated, instead they all arrive at once after generation is done. I expect events, instead, to arrive more or less evenly as generation is taking place.

Is that just a bug in the Typescript client or..? I know this issue doesn't exist when one uses Python client.

    import { CohereClient } from "cohere-ai";
 
    const cohere = new CohereClient({'xxx'});
 
    process.stdout.write(`[Before] ${new Date().toISOString()} \n`);
 
    const stream = await cohere.chatStream({
        model: "command",
        message: "Name all US presidents",
        maxTokens: 200,
    });
 
    process.stdout.write(`[After] ${new Date().toISOString()} \n`);
 
    for await (const event of stream) {
        process.stdout.write(`[${event.eventType}] ${
            new Date().toISOString()}\n`);
 
        if (event.eventType === "text-generation") {
            process.stdout.write(`${event.text} \n\n`);
        }
    }

in the output I see that, in this case, generation seems to have taken 6 seconds, and then all events got processed in 20 milliseconds:

[Before] 2024-02-05T01:44:02.644Z 
[After] 2024-02-05T01:44:08.564Z 
[stream-start] 2024-02-05T01:44:08.565Z
[text-generation] 2024-02-05T01:44:08.565Z
There 

[text-generation] 2024-02-05T01:44:08.566Z
 have 

[text-generation] 2024-02-05T01:44:08.566Z
 been 

[text-generation] 2024-02-05T01:44:08.566Z
 46 

[text-generation] 2024-02-05T01:44:08.566Z
 people 

[text-generation] 2024-02-05T01:44:08.566Z
 elected 

// skipping some 170 events ...

[text-generation] 2024-02-05T01:44:08.584Z
 Andrew 

[text-generation] 2024-02-05T01:44:08.584Z
 Johnson 

[text-generation] 2024-02-05T01:44:08.584Z
 ( 

[text-generation] 2024-02-05T01:44:08.584Z
1865 

[stream-end] 2024-02-05T01:44:08.585Z
@dsinghvi
Copy link
Contributor

dsinghvi commented Feb 5, 2024

Hey @sevaseva thanks for filing this issue -- what version of the TypeScript client are you using?

@sevaseva
Copy link
Author

sevaseva commented Feb 5, 2024

        "node_modules/cohere-ai": {
            "version": "7.7.5",

@dsinghvi
Copy link
Contributor

dsinghvi commented Feb 5, 2024

What version of node are you on? The script seems to work just fine for me

Before] 2024-02-05T02:39:44.350Z 
[After] 2024-02-05T02:39:44.681Z 
[stream-start] 2024-02-05T02:39:44.683Z
[text-generation] 2024-02-05T02:39:44.950Z
There 

[text-generation] 2024-02-05T02:39:44.979Z
 have 

[text-generation] 2024-02-05T02:39:45.006Z
 been 

[text-generation] 2024-02-05T02:39:45.032Z
 46 

@sevaseva
Copy link
Author

sevaseva commented Feb 5, 2024

@dsinghvi , thanks for the instant reaction!

What version of node are you on?

$ node --version
v21.5.0

What about you?

I am playing with this from some API endpoint handler of a next.js app's local dev mode. Let me perhaps try it in a more plain environment... Or first try with an older nodejs...

@dsinghvi
Copy link
Contributor

dsinghvi commented Feb 5, 2024

I'm on node v18.9.0 and just trying it in a simple node app! Sounds good, let us know if you're able to create a consistent repro and we can dive deeper.

@sevaseva
Copy link
Author

sevaseva commented Feb 5, 2024 via email

@sevaseva
Copy link
Author

sevaseva commented Feb 5, 2024

Alright.

I confirmed this does not reproduce in a plain nodejs program. node v21.x.

This also does not reproduce from a next.js API route (GET) handler when the latter is configured with export const dynamic = "force-dynamic".

I checked in in various combinations of conditions such as local next dev server vs. deployed on vercel as an edge function (export const runtime = "edge") vs serverless function, etc.

The caching strategy (export const dynamic = "force-dynamic" vs "auto" vs other options) does make the difference, and I did not find other conditions (although there may be) that also do.

Upd: Confirmed with current nextjs v14.1

Now, next.js caching is its own can of worms with Full Route Cache, Data Cache and various other levels and kinds of caches and kinds of caching, with its own peculiarities and bugs. Arguably, none of that is relevant to cohere, fern, this API and this Typescript client. We could just close this issue and forget.

However.

  1. the caching strategy that reproduces the issue is the default and common one (for GET route handlers at least; in certain/many situations) and, well, many people will stumble into this and find "streaming not working" as I did, and that will slow them down or deter (bad for Cohere business),
  2. the actual behavior when the issue does reproduce is quite peculiar and may possibly
    a) reproduce in a wider variety of situations outside of next.js, etc and/or
    b) actually be a bug in cohere typescript client (and/or fern).

If you'd like to debug this a bit, try and find out exactly what's going on in that "streaming not working" scenario, here is the repro code. This can be added to a hello-world next.js app that uses nextjs' "App router":

src/app/api/route.js

import { CohereClient } from "cohere-ai";

export const runtime = "edge";

// Uncomment next line to "fix" the issue:
// export const dynamic = "force-dynamic";

const cohere = new CohereClient({
    token: process.env.COHERE_API_KEY,
});

export async function GET() {
    console.log(`[Before] ${new Date().toISOString()}\n`);
    const events = await cohere.chatStream({
        model: "command",
        message: "Tell me brief biographies of all US presidents",
        maxTokens: 400,
    });
    console.log(`[After] ${new Date().toISOString()}\n`);

    return new Response(events.stream);
}

To be clear, this code as is reproduces the issue (in next dev and when deployed to vercel). Uncommenting export const dynamic = "force-dynamic"; in line 6 "fixes" the issue.

If you'd like me to run some experiments to get to the bottom of this, let me know what I can do.

@dsinghvi
Copy link
Contributor

dsinghvi commented Feb 5, 2024

@sevaseva this is really useful, thank you! For context, Cohere uses Fern to generate SDKs. I've filed an issue in our repo to track this.

@billytrend-cohere
Copy link
Collaborator

Hey @sevaseva and @dsinghvi thanks for the detail here! I will take a look into the issue. I'd imagine your hypothesis about caching is correct. Perhaps nextjs needs to read the whole body of the response before it can make caching decisions? This would obviously break streaming because the whole body is only available once the stream has finished. This would also fit the behaviour you're seeing.

I will do some more research to see if this is a limitation of nextjs or if we can control this caching behaviour with some headers or something.

We are keen on ensuring good support for nextjs since we use it internally and as you say it's good for business 📈

@billytrend-cohere
Copy link
Collaborator

billytrend-cohere commented Feb 5, 2024

Yeah without repro'ing I think what is happening is that nextjs is revalidating the streamed request by reading the whole body and checking if it is still the same which breaks streaming. You are opting out of data caching by setting const dynamic = 'force-dynamic' which fixes that issue. A few follow ups I can think of are

  1. We should send cache-control: no-cache with our streaming response which overall makes sense to me for a mostly stochastic endpoint. This should disable next's revalidation.
  2. revalidate = 0 might also fix the issue for you right now with slightly less impact. (would be great if you could try that and report back)

Let me know if you have any more suggestions

@sevaseva
Copy link
Author

sevaseva commented Feb 6, 2024

Confirmed; export const revalidate = 0; at the route level does also "fix" the issue.

I didn't try fetch('https://...', { next: { revalidate: 0 } }), I guess that's out of scope of this issue (only relevant when one uses cohere api endpoints directly, without this SDK), and also we kind of know what would happen there anyway.


Perhaps nextjs needs to read the whole body of the response before it can make caching decisions?

That did at first make sense to me, but then... You see, there aren't too many "caching decisions" left to make once the request for upstream data has already been made. If the caching settings are such that the request (i.e. fetch call that user made) isn't to be cached, then there's nothing to worry about, we just send the upstream network request and we're done. If user's fetch call is cache-able AND there was a cache miss AND so we send the network request, then we know that we will just put its result into the cache when it comes back (assuming a 2xx status, no no-cache headers, etc). We (nextjs) can update the cache after all is said and done (I do not see why not, at least), i.e. we don't need to alter the way the result of the network request we made is delivered/streamed to the user (user being the application developer who wrote that fetch() call in their API route handler or someplace); we will have the whole result eventually, and at that later point in time (if all is well) we can update the cache.

So that's the first point: potentially it is feasible and is cleaner for nextjs to still stream results to their user even if they plan to update the cache with results of that fetch. And update the cache, if needed, after the response is eventually consumed. We might want to file an issue with nextjs to find out.


Second point, I think the following is correct regardless of what nextjs does:

We should send cache-control: no-cache with our streaming response which overall makes sense to me for a mostly stochastic endpoint.

except, I think, cache-control: no-cache should be sent with non-streaming responses, too, for the same reason.

@billytrend-cohere
Copy link
Collaborator

Hey @sevaseva we now send a no-cache header in our chat streaming response! Would love to hear if it solved your problem. Thanks

@sevaseva
Copy link
Author

Great!

I have been using that workaround we found.. I will try to remove the workaround, confirm it's no longer necessary and report back. But that may take me a couple of weeks (or longer - you know how that goes) to get to it. I don't mind if you close the issue.

Question. The non-streaming endpoints don't sent no-cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants