-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamed events arrive all at once, _after_ generation is all done #132
Comments
Hey @sevaseva thanks for filing this issue -- what version of the TypeScript client are you using? |
|
What version of node are you on? The script seems to work just fine for me
|
@dsinghvi , thanks for the instant reaction!
$ node --version
v21.5.0 What about you? I am playing with this from some API endpoint handler of a next.js app's local dev mode. Let me perhaps try it in a more plain environment... Or first try with an older nodejs... |
I'm on node v18.9.0 and just trying it in a simple node app! Sounds good, let us know if you're able to create a consistent repro and we can dive deeper. |
Yep; will try that in a bit and update here.
Also... the fact that I am using a trial cohere API key cannot possibly
cause this behavior, can it?.. (it doesn't with Python client)
…
Message ID: ***@***.***>
|
Alright. I confirmed this does not reproduce in a plain nodejs program. node v21.x. This also does not reproduce from a next.js API route (GET) handler when the latter is configured with I checked in in various combinations of conditions such as local The caching strategy ( Upd: Confirmed with current nextjs v14.1 Now, next.js caching is its own can of worms with Full Route Cache, Data Cache and various other levels and kinds of caches and kinds of caching, with its own peculiarities and bugs. Arguably, none of that is relevant to cohere, fern, this API and this Typescript client. We could just close this issue and forget. However.
If you'd like to debug this a bit, try and find out exactly what's going on in that "streaming not working" scenario, here is the repro code. This can be added to a hello-world next.js app that uses nextjs' "App router": src/app/api/route.js import { CohereClient } from "cohere-ai";
export const runtime = "edge";
// Uncomment next line to "fix" the issue:
// export const dynamic = "force-dynamic";
const cohere = new CohereClient({
token: process.env.COHERE_API_KEY,
});
export async function GET() {
console.log(`[Before] ${new Date().toISOString()}\n`);
const events = await cohere.chatStream({
model: "command",
message: "Tell me brief biographies of all US presidents",
maxTokens: 400,
});
console.log(`[After] ${new Date().toISOString()}\n`);
return new Response(events.stream);
} To be clear, this code as is reproduces the issue (in If you'd like me to run some experiments to get to the bottom of this, let me know what I can do. |
Hey @sevaseva and @dsinghvi thanks for the detail here! I will take a look into the issue. I'd imagine your hypothesis about caching is correct. Perhaps nextjs needs to read the whole body of the response before it can make caching decisions? This would obviously break streaming because the whole body is only available once the stream has finished. This would also fit the behaviour you're seeing. I will do some more research to see if this is a limitation of nextjs or if we can control this caching behaviour with some headers or something. We are keen on ensuring good support for nextjs since we use it internally and as you say it's good for business 📈 |
Yeah without repro'ing I think what is happening is that nextjs is revalidating the streamed request by reading the whole body and checking if it is still the same which breaks streaming. You are opting out of data caching by setting
Let me know if you have any more suggestions |
Confirmed; I didn't try
That did at first make sense to me, but then... You see, there aren't too many "caching decisions" left to make once the request for upstream data has already been made. If the caching settings are such that the request (i.e. fetch call that user made) isn't to be cached, then there's nothing to worry about, we just send the upstream network request and we're done. If user's fetch call is cache-able AND there was a cache miss AND so we send the network request, then we know that we will just put its result into the cache when it comes back (assuming a 2xx status, no So that's the first point: potentially it is feasible and is cleaner for nextjs to still stream results to their user even if they plan to update the cache with results of that fetch. And update the cache, if needed, after the response is eventually consumed. We might want to file an issue with nextjs to find out. Second point, I think the following is correct regardless of what nextjs does:
except, I think, |
Hey @sevaseva we now send a no-cache header in our chat streaming response! Would love to hear if it solved your problem. Thanks |
Great! I have been using that workaround we found.. I will try to remove the workaround, confirm it's no longer necessary and report back. But that may take me a couple of weeks (or longer - you know how that goes) to get to it. I don't mind if you close the issue. Question. The non-streaming endpoints don't sent no-cache? |
I observe that with any response, long and short, events don't actually arrive as the response is generated, instead they all arrive at once after generation is done. I expect events, instead, to arrive more or less evenly as generation is taking place.
Is that just a bug in the Typescript client or..? I know this issue doesn't exist when one uses Python client.
in the output I see that, in this case, generation seems to have taken 6 seconds, and then all events got processed in 20 milliseconds:
The text was updated successfully, but these errors were encountered: