Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3/5] [Run Timeline] Add hourly data cache #21934

Merged
merged 37 commits into from
May 22, 2024
Merged

[3/5] [Run Timeline] Add hourly data cache #21934

merged 37 commits into from
May 22, 2024

Conversation

salazarm
Copy link
Contributor

@salazarm salazarm commented May 17, 2024

Summary & Motivation

Add an hourly data cache to the Run Timeline.
We only cache completed run data and we only request data for time ranges that we don't already have data for.

  • how do we handle settling ongoing runs in to the completed bucket?
    • when an ongoing run goes into completed it should have an updatedTimestamp that belongs to a range we havent fetched yet

Updated the completed run bucketing query logic to fetch updatedAfter -> updatedBefore timestamps to avoid the overfetching that the current updatedAfter -> createdBefore buckets make.

image

updatedAfter -> updatedBefore buckets only capture range 3 and 4 boxes though:
image

To capture range 2 and range 1 we need runs that were createdBefore the right boundary and updated
after the right boundary. Instead of querying for these runs specifically we rely on the fact that we have already queried for all the future buckets since we always paginate backwards.

So we subscribe to the cache for that data and then filter for runs that started before the right boundary of the time range

How I Tested These Changes

Jest tests.

Left the page open for half an hour and saw in progress runs transition to completed.

Switched between 1hr / 6hr / 12hr / 24hr views and saw we only request time ranges that are missing.

@salazarm salazarm changed the title [3/3] [Run Timeline] Add hourly data cache backed by indexedb [3/3] [Run Timeline] Add hourly data cache May 17, 2024
@salazarm salazarm changed the title [3/3] [Run Timeline] Add hourly data cache [3/4] [Run Timeline] Add hourly data cache May 17, 2024
@salazarm salazarm changed the title [3/4] [Run Timeline] Add hourly data cache [3/5] [Run Timeline] Add hourly data cache May 17, 2024
@salazarm
Copy link
Contributor Author

Going to be adding more tests and changing strategy a bit here. Will update PR soon

@salazarm salazarm dismissed bengotow’s stale review May 20, 2024 14:49

Going to add more tests + change strategy a little

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to your queue for ^ planned changes

}
});
Object.entries(buckets)
.sort((bucketA, bucketB) => COMMON_COLLATOR.compare(bucketA[0], bucketB[0]))
Copy link
Contributor Author

@salazarm salazarm May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stops the run timeline rows from shuffling around

@salazarm salazarm dismissed alangenfeld’s stale review May 21, 2024 01:10

ready for review

@@ -14,7 +14,7 @@ import {WorkspaceContext} from '../workspace/WorkspaceContext';
import {buildRepoAddress} from '../workspace/buildRepoAddress';
const LOOKAHEAD_HOURS = 1;
const ONE_HOUR = 60 * 60 * 1000;
const POLL_INTERVAL = 60 * 1000;
const POLL_INTERVAL = 5 * 1000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowered the poll interval since we only fetch for the missing ranges.
Increased the poll interview for FutureTicks / Ongoing runs below.

@salazarm
Copy link
Contributor Author

Buildkite failures unrelated

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks solid, probably still worth @bengotow taking a peak since my typescript isn't the strongest

@salazarm salazarm merged commit 25e1783 into master May 22, 2024
0 of 2 checks passed
@salazarm salazarm deleted the salazarm/part3 branch May 22, 2024 02:27
nikomancy pushed a commit that referenced this pull request May 22, 2024
## Summary & Motivation

Add an hourly data cache to the Run Timeline. 
We only cache completed run data and we only request data for time
ranges that we don't already have data for.


- how do we handle settling ongoing runs in to the completed bucket?
- - when an ongoing run goes into completed it should have an
updatedTimestamp that belongs to a range we havent fetched yet

Updated the completed run bucketing query logic to fetch updatedAfter ->
updatedBefore timestamps to avoid the overfetching that the current
updatedAfter -> createdBefore buckets make.


![image](https://github.com/dagster-io/dagster/assets/2286579/af0b2c02-f189-4332-95a2-5cacba95a6d3)


updatedAfter -> updatedBefore buckets only capture range 3 and 4 boxes
though:

![image](https://github.com/dagster-io/dagster/assets/2286579/c566547a-43b9-4d83-bd07-5e5b9e1285be)


To capture range 2 and range 1 we need runs that were createdBefore the
right boundary and updated
after the right boundary. Instead of querying for these runs
specifically we rely on the fact that we have already queried for all
the future buckets since we always paginate backwards.

So we subscribe to the cache for that data and then filter for runs that
started before the right boundary of the time range


## How I Tested These Changes
Jest tests.

Left the page open for half an hour and saw in progress runs transition
to completed.

Switched between 1hr / 6hr / 12hr / 24hr views and saw we only request
time ranges that are missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants