-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capsules/alarm: left-justify 32 bit ticks, re-architect alarm, add unit tests #3975
base: master
Are you sure you want to change the base?
capsules/alarm: left-justify 32 bit ticks, re-architect alarm, add unit tests #3975
Conversation
04bad49
to
c8e4da7
Compare
capsules/core/src/alarm.rs
Outdated
// TODO: document | ||
fn earliest_alarm<UD, R, F: FnOnce(Expiration<A::Ticks>, &UD) -> Option<R>>( | ||
now: A::Ticks, | ||
expirations: impl Iterator<Item = (Expiration<A::Ticks>, UD, F)>, | ||
// expired_handler: &mut impl FnMut(&Expiration<A::Ticks>, &UD) -> Option<R>, | ||
) -> Result<Option<(Expiration<A::Ticks>, UD)>, (Expiration<A::Ticks>, UD, R)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does much of the heavy lifting of re-arming the timer, as it's generic. This also makes it hard to understand. It is in desperate need of documentation. expired_handler
as a parameter has been integrated into expirations
and should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this misses a corner case where:
now = 1000
expirations[0].fire_at = 1050
expirations[1].fire_at = 950
if the expirations[1].expired_handler
runs for >50 ticks, then by the time expirations[0]
is processed as the "earliest alarm" in the block at line 193, its dt
will look far in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might well be wrong, but I don't think that this is a concern here.
First of all, this function takes now
as a parameter. We do this primarily to avoid performing a volatile read of the underlying timer's register multiple times, but it also ensures that this function and all of the expired-callbacks are executed from a consistent reference time. This means that, regardless of how long each expired handler takes, we always identify the earliest alarm from this reference time:
tock/capsules/core/src/alarm.rs
Lines 139 to 190 in c8e4da7
// Ask the clock about a current reference once. This can incur a | |
// volatile read, and this may not be optimized if done in a loop: | |
let now = self.alarm.now(); | |
let expired_handler = |expired: Expiration<A::Ticks>, process_id: &ProcessId| { | |
// This closure is run on every expired alarm, _after_ the `enter()` | |
// closure on the Grant iterator has returned. We are thus not | |
// risking reentrancy here. | |
// Enter the app's grant again: | |
let _ = self.app_alarms.enter(*process_id, |alarm_state, upcalls| { | |
// Reset this app's alarm: | |
alarm_state.expiration = None; | |
// Keep track of the number of armed alarms, to allow for the | |
// early-exit at the top of this function: | |
self.num_armed.set(self.num_armed.get() - 1); | |
// Deliver the upcall: | |
upcalls | |
.schedule_upcall( | |
ALARM_CALLBACK_NUM, | |
( | |
now.into_u32_left_justified() as usize, | |
expired.reference.wrapping_add(expired.dt).into_usize(), | |
0, | |
), | |
) | |
.ok(); | |
}); | |
// Proceed iteration across expirations: | |
None::<()> | |
}; | |
// Compute the earliest alarm, and invoke the `expired_handler` for | |
// every expired alarm. This will issue a callback and reset the alarms | |
// respectively. | |
let res = Self::earliest_alarm( | |
now, | |
// Pass an interator of all non-None expirations: | |
self.app_alarms.iter().filter_map(|app| { | |
let process_id = app.processid(); | |
app.enter(|alarm_state, _upcalls| { | |
if let Some(exp) = alarm_state.expiration { | |
Some((exp, process_id, expired_handler)) | |
} else { | |
None | |
} | |
}) | |
}), | |
); |
Apart from that, we also don't just save the fire_at
value, but reference
and dt
, where fire_at = reference.wrapping_add(dt)
. This provides us with significant margin of error, provided that dt
is significantly smaller than Ticks::max()
. Even if we were to take long, once now
surpasses reference + dt
and did not tick another Ticks::max() - dt
, we'll still treat it as expired. We also don't risk mistaking a previous timer as the "earliest", as when now
is not within [reference; dt)
, we treat is as expired instead.
This does raise another question though: I believe that all of this falls apart when Ticks::max() - dt
is very small. In this case, we'd run a realistic risk of never firing a timer, as we'd never trigger the case where now
is within [reference; dt)
.
Should we limit the maximum value of dt
, perhaps to Ticks::max() / 2
, to ensure that we always reliably detect when it has expired and have enough time to schedule the callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, I think lost in my reading through the web UI diff view was where the reference, dt
was coming from that is ultimately passed to set_alarm
.
The concern does come back up in set_alarm
implementations, as they all do do a fresh read of now
. I looked through several (chosen arbitrarily as example, the rp2040 alarm implementation does this right) and all seem to handle the case where now
isn't in the [ref,dt] range. Most seem to also account for minimum_dt()
in the set_alarm
implementation, but it looks like esp32 might be missing that nuance. Not a bug for this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the person to really dig into the alarm capsule and reason about the correctness of this change.
But, I did look over the code and it generally makes sense to me and seems more clear.
Can we merge just the hil changes now? Those are the same between the two PRs, right? I say we just merge the other PR with the quick fix, and leave this open until we can get more eyes on it. |
@bradjc The reason this PR exists is precisely because I'm having a really hard time reasoning whether the changes in the other PR lead to "sane" behavior (i.e. the same behavior that this PR exhibits). It does, however, seem to resolve the issue we're facing currently, so I say let's merge it! Ideally, neither PR introduces breaking changes for userspace. And indeed, the HIL changes are the same. |
The merge-base changed after approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the extent that I'm capable of reading and reasoning through timer code, this looks good to me.
The merge-base changed after approval.
…d tests Before this change, and in particular with the introduction of tock#3973, it is very hard to reason about the wrapping logic in the alarm driver with > 32 bit timers. This is because the alarm driver used to deal only with 32-bit values internally. However, it would not record whether those values were potentially stemming from a timer of a smaller timer. This then causes issues when it will set a new alarm in its wrapping logic. Apart from that, the fact that it first converted to 32 bit, and then back again to the underlying timer width, up- and down-scaling as necessary, is very confusing. The code interleaves iterating over timers, computing wrapping, and controlling the hardware in a way that makes it hard to reason about, and virtually impossible to unit-test. I refactored the driver to clearly separate out this logic: - `Expiration` now always refers to a set alarm. `Option<Expiration>` refers to an alarm that can be disabled. Distinguishing between these on a type level (versus one single `enum Expiration` for both) allows us to get rid of a many checks. - `earliest_alarm` iterates over a list of alarms, invoking a callback on each expired one, and returns the earliest non-expired one. This is very easily unit-testable and allows us to combine callback- and re-arm operations into a single handler. - `process_rearm_or_callback` is this handler. It is a thin wrapper around the above, and controls the hardware (this part is not easily testable). - `rearm_u32_left_justified_expiration` does all the magic wrapping / shifting logic. It is not clobbered with any details about the underlying alarm, but instead just converts a 32-bit left-justified ticks value into any underlying A::Ticks, regardless of width. It does appropriate wrapping for smaller and larger timers. This is extensively unit-tested. The logic in the `Driver` implementation is simplified as well.
c8e4da7
to
8a4f730
Compare
I rebased and added comments to the magic I also removed the To support these optimizations and other de-initialization of peripherals, for instance to turn them into a low-power state again after an app fault, we'd want to have an additional handler for this in capsules. Historically there have been good reasons why we'd want to avoid such "broadcast" events, in particular over the dyn-traits that are our drivers. However, not supporting this does seem to limit our ability to build expressive and optimized drivers in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me
@alevy Thanks for fixing the tests! Forgot to carry the removal of the |
Pull Request Overview
This pull request is a second revision of #3973, with a full rewrite of the alarm driver. See below for why I deem this the best strategy forward.
Original Motivation
This change introduces new APIs on the
Ticks
trait ofkernel/src/hil/time.rs
which allow consumers of this type to know the underlying timer's actual ticks width (may be less than 32 bit), and to right-pad Ticks values for it to wrap at exactly(2 ** 32) - 1
ticks.By exposing this shifted Ticks value and an appropriately scaled tick frequency to userspace, Tock finally conforms to assumptions made in
libtock-c
long ago:This predictable timer interface can allow userspace to build timer infrastructure that can schedule alarms or sleep for longer than
2 ** Timer Width
ticks.Apart from now conforming to libtock-c's expectations, by scaling both timer and frequency this change should not break any existing userspace applications (apart from triggering an otherwise unrelated bug in libtock-c's virtual alarms). It does modify code for which a relevant TRD (105) exists, but that TRD is still in draft state.
Alarm Driver Rewrite
The wrapping logic in the alarm driver was likely still broken in #3973 with > 32 bit timers. This is because the alarm driver used to deal only with 32-bit values internally. However, it would not record whether those values were potentially stemming from a timer of a smaller timer. This then causes issues when it will set a new alarm in its wrapping logic.
Apart from that, the fact that it first converted to 32 bit, and then back again to the underlying timer width, up- and down-scaling as necessary, is very confusing. The code interleaves iterating over timers, computing wrapping, and controlling the hardware in a way that makes it hard to reason about, and virtually impossible to unit-test.
I refactored the driver to clearly separate out this logic:
Expiration
now always refers to a set alarm.Option<Expiration>
refers to an alarm that can be disabled. Distinguishing between these on a type level (versus one singleenum Expiration
for both) allows us to get rid of a many checks.earliest_alarm
iterates over a list of alarms, invoking a callback on each expired one, and returns the earliest non-expired one. This is very easily unit-testable and allows us to combine callback- and re-arm operations into a single handler.process_rearm_or_callback
is this handler. It is a thin wrapper around the above, and controls the hardware (this part is not easily testable).rearm_u32_left_justified_expiration
does all the magic wrapping / shifting logic. It is not clobbered with any details about the underlying alarm, but instead just converts a 32-bit left-justified ticks value into any underlying A::Ticks, regardless of width. It does appropriate wrapping for smaller and larger timers. This is extensively unit-tested.The logic in the
Driver
implementation is simplified as well.I'm not happy with the fact that this is a full re-write, but I am much more confident that the implementation is correct and has less edge-cases than with the old architecture. I hope others find the new code structure equally readable.
Testing Strategy
LiteX Sim CI and unit tests introduced in this PR:
TODO or Help Wanted
Updates to the TRD.
Documentation Updated
/docs
, or no updates are required.Formatting
make prepush
.