-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvlog: log basics for segment rolls #18521
Conversation
8858a98
to
e81cc70
Compare
/ci-repeat |
bc6a94c
to
4651b15
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49240#018f82d0-6c12-4eb4-912b-8ba00e7a9aa8 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49255#018f83cd-a3a8-4ab2-af72-f7b390239ff8 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49388#018f9d29-bd08-4032-844c-caafabf055fb ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49438#018fa139-a10b-4a06-bb8c-5f0d63bd5c92 |
This will be used to uniquely identify new segments.
4651b15
to
92c6b5f
Compare
: segment_file(std::move(f)) | ||
, appender(std::make_unique<segment_appender>(segment_file.get())) | ||
, readable_seg(std::make_unique<readable_segment>(segment_file.get())) | ||
, construct_time(ss::lowres_clock::now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would construction time be attached to the file instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean whether construction time should be something tracked in the mvlog::file
? I think probably not -- the construction time is only really useful for now in determining if we should apply segment.ms
Or are you asking whether it should be attached to something other than the file construction (e.g. first data write)? I think that wouldn't be a bad idea -- maybe it'd be important to ensure no "empty" segments e.g. if a segment contains only truncate entries maybe we wouldn't want it being rolled? Though I'm not sold that such empty segments are a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that ctime
is a normal posix file property, so associating it with the actual file made some sense. but maybe it's just a naming collision? we probably want to track our own creation timestamp in mvlog if that timestamp is going to be used for retention purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. Yeah it probably makes sense for retention. Will ponder a bit on how to resolve the naming conflict. I agree it might be confusing down the line
src/v/storage/mvlog/versioned_log.cc
Outdated
auto ro_seg = std::make_unique<readonly_segment>(std::move(active_seg_)); | ||
segs_.emplace_back(std::move(ro_seg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. i wonder if we should call this like extend
rather than roll? i mean, it doesn't really matter, but I am reading correctly that this roll is not like the roll we do in disk_log_impl--it's a logical roll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, a relatively self-describing name might be close_active_seg_unlocked()
or something. You're right in disk_log_impl "roll" typically refers to creating an additional segment, which this doesn't do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of extend
because it might be mistaken for adding data to the log, which this isn't doing, unless you're referring to the create_unlocked()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. i think it's fine as-is. i'm getting used to the naming
src/v/storage/mvlog/versioned_log.cc
Outdated
vassert( | ||
!active_segment_lock_.ready(), | ||
"create_unlocked() must be called with active segment lock held"); | ||
vassert(active_seg_ == nullptr, "Expected no active segment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there isn't, but it would be nice if this were impossible by construction. sort of like, a log always has an active segment--it's the last one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, though I haven't landed yet on an impl that I found easy to reason about. It's possible that everything covered by the active segment lock could be encapsulated into some active_segment_manager
class that handles rolling and appends and such. I suspect it'll make things harder to reason about on the read path, but I wasn't sure so I opted to keep this a little more raw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
log.info, | ||
"Rolling segment file {}", | ||
active_seg_->segment_file->filepath().c_str()); | ||
co_await active_seg_->segment_file->flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if the flush is necessary? if we extend the log and keep writing to the next segment at some point the "log" will be flushed, at which point we could flush all unflushed segments. is there a particular reason for flushing here before rolling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I felt it seemed natural for applying segment.ms for a flush to occur because it indicates we have some data that's otherwise just sitting in memory. But for segment.bytes maybe it's more worth it to be lazy about flushing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh. we can always measure it and see if it is worth keeping or removing.
Generally, the change looks good. I'm curious about the versioning. Previously, we had a lot of complexity created by use of locking. This code also relies on locking instead of versioning. Or maybe versioning will be used for truncation/readers? |
Good observation. I think the complexity of segment locking is that it allows for there to be many combinations of orderings of lock acquisitions, which makes it very easy to deadlock. I'm hoping that by more narrowly scoping the areas that we have locks that we'll reduce complexity. And that's right, we'll use versioning on the read path and in handling truncations. |
92c6b5f
to
ce88be0
Compare
src/v/storage/mvlog/versioned_log.cc
Outdated
} | ||
|
||
ss::future<> versioned_log::apply_segment_ms() { | ||
auto lock = active_segment_lock_.get_units(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing co_await? i thought this would fail 🤔 or we never added this diagnostic? llvm/llvm-project#76101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for this as well.
d27b749
to
8b291b3
Compare
"Rolling segment file {}", | ||
active_seg_->segment_file->filepath().c_str()); | ||
co_await active_seg_->segment_file->flush(); | ||
auto ro_seg = std::make_unique<readonly_segment>(std::move(active_seg_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is auto ro_seg = std::move(active_seg_)
on its own sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The active segment and readonly segments have different types, so it doesn't seem like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Eyes deceived me on this one.
return log_.get(); | ||
} | ||
|
||
ss::future<ss::circular_buffer<model::record_batch>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For future proofing, maybe this should use the version_log::segments_t
type (you'll have to make it public
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a group of batches, not segments
Introduces the beginnings of a log implementation that is able to roll new segments with segment.bytes or segment.ms. The log, similar to the disk_log_impl, is backed by a circular buffer of segments, with the noteable difference that the active segment is managed explicitly and separately from the others. I'm not fully sold on this approach (e.g. vs only managing the segment appender separately but keeping the underlying readable segment with the rest of the segments), but for now it makes reasoning about segment rolling concurrency a bit more straightforward, and down the line it will make it easier to reason about segment removals (e.g. the active segment cannot be removed). This implementation is still missing quite a bit (e.g. it doesn't flush, doesn't implement the exact log interface), but at least begins to introduce some ideas that can back the new log implementation.
8b291b3
to
22e39f0
Compare
Introduces the beginnings of a log implementation that is able to roll
new segments with segment.bytes or segment.ms.
The log, similar to the disk_log_impl, is backed by a circular buffer of
segments, with the noteable difference that the active segment is
managed explicitly and separately from the others.
I'm not fully sold on this approach (e.g. vs only managing the segment
appender separately but keeping the underlying readable segment with the
rest of the segments), but for now it makes reasoning about segment
rolling concurrency a bit more straightforward, and down the line it
will make it easier to reason about segment removals (e.g. the active
segment cannot be removed).
This implementation is still missing quite a bit (e.g. it doesn't flush,
doesn't implement the exact log interface), but at least begins to
introduce some ideas that can back the new log implementation.
Backports Required
Release Notes