Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibAudio: Fix EOF handling for FLAC files with unknown sample count #24335

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abuneri
Copy link
Contributor

@abuneri abuneri commented May 15, 2024

The original solution to get this test to pass was done in 1ee2091, but it looks like there has been a regression in the stream infrastructure since?

What I'm doing does seem to just be working around a potentially larger issue with the seekable stream infrastructure, but I didn't want to venture into that land at the moment, and it is nice to have all FLAC spec tests passing again.

(FYI for others who didn't know, to download the data required for TestFLACSpec tests, you need to add the -DINCLUDE_FLAC_SPEC_TESTS=ON option when configuring)

Before my fix:

flactest-before

After my fix:

flactest-after

@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label May 15, 2024
@@ -354,7 +354,15 @@ ErrorOr<Vector<FixedArray<Sample>>, LoaderError> FlacLoaderPlugin::load_chunks(s

size_t sample_index = 0;

while (!m_stream->is_eof() && sample_index < samples_to_read) {
// FIXME: To prevent this workaround that allows the test for a FLAC file with an unknown amount of samples to pass,
// is_eof() should return true at the same time as tell() == size() returns true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timschumi explains is_eof() behavior on discord sometimes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too often. :^)

I have a suspicion for what's causing this, I'll start a bisect later.

Copy link
Contributor Author

@abuneri abuneri May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some comments about when EOF should actually be set, and that is after the final read that returns 0, even if we know we're about to return 0 (e.g. the position is the size of the stream), EOF shouldn't be set yet.

That was the behaviour I noticed when poking around Stream::read_until_filled() for this particular test failure, and does make sense. But the problem is, at least with this FLAC infrastructure, by the time EOF is actually set it is too late and we're already in the middle of trying to read another frame that doesn't exist.

Maybe the PluginLoader infrastructure needs to have a way to bail out and assume/account for this EOF behaviour?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know that we are about to return 0, we'd have to explicitly ask each and every time that we'd want to get an accurate is_eof reading (thrice even, because there is no POSIX API to check the EOF condition in particular, so we'd have to seek multiple times to figure out where we are and where the end of the file is). We could offer that as is_eof_slow, but that would only work for seekable streams.

The unfortunate thing is that this can affect any user of Stream, as long as it was nice enough to read until the exact end of the data, and as long as it can only know if the file ends by trying to read the next set of data.

I'll finish up the bisect (which should be soon, because I have a few likely candidates in mind), and then I'll check whether we can substitute the whole thing with something like poll.

@timschumi
Copy link
Member

timschumi commented May 15, 2024

Turns out it's neither of the commits that I suspected, bisect points to these commits (one of which is unbuildable):

fc70d88
8258618

EDIT: The build error was trivial to fix, fc70d88 is the commit that "introduced" this issue. However, audio formats are not my area of expertise. cc @kleinesfilmroellchen

@kleinesfilmroellchen
Copy link
Collaborator

@timschumi Had a quick look over the code:

The first condition seems responsible, but also I don't see how the logic here is wrong: If we have an EOF, it's true regardless, and it yields no more samples. The plugin_at_end_of_stream logic in combination with that may be incorrect.

I don't currently have spoons to investigate this further. Thanks for bisecting.

@timschumi
Copy link
Member

is_eof is almost certainly at fault, what I'm most confused about is how that patch managed to make the decoder run into an EOF error. The only case I can see why that would happen is that the previous version of the remaining_samples condition prevented any kind of decoding from ocurring.

@abuneri
Copy link
Contributor Author

abuneri commented May 17, 2024

is_eof is almost certainly at fault, what I'm most confused about is how that patch managed to make the decoder run into an EOF error. The only case I can see why that would happen is that the previous version of the remaining_samples condition prevented any kind of decoding from ocurring.

@timschumi
I'll mention one of my initial fixes got the test to pass; however, when I inspected further I noticed that it was only passing because it was never trying to fetch any frames at all (test completed in 0ms vs my final fix which made the test complete in ~50ms).

So maybe the test only first started passing in that initial commit I linked because it never made it as far as attempting to read frames, therefore hiding the the eof() issues, then the commit you bisected made changes that caused us to actually read frames, thus exposing the eof problem that has always existed?

@abuneri
Copy link
Contributor Author

abuneri commented May 18, 2024

@kleinesfilmroellchen I've updated to a more efficient fix that doesn't require us to seek the stream and reset, and more accurately captures how FLAC files with unknown sample counts can interact with the Stream infrastructure

This fixes a regression in flac_spec_test_45 where we end up trying to
read from one extra frame that does not exist.
In the `Stream::read_until_filled()` call nread returns 0 and then EOF
is true, that does seem like a valid state, but the FLAC consumer was
expecting a single byte to be read, so the EOF error ends up being
returned instead. Since this is an expected state for FLAC files with
unknown sample sizes, we can simply exit out of the frame read without
propagating the error, as we now have all the samples
// infrastructure does not set EOF until the number of bytes read from the stream is 0. So in the case of an
// unknown sample size, we should expect the final frame read to fail, because we only find out the EOF is set
// during next_frame()
if (m_total_samples == NumericLimits<decltype(m_total_samples)>::max() && m_stream->is_eof())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to just have an extra boolean member variable m_unknown_sample_count, or even better a helper method called is_unknown_sample_count() that just does return m_total_samples == NumericLimits<decltype(m_total_samples)>::max()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👀 pr-needs-review PR needs review from a maintainer or community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants