Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve the performance of ValueRowDeserializer #16763

Open
fuyufjh opened this issue May 15, 2024 · 2 comments
Open

perf: improve the performance of ValueRowDeserializer #16763

fuyufjh opened this issue May 15, 2024 · 2 comments
Assignees
Labels
component/storage Storage help wanted Issues that need help from contributors type/feature type/perf
Milestone

Comments

@fuyufjh
Copy link
Contributor

fuyufjh commented May 15, 2024

The current implementation of ValueRowDeserializer is not efficient, which didn't fully utilize the performance of our row encoding format.

fn deserialize(&self, mut encoded_bytes: &[u8]) -> Result<Vec<Datum>> {
let flag = Flag::from_bits(encoded_bytes.get_u8()).expect("should be a valid flag");
let offset_bytes = match flag - Flag::EMPTY {
Flag::OFFSET8 => 1,
Flag::OFFSET16 => 2,
Flag::OFFSET32 => 4,
_ => return Err(ValueEncodingError::InvalidFlag(flag.bits())),
};
let datum_num = encoded_bytes.get_u32_le() as usize;
let offsets_start_idx = 4 * datum_num;
let data_start_idx = offsets_start_idx + datum_num * offset_bytes;
let offsets = &encoded_bytes[offsets_start_idx..data_start_idx];
let data = &encoded_bytes[data_start_idx..];
let mut datums: Vec<Option<Datum>> = vec![None; self.schema.len()];
let mut contained_indices = BTreeSet::new();
for i in 0..datum_num {
let this_id = encoded_bytes.get_i32_le();
if let Some(&decoded_idx) = self.needed_column_ids.get(&this_id) {
contained_indices.insert(decoded_idx);
let this_offset_start_idx = i * offset_bytes;
let mut this_offset_slice =
&offsets[this_offset_start_idx..(this_offset_start_idx + offset_bytes)];
let this_offset = deserialize_width(offset_bytes, &mut this_offset_slice);
let data = if i + 1 < datum_num {
let mut next_offset_slice = &offsets[(this_offset_start_idx + offset_bytes)
..(this_offset_start_idx + 2 * offset_bytes)];
let next_offset = deserialize_width(offset_bytes, &mut next_offset_slice);
if this_offset == next_offset {
None
} else {
let mut data_slice = &data[this_offset..next_offset];
Some(deserialize_value(
&self.schema[decoded_idx],
&mut data_slice,
)?)
}
} else if this_offset == data.len() {
None
} else {
let mut data_slice = &data[this_offset..];
Some(deserialize_value(
&self.schema[decoded_idx],
&mut data_slice,
)?)
};
datums[decoded_idx] = Some(data);
}
}
for (id, datum) in &self.default_column_values {
if !contained_indices.contains(id) {
datums[*id].get_or_insert(datum.clone());
}
}
Ok(datums.into_iter().map(|d| d.unwrap_or(None)).collect())
}

As shown in the figure below, the main time consumption is due to container operations.

This encoding is rather friendly for random access, isn't it? The sorted column_id is obviously intended to facilitate binary search.

image

https://github.com/risingwavelabs/rfcs/blob/75091f0c7f197f718b8343eb121932df5530ddb1/rfcs/0090-table-schema-change.md#column-aware-row-encoding

@github-actions github-actions bot added this to the release-1.10 milestone May 15, 2024
@fuyufjh fuyufjh added help wanted Issues that need help from contributors component/storage Storage type/perf labels May 15, 2024
@BugenZhao
Copy link
Member

Would you please share some backgrounds? For example, in which case does this become the bottleneck of the query or the job?

@zwang28
Copy link
Contributor

zwang28 commented May 22, 2024

Would you please share some backgrounds? For example, in which case does this become the bottleneck of the query or the job?

My case is count(*) a table with 100 int columns, 10 million rows.

The bottleneck is the StorageTable that calls ValueRowDeserializer always explicitly deserializes the full row, which can be optimized to deserialize needed columns only (maybe by leveraging the needed_column_ids).

I'll post a comparision after optimizing it later.
With this minor change, the query latency drops from 60sec to 15sec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/storage Storage help wanted Issues that need help from contributors type/feature type/perf
Projects
None yet
Development

No branches or pull requests

3 participants