-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract RowInTable
logic
#108696
Abstract RowInTable
logic
#108696
Conversation
This moves the logic for finding the offset in a table that we will use in `LOOKUP` from a method on `BlockHash` and some complex building logic in `HashLookupOperator`. Now it's in an `RowInTable` interface - both a static builder method and some implementations. There are three implementations: 1. One that talks to `BlockHash` just like `HashLookupOperator` used to. Right now it talks to `PackedValuesBlockHash` because it's the only one who's `lookup` method returns the offset in the original row, but we'll fix it eventually. 2. A `RowInTable` that works with increasing sequences of integers, say, `1, 2, 3, 4, 5` - this is fairly simple - it just checks that the input is between `1` and `5` and, if it is, subtracts `1`. Easy. Obvious. And very very fast. Simple. Good simple example. 3. An `RowInTable` that handles empty tables - this just makes writing the rest of the code simpler. It always returns `null`.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks Nik! I have some optional comments, but feel free to merge as is.
"keys must have the same number of positions but [" + positions + "] != [" + keys[k].getPositionCount() + "]" | ||
); | ||
} | ||
for (int p = 0; p < keys[k].getPositionCount(); p++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a quick check with Block#mayHaveMultivaluedFields()
, then double-check every position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍. no need to check if it can't have it.
); | ||
boolean success = false; | ||
try { | ||
final int[] lastOrd = new int[] { -1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe move lastOrd
inside the AddInput
and change it to an int?
} | ||
|
||
private final List<String> keys; | ||
private final RowInTable lookup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call this rowInTable
or table
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ - old names didn't get changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice abstraction!
* Consumes {@link Page}s and looks up each row in a pre-built table, and returns the | ||
* offsets of each row in the table. | ||
*/ | ||
public abstract sealed class RowInTable implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name suggests this models a row (in a table), but it really represents looking up a row.
public abstract sealed class RowInTable implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable { | |
public abstract sealed class RowInTableLookup implements Releasable permits EmptyRowInTable, AscendingSequenceRowInTable, BlockHashRowInTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's a better name!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed this thing like 3 times already.
private IntVector lookupVector(IntVector vector) { | ||
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) { | ||
for (int i = 0; i < vector.getPositionCount(); i++) { | ||
builder.appendInt(vector.getInt(i) - min); | ||
} | ||
return builder.build(); | ||
} | ||
} | ||
|
||
private IntBlock lookupBlock(IntVector vector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: names are a bit confusing.
private IntVector lookupVector(IntVector vector) { | |
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) { | |
for (int i = 0; i < vector.getPositionCount(); i++) { | |
builder.appendInt(vector.getInt(i) - min); | |
} | |
return builder.build(); | |
} | |
} | |
private IntBlock lookupBlock(IntVector vector) { | |
private IntVector lookupVectorInRange(IntVector vector) { | |
try (IntVector.Builder builder = blockFactory.newIntVectorFixedBuilder(vector.getPositionCount())) { | |
for (int i = 0; i < vector.getPositionCount(); i++) { | |
builder.appendInt(vector.getInt(i) - min); | |
} | |
return builder.build(); | |
} | |
} | |
private IntBlock lookupVector(IntVector vector) { |
|
||
@Override | ||
public String toString() { | ||
return "DirectLookup[" + min + "-" + max + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the toString
match the class name? That could be confusing during debugging.
Applies in general to the classes added in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boo. yeah. Old tostring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to the name of the class - like here I'll call it AscendingSequence
. But, yeah, I'll double check them. It's what I get when I rename a bunch of stuff as I go.
if (v != null) { | ||
values.add(v); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why a null
value for v
doesn't translate into a null
added to the builder - won't the builders get misaligned? Could it be that currently nulls don't occur in the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me go poke the tests some more. null
is valid key and should get mapped to whatever row has the null
. And you can look it up. That's how aggs work because that's how postgresql and friends work. Let me double check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've got is actually correct, but it's quite tricky. Tricky in ways ways the certainly deserve a block comment. Adding one.
This moves the logic for finding the offset in a table that we will use in
LOOKUP
from a method onBlockHash
and some complex building logic inHashLookupOperator
. Now it's in anRowInTable
interface - both a static builder method and some implementations.There are three implementations:
BlockHash
just likeHashLookupOperator
used to. Right now it talks toPackedValuesBlockHash
because it's the only one who'slookup
method returns the offset in the original row, but we'll fix it eventually.RowInTable
that works with increasing sequences of integers, say,1, 2, 3, 4, 5
- this is fairly simple - it just checks that the input is between1
and5
and, if it is, subtracts1
. Easy. Obvious. And very very fast. Simple. Good simple example.RowInTable
that handles empty tables - this just makes writing the rest of the code simpler. It always returnsnull
.