GH-37378: [C++] Add A Dictionary Compaction Function For DictionaryArray#37418
Conversation
|
|
| dict_used[current_index] = true; | ||
| dict_used_count++; | ||
|
|
||
| if (dict_used_count == dict_length) { |
There was a problem hiding this comment.
Checking here enables skipping the rest of the dictionary, which is good. However I think it'd also be useful to detect usage of only a slice of the dictionary. If you'd prefer not to handle that in this PR, please write a follow up issue
There was a problem hiding this comment.
Hi, @bkietz, do you mean if we find it just use only a slice of dictionay, we use slice() instead of Take? I prefer to leave it as an new issue. Since we have another PR which is wating for this PR to be merged.
There was a problem hiding this comment.
Can we just use Slice outside the Compact to handling this?
There was a problem hiding this comment.
I write a follow up issue #38247 to handle the optimization
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
|
|
||
| type = boolean(); | ||
| dict_type = dictionary(index_type, type); | ||
|
|
There was a problem hiding this comment.
Should we add a test for invalid input type?
There was a problem hiding this comment.
No. We can't create a DictionaryArray with an invalid index type.
There was a problem hiding this comment.
Oh I see a
if (data->type->id() != Type::DICTIONARY) {
return Status::TypeError("Expected dictionary type");
}Here, but seems it cannot be called
bkietz
left a comment
There was a problem hiding this comment.
LGTM, thanks for working on this!
|
Strange. All C++ tests are canceled in CI. |
|
CI failures look unrelated. Thanks! |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 73454b7. There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <junming.chen.r@outlook.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <junming.chen.r@outlook.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.
What changes are included in this PR?
Add a Function for Dictionary Compaction
Are these changes tested?
Yes
Are there any user-facing changes?
No