Skip to content

Support decompressing concatenated gzip members (stream) #38271

@amassalha

Description

@amassalha

Describe the enhancement requested

Curent gzip decompress is calling 'infalte' until getting 'Z_STREAM_END ' or error is returned, but zccording to gzip (zlib) documentation, this might be not enough:

" inflate() will not automatically decode concatenated gzip members. inflate() will return Z_STREAM_END at the end of the gzip member. The state would need to be reset to continue decoding a subsequent gzip member. This must be done if there is more data after a gzip member, in order for the decompression to be compliant with the gzip standard (RFC 1952)." (https://www.zlib.net/manual.html)

This PR is for supporting reading parquet files that contains more than 1 gzip member. (example file attahced)
concatenated_gzip_members.zip

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions