Skip to content

Commit 803c0e2

Browse files
authored
Fix BaseSQLToGCSOperator approx_max_file_size_bytes (#25469)
* Fix BaseSQLToGCSOperator approx_max_file_size_bytes When using the parquet file_format, using `tmp_file_handle.tell()` always points to the beginning of the file after the data has been saved and therefore is not a good indicator for the files current size. Save the current file pointer position and set the file pointer position to `os.SEEK_END`. file_size is set to the new position, and the file pointer's position goes back to the saved position. Currently, after a parquet write operation the pointer is set to 0, and therefore, simply executing `tmp_file_handle.tell()` is not sufficient to determine the current size. This sequence is added to allow file splitting when the export format is set to parquet.
1 parent d004841 commit 803c0e2

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

β€Žairflow/providers/google/cloud/transfers/sql_to_gcs.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,8 @@ def _write_local_data_files(self, cursor):
198198
names in GCS, and values are file handles to local files that
199199
contain the data for the GCS objects.
200200
"""
201+
import os
202+
201203
org_schema = list(map(lambda schema_tuple: schema_tuple[0], cursor.description))
202204
schema = [column for column in org_schema if column not in self.exclude_columns]
203205

@@ -250,7 +252,12 @@ def _write_local_data_files(self, cursor):
250252
tmp_file_handle.write(b'\n')
251253

252254
# Stop if the file exceeds the file size limit.
253-
if tmp_file_handle.tell() >= self.approx_max_file_size_bytes:
255+
fppos = tmp_file_handle.tell()
256+
tmp_file_handle.seek(0, os.SEEK_END)
257+
file_size = tmp_file_handle.tell()
258+
tmp_file_handle.seek(fppos, os.SEEK_SET)
259+
260+
if file_size >= self.approx_max_file_size_bytes:
254261
file_no += 1
255262

256263
if self.export_format == 'parquet':

0 commit comments

Comments
 (0)