Add GizmoSQL support to @malloydata/db-duckdb#2582
Conversation
89d74fd to
274cf31
Compare
Adds GizmoSQL connection support following the established pattern for DuckDB variants (similar to DuckDBWasmConnection). Key features: - GizmoSQLConnection class extending DuckDBCommon - Python ADBC bridge for Arrow Flight SQL protocol communication - SQL injection protection for catalog identifiers - Efficient data transfer via Apache Arrow IPC format Architecture: GizmoSQL is DuckDB served over Arrow Flight SQL with the same SQL dialect. Following the project's convention, this implementation lives in db-duckdb alongside other DuckDB transport mechanisms. Dependencies: - Python: pyarrow, adbc-driver-flightsql - Node: apache-arrow (already a dependency) Signed-off-by: Schyler Ruhland <schyler@summation.com>
Adds integration tests following the repository's standard pattern: - Uses describeIfDatabaseAvailable for conditional test execution - Tests connection, queries, streaming, schema introspection - Tests error handling and edge cases - Automatically skipped in CI when credentials unavailable All 8 tests pass with real GizmoSQL server. Signed-off-by: Schyler Ruhland <schyler@summation.com>
dc732c4 to
d71a27e
Compare
| rowLimit ??= defaultOptions.rowLimit; | ||
| await this.setup(); | ||
|
|
||
| const statements = sql.split('-- hack: split on this'); |
There was a problem hiding this comment.
Can you remove the -- hack: split on this delimiter. I believe this isn’t a valid or stable way to split SQL statements and GizmoSQL doesn’t define or rely on this marker anywhere
Comment based splitting is risky
There was a problem hiding this comment.
Can do but fyi this was borrowed from duckdb_*.ts implementations
There was a problem hiding this comment.
Ah I see. Then it's fine :)
| const pythonPath = this.config.pythonPath || 'python3'; | ||
| const bridgeScript = join(__dirname, '../python/gizmosql_bridge.py'); | ||
|
|
||
| return new Promise<Table>((resolve, reject) => { |
There was a problem hiding this comment.
I believe that the Gizmo has no default query timeout. Can you consider adding a timeout to prevent hanging on long-running queries?
Gizmo recently added a timeout arg (https://www.linkedin.com/posts/gizmodata_release-v1129-gizmodatagizmosql-activity-7394010812920582145-EIwM/)
| stdout += data.toString(); | ||
| }); | ||
|
|
||
| python.stderr.on('data', (data) => { |
There was a problem hiding this comment.
Are only the errors written here? Or even the warnings too?
If there's a chance of warnings being written into stdder then what about the scenario when python may write warnings to stderr even on success?
| const uri = process.env['GIZMOSQL_URI']; | ||
| const username = process.env['GIZMOSQL_USERNAME']; | ||
| const password = process.env['GIZMOSQL_PASSWORD']; | ||
| const catalog = process.env['GIZMOSQL_CATALOG'] || 'main'; |
There was a problem hiding this comment.
Can u remove 'main' from here and update the Error message below?
There was a problem hiding this comment.
I created an account and was able to run the follwoing query: SELECT * FROM customer LIMIT 10;
Here's what I found in the Network Tab:
{
"action": "query",
"sql": "SELECT * FROM customer LIMIT 10;",
"parameters": []
}
I didn't have any catalog specified. I believe catalog is optional
Here's another query:
{
"action": "query",
"sql": "\n SELECT column_name, data_type, is_nullable\n FROM information_schema.columns\n WHERE table_schema = ?\n AND table_name = ?\n ORDER BY ordinal_position\n ",
"parameters": [
"main",
"nation"
]
}
|
|
||
| it('should execute query with multiple rows', async () => { | ||
| const result = await connection.runSQL( | ||
| 'SELECT * FROM (VALUES (1, \'a\'), (2, \'b\'), (3, \'c\')) AS t(id, name)' |
| this.uri = options.gizmosqlUri; | ||
| this.username = options.gizmosqlUsername; | ||
| this.password = options.gizmosqlPassword; | ||
| this.catalog = options.gizmosqlCatalog || 'main'; |
Summary
Adds GizmoSQL connection support to the
@malloydata/db-duckdbpackage, following the established pattern for DuckDB variants (similar to MotherDuck and WASM).Key changes:
GizmoSQLConnectionclass extendingDuckDBCommonArchitecture
GizmoSQL is DuckDB served over Arrow Flight SQL with the same SQL dialect. Following the project's convention, this implementation belongs in
db-duckdbalongside other DuckDB transport mechanisms:Implementation Details
Connection Layer:
DuckDBCommonfor shared dialect and schema introspectionSecurity:
Dependencies:
pyarrow,adbc-driver-flightsqlapache-arrow(already a dependency)Testing
Connection tested successfully with: