JayDeBeApiArrow¶

A high-performance Python DB-API 2.0 bridge to databases using Java JDBC drivers, accelerated with Apache Arrow.

JayDeBeApiArrow converts JDBC result sets to Arrow record batches directly within the JVM, then streams them to Python via the Arrow C Data Interface - achieving up to 24x speedup over the original jaydebeapi for large datasets.

This is a fork of JayDeBeApi.

Quick Start¶

pip install JayDeBeApiArrow

import jaydebeapiarrow

conn = jaydebeapiarrow.connect(
    "org.postgresql.Driver",
    "jdbc:postgresql://localhost:5432/mydb",
    ["user", "password"],
    "/path/to/pgjdbc.jar"
)

with conn.cursor() as curs:
    curs.execute("SELECT * FROM large_table")
    table = curs.fetch_arrow_table()  # zero-copy Arrow table
    df = table.to_pandas()            # or convert to pandas

conn.close()

Key Features¶

DB-API 2.0 compliant - drop-in replacement for any jaydebeapi-based code
Apache Arrow fast path - fetch_arrow_table(), fetch_arrow_batches(), fetch_df()
Native Python types - datetime, Decimal, bytes instead of strings
Works with any JDBC driver - PostgreSQL, MySQL, Oracle, SQL Server, SQLite, DB2, Teradata, and more

Sections¶

Design - architecture and data flow
Usage - connection, cursors, Arrow API, parameter binding
Data Mapping - JDBC type mappings and known limitations
Benchmarks - performance results and methodology
Differences - changes from the parent JayDeBeApi project