Important
KDB-X General Availability (GA) Now Live
The KDB-X Public Preview period has ended. Please note that this Public Preview website is no longer updated. Visit the new KDB-X GA site for the latest documentation, downloads, and updates:
Go to the KDB-X GA site
Parquet Overview in KDB-X
This page provides a high-level overview of Parquet support in KDB-X. It explains why Parquet is valuable within KDB-X, outlines the main benefits, and describes common scenarios where it applies. Use this page as a starting point before diving into detailed concepts, architecture, and limitations in the Introduction.
Parquet is a columnar storage format designed for efficient storage and retrieval.
KDB-X supports reading and writing Parquet files through the pq module,
making it easy to query large datasets and interoperate with other data platforms.
Why Parquet with KDB-X?
- Fast analytics at scale: Query large Parquet datasets efficiently with row group pruning and virtual tables.
- Interoperability: Exchange data seamlessly with ecosystems like Spark, Pandas, Hive, or Arrow.
- Reduced storage costs: Take advantage of columnar compression (for example,
snappy,zstd) while keeping data queryable. - Seamless integration: Use q or SQL queries directly on Parquet files, alongside in-memory or partitioned tables.
Use cases
- Data interchange: Share Parquet datasets between KDB-X and tools like Spark, Pandas, and Hive.
- Efficient analytics: Run SQL or q queries directly against Parquet files with row group pruning.
- Archival storage: Keep large historical datasets compressed but queryable.
- Hybrid queries: Join or aggregate across in-memory tables and Parquet-backed virtual tables in one query.
Next steps
Check out the Quickstart guide for more details on how to get started with Kurl: