Important

KDB-X General Availability (GA) Now Live

The KDB-X Public Preview period has ended. Please note that this Public Preview website is no longer updated. Visit the new KDB-X GA site for the latest documentation, downloads, and updates:

Go to the KDB-X GA site

Parquet Overview in KDB-X

This page provides a high-level overview of Parquet support in KDB-X. It explains why Parquet is valuable within KDB-X, outlines the main benefits, and describes common scenarios where it applies. Use this page as a starting point before diving into detailed concepts, architecture, and limitations in the Introduction.

Parquet is a columnar storage format designed for efficient storage and retrieval. KDB-X supports reading and writing Parquet files through the pq module, making it easy to query large datasets and interoperate with other data platforms.

Why Parquet with KDB-X?

  • Fast analytics at scale: Query large Parquet datasets efficiently with row group pruning and virtual tables.
  • Interoperability: Exchange data seamlessly with ecosystems like Spark, Pandas, Hive, or Arrow.
  • Reduced storage costs: Take advantage of columnar compression (for example, snappy, zstd) while keeping data queryable.
  • Seamless integration: Use q or SQL queries directly on Parquet files, alongside in-memory or partitioned tables.

Use cases

  • Data interchange: Share Parquet datasets between KDB-X and tools like Spark, Pandas, and Hive.
  • Efficient analytics: Run SQL or q queries directly against Parquet files with row group pruning.
  • Archival storage: Keep large historical datasets compressed but queryable.
  • Hybrid queries: Join or aggregate across in-memory tables and Parquet-backed virtual tables in one query.

Next steps

Check out the Quickstart guide for more details on how to get started with Kurl: