Databases in PyKX

This page explains the concept of databases in PyKX, including the creation and management of databases.

What's a PyKX database?

In PyKX, the term database refers to a kdb+ database which can hold a set of splayed and partitioned tables.

Splayed tables

A splayed kdb+ database consists of a single table stored on-disk with each column stored as a separate file rather than using a single file for the whole table. Tables of medium-size with < 100 million rows and many columns are good candidates for being stored as splayed tables, in particular when only a small subset of columns are being accessed often.

Bash

Copy
quotes
 ├── .d
 ├── price
 ├── sym
 └── time

Note

The splayed database format used by PyKX has been used in production environments for decades. As such there is a significant amount of information available on the creation and use of these databases. Below are some articles.

Partitioned Database

A partitioned kdb+ database consists of one or more tables saved on-disk, where they are split into separate folders called partitions. These partitions are most often based on a temporal field within the dataset, such as date or month. Each table within the database must follow the same partition structure.

A visual representation of a database containing 2 tables (trade and quote) partitioned by date would be as follows, where price, sym, time in the quotes folder are columns within the table:

Bash

Copy
db
├── 2020.10.04
│   ├── quotes
│   │   ├── .d
│   │   ├── price
│   │   ├── sym
│   │   └── time
│   └── trades
│       ├── .d
│       ├── price
│       ├── sym
│       ├── time
│       └── vol
├── 2020.10.06
│   ├── quotes
..
└── sym

Note

The partitioned database format used by PyKX has been used in production environments for decades in many of the world's best-performing tier-1 investment banks. Today, there is a significant amount of information available on the creation and maintenance of these databases. Below are some articles related to their creation and querying.

How to use databases in PyKX

Creating and managing databases is crucial for handling large amounts of data. The pykx.DB module helps make these tasks easier, Pythonic, and more user-friendly.

PyKX Database API supports the following operations:

Operation

Description

Generate

Learn how to generate a new historical database using data from Python/q and expand it over time.

Load

Learn how to load existing databases and fix some common issues with databases.

Manage

Copy, change datatypes or names of columns, apply functions to columns, delete columns from a table, rename tables and backfill data.

Check out a full breakdown of the database API.

Next steps