Databases in PyKX
This page explains the concept of databases in PyKX, including the creation and management of databases.
What's a PyKX database?
In PyKX, the term database refers to a kdb+ database which can hold a set of splayed and partitioned tables.
Splayed tables
A splayed kdb+ database consists of a single table stored on-disk with each column stored as a separate file rather than using a single file for the whole table. Tables of medium-size with < 100 million rows and many columns are good candidates for being stored as splayed tables, in particular when only a small subset of columns are being accessed often.
Bash
quotes
├── .d
├── price
├── sym
└── time
Note
The splayed database format used by PyKX has been used in production environments for decades. As such there is a significant amount of information available on the creation and use of these databases. Below are some articles.
Partitioned Database
A partitioned kdb+ database consists of one or more tables saved on-disk, where they are split into separate folders called partitions. These partitions are most often based on a temporal field within the dataset, such as date or month. Each table within the database must follow the same partition structure.
A visual representation of a database containing 2 tables (trade and quote) partitioned by date would be as follows, where price
, sym
, time
in the quotes folder are columns within the table:
Bash
db
├── 2020.10.04
│ ├── quotes
│ │ ├── .d
│ │ ├── price
│ │ ├── sym
│ │ └── time
│ └── trades
│ ├── .d
│ ├── price
│ ├── sym
│ ├── time
│ └── vol
├── 2020.10.06
│ ├── quotes
..
└── sym
Note
The partitioned database format used by PyKX has been used in production environments for decades in many of the world's best-performing tier-1 investment banks. Today, there is a significant amount of information available on the creation and maintenance of these databases. Below are some articles related to their creation and querying.
How to use databases in PyKX
Creating and managing databases is crucial for handling large amounts of data. The pykx.DB
module helps make these tasks easier, Pythonic, and more user-friendly.
PyKX Database API supports the following operations:
Operation |
Description |
---|---|
Learn how to generate a new historical database using data from Python/q and expand it over time. |
|
Learn how to load existing databases and fix some common issues with databases. |
|
Copy, change datatypes or names of columns, apply functions to columns, delete columns from a table, rename tables and backfill data. |
Check out a full breakdown of the database API.
Next steps
- Learn how to create or update a database.
- Learn how to load a database.