Generate and extend a database

This page explains how to create and extend databases using PyKX.

Tip

For the best experience, start by reading about Databases in PyKX. If you already have access to a database and only need to load it, you can skip this page and jump right to Load database.

Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted databaseClosed A persisted database (or on-disk database) stores data on non-volatile storage like a hard drive or SSD, ensuring the data remains intact even after the application is closed or the system is restarted. In contrast, in-memory databases store data in RAM and lose all data when the system is powered down. Persisted databases are crucial for applications needing long-term data storage and reliability, such as financial systems, customer databases, and many web applications.. In the following sections we complete the following:

  1. Create a new database containing a single table trade and multiple days of data.
  2. Add a new day worth of data for today to the database for the trade table.
  3. On-board a new table (quote) which contains data from today.
  4. Ensure that the new table is queryable.

Note

The below example makes use of randomly-generated data using PyKX, where we use trade or quote tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.

1. Create database

For more information on database structures, see the linked section on what is a database. With PyKX, use the pykx.DB class for all database interactions in Python. This class lets you create, expand, and maintain on-disk splayed/partitioned databases. First, we need to create a database.

In the next cell, we create a trade table with data from multiple days in the chat.

Python

Copy
>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
...     'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     })

Now that we have generated our trade table, we can persist it to disk at the location /tmp/db.

Partitioned database

Splayed database

Python

Copy
>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')

Python

Copy
>>> db = kx.DB(path='/tmp/splay')
>>> db.create(trade, 'trade', format='splayed')>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')

That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:

Partitioned database

Splayed database

Python

Copy
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.PartitionedTable'>

Python

Copy
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.SplayedTable'>

The above database persistence uses the default parameters within the create function. If you need to compress/encrypt the tables persisted to the database or need to define a by or specify the symbol enumeration name when persisting a Partitioned Database, you can follow the API documentation here.

2. Add new database partition

The following section outlines functionality only applicable to Partitioned Databases.

Now that you have generated a database, you can add extra partitions using the same database class and the create function. In this example we will add new data for the current day created in the below cell and convert it to a Pandas DataFrame prior to persistence:

Python

Copy
>>> N = 2000000
>>> trade = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     }).pd()
>>> type(trade)
pandas.core.frame.DataFrame

Note that in comparison to the original database creation logic, we do not have a date column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:

Generate default partition

Compress data in a partition

Encrypt persisted data

Python

Copy
>>> db.create(trade, 'trade', kx.DateAtom('today'))

In the below example, we compress data within the persisted partition using gzip. For further details on supported compression formats see here or look at the API reference here.

Python

Copy
>>> db.create(trade, 'trade', kx.DateAtom('today'))

In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see here or look at the API reference here.

Python

Copy
>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)

3. Add new table to database

After onboarding your first table to a database, a common question is “How can I add a new table of related data?”. You can use the DB class and the create function to do this. For instance, let’s add a quote table for the current day:

Python

Copy
>>> N = 1000000
>>> quote = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'ask': kx.random.random(N, 100),
...     'bid': kx.random.random(N, 100)
... })

We can now add this data to your database.

Partitioned database

Splayed database

For the current day we can add the python quote table and see that the table is defined:

Python

Copy
For the current day we can add the python quote table and see that the table is defined:

>>> db.create(quote, 'quote', kx.DateAtom('today'))
>>> db.tables
['quote', 'trade']
>>> type(db.quote)
<class 'pykx.wrappers.PartitionedTable'>

Add the table quote to the database

Python

Copy
>>> db.create(quote, 'quote', format='splayed')
>>> db.tables
['trade', 'quote']
>>> type(db.quote)
<class 'pykx.wrappers.SplayedTable'>

4. Ensure new table is queryable

The following section outlines a restriction only applicable to Partitioned Databases, Splayed Databases should be queryable immediately.

You have now persisted another table to your database, however, you will notice if you access the quote table that the return is surprising:

Python

Copy
>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))

The reason for this is that you currently do not have data in each partition of your database for the quote table. To rectify this, run the fill_database method off the database class which adds relevant empty quote data to tables to the partitions from which it's missing:

Python

Copy
>>> db.fill_database()

Now you should be able to access the quote data for query:

Python

Copy
>>> db.quote

Next Steps