Generate and extend a database
This page explains how to create and extend databases using PyKX.
Tip
For the best experience, start by reading about Databases in PyKX. If you already have access to a database and only need to load it, you can skip this page and jump right to Load database.
Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted database A persisted database (or on-disk database) stores data on non-volatile storage like a hard drive or SSD, ensuring the data remains intact even after the application is closed or the system is restarted. In contrast, in-memory databases store data in RAM and lose all data when the system is powered down. Persisted databases are crucial for applications needing long-term data storage and reliability, such as financial systems, customer databases, and many web applications.. In the following sections we complete the following:
- Create a new database containing a single table
trade
and multiple days of data. - Add a new day worth of data for
today
to the database for thetrade
table. - On-board a new table (
quote
) which contains data fromtoday
. - Ensure that the new table is queryable.
Note
The below example makes use of randomly-generated data using PyKX, where we use trade or quote tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.
1. Create database
In the next cell, we create a trade
table with data from multiple days in the chat.
Python
>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
... 'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0)
... })
Now that we have generated our trade table, we can persist it to disk at the location /tmp/db
.
That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:
2. Add new database partition
Note that in comparison to the original database creation logic, we do not have a date
column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:
Generate default partition
Compress data in a partition
Encrypt persisted data
Python
>>> db.create(trade, 'trade', kx.DateAtom('today'))
In the below example, we compress data within the persisted partition using gzip. For further details on supported compression formats see here or look at the API reference here.
Python
>>> db.create(trade, 'trade', kx.DateAtom('today'))
In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see here or look at the API reference here.
Python
>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)
3. Add new table to database
Python
>>> N = 1000000
>>> quote = kx.Table(data={
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'ask': kx.random.random(N, 100),
... 'bid': kx.random.random(N, 100)
... })
4. Ensure new table is queryable
You have now persisted another table to your database, however, you will notice if you access the quote
table that the return is surprising:
Python
>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))
The reason for this is that you currently do not have data in each partition of your database for the quote
table. To rectify this, run the fill_database
method off the database
class which adds relevant empty quote data to tables to the partitions from which it's missing:
Python
>>> db.fill_database()
Now you should be able to access the quote
data for query:
Python
>>> db.quote
Next Steps
- Load an existing database.
- Modify the contents of your database
- Query your database
- Compress/encrypt data for persisting database partitions.