Performance Tuning
This page explains how to efficiently query your data using PyKX.
qSQL
The whitepapers detailed below outline optimizations which can be useful for qSQL queries. However, the core lessons/concepts which apply in the q query case also apply to using the Pythonic query API and SQL modes:
The following provides a tangible example of two impactful optimizations.
Parameter ordering
Assume we have a historical database generated using the functionality here partitioned on date. The query being performed will query for all data in the database based on date and symbol. The following queries align to those completed here
Optimal query
Non-Optimal query
Python
trade.select(where = (kx.Column('date') == kx.DateAtom(2020, 1, 1)) &
(kx.Column('sym') == 'IBM'))
Python
trade.select(where = (kx.Column('sym') == 'IBM') &
(kx.Column('date') == kx.DateAtom(2020, 1, 1)))
The following shows the scaling of queries based on the number of dates within the database
Python
| sym before date | date before sym
dates in | time size | time size
database | (ms) (b) | (ms) (b)
---------|--------------------|----------------------
1 | 470 75,499,920 | 78 75,499,984
5 | 487 75,878,400 | 78 75,499,984
10 | 931 75,880,624 | 78 75,499,984
15 | 1,209 75,882,912 | 78 75,499,984
20 | 1,438 75,885,072 | 78 75,499,984
Applying attributes
The following shows the performance difference between the application of a grouped-attribute on the sym
column of an in-memory table.
Python
rtquote = quote.select(where = kx.Column('date').isin([kx.DateAtom(2020, 1, 1)]).grouped('sym')
rtquote.select(where = kx.Column('sym') == 'IBM')
The following shows the scaling of queries based on the number of rows on an in-memory table using only the sym column.
Python
| no attribute | grouped attribute
rows in | time size | time size
table | (ms) (b) | (ms) (b)
-------------------------------------------------------
25,000,000 | 119 301,990,304 | 8 2,228,848
50,000,000 | 243 603,980,192 | 10 4,457,072
75,000,000 | 326 1,207,959,968 | 14 8,913,520
100,000,000 | 472 1,207,959,968 | 20 8,913,520
125,000,000 | 582 1,207,959,968 | 26 8,913,520
150,000,000 | 711 2,415,919,520 | 30 17,826,416
175,000,000 | 834 2,415,919,520 | 36 17,826,416
200,000,000 | 931 2,415,919,520 | 40 17,826,416
225,000,000 | 1,049 2,415,919,520 | 46 17,826,416
250,000,000 | 1,167 2,415,919,520 | 50 17,826,416
SQL
To optimize frequently called SQL queries you can make use of the prepare and execute functionality to separate SQL parsing from query execution as detailed here.
Next Steps
-
Learn how to query your data using the PyKXPythonic query API.
-
If you don't have a historical database available visit our Databases page.
-
Learn about creating PyKX Table objects from the PyKXIntroduction notebook.