Querying data using the query API with PyKX
This page explains how to query your data with PyKX using the query API.
To begin, we use the following dataset for the rest of this page.
Create a sample table:
Python
>>> import pykx as kx
>>> kx.random.seed(42)
>>> trades = kx.Table(data={
'sym': kx.random.random(100, ['AAPL', 'GOOG', 'MSFT']),
'date': kx.random.random(100, kx.q('2022.01.01') + [0,1,2]),
'price': kx.random.random(100, 1000.0),
'size': kx.random.random(100, 100)
})
# Store the same table in q memory space to be able to demo queries on q variables
>>> kx.q['trades'] = trades
Query basics
The PyKX query API provides a Pythonic way to query kdb+ tables. This API builds qSQL queries in their functional form allowing you to query in-memory and on-disk data.
In the following sections we introduce the functions, their arguments and how you can use them to perform queries of increasing complexity.
Query functions
Query functions describe the operations against in-memory and on-disk data which allow users to retrieve, update or delete data from these tables. Not all operations are supported against all table types, as such the following table provides a guide:
Function |
pykx.Table |
pykx.KeyedTable |
pykx.SplayedTable |
pykx.PartitionedTable |
---|---|---|---|---|
select | ||||
exec | ||||
update | ||||
delete |
For pykx.SplayedTable
objects the denoted operations indicate that while applied queries will return a pykx.Table
object the on-disk data will not be modified directly.
select()
select() builds on qSQL select.
Select should be used to query/filter data returning a pykx.Table or pykx.KeyedTable.
Python
table.select(columns=None, where=None, by=None, inplace=False)
exec()
Exec is used to query tables but unlike Select it does not return tables. Instead this query type will return pykx.Vector, pykx.Atom, or pykx.Dictionary will be returned depending on the query parameters.
For example if querying for data in a single column a vector will be returned, multiple columns will result in a dictionary mapping column name to value and when performing aggregations on a single column you may return an atom.
Python
table.exec(columns=None, where=None, by=None, inplace=False)
update()
update() builds on qSQL update.
Update returns the modified pykx.Table or pykx.KeyedTable.
Python
table.update(columns=None, where=None, by=None, inplace=False)
delete()
delete() builds on qSQL delete.
Delete returns the modified pykx.Table or pykx.KeyedTable.
Python
table.delete(columns=None, where=None, by=None, inplace=False)
Note
The following sections make use of kx.Column
objects which are only enabled in PyKX licensed mode. For unlicensed query examples using str
objects see the query API page.
Query arguments
Querying data using this API refers to the four functions outlined above each which can take as arguments the following keyword parameters:
-
columns
-
where
-
by
* -
inplace
Outlined below these, arguments allow you to manipulate your data to filter for/update specific columns/rows in the case of a where clause, apply some analytics in the case of a columns clause or group data based on supplied conditions when discussing a by clause.
Note
The by
clause is not supported when used with the delete
query type
columns
The columns
keyword provides the ability to access columnar data by name or apply analytics to the content of columns. In the following examples we will use various combinations of the columns
keyword with select
, exec
, update
and delete
operations.
columns
can be passed a single column name without where conditions to retrieve or modify the content of that column:
select
delete
exec
update
Python
>>> trades.select(columns=kx.Column('sym'))
pykx.Table(pykx.q('
sym
----
AAPL
MSFT
MSFT
GOOG
AAPL
..
'))
>>> trades.select(columns=2 * kx.Column('price'))
pykx.Table(pykx.q('
price
--------
291.2518
1067.837
34.35393
1832.257
280.0766
..
'))
>>> trades.select(columns=kx.Column('price').max())
pykx.Table(pykx.q('
price
--------
989.3873
'))
Python
>>> trades.delete(columns=kx.Column('date'))
pykx.Table(pykx.q('
sym price
-------------
AAPL 145.6259
MSFT 533.9187
MSFT 17.17696
GOOG 916.1286
AAPL 140.0383
..
'))
Python
>>> trades.exec(columns=kx.Column('price'))
pykx.FloatVector(pykx.q('145.6259 533.91..'))
>>> trades.exec(columns=kx.Column('price').max())
pykx.FloatAtom(pykx.q('989.3873'))
>>> trades.exec(columns=2 * kx.Column('price'))
pykx.FloatVector(pykx.q('291.2518 1067.83..'))
Python
>>> trades.update(columns=(kx.Column('price') * 2).name('dpx'))
pykx.Table(pykx.q('
sym date price size dpx
--------------------------------------
AAPL 2022.01.01 145.6259 19 291.2518
MSFT 2022.01.02 533.9187 92 1067.837
MSFT 2022.01.02 17.17696 7 34.35393
GOOG 2022.01.03 916.1286 60 1832.257
AAPL 2022.01.02 140.0383 54 280.0766
..
'))
>>> trades.update(columns=kx.Column('dpx', value=kx.Column('price') * 2))
pykx.Table(pykx.q('
sym date price size dpx
--------------------------------------
AAPL 2022.01.01 145.6259 19 291.2518
MSFT 2022.01.02 533.9187 92 1067.837
MSFT 2022.01.02 17.17696 7 34.35393
GOOG 2022.01.03 916.1286 60 1832.257
AAPL 2022.01.02 140.0383 54 280.0766
..
'))
>>> trades.update(columns=kx.Column('price', name='dpx') * 2)
pykx.Table(pykx.q('
sym date price size dpx
--------------------------------------
AAPL 2022.01.01 145.6259 19 291.2518
MSFT 2022.01.02 533.9187 92 1067.837
MSFT 2022.01.02 17.17696 7 34.35393
GOOG 2022.01.03 916.1286 60 1832.257
AAPL 2022.01.02 140.0383 54 280.0766
..
'))
Multiple columns can be modified, retrieved or aggregations applied by using queries can be returned and have aggregations/operation performed on them.
select
delete
exec
Python
>>> trades.select(columns=kx.Column('date') & kx.Column('sym'))
pykx.Table(pykx.q('
date sym
---------------
2022.01.01 AAPL
2022.01.02 MSFT
2022.01.02 MSFT
2022.01.03 GOOG
2022.01.02 AAPL
..
'))
>>> trades.select(columns=kx.Column('price').neg() & kx.Column('date') + 1)
pykx.Table(pykx.q('
price date
--------------------
-145.6259 2022.01.02
-533.9187 2022.01.03
-17.17696 2022.01.03
-916.1286 2022.01.04
-140.0383 2022.01.03
..
'))
>>> trades.select(columns=kx.Column('price').last() & kx.Column('date').last())
pykx.Table(pykx.q('
price date
-------------------
975.5566 2022.01.01
'))
Python
>>> trades.delete(columns=kx.Column('date') & kx.Column('sym'))
pykx.Table(pykx.q('
price size
-------------
145.6259 19
533.9187 92
17.17696 7
916.1286 60
140.0383 54
..
'))
'))
Python
>>> trades.exec(columns=kx.Column('date') & kx.Column('price'))
pykx.Dictionary(pykx.q('
date | 2022.01.01 2022.01.02 2020.0..
price| 145.6259 533.9187 17.176..
'))
Columns can be named/renamed by using the name
method or keyword.
select
exec
update
Python
>>> trades.select(columns=kx.Column('price').max().name('maxPrice'))
pykx.Table(pykx.q('
maxPrice
--------
989.3873
'))
>>> trades.select(columns=kx.Column('price', name='maxPrice').max())
pykx.Table(pykx.q('
maxPrice
--------
989.3873
'))
Python
>>> trades.exec(columns=(2 * kx.Column('price')).name('multiPrice') &
... kx.Column('sym').name('symName'))
pykx.Dictionary(pykx.q('
multiPrice| 291.2518 1067.837 34.35..
symName | AAPL MSFT MSFT ..
'))
>>> trades.exec(columns=(2 * kx.Column('price', name='multiPrice') &
... kx.Column('sym', name='symName'))
pykx.Dictionary(pykx.q('
multiPrice| 291.2518 1067.837 34.35..
symName | AAPL MSFT MSFT ..
'))
In the case of update renaming a column will add a new column with the associated name
Python
>>> trades.update(columns=kx.Column('price').name('priceCol'))
pykx.Table(pykx.q('
sym date price size priceCol
--------------------------------------
AAPL 2022.01.01 145.6259 19 145.6259
MSFT 2022.01.02 533.9187 92 533.9187
MSFT 2022.01.02 17.17696 7 17.17696
GOOG 2022.01.03 916.1286 60 916.1286
AAPL 2022.01.02 140.0383 54 140.0383
..
'))
>>> trades.update(columns=kx.Column('price', name='priceCol'))
pykx.Table(pykx.q('
sym date price size priceCol
--------------------------------------
AAPL 2022.01.01 145.6259 19 145.6259
MSFT 2022.01.02 533.9187 92 533.9187
MSFT 2022.01.02 17.17696 7 17.17696
GOOG 2022.01.03 916.1286 60 916.1286
AAPL 2022.01.02 140.0383 54 140.0383
..
'))
As an alternative approach for renaming, a dictionary can be used to control names of returned columns.
Python
>>> trades.select(columns={'maxPrice':kx.Column('price').max()})
pykx.Table(pykx.q('
maxPrice
--------
993.6284
'))
You can also use the value
keyword to pass data when generating a new column. For example
Python
>>> trades.update(kx.Column('newCol', value = kx.random.random(100, 10.0))).head(5)
pykx.Table(pykx.q('
sym date price size newCol
---------------------------------------
AAPL 2022.01.01 145.6259 19 3.489322
MSFT 2022.01.02 533.9187 92 4.731594
MSFT 2022.01.02 17.17696 7 8.769994
GOOG 2022.01.03 916.1286 60 0.3928818
AAPL 2022.01.02 140.0383 54 4.937273
'))
where
The where phrase allows you to filter data to retrieve, update, delete or apply functions on rows of a table which meet the specified conditions
By default this parameter has a value None
which is equivalent to not filtering the data. This parameter is supported for all query types.
Filter data meeting a specified criteria on one column
select
delete
update
exec
Python
>>> trades.select(where=kx.Column('price') > 500)
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.02 533.9187 92
GOOG 2022.01.03 916.1286 60
AAPL 2022.01.02 876.0921 37
AAPL 2022.01.03 952.2597 53
MSFT 2022.01.02 603.3717 6
..
'))
>>> trades.select(where=kx.Column('price') < kx.Column('size'))
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.03 46.11964 93
GOOG 2022.01.02 16.11913 81
AAPL 2022.01.03 28.98133 97
AAPL 2022.01.02 44.09906 91
GOOG 2022.01.01 12.58364 33
'))
>>> trades.select(where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.01 989.3873 42
'))
Python
>>> trades.delete(where=kx.Column('price') > 500)
pykx.Table(pykx.q('
sym date price size
-----------------------------
AAPL 2022.01.01 145.6259 19
MSFT 2022.01.02 17.17696 7
AAPL 2022.01.02 140.0383 54
MSFT 2022.01.03 282.4291 98
MSFT 2022.01.03 46.11964 93
..
'))
>>> trades.delete(where=kx.Column('price') > kx.Column('size'))
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.03 46.11964 93
GOOG 2022.01.02 16.11913 81
AAPL 2022.01.03 28.98133 97
AAPL 2022.01.02 44.09906 91
GOOG 2022.01.01 12.58364 33
'))
Python
>>> trades.update(columns = 2 * kx.Column('price'),
... where=kx.Column('price') > 500)
pykx.Table(pykx.q('
sym date price size
-----------------------------
AAPL 2022.01.01 145.6259 19
MSFT 2022.01.02 1067.837 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 1832.257 60
AAPL 2022.01.02 140.0383 54
..
'))
Python
>>> trades.exec(columns = kx.Column('size'), where = kx.Column('price') > 900)
pykx.LongVector(pykx.q('60 53 61 41 98 12 41 12 23 42 18 76 73 55'))
Using &
or passing a list of pykx.Column
objects will allow multiple filters to be passed
select
delete
update
exec
Python
>>> trades.select(where=(kx.Column('sym') == 'GOOG') & (kx.Column('date') == datetime.date(2022, 1, 1)))
pykx.Table(pykx.q('
sym date price
------------------------
GOOG 2022.01.01 480.9078
GOOG 2022.01.01 454.5668
GOOG 2022.01.01 790.2208
GOOG 2022.01.01 296.6022
GOOG 2022.01.01 727.6113
..
'))
>>> trades.select(where=[
... kx.Column('sym') == 'GOOG',
... kx.Column('date') == datetime.date(2022, 1, 1)
... ])
>>> from datetime import date
>>> trades.select(columns=kx.Column('price').wavg(kx.Column('size')),
... where=(kx.Column('sym') == 'GOOG') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
price
-------
44.7002
'))
Python
>>> from datetime import date
>>> trades.delete(where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.02 533.9187 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 916.1286 60
AAPL 2022.01.02 140.0383 54
MSFT 2022.01.03 282.4291 98
..
'))
Python
>>> from datetime import date
>>> trades.update(
... columns=2*kx.Column('price'),
... where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
sym date price size
-----------------------------
AAPL 2022.01.01 291.2518 19
MSFT 2022.01.02 533.9187 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 916.1286 60
AAPL 2022.01.02 140.0383 54
..
'))
Python
>>> from datetime import date
>>> trades.exec(
... columns=kx.Column('price') & kx.Column('date'),
... where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Dictionary(pykx.q('
price| 145.6259 636.4009 8..
date | 2022.01.01 2022.01.01 2..
'))
by
The by phrase allows you to apply aggregations or manipulate data grouping the data by
specific conditions.
By default this parameter has a value None
which is equivalent to not grouping your data. This parameter is supported for select
, exec
and update
type queries.
When both a columns
and by
clause are passed to a select query without use of an aggregation function then each row contains vectors of data related to the by columns.
Python
>>> trades.select(columns=kx.Column('price'), by=kx.Column('date') & kx.Column('sym'))
pykx.KeyedTable(pykx.q('
date sym | price ..
---------------| ------------------------------------------------------------..
2022.01.01 AAPL| 131.6095 236.3145 140.4332 839.3869 843.3531 641.2171 104.81..
2022.01.01 GOOG| 480.9078 454.5668 790.2208 296.6022 727.6113 341.9665 609.77..
2022.01.01 MSFT| 556.9152 755.6175 865.9657 714.9804 179.5444 149.734 67.0821..
2022.01.02 AAPL| 441.8975 379.1373 659.8286 531.1731 975.3188 613.6512 603.99..
2022.01.02 GOOG| 446.898 664.8273 648.3929 240.1062 119.6 774.3718 449.4149 8..
2022.01.02 MSFT| 699.0336 387.7172 588.2985 725.8795 842.5805 646.37 593.7708..
2022.01.03 AAPL| 793.2503 621.7243 570.4403 626.2866 263.992 153.475 123.7397..
2022.01.03 GOOG| 586.263 777.3633 834.1404 906.9809 617.6205 179.6328 100.041..
2022.01.03 MSFT| 633.3324 39.47309 682.9453 867.1843 483.0873 851.2139 318.93..
'))
Adding an aggregation function allows this aggregation to be run on a column within the by phrase.
Python
>>> trades.select(columns=kx.Column('price').max(), by=kx.Column('date') & kx.Column('sym'))
pykx.KeyedTable(pykx.q('
date sym | price
---------------| --------
2022.01.01 AAPL| 843.3531
2022.01.01 GOOG| 790.2208
2022.01.01 MSFT| 865.9657
2022.01.02 AAPL| 975.3188
2022.01.02 GOOG| 886.0093
2022.01.02 MSFT| 993.6284
2022.01.03 AAPL| 843.9354
2022.01.03 GOOG| 914.6929
2022.01.03 MSFT| 867.1843
'))
Using a by
clause within an update allows you to modify the values of the table conditionally based on your grouped criteria, for example:
>>> trades.update(columns=kx.Column('price').wavg(kx.Column('size')).name('vwap'),
... by=kx.Column('sym'))
pykx.Table(pykx.q('
sym date price size vwap
--------------------------------------
AAPL 2022.01.01 145.6259 19 56.09317
MSFT 2022.01.02 533.9187 92 40.46716
MSFT 2022.01.02 17.17696 7 40.46716
GOOG 2022.01.03 916.1286 60 52.721
AAPL 2022.01.02 140.0383 54 56.09317
..
'))
Note
Using by
without an associated columns
clause will return the last row in the table for each column in the by
phrase.
Python
>>> trades.select(by=kx.Column('sym'))
pykx.KeyedTable(pykx.q('
sym | date price
----| -------------------
AAPL| 2022.01.02 955.4843
GOOG| 2022.01.02 886.0093
MSFT| 2022.01.01 719.9879
'))
inplace
The inplace
keyword provides the ability for a user to overwrite the representation of the object which they are querying.
This functionality is set to False
by default but will operate effectively on in-memory table objects for the select
, update
and delete
query types.
If set to True
the input table can be overwritten as follows
Python
>>> trades.delete(where=kx.Column('sym').isin(['AAPL']), inplace=True)
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.02 533.9187 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 916.1286 60
MSFT 2022.01.03 282.4291 98
MSFT 2022.01.03 46.11964 93
..
'))
>>> trades
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.02 533.9187 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 916.1286 60
MSFT 2022.01.03 282.4291 98
MSFT 2022.01.03 46.11964 93
..
'))
Query types
While this page discusses primarily the Pythonic API for querying kdb+ tables locally. The following describes some of the other ways that queries can be completed
Local queries
qSQL equivalent query for comparison:
Python
>>> kx.q('select from trades where price=max price')
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 983.0794
'))
Access query API off the table object:
Python
>>> trades.select(where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 983.0794
'))
Direct use of the kx.q.qsql
query APIs taking the table as a parameter:
Python
>>> kx.q.qsql.select(trades, where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 983.0794
'))
Passing a string will query the table of that name in q memory:
Python
>>> kx.q.qsql.select('trades', where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 983.0794
'))
Remote Queries
Queries can also be performed over IPC to remote servers.
Python
>>> conn = kx.SyncQConnection(port = 5000)
>>> conn.qsql.select('trades', where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 983.0794
'))
Query classes
Column
See pykx.Column for full documentation on this class.
And operator &
Using &
on two Column
objects will return a QueryPhrase
which describes the underlying construct which is used to query your table.
Python
>>> qp =(kx.Column('sym') == 'GOOG') & (kx.Column('price') > 500)
>>> type(qp)
<class 'pykx.wrappers.QueryPhrase'>
>>> qp._phrase
[[pykx.Operator(pykx.q('=')), 'sym', [pykx.SymbolAtom(pykx.q('`GOOG'))]], [pykx.Operator(pykx.q('>')), 'price', pykx.LongAtom(pykx.q('500'))]]
>>> trades.select(where=qp)
pykx.Table(pykx.q('
sym date price
------------------------
GOOG 2022.01.03 976.1246
GOOG 2022.01.02 716.2858
GOOG 2022.01.03 872.5027
GOOG 2022.01.02 962.5156
GOOG 2022.01.01 589.7202
..
'))
Additional Column
objects can &
off a QueryPhrase
to further build up more complex queries.
Or operator |
Using |
on two Column
objects will return a Column
object.
Python
>>> c =(kx.Column('price') < 100) | (kx.Column('price') > 500)
>>> type(c)
<class 'pykx.wrappers.Column'>
>>> c._value
[pykx.Operator(pykx.q('|')), [pykx.Operator(pykx.q('<')), 'price', pykx.LongAtom(pykx.q('100'))], [pykx.Operator(pykx.q('>')), 'price', pykx.LongAtom(pykx.q('500'))]]
>>> trades.select(where=c)
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 542.6371
AAPL 2022.01.01 77.57332
MSFT 2022.01.01 637.4637
GOOG 2022.01.03 976.1246
MSFT 2022.01.03 539.6816
..
'))
Note
Column
objects can not apply |
off a QueryPhrase
. Presently these are restricted only to operations on two kx.Column
objects.
Python operators
The following Python operators can be used with the Column class to perform analysis on your data
Python operator |
q operation |
Magic method |
---|---|---|
+
|
+
|
__add__
|
-
|
-
|
__sub__
|
-
|
-
|
__rsub__
|
*
|
*
|
__mul__
|
/
|
%
|
__truediv__
|
/
|
%
|
__rtruediv__
|
//
|
div
|
__floordiv__
|
//
|
div
|
__rfloordiv__
|
%
|
mod
|
___mod__
|
**
|
xexp
|
__pow__
|
==
|
=
|
__eq__
|
!=
|
<>
|
__ne__
|
>
|
>
|
__gt__
|
>=
|
>=
|
__ge__
|
<
|
<
|
__lt__
|
<=
|
<=
|
__le__
|
pos
|
abs
|
__pos__
|
neg
|
neg
|
__neg__
|
floor
|
floor
|
__floor__
|
ceil
|
ceiling
|
__ceil__
|
abs
|
abs
|
__abs__
|
The following are a few examples of this various operations in use
-
Finding rows where price is greater than or equal to half the average price:
Python
Copy>>> trades.select(where=kx.Column('price') >= kx.Column('price').avg() / 2)
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.01 542.6371
MSFT 2022.01.01 637.4637
GOOG 2022.01.03 976.1246
MSFT 2022.01.03 539.6816
GOOG 2022.01.02 716.2858
..
')) -
Apply the math libraries floor operation on the column price updating its value:
Python
Copy>>> from math import floor
>>> trades.update(floor(kx.Column('price')))
pykx.Table(pykx.q('
sym date price size
--------------------------
AAPL 2022.01.01 145 19
MSFT 2022.01.02 533 92
MSFT 2022.01.02 17 7
GOOG 2022.01.03 916 60
AAPL 2022.01.02 140 54
..
'))
PyKX methods
In addition to support for the Python operators outlined above PyKX provides a number of analytic methods and properties for the kx.Column
objects. In total there are more than 100 analytic methods supported ranging from a basic method to retrieve the maximum value of a column, to more complex analytics for the calculation of the weighted average between two vectors.
The following drop-down provides a list of the supported methods, with full details on the API page here.
Note
Supported methods
abs
,
acos
,
asc
,
asin
,
atan
,
avg
,
avgs
,
ceiling
,
cor
,
cos
,
count
,
cov
,
cross
,
deltas
,
desc
,
dev
,
differ
,
distinct
,
div
,
exp
,
fills
,
first
,
floor
,
null
,
iasc
,
idesc
,
inter
,
isin
,
last
,
like
,
log
,
lower
,
ltrim
,
mavg
,
max
,
maxs
,
mcount
,
md5
,
mdev
,
med
,
min
,
mins
,
mmax
,
mmin
,
mod
,
msum
,
neg
,
prd
,
prds
,
prev
,
rank
,
ratios
,
reciprocal
,
reverse
,
rotate
,
rtrim
,
scov
,
sdev
,
signum
,
sin
,
sqrt
,
string
,
sum
,
sums
,
svar
,
tan
,
trim
,
union
,
upper
,
var
,
wavg
,
within
,
wsum
,
xbar
,
xexp
,
xlog
,
xprev
,
hour
,
minute
,
date
,
year
,
month
,
day
,
second
,
add
,
name
,
average
,
cast
,
correlation
,
covariance
,
divide
,
drop
,
fill
,
index_sort
,
join
,
len
,
modulus
,
multiply
,
next_item
,
previous_item
,
product
,
products
,
sort
,
subract
,
take
,
value
and
variance
.
The following provides a complex example of a user generated query to calculate trade statistics and time-weighted average spread information associated with a Trade and Quote tables making use of the following methods.
Python
def generate_twap(trade, quote, start_time, end_time, syms = None):
if syms is None:
syms = trade.exec(kx.Column('sym').distinct())
quote_metrics = quote.select(
columns = (kx.Column('ask') - kx.Column('bid')).avg().name('avg_spread') &
(kx.Column('time').next_item() - kx.Column('time')).wavg(kx.Column('ask') - kx.Column('bid')).name('twa_spread') &
((kx.Column('asize') + kx.Column('bsize')).avg().name('avg_size') * 0.5) &
(kx.Column('time').next_item() - kx.Column('time')).avg().name('avg_duration'),
by = kx.Column('sym'),
where = kx.Column('sym').isin(syms) & kx.Column('time').within(start_time, end_time)
)
trade_metrics = trade.select(
columns = (2 * kx.Column('price').dev()).name('std_dev') &
(kx.Column('time').next_item() - kx.Column('time')).wavg(kx.Column('price')).name('std_dev') &
kx.Column('price').max().name('max_price') &
kx.Column('price').min().name('min_price') &
kx.Column('size').wavg(kx.Column('price')).name('vwap'),
by = kx.Column('sym'),
where = kx.Column('sym').isin(syms) & kx.Column('time').within(start_time, end_time)
)
return kx.q.uj(quote_metrics, trade_metrics)
Variable
See pykx.Variable for full documentation on this class.
In some cases when operating at the interface of q and Python analytics you may wish to perform a comparison or analytic which makes use of a named variable from q
.
The following example shows this in action
Python
>>> kx.q['filter']='GOOG'
>>> trades.select(where=kx.Column('sym') == kx.Variable('filter'))
pykx.Table(pykx.q('
sym date price
------------------------
GOOG 2022.01.03 976.1246
GOOG 2022.01.02 716.2858
GOOG 2022.01.03 872.5027
GOOG 2022.01.02 962.5156
GOOG 2022.01.01 589.7202
..
'))
Advanced features
Custom functions
While there is an extensive list of functions/analytics that are supported by the API it does not cover all analytics that you, or users of an extension you are writing may need.
To facilitate this you have access to the pykx.register.column_function
, this function provides the ability to define methods off your defined pykx.Column
objects. This function should take the column on which the function is being performed as it's first argument and the call
method should be used to apply your analytic.
The call method takes as it's first argument the function you wish to apply and can take multiple positional arguments.
For example, let's consider the following cases:
-
Define a function applying a min-max scaling against the price column of a table
Python
Copy>>> def min_max_scaler(column):
... return column.call('{(x-minData)%max[x]-minData:min x}')
>>> kx.register.column_function('minmax', min_max_scaler)
>>> trades.update(kx.Column('price').minmax().name('scaled_price'))
pykx.Table(pykx.q('
sym date price size scaled_price
------------------------------------------
MSFT 2022.01.02 533.9187 92 0.5337153
MSFT 2022.01.02 17.17696 7 0.004702399
GOOG 2022.01.03 916.1286 60 0.9250016
MSFT 2022.01.03 282.4291 98 0.2762535
MSFT 2022.01.03 46.11964 93 0.03433238
..
')) -
Define a function which multiplies two columns together and calculates the log returning the result
Python
Copy>>> def log_multiply(column1, column2):
... return column1.call('{log x*y}', column2)
>>> kx.register.column_function('log_multiply', log_multiply)
>>> trades.select(kx.Column('price').log_multiply(kx.Column('size')))
pykx.Table(pykx.q('
price
--------
10.80203
4.789479
10.9145
10.22839
8.363838
..
'))
fby queries
Complex queries often require the application of a function on data grouped by some condition, in many cases the application of a by
clause is sufficient to get the information you need, however you will run into cases where you need to filter-by a certain condition.
Let's assume you want to find the stock information by symbol, where the price is the maximum price:
Python
>>> trades.select(where=kx.Column('price') == kx.Column.fby(kx.Column('sym'), kx.q.max, kx.Column('price')))
pykx.Table(pykx.q('
sym date price size
-----------------------------
MSFT 2022.01.03 977.1655 92
AAPL 2022.01.02 996.8898 20
GOOG 2022.01.03 971.9498 47
'))
Using iterators
Not all analytics that you may wish to run on your table expect to take the full content of a column(s) as input, for example in some cases you may wish to apply an analytic on each row of a column. While operations which rely on iterators may be slower than purely vectorized, operations they may be necessary.
PyKX supports the following iterators, a number of examples are provided below
Iterator |
Type |
Link |
---|---|---|
each
|
map | Each |
peach
|
map | Peach |
\:
|
map | Each Left |
/:
|
map | Each Right |
\:/:
|
map | Each Left-Each Right |
/:\:
|
map | Each Right-Each Left |
'
|
map | Case |
':
|
map | Each Prior |
/
|
accumulator | Over |
\
|
accumulator | Scan |
Example 1
Calculate the maximum value of each row of a column x:
Python
>>> table = kx.Table(data={'x': [[10, 5, 4], [20, 30, 50], [1, 2, 3]]})
pykx.Table(pykx.q('
x
--------
10 5 4
20 30 50
1 2 3
'))
>>> table.select(kx.Column('x').max(iterator='each'))
pykx.Table(pykx.q('
x
--
10
50
3
'))
Example 2
Join the characters associated from two columns row wise using the ' iterator:
Python
>>> table = kx.Table(data={'x': b'abc', 'y': b'def'})
pykx.Table(pykx.q('
x y
---
a d
b e
c f
'))
>>> table.select(kx.Column('x').join(kx.Column('y'), iterator="'"))
pykx.Table(pykx.q('
x
----
"ad"
"be"
"cf"
'))
Example 3
Join the characters "_xy" to all rows in a column x:
Python
>>> table = kx.Table(data={'x': b'abc', 'y': b'def'})
pykx.Table(pykx.q('
x y
---
a d
b e
c f
'))
>>> table.select(kx.Column('x').join(b"_xy", iterator='\\:'))
pykx.Table(pykx.q('
x
------
"a_xy"
"b_xy"
"c_xy"
'))
Next steps
After learning to query data with the Pythonic API, explore other querying methods:
-
Query your data using SQL.
-
If you want to upskill and learn how to query directly using q.
-
Make your queries more performant by following a few guidelines.
For some further reading, here are some related topics:
-
If you don't have a historical database available visit our Databases page.
-
Learn about creating PyKX Table objects from the PyKXIntroduction notebook.