Querying data using the query API with PyKX

This page explains how to query your data with PyKX using the query API.

To begin, we use the following dataset for the rest of this page.

Create a sample table:

Python

Copy
>>> import pykx as kx
>>> kx.random.seed(42)
>>> trades = kx.Table(data={
        'sym': kx.random.random(100, ['AAPL', 'GOOG', 'MSFT']),
        'date': kx.random.random(100, kx.q('2022.01.01') + [0,1,2]),
        'price': kx.random.random(100, 1000.0),
        'size': kx.random.random(100, 100
    })

# Store the same table in q memory space to be able to demo queries on q variables
>>> kx.q['trades'] = trades

Query basics

The PyKX query API provides a Pythonic way to query kdb+ tables. This API builds qSQL queries in their functional form allowing you to query in-memory and on-disk data.

In the following sections we introduce the functions, their arguments and how you can use them to perform queries of increasing complexity.

Query functions

Query functions describe the operations against in-memory and on-disk data which allow users to retrieve, update or delete data from these tables. Not all operations are supported against all table types, as such the following table provides a guide:

Function

pykx.Table

pykx.KeyedTable

pykx.SplayedTable

pykx.PartitionedTable

select
exec
update
delete

For pykx.SplayedTable objects the denoted operations indicate that while applied queries will return a pykx.Table object the on-disk data will not be modified directly.

select()

select() builds on qSQL select.

Select should be used to query/filter data returning a pykx.Table or pykx.KeyedTable.

Python

Copy
table.select(columns=None, where=None, by=None, inplace=False)

exec()

exec() builds on qSQL exec.

Exec is used to query tables but unlike Select it does not return tables. Instead this query type will return pykx.Vector, pykx.Atom, or pykx.Dictionary will be returned depending on the query parameters.

For example if querying for data in a single column a vector will be returned, multiple columns will result in a dictionary mapping column name to value and when performing aggregations on a single column you may return an atom.

Python

Copy
table.exec(columns=None, where=None, by=None, inplace=False)

update()

update() builds on qSQL update.

Update returns the modified pykx.Table or pykx.KeyedTable.

Python

Copy
table.update(columns=None, where=None, by=None, inplace=False)

delete()

delete() builds on qSQL delete.

Delete returns the modified pykx.Table or pykx.KeyedTable.

Python

Copy
table.delete(columns=None, where=None, by=None, inplace=False)

Note

The following sections make use of kx.Column objects which are only enabled in PyKX licensed mode. For unlicensed query examples using str objects see the query API page.

Query arguments

Querying data using this API refers to the four functions outlined above each which can take as arguments the following keyword parameters:

  • columns

  • where

  • by*

  • inplace

Outlined below these, arguments allow you to manipulate your data to filter for/update specific columns/rows in the case of a where clause, apply some analytics in the case of a columns clause or group data based on supplied conditions when discussing a by clause.

Note

The by clause is not supported when used with the delete query type

columns

The columns keyword provides the ability to access columnar data by name or apply analytics to the content of columns. In the following examples we will use various combinations of the columns keyword with select, exec, update and delete operations.

columns can be passed a single column name without where conditions to retrieve or modify the content of that column:

select

delete

exec

update

Python

Copy
>>> trades.select(columns=kx.Column('sym'))
pykx.Table(pykx.q('
sym 
----
AAPL
MSFT
MSFT
GOOG
AAPL
..
'))
>>> trades.select(columns=2 * kx.Column('price'))
pykx.Table(pykx.q('
price   
--------
291.2518
1067.837
34.35393
1832.257
280.0766
..
'))
>>> trades.select(columns=kx.Column('price').max())
pykx.Table(pykx.q('
price   
--------
989.3873
'))

Python

Copy
>>> trades.delete(columns=kx.Column('date'))
pykx.Table(pykx.q('
sym  price   
-------------
AAPL 145.6259
MSFT 533.9187
MSFT 17.17696
GOOG 916.1286
AAPL 140.0383
..
'))

Python

Copy
>>> trades.exec(columns=kx.Column('price'))
pykx.FloatVector(pykx.q('145.6259 533.91..'))
>>> trades.exec(columns=kx.Column('price').max())
pykx.FloatAtom(pykx.q('989.3873'))
>>> trades.exec(columns=2 * kx.Column('price'))
pykx.FloatVector(pykx.q('291.2518 1067.83..'))

Python

Copy
>>> trades.update(columns=(kx.Column('price') * 2).name('dpx'))
    pykx.Table(pykx.q('
    sym  date       price    size dpx
    --------------------------------------
    AAPL 2022.01.01 145.6259 19   291.2518
    MSFT 2022.01.02 533.9187 92   1067.837
    MSFT 2022.01.02 17.17696 7    34.35393
    GOOG 2022.01.03 916.1286 60   1832.257
    AAPL 2022.01.02 140.0383 54   280.0766
    ..
    '))
    >>> trades.update(columns=kx.Column('dpx', value=kx.Column('price') * 2))
    pykx.Table(pykx.q('
    sym  date       price    size dpx
    --------------------------------------
    AAPL 2022.01.01 145.6259 19   291.2518
    MSFT 2022.01.02 533.9187 92   1067.837
    MSFT 2022.01.02 17.17696 7    34.35393
    GOOG 2022.01.03 916.1286 60   1832.257
    AAPL 2022.01.02 140.0383 54   280.0766
    ..
    '))
>>> trades.update(columns=kx.Column('price', name='dpx') * 2)
pykx.Table(pykx.q('
sym  date       price    size dpx
--------------------------------------
AAPL 2022.01.01 145.6259 19   291.2518
MSFT 2022.01.02 533.9187 92   1067.837
MSFT 2022.01.02 17.17696 7    34.35393
GOOG 2022.01.03 916.1286 60   1832.257
AAPL 2022.01.02 140.0383 54   280.0766
..
'))

Multiple columns can be modified, retrieved or aggregations applied by using queries can be returned and have aggregations/operation performed on them.

select

delete

exec

Python

Copy
>>> trades.select(columns=kx.Column('date') & kx.Column('sym'))
    pykx.Table(pykx.q('
    date       sym 
    ---------------
    2022.01.01 AAPL
    2022.01.02 MSFT
    2022.01.02 MSFT
    2022.01.03 GOOG
    2022.01.02 AAPL
    ..
    '))
    >>> trades.select(columns=kx.Column('price').neg() & kx.Column('date') + 1)
    pykx.Table(pykx.q('
    price     date      
    --------------------
    -145.6259 2022.01.02
    -533.9187 2022.01.03
    -17.17696 2022.01.03
    -916.1286 2022.01.04
    -140.0383 2022.01.03
    ..
    '))
    >>> trades.select(columns=kx.Column('price').last() & kx.Column('date').last())
    pykx.Table(pykx.q('
    price    date      
    -------------------
    975.5566 2022.01.01
    '))

Python

Copy
>>> trades.delete(columns=kx.Column('date') & kx.Column('sym'))
    pykx.Table(pykx.q('
    price    size
    -------------
    145.6259 19  
    533.9187 92  
    17.17696 7   
    916.1286 60  
    140.0383 54
    ..
    '))
    '))

Python

Copy
>>> trades.exec(columns=kx.Column('date') & kx.Column('price'))
    pykx.Dictionary(pykx.q('
    date | 2022.01.01 2022.01.02 2020.0..
    price| 145.6259   533.9187   17.176..
    '))

Columns can be named/renamed by using the name method or keyword.

select

exec

update

Python

Copy
>>> trades.select(columns=kx.Column('price').max().name('maxPrice'))
    pykx.Table(pykx.q('
    maxPrice
    --------
    989.3873
    '))
    >>> trades.select(columns=kx.Column('price', name='maxPrice').max())
    pykx.Table(pykx.q('
    maxPrice
    --------
    989.3873
    '))

Python

Copy
>>> trades.exec(columns=(2 * kx.Column('price')).name('multiPrice') &
...                     kx.Column('sym').name('symName'))
pykx.Dictionary(pykx.q('
multiPrice| 291.2518 1067.837 34.35..
symName   | AAPL     MSFT     MSFT ..
'))
    >>> trades.exec(columns=(2 * kx.Column('price', name='multiPrice') &
    ...                     kx.Column('sym', name='symName'))
    pykx.Dictionary(pykx.q('
    multiPrice| 291.2518 1067.837 34.35..
    symName   | AAPL     MSFT     MSFT ..
    '))

In the case of update renaming a column will add a new column with the associated name

Python

Copy
>>> trades.update(columns=kx.Column('price').name('priceCol'))
pykx.Table(pykx.q('
sym  date       price    size priceCol
--------------------------------------
AAPL 2022.01.01 145.6259 19   145.6259
MSFT 2022.01.02 533.9187 92   533.9187
MSFT 2022.01.02 17.17696 7    17.17696
GOOG 2022.01.03 916.1286 60   916.1286
AAPL 2022.01.02 140.0383 54   140.0383
..
'))
>>> trades.update(columns=kx.Column('price', name='priceCol'))
pykx.Table(pykx.q('
sym  date       price    size priceCol
--------------------------------------
AAPL 2022.01.01 145.6259 19   145.6259
MSFT 2022.01.02 533.9187 92   533.9187
MSFT 2022.01.02 17.17696 7    17.17696
GOOG 2022.01.03 916.1286 60   916.1286
AAPL 2022.01.02 140.0383 54   140.0383
..
'))

As an alternative approach for renaming, a dictionary can be used to control names of returned columns.

Python

Copy
>>> trades.select(columns={'maxPrice':kx.Column('price').max()})
pykx.Table(pykx.q('
maxPrice
--------
993.6284
'))

You can also use the value keyword to pass data when generating a new column. For example

Python

Copy
>>> trades.update(kx.Column('newCol', value = kx.random.random(100, 10.0))).head(5)
    pykx.Table(pykx.q('
    sym  date       price    size newCol   
    ---------------------------------------
    AAPL 2022.01.01 145.6259 19   3.489322 
    MSFT 2022.01.02 533.9187 92   4.731594 
    MSFT 2022.01.02 17.17696 7    8.769994 
    GOOG 2022.01.03 916.1286 60   0.3928818
    AAPL 2022.01.02 140.0383 54   4.937273 
    '))

where

The where phrase allows you to filter data to retrieve, update, delete or apply functions on rows of a table which meet the specified conditions

By default this parameter has a value None which is equivalent to not filtering the data. This parameter is supported for all query types.

Filter data meeting a specified criteria on one column

select

delete

update

exec

Python

Copy
>>> trades.select(where=kx.Column('price') > 500)
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.02 533.9187 92  
GOOG 2022.01.03 916.1286 60  
AAPL 2022.01.02 876.0921 37  
AAPL 2022.01.03 952.2597 53  
MSFT 2022.01.02 603.3717 6   
..
'))
>>> trades.select(where=kx.Column('price') < kx.Column('size'))
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.03 46.11964 93  
GOOG 2022.01.02 16.11913 81  
AAPL 2022.01.03 28.98133 97  
AAPL 2022.01.02 44.09906 91  
GOOG 2022.01.01 12.58364 33  
'))
>>> trades.select(where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.01 989.3873 42  
'))

Python

Copy
>>> trades.delete(where=kx.Column('price') > 500)
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
AAPL 2022.01.01 145.6259 19  
MSFT 2022.01.02 17.17696 7   
AAPL 2022.01.02 140.0383 54  
MSFT 2022.01.03 282.4291 98  
MSFT 2022.01.03 46.11964 93 
..
'))
>>> trades.delete(where=kx.Column('price') > kx.Column('size'))
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.03 46.11964 93  
GOOG 2022.01.02 16.11913 81  
AAPL 2022.01.03 28.98133 97  
AAPL 2022.01.02 44.09906 91  
GOOG 2022.01.01 12.58364 33  
'))

Python

Copy
>>> trades.update(columns = 2 * kx.Column('price'),
        ...               where=kx.Column('price') > 500)
    pykx.Table(pykx.q('
    sym  date       price    size
    -----------------------------
    AAPL 2022.01.01 145.6259 19  
    MSFT 2022.01.02 1067.837 92  
    MSFT 2022.01.02 17.17696 7   
    GOOG 2022.01.03 1832.257 60  
    AAPL 2022.01.02 140.0383 54  
    ..
    '))

Python

Copy
>>> trades.exec(columns = kx.Column('size'), where = kx.Column('price') > 900)
pykx.LongVector(pykx.q('60 53 61 41 98 12 41 12 23 42 18 76 73 55'))

Using & or passing a list of pykx.Column objects will allow multiple filters to be passed

select

delete

update

exec

Python

Copy
>>> trades.select(where=(kx.Column('sym') == 'GOOG') & (kx.Column('date') == datetime.date(2022, 1, 1)))
pykx.Table(pykx.q('
sym  date       price   
------------------------
GOOG 2022.01.01 480.9078
GOOG 2022.01.01 454.5668
GOOG 2022.01.01 790.2208
GOOG 2022.01.01 296.6022
GOOG 2022.01.01 727.6113
..
'))
>>> trades.select(where=[
...     kx.Column('sym') == 'GOOG',
    ...     kx.Column('date') == datetime.date(2022, 1, 1)
...     ])
>>> from datetime import date
>>> trades.select(columns=kx.Column('price').wavg(kx.Column('size')),
...               where=(kx.Column('sym') == 'GOOG') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
price  
-------
44.7002
'))

Python

Copy
>>> from datetime import date
>>> trades.delete(where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.02 533.9187 92  
MSFT 2022.01.02 17.17696 7   
GOOG 2022.01.03 916.1286 60  
AAPL 2022.01.02 140.0383 54  
MSFT 2022.01.03 282.4291 98
..
'))

Python

Copy
>>> from datetime import date
>>> trades.update(
...     columns=2*kx.Column('price'),
...     where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
AAPL 2022.01.01 291.2518 19  
MSFT 2022.01.02 533.9187 92  
MSFT 2022.01.02 17.17696 7   
GOOG 2022.01.03 916.1286 60  
AAPL 2022.01.02 140.0383 54  
..
'))

Python

Copy
>>> from datetime import date
>>> trades.exec(
...     columns=kx.Column('price') & kx.Column('date'),
...     where=(kx.Column('sym') == 'AAPL') & (kx.Column('date') == date(2022, 1, 1)))
pykx.Dictionary(pykx.q('
price| 145.6259   636.4009   8..
date | 2022.01.01 2022.01.01 2..
'))

by

The by phrase allows you to apply aggregations or manipulate data grouping the data by specific conditions.

By default this parameter has a value None which is equivalent to not grouping your data. This parameter is supported for select, exec and update type queries.

When both a columns and by clause are passed to a select query without use of an aggregation function then each row contains vectors of data related to the by columns.

Python

Copy
>>> trades.select(columns=kx.Column('price'), by=kx.Column('date') & kx.Column('sym'))
pykx.KeyedTable(pykx.q('
date       sym | price                                                       ..
---------------| ------------------------------------------------------------..
2022.01.01 AAPL| 131.6095 236.3145 140.4332 839.3869 843.3531 641.2171 104.81..
2022.01.01 GOOG| 480.9078 454.5668 790.2208 296.6022 727.6113 341.9665 609.77..
2022.01.01 MSFT| 556.9152 755.6175 865.9657 714.9804 179.5444 149.734 67.0821..
2022.01.02 AAPL| 441.8975 379.1373 659.8286 531.1731 975.3188 613.6512 603.99..
2022.01.02 GOOG| 446.898 664.8273 648.3929 240.1062 119.6 774.3718 449.4149 8..
2022.01.02 MSFT| 699.0336 387.7172 588.2985 725.8795 842.5805 646.37 593.7708..
2022.01.03 AAPL| 793.2503 621.7243 570.4403 626.2866 263.992 153.475 123.7397..
2022.01.03 GOOG| 586.263 777.3633 834.1404 906.9809 617.6205 179.6328 100.041..
2022.01.03 MSFT| 633.3324 39.47309 682.9453 867.1843 483.0873 851.2139 318.93..
'))

Adding an aggregation function allows this aggregation to be run on a column within the by phrase.

Python

Copy
>>> trades.select(columns=kx.Column('price').max(), by=kx.Column('date') & kx.Column('sym'))
pykx.KeyedTable(pykx.q('
date       sym | price   
---------------| --------
2022.01.01 AAPL| 843.3531
2022.01.01 GOOG| 790.2208
2022.01.01 MSFT| 865.9657
2022.01.02 AAPL| 975.3188
2022.01.02 GOOG| 886.0093
2022.01.02 MSFT| 993.6284
2022.01.03 AAPL| 843.9354
2022.01.03 GOOG| 914.6929
2022.01.03 MSFT| 867.1843
'))

Using a by clause within an update allows you to modify the values of the table conditionally based on your grouped criteria, for example:

Copy
>>> trades.update(columns=kx.Column('price').wavg(kx.Column('size')).name('vwap'),
...               by=kx.Column('sym'))
pykx.Table(pykx.q('
sym  date       price    size vwap    
--------------------------------------
AAPL 2022.01.01 145.6259 19   56.09317
MSFT 2022.01.02 533.9187 92   40.46716
MSFT 2022.01.02 17.17696 7    40.46716
GOOG 2022.01.03 916.1286 60   52.721  
AAPL 2022.01.02 140.0383 54   56.09317
..
'))

Note

Using by without an associated columns clause will return the last row in the table for each column in the by phrase.

Python

Copy
>>> trades.select(by=kx.Column('sym'))
pykx.KeyedTable(pykx.q('
sym | date       price   
----| -------------------
AAPL| 2022.01.02 955.4843
GOOG| 2022.01.02 886.0093
MSFT| 2022.01.01 719.9879
')) 

inplace

The inplace keyword provides the ability for a user to overwrite the representation of the object which they are querying.

This functionality is set to False by default but will operate effectively on in-memory table objects for the select, update and delete query types.

If set to True the input table can be overwritten as follows

Python

Copy
>>> trades.delete(where=kx.Column('sym').isin(['AAPL']), inplace=True)
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.02 533.9187 92  
MSFT 2022.01.02 17.17696 7   
GOOG 2022.01.03 916.1286 60  
MSFT 2022.01.03 282.4291 98  
MSFT 2022.01.03 46.11964 93 
..
'))
>>> trades
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.02 533.9187 92
MSFT 2022.01.02 17.17696 7
GOOG 2022.01.03 916.1286 60
MSFT 2022.01.03 282.4291 98
MSFT 2022.01.03 46.11964 93
..
'))

Query types

While this page discusses primarily the Pythonic API for querying kdb+ tables locally. The following describes some of the other ways that queries can be completed

Local queries

qSQL equivalent query for comparison:

Python

Copy
>>> kx.q('select from trades where price=max price')
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 983.0794
'))

Access query API off the table object:

Python

Copy
>>> trades.select(where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 983.0794
'))

Direct use of the kx.q.qsql query APIs taking the table as a parameter:

Python

Copy
>>> kx.q.qsql.select(trades, where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 983.0794
'))

Passing a string will query the table of that name in q memory:

Python

Copy
>>> kx.q.qsql.select('trades', where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 983.0794
'))

Remote Queries

Queries can also be performed over IPC to remote servers.

Python

Copy
>>> conn = kx.SyncQConnection(port = 5000)
>>> conn.qsql.select('trades', where=kx.Column('price') == kx.Column('price').max())
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 983.0794
'))

Query classes

Column

See pykx.Column for full documentation on this class.

And operator &

Using & on two Column objects will return a QueryPhrase which describes the underlying construct which is used to query your table.

Python

Copy
>>> qp =(kx.Column('sym') == 'GOOG') & (kx.Column('price') > 500)
>>> type(qp)
<class 'pykx.wrappers.QueryPhrase'>
>>> qp._phrase
[[pykx.Operator(pykx.q('=')), 'sym', [pykx.SymbolAtom(pykx.q('`GOOG'))]], [pykx.Operator(pykx.q('>')), 'price', pykx.LongAtom(pykx.q('500'))]]
>>> trades.select(where=qp)
pykx.Table(pykx.q('
sym  date       price   
------------------------
GOOG 2022.01.03 976.1246
GOOG 2022.01.02 716.2858
GOOG 2022.01.03 872.5027
GOOG 2022.01.02 962.5156
GOOG 2022.01.01 589.7202
..
'))

Additional Column objects can & off a QueryPhrase to further build up more complex queries.

Or operator |

Using | on two Column objects will return a Column object.

Python

Copy
>>> c =(kx.Column('price') < 100) | (kx.Column('price') > 500)
>>> type(c)
<class 'pykx.wrappers.Column'>
>>> c._value
[pykx.Operator(pykx.q('|')), [pykx.Operator(pykx.q('<')), 'price', pykx.LongAtom(pykx.q('100'))], [pykx.Operator(pykx.q('>')), 'price', pykx.LongAtom(pykx.q('500'))]]
>>> trades.select(where=c)
pykx.Table(pykx.q('
sym  date       price   
------------------------
AAPL 2022.01.01 542.6371
AAPL 2022.01.01 77.57332
MSFT 2022.01.01 637.4637
GOOG 2022.01.03 976.1246
MSFT 2022.01.03 539.6816
..
'))

Note

Column objects can not apply | off a QueryPhrase. Presently these are restricted only to operations on two kx.Column objects.

Python operators

The following Python operators can be used with the Column class to perform analysis on your data

Python operator

q operation

Magic method

+ + __add__
- - __sub__
- - __rsub__
* * __mul__
/ % __truediv__
/ % __rtruediv__
// div __floordiv__
// div __rfloordiv__
% mod ___mod__
** xexp __pow__
== = __eq__
!= <> __ne__
> > __gt__
>= >= __ge__
< < __lt__
<= <= __le__
pos abs __pos__
neg neg __neg__
floor floor __floor__
ceil ceiling __ceil__
abs abs __abs__

The following are a few examples of this various operations in use

  • Finding rows where price is greater than or equal to half the average price:

    Python

    Copy
    >>> trades.select(where=kx.Column('price') >= kx.Column('price').avg() / 2)
    pykx.Table(pykx.q('
    sym  date       price   
    ------------------------
    AAPL 2022.01.01 542.6371
    MSFT 2022.01.01 637.4637
    GOOG 2022.01.03 976.1246
    MSFT 2022.01.03 539.6816
    GOOG 2022.01.02 716.2858
    ..
    '))
  • Apply the math libraries floor operation on the column price updating its value:

    Python

    Copy
    >>> from math import floor
    >>> trades.update(floor(kx.Column('price')))
    pykx.Table(pykx.q('
    sym  date       price size
    --------------------------
    AAPL 2022.01.01 145   19  
    MSFT 2022.01.02 533   92  
    MSFT 2022.01.02 17    7   
    GOOG 2022.01.03 916   60  
    AAPL 2022.01.02 140   54  
    ..
    '))

PyKX methods

In addition to support for the Python operators outlined above PyKX provides a number of analytic methods and properties for the kx.Column objects. In total there are more than 100 analytic methods supported ranging from a basic method to retrieve the maximum value of a column, to more complex analytics for the calculation of the weighted average between two vectors.

The following drop-down provides a list of the supported methods, with full details on the API page here.

The following provides a complex example of a user generated query to calculate trade statistics and time-weighted average spread information associated with a Trade and Quote tables making use of the following methods.

Python

Copy
def generate_twap(trade, quote, start_time, end_time, syms = None):
    if syms is None:
        syms = trade.exec(kx.Column('sym').distinct())

    quote_metrics = quote.select(
        columns = (kx.Column('ask') - kx.Column('bid')).avg().name('avg_spread') &
                  (kx.Column('time').next_item() - kx.Column('time')).wavg(kx.Column('ask') - kx.Column('bid')).name('twa_spread') &
                  ((kx.Column('asize') + kx.Column('bsize')).avg().name('avg_size') * 0.5) &
                  (kx.Column('time').next_item() - kx.Column('time')).avg().name('avg_duration'),
        by = kx.Column('sym'),
        where = kx.Column('sym').isin(syms) & kx.Column('time').within(start_time, end_time)
        )

    trade_metrics = trade.select(
        columns = (2 * kx.Column('price').dev()).name('std_dev') &
                  (kx.Column('time').next_item() - kx.Column('time')).wavg(kx.Column('price')).name('std_dev') &
                  kx.Column('price').max().name('max_price') &
                  kx.Column('price').min().name('min_price') &
                  kx.Column('size').wavg(kx.Column('price')).name('vwap'),
        by = kx.Column('sym'),
        where = kx.Column('sym').isin(syms) & kx.Column('time').within(start_time, end_time)
        )

    return kx.q.uj(quote_metrics, trade_metrics)

Variable

See pykx.Variable for full documentation on this class.

In some cases when operating at the interface of q and Python analytics you may wish to perform a comparison or analytic which makes use of a named variable from q.

The following example shows this in action

Python

Copy
>>> kx.q['filter']='GOOG'
>>> trades.select(where=kx.Column('sym') == kx.Variable('filter'))
pykx.Table(pykx.q('
sym  date       price
------------------------
GOOG 2022.01.03 976.1246
GOOG 2022.01.02 716.2858
GOOG 2022.01.03 872.5027
GOOG 2022.01.02 962.5156
GOOG 2022.01.01 589.7202
..
'))

Advanced features

Custom functions

While there is an extensive list of functions/analytics that are supported by the API it does not cover all analytics that you, or users of an extension you are writing may need.

To facilitate this you have access to the pykx.register.column_function, this function provides the ability to define methods off your defined pykx.Column objects. This function should take the column on which the function is being performed as it's first argument and the call method should be used to apply your analytic.

The call method takes as it's first argument the function you wish to apply and can take multiple positional arguments.

For example, let's consider the following cases:

  • Define a function applying a min-max scaling against the price column of a table

    Python

    Copy
    >>> def min_max_scaler(column):
    ...     return column.call('{(x-minData)%max[x]-minData:min x}')
    >>> kx.register.column_function('minmax', min_max_scaler)
    >>> trades.update(kx.Column('price').minmax().name('scaled_price'))
    pykx.Table(pykx.q('
    sym  date       price    size scaled_price
    ------------------------------------------
    MSFT 2022.01.02 533.9187 92   0.5337153   
    MSFT 2022.01.02 17.17696 7    0.004702399 
    GOOG 2022.01.03 916.1286 60   0.9250016   
    MSFT 2022.01.03 282.4291 98   0.2762535   
    MSFT 2022.01.03 46.11964 93   0.03433238
    ..
    '))
  • Define a function which multiplies two columns together and calculates the log returning the result

    Python

    Copy
    >>> def log_multiply(column1, column2):
    ...     return column1.call('{log x*y}', column2)
    >>> kx.register.column_function('log_multiply', log_multiply)
    >>> trades.select(kx.Column('price').log_multiply(kx.Column('size')))
    pykx.Table(pykx.q('
    price   
    --------
    10.80203
    4.789479
    10.9145 
    10.22839
    8.363838
    ..
    '))

fby queries

Complex queries often require the application of a function on data grouped by some condition, in many cases the application of a by clause is sufficient to get the information you need, however you will run into cases where you need to filter-by a certain condition.

Let's assume you want to find the stock information by symbol, where the price is the maximum price:

Python

Copy
>>> trades.select(where=kx.Column('price') == kx.Column.fby(kx.Column('sym'), kx.q.max, kx.Column('price')))
pykx.Table(pykx.q('
sym  date       price    size
-----------------------------
MSFT 2022.01.03 977.1655 92  
AAPL 2022.01.02 996.8898 20  
GOOG 2022.01.03 971.9498 47  
'))

Using iterators

Not all analytics that you may wish to run on your table expect to take the full content of a column(s) as input, for example in some cases you may wish to apply an analytic on each row of a column. While operations which rely on iterators may be slower than purely vectorized, operations they may be necessary.

PyKX supports the following iterators, a number of examples are provided below

Iterator

Type

Link

each map Each
peach map Peach
\: map Each Left
/: map Each Right
\:/: map Each Left-Each Right
/:\: map Each Right-Each Left
' map Case
': map Each Prior
/ accumulator Over
\ accumulator Scan

Example 1

Calculate the maximum value of each row of a column x:

Python

Copy
>>> table = kx.Table(data={'x': [[10, 5, 4], [20, 30, 50], [1, 2, 3]]})
pykx.Table(pykx.q('
x       
--------
10 5  4 
20 30 50
1  2  3 
'))
>>> table.select(kx.Column('x').max(iterator='each'))
pykx.Table(pykx.q('

--
10
50
3 
'))

Example 2

Join the characters associated from two columns row wise using the ' iterator:

Python

Copy
>>> table = kx.Table(data={'x': b'abc', 'y': b'def'})
pykx.Table(pykx.q('
x y
---
a d
b e
c f
'))
>>> table.select(kx.Column('x').join(kx.Column('y'), iterator="'"))
pykx.Table(pykx.q('
x   
----
"ad"
"be"
"cf"
'))

Example 3

Join the characters "_xy" to all rows in a column x:

Python

Copy
>>> table = kx.Table(data={'x': b'abc', 'y': b'def'})
pykx.Table(pykx.q('
x y
---
a d
b e
c f
'))
>>> table.select(kx.Column('x').join(b"_xy", iterator='\\:'))
pykx.Table(pykx.q('
x     
------
"a_xy"
"b_xy"
"c_xy"
'))

Next steps

After learning to query data with the Pythonic API, explore other querying methods:

  • Query your data using SQL.

  • If you want to upskill and learn how to query directly using q.

  • Make your queries more performant by following a few guidelines.

For some further reading, here are some related topics:

  • If you don't have a historical database available visit our Databases page.

  • Learn about creating PyKX Table objects from the PyKXIntroduction notebook.