PyKX conversion considerations
This page provides details on data types and conversions in PyKX.
PyKX attempts to make conversions between q and Python as seamless as possible. However due to differences in their underlying implementations there are cases where 1 to 1 mappings are not possible.
Data types and conversions
The key PyKX APIs around data types and conversions are outlined under:
- Convert Pythonic data to PyKX
- PyKX type wrappers
- PyKX to Pythonic data type mapping
- Registering custom conversions
Text representation in PyKX
Handling and converting text in PyKX requires consideration as there are some key differences between the Symbol
and Char
data types.
Nulls and infinities
Most q data types have the concepts of null, negative infinity, and infinity. Python does not have the concept of infinities and it's null behavior differs in implementation. The page handling nulls and infinities details the needed considerations when dealing with these special values.
Temporal data types
Converting temporal data types in PyKX involves handling timestamp/datetime types and duration types, each with specific considerations due to differences in how Python and q (the language used by kdb+) represent these data types.
List conversion considerations
By default the library converts generic PyKX List objects pykx.List
to NumPy as an array of NumPy arrays. This conversion is chosen as it allows for the most flexible representation of data allowing ragged array representations and mixed lists of objects to be converted easily. However, this representation can be difficult to work with if/when dealing with multi-dimensional numeric data as is common in machine learning tasks for example.
As an example we can look at the conversion of a 3-Dimensional regularly shaped pykx.List
object to a NumPy array as follows:
Python
>>> import pykx as kx
>>> qlist = kx.random.random([2, 2, 2], 5.0)
pykx.List(pykx.q('
3.453383 3.388243 0.8355005 4.325851
0.6168138 3.450051 3.849182 2.360245
'))
>>> qlist.np()
array([array([array([3.45338272, 3.3882429 ]), array([0.83550046, 4.32585143])],
dtype=object) ,
array([array([0.61681376, 3.45005117]), array([3.84918233, 2.36024517])],
dtype=object) ],
dtype=object)
This representation clearly is more difficult to handle than you might expect for a regularly shaped numeric dataset of single type. A keyword argument reshape
is provided to facilitate a better converted representation of these singularly typed N-Dimensional lists, for example:
Python
>>> import pykx as kx
>>> qlist = kx.random.random([2, 2, 2], 5.0)
>>> qlist.np(reshape=True)
array([[[3.45338272, 3.3882429 ],
[0.83550046, 4.32585143]],
[[0.61681376, 3.45005117],
[3.84918233, 2.36024517]]])
Setting the reshape
` keyword to True
` checks if the input list is "rectangular" and contains only one data type before converting it to a single NumPy array by 'razing' the data to a single array and reshaping the data in NumPy post conversion.
This can be slow for nested arrays or many list elements. If you know the input and output shape of the data, you can pass this shape to the reshape
keyword like this:
Python
>>> import pykx as kx
>>> qlist = kx.random.random([10000, 100, 10], 10.0)
>>> qlist.np(reshape=[10000, 100, 10])
array([[[4.99088645, 9.20164969, 3.3486574 , ..., 9.28529354,
7.78650336, 0.9355585 ],
[9.49664481, 0.79703755, 8.41364461, ..., 5.28080439,
7.3933825 , 7.40476901],
[6.03204263, 9.40702084, 6.75116092, ..., 2.43375089,
9.33645056, 8.56930709],
...
The performance boost from knowing the shape ahead of time is significant
Python
import pykx as kx
qlist = kx.random.random([10000, 100, 10], 10.0)
%timeit qlist.np(reshape=True)
# 974 ms ± 34.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit qlist.np(reshape=[10000, 100, 10])
# 81.2 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)