How to Work with Files
This guide shows you how to work with files in KDB-X: read/write text and binary, use the native format, stream large inputs, and apply compression/encryption. It also covers key file/system commands and external interfaces (ODBC, Arrow/Parquet, HDF5, Python).
In this guide, you will learn how to:
-
Use file handles and resolve file paths in q
-
Read and write text files (CSV, fixed-width)
-
Stream large inputs by line, size, or pipe
-
Read and write binary files (raw, fixed-width) and work with offsets and lengths
-
Load and save KDB-X formatted data (
load
,rload
,get
,save
,rsave
,set
) -
Choose and apply compression and encryption (including defaults through
.z.zd
)
By the end, you will know how to: work with mapped lists (anymap
) and splayed/partitioned data (dsave
), use .Q
utilities for enumeration and database layout, inspect file sizes and directories, run OS commands from q, and connect to external interfaces such as ODBC, Arrow/Parquet, HDF5, and Python.
Overview
File handles
In q, you use a symbol in the form of a file handle to identify files or directories. A file handle is a symbol that starts with a colon. For example, this is a file handle (symbol data type) for the file new.txt
:
q
`:new.txt
The format of a file handle can start with an optional relative or absolute file path, followed by the file name. Use hsym
to convert an existing symbol to a symbol that follows the symbol handle format:
q
q)hsym `$"/tmp/test.txt"
`:/tmp/test.txt
File paths
q reads files from relative or absolute paths.
When q resolves a relative file path, it searches locations in the following order:
-
the current directory
-
the directory specified by the
QHOME
environment variable -
the directory specified by the
QLIC
environment variable
For example, if a file new.txt
containing "Im stored in QHOME!!"
exists in the directory specified by the QHOME
environment variable, trying to read new.txt
while in a different directory resolves to using QHOME
:
q
q)getenv `QHOME / location where new.txt stored
"/Users/myuser/Development/q"
q)\pwd / current directory (does not contain new.txt)
"/tmp"
q)read0 `:new.txt / read new.txt
"Im stored in QHOME!!"
Creating another new.txt
file containing "Im stored in the current directory!!!"
in the current directory and retrying will read that instead:
q
q)\pwd / location where new.txt stored
"/tmp"
q)read0 `:new.txt
"Im stored in the current directory!!!"
Text files
Text conversion
In addition to the text-handling facilities built into the q language (such as ssr
,sv
, and vs
), q also provides functionality through its core namespaces.
General conversions
The .h (markup) namespace
offers utilities for converting data to and from different formats.
Note
You usually do not need the CSV conversion utilities in .h
, because q already handles CSV directly in its read and write functions.
The .Q
namespace includes additional text utilities such as .Q.s1
, which produces a string representation of an object.
q
q).Q.s til 20
"0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n"
JSON conversion
The .j (JSON) namespace
converts data to and from JSON (JavaScript Object Notation). For example, you can convert a dictionary to a JSON text string:
q
q)d:`a`b!(0 1;("hello";"world"))
q)show d
a| 0 1
b| "hello" "world"
q)show j:.j.j d
"{\"a\":[0,1],\"b\":[\"hello\",\"world\"]}"
You can then convert the JSON string back into a dictionary:
q
q).j.k j
a| 0 1
b| "hello" "world"
Because q represents JSON as a string, you can also use it with the built-in KDB-X facilities for reading and writing text files.
Read text files
Read raw text
Use read0
to read the contents of a text file as a list of strings.
For example, if the file test.txt
contains:
txt
hello
world
you can read it with:
q
q)read0 `:test.txt
"hello"
"world"
If you want to read the file without converting each line into a separate string, refer to the Read text as binary data section.
You can also read from a file using an offset and a length. In the following example, q starts 6 characters from the beginning of the test.txt
file and reads the next 5 characters:
q
q)first read0 (`:test.txt;6;5)
"world"
Deliminated files (CSV)
Use 0:
to load files that contain deliminated field values, such as CSV (comma-separated value) files. q parses the file into records based on the KDB-X data type you specify for each field (or whether to skip a field). Refer to the column type reference for details.
For example, given a file test.csv
:
csv
DOB,NAME,ID
20130315,Jim,3404
19760113,Sam,2311
33120212,Ian,2211
you can parse it with:
q
("DSJ";enlist ",") 0: `:test.csv
DOB NAME ID
--------------------
2013.03.15 Jim 3404
1976.01.13 Sam 2311
3312.02.12 Ian 2211
Because you can specify the delimiter, you can also parse files that use other separators. For example, a file that uses tab characters between fields:
q
q)("DSJ";enlist "\t") 0: `:tab.txt
DOB NAME ID
--------------------
2013.03.15 Jim 3404
1976.01.13 Sam 2311
3312.02.12 Ian 2211
If a file does not include a header row, such as:
csv
20130315,Jim,3404
19760113,Sam,2311
33120212,Ian,2211
parse it by omitting enlist
when specifying the delimiter:
q
q)("DSJ";",") 0: `:test.csv
2013.03.15 1976.01.13 3312.02.12
Jim Sam Ian
3404 2311 2211
Multithreaded load
For large files, you can improve performance by specifying additional threads.
Auto CSV type mapping
The utility script KxSystems/kdb/utils/csvguess.q
automatically generates CSV loader scripts. This is especially useful for very wide or long CSV files, where specifying the correct type for each column manually would be time-consuming.
The script also includes:
-
an optimized on-disk sorter
-
the ability to create a loader that enumerates all symbol columns and loads them quickly (parallel loading processes only need to read the sym file)
You can find the range of command-line options for the script at the start of its source file.
Fixed width
Use 0:
to load text files that contain fixed-length records. q converts each field into a suitable data type for easier analysis. You parse the file into records by defining the corresponding kKDB-X data type of each field and its length. Refer to the column types and formats documentation for details. For example, given a file users.txt
containing the following:
txt
1001 198.00James STAFF 1997.01.01
1002 142.00Sandra STAFF 1976.01.12
1003 144.12Fred STAFF 2000.01.23
The example file starts with a 4 character long (type J
), followed by 8 characters which contain a float (type F
) and so on. Use the space type ' '
to ignore a column. In this example, the parser ignores the field that always contains STAFF
.
The size of each field can be smaller (due to a smaller range) or larger (if padded with whitespace) than the typical range for that data type. You parse the file with the following instruction:
q
q)("JFS D";4 8 7 6 10) 0: `users.txt
1001 1002 1003
198 142 144.12
James Sandra Fred
1997.01.01 1976.01.12 2000.01.23
Because each record has a fixed length, you can also specify an offset and a length to read a particular range of records. In the previous example file, each record is 36-characters long (including the return character). To read 2 records after the first record, use an offset of 36
and a length of 72
:
q
q)("JFS D";4 8 7 6 10) 0: (`users.txt;36;72)
1002 1003
142 144.12
Sandra Fred
1976.01.12 2000.01.23
Multithreaded load
For large files, you can improve performance by specifying additional threads.
Stream data from files
When a file is large, loading the entire file into memory can be prohibitive. Streaming lets you load and process a subsection of the file, in sequence, reducing the memory footprint.
Streaming by line
Use .Q.fs
to stream an input file line by line, or in chunks of multiple lines. You can add a user-defined function to control how each chunk of data is processed. For example, given a file data.csv
:
csv
2019-10-03, 24.5, 24.51, 23.79, 24.13, 19087300, AMD
2019-10-03, 27.37, 27.48, 27.21, 27.37, 39386200, MSFT
2019-10-04, 24.1, 25.1, 23.95, 25.03, 17869600, AMD
2019-10-04, 27.39, 27.96, 27.37, 27.94, 82191200, MSFT
2019-10-05, 24.8, 25.24, 24.6, 25.11, 17304500, AMD
2019-10-05, 27.92, 28.11, 27.78, 27.92, 81967200, MSFT
2019-10-06, 24.66, 24.8, 23.96, 24.01, 17299800, AMD
2019-10-06, 27.76, 28, 27.65, 27.87, 36452200, MSFT
can be read with .Q.fs
, which processes a specified number of lines at a time, which are used as the parameter to a single-parameter function (in this example, 0N!
displays the value).
q
q).Q.fs[{0N!x}]`:data.csv
("2019-10-03, 24.5, 24.51, 23.79, 24.13, 19087300, AMD";"2019-10-03, 27.37, 27.48, 27.21, 27.37, 39386200, MSFT";"2019-10-04, 24.1, 25.1, 23.95, 25.03, 17869600, AMD";"2019-10-04, 27.39, 27.96, 27.37, 27.94, 82191200, MSFT";"2019-10-05, 24.8, 25.24, 24.6, 25.11, 17304500, AMD";"2019-10-05, 27.92, 28.11, 27.78, 27.92, 81967200, MSFT";"2019-10-06, 24.66, 24.8, 23.96, 24.01, 17299800, AMD";"2019-10-06, 27.76, 28, 27.65, 27.87, 36452200, MSFT")
436
Given the file is a CSV file, you can use 0:
to parse each line into their appropriate data types.
q
q).Q.fs[{0N!("DFFFFIS";",")0:x}]`:data.csv
(2019.10.03 2019.10.03 2019.10.04 2019.10.04 2019.10.05 2019.10.05 2019.10.06 2019.10.06;24.5 27.37 24.1 27.39 24.8 27.92 24.66 27.76;24.51 27.48 25.1 27.96 25.24 28.11 24.8 28;23.79 27.21 23.95 27.37 24.6 27.78 23.96 27.65;24.13 27.37 25.03 27.94 25.11 27.92 24.01 27.87;19087300 39386200 17869600 82191200 17304500 81967200 17299800 36452200i;`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT)
436
In this example, the file has 9 360 lines. For each chunk, q creates a table (using flip
) and appends it to the table t
with the join operator (,
). This creates a table with 9 360 rows. Next, use first
to inspect the first row.
q
q).Q.fs[{t,:flip `date`open`high`low`close`volume`sym!("DFFFFIS";",")0:x}]`:data.csv
q)count t
9360
q)first t
date | 2019.10.03
open | 24.5
high | 24.51
low | 23.79
close | 24.13
volume| 19087300i
sym | `AMD
Streaming by size
Use .Q.fsn
to stream an input file, for a given bytes size over each line. You can pass a user-defined function on how to process each chunk of data. For example, given an input file numbers
with the following contents:
txt
0123456789
0123456789
can be read in chunks of 3 bytes, which are used as the parameter to a single-parameter function (here just using `0N!`
to display the value):
q
q).Q.fsn[{0N!x};`:numbers;3i]
,"012"
,"345"
,"678"
,,"9"
,"012"
,"345"
,"678"
,,"9"
22
Streaming from pipe
.Q.fps
and .Q.fpn
provide the ability to stream data from a fifo/named pipe. This is useful for many applications, such as streaming data directly from a compressed file without having to decompress the contents to disk. For example, using a CSV file (t.csv
) with the contents:
csv
MSFT,12:01:10.000,A,O,300,55.60
APPL,12:01:20.000,B,O,500,67.70
IBM,12:01:20.100,A,O,100,61.11
MSFT,12:01:10.100,A,O,300,55.60
APPL,12:01:20.100,B,O,500,67.70
IBM,12:01:20.200,A,O,100,61.11
MSFT,12:01:10.200,A,O,300,55.60
APPL,12:01:20.200,B,O,500,67.70
IBM,12:01:20.200,A,O,100,61.11
MSFT,12:01:10.300,A,O,300,55.60
APPL,12:01:20.400,B,O,500,67.70
IBM,12:01:20.500,A,O,100,61.11
MSFT,12:01:10.500,A,O,300,55.60
APPL,12:01:20.600,B,O,500,67.70
IBM,12:01:20.600,A,O,100,61.11
MSFT,12:01:10.700,A,O,300,55.60
APPL,12:01:20.700,B,O,500,67.70
IBM,12:01:20.800,A,O,100,61.11
MSFT,12:01:10.900,A,O,300,55.60
APPL,12:01:20.900,B,O,500,67.70
IBM,12:01:20.990,A,O,100,61.11
If the file is compressed into a ZIP archive (t.zip
), the system command unzip
has the option to uncompress to stdout
, which can be combined with a fifo. The following loads the CSV file through a FIFO without having the intermediary step of creating the unzipped file:
q
q)system"rm -f fifo && mkfifo fifo"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q)system"unzip -p t.zip > fifo &"
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo
q)trade
Alternatively, if the file was compressed using gzip
(t.gz
), use the system command gunzip
:
q
q)system"rm -f fifo && mkfifo fifo"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q)system"gunzip -cf t.gz > fifo &"
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo
q)trade
Write text files
Write raw text
You can write strings to a file as text by using 0:
. For example, the following
q
q)`:test.txt 0: ("hello";"world")
`:test.txt
creates a file called test.txt
which contains:
txt
hello
world
Use enlist
to write a single string:
q
q)`:test.txt 0: enlist "hello"
creates a file called test.txt
which contains:
txt
hello
Write using hopen
Use hopen
to create or append to a text file. Remember to always call hclose
when you finish writing. The following example creates a file /tmp/new.txt
and returns the handle to h
.
q
q)h:hopen `:/tmp/new.txt
To append the text, call the file handle and pass the string you want to append:
q
q)h "one";
q)read0 `:/tmp/new.txt
"one"
q)h "two";
q)read0`:/tmp/new.txt
"onetwo"
Negating the file handle adds a new line character to the end of the string (character vector). Passing a vector of strings adds new line characters to the end of each string.
q
q)neg[h] "three";
q)neg[h] "four";
q)read0 `:/tmp/new.txt
"onetwothree"
"four"
q)neg[h] ("five";"six";"seven");
q)read0 `:/tmp/new.txt
"onetwothree"
"four"
"five"
"six"
"seven"
q)hclose h
Write and format
save
formats different KDB-X data types into a range of text-based formats. For example, saving the table t
as csv
:
q
q)t:([]a:1 2 3;b:4 5 6)
q)save `t.csv
`:t.csv
results in a t.csv
file containing:
csv
a,b
1,4
2,5
3,6
whereas saving as xml
q
q)t:([]a:1 2 3;b:4 5 6)
q)save `t.xml
`:t.xml
results in a t.xml
containing:
xml
<R>
<r><a>1</a><b>4</b></r>
<r><a>2</a><b>5</b></r>
<r><a>3</a><b>6</b></r>
</R>
Supported file extensions include: .csv
, .txt
, .xls
, .xml
, .json
, and no extension for saving in q binary format.
Binary files
Read binary files
Read raw binary
read1
reads the contents of a file as a byte vector. The following example reads a 40-byte file that contains 10 integers in the range 0
to 9
:
q
q)read1 `:integers.dat
0x00000000010000000200000003000000040000000500000006000000070000000800000009000000
You can specify an offset and length to read a subsection of a file. In this example, q reads the same file starting at a 4-byte offset and loads 36 bytes from that point:
q
q)read1 (`:hopen.dat;4;36)
0x010000000200000003000000040000000500000006000000070000000800000009000000
Read text as binary data
You can also use read1
to read files as text, by casting the resulting bytes to characters. This approach lets you process newline or carriage-return characters and returns the file contents of the file as a single string. For example, a file test.txt
containing:
txt
hello
world
can be read with:
q
q)"c"$read1 `:test.txt
"hello\nworld\n"
Fixed width binary
1:
provides the ability to read binary data by specifying the data types, widths, and endianness. Refer to the range of types documentation for more details. It can read an entire file by repeatedly parsing each fixed-width record that you define.
For example, the following instruction reads each record of two 4-byte integers from a file test.dat
that contains 10
4-byte integers (0...9)
q
q)("ii";4 4)1:`:test.dat
0 2 4 6 8
1 3 5 7 9
You can also specify an offset and length to read a subsection of the file. Reading the same file, but starting at a 4-byte offset, and only reading 8 bytes from that point:
q
q)("ii";4 4)1:(`:test.dat;4;8)
1
2
This is useful when a file contains a header or footer that doesn’t fit the specification of the other fixed-width record within the file.
Multithreaded load
For large files, you can improve performance by specifying additional threads.
Write binary files
Write raw binary
To write raw binary data use 1:
. For example, the following creates file.txt
containing "hello"
, as the bytes represent the word in ASCII (for example, 0x68 is h
, 0x65 is e
).
q
q)`:file.txt 1: 0x68656c6c6f
The following example uses til
to create a 400-byte file that contains a hundred 4-byte integers, ranging from 0
to 99
:
q
q)`:file.txt 1:"i"$til 100
Note
Writing the vector above uses the binary representation of the values, which is different from saving in the KDB-X file format. To use the KDB-X file format, you must apply the suite of save options described in KDB-X formatted files.
For example, using the binary write to save the a
vector with or without the sorted attribute applied results in the same file. The md5
keyword hashes the contents of each resulting file, to check contents for comparison. The attribute is not saved.
q
q)a:til 10
q)b:`s#til 10
q)a
0 1 2 3 4 5 6 7 8 9
q)b
`s#0 1 2 3 4 5 6 7 8 9
q)`:a 1: a
`:a
q)`:b 1: b
`:b
q)md5 "c"$read1 `:a
0xcdff2051b710c66f3373e26c901ee505
q)md5 "c"$read1 `:b
0xcdff2051b710c66f3373e26c901ee505
Write compressed data
You can compress binary data at the time of writing by choosing from a range of compression options. Use -21!
on the compressed file to display compression details and statistics.
q
q)(`:file;17;2;9) 1:"i"$til 1000000
`:file
q)-21!`:file
compressedLength | 1383683
uncompressedLength| 4000000
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 9i
Use .z.zd
to set global defaults for writing compressed files. This prevents you from having to specify compression parameters on every write.
q
q).z.zd:(17;2;9)
q)`:file 1:"i"$til 1000000
`:file
q)-21!`:file
compressedLength | 1383683
uncompressedLength| 4000000
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 9i
q)`:another_file 1:til 10000
`:another_file
q)-21!`:another_file
compressedLength | 15194
uncompressedLength| 80000
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 9i
Write using hopen
Use hopen
to create or append to a binary file. Always call hclose
when you finish with the file handle:
q
q)h:hopen `:test.dat
q)h "a"
6i
q)h "bc"
6i
q)h 0x21
6i
q)hclose h
This produces a file test.dat
containing abc!
with no newlines, because no newline characters were written. The last byte written is the ASCII hex code for the exclamation mark, which is why it appears as !
when viewed in a text editor. These are raw binary files, not KDB-X data files, as you can notice if you're trying to use get
to read a KDB-X formatted data file.
q
q)get `:test.dat
'test.dat
[0] get `:test.dat
^
KDB-X formatted files
The KDB-X file format is a proprietary format. It adds a small amount of extra bytes to record metadata such as data types, attributes, and lengths. This structure allows q to load data back into memory quickly and efficiently, and to save data with the same benefits.
Read KDB-X data
Using load
You can read KDB-X files with load
. This reads a file (or a directory of files) and creates a variable using the name provided. For example, loading a sorted list of a million longs:
q
q)load `a
`a
q)count a
1000000
or loading directory t
that contains files a
and b
:
q
q)load `:t
`t
q)t
a| 1 2 3
b| 4 5 6
Using rload
rload
loads a splayed table from a given directory. For example, to load a splayed table stored in directory t
, run:
q
q)rload `t
`t
q)t
a b
---
1 4
2 5
3 6
Using get
get
loads data from a file or folder (for example, a folder containing a splayed table):
q
q)t:get `:orig/
q)first t
a| 3
b| 3
c| 5
d| 8
Memory mapping
When you load a database from a folder (for example, a splayed table), q uses memory differently depending on whether you include a trailing slash at the end of the path.
The two modes are:
-
Immediate (no trailing slash): q memory-maps the directory contents when you load them.
-
Deferred (with trailing slash): q does not memory-map the directory at load time, but memory-maps and unmaps files as you access them.
To demonstrate this, first create a splayed table to a directory:
q
`:/tmp/data/tab/ set ([]100000?100;100000?1000f)
To compare how immediate and deferred memory mapping behave, we use:
-
.Q.w
to view current memory statistics, with a focus on the memory-mapping values for files mapped to memory. -
.Q.s1
to expose the underlying structure by looking at its string representation. -
\t
to measure elapsed time when running a command repeatedly. Higher values indicate slower execution.
First, we load the database without a trailing slash, which puts it into immediate mode. We then check memory statistics and query run-times against the loaded database.
q
q)`used`mmap#.Q.w[]
used| 371616
mmap| 0
q)t:get`:/tmp/data/tab
q)`used`mmap#.Q.w[]
used| 373360
mmap| 1600032
q)\t:100 select from t
0
q)\t:100 select from t
0
q)\t:100 select from t
0
q)`used`mmap#.Q.w[]
used| 373360
mmap| 1600032
q).Q.s1 t
"+`x`x1!(81 96 32 46 99 88 61 54 31 6 25 49 61 76 30 10 37 90 92...
Again, but this time with the trailing slash in deferred mode:
q
q)`used`mmap#.Q.w[]
used| 371616
mmap| 0
q)t:get`:/tmp/data/tab/
q)`used`mmap#.Q.w[]
used| 373344
mmap| 0
q)\t:100 select from t
11
q)\t:100 select from t
10
q)\t:100 select from t
12
q)`used`mmap#.Q.w[]
used| 373344
mmap| 0
q).Q.s1 t
"+`x`x1!`:/tmp/data/tab/"
The example shows there is less memory used over the lifetime of the session in deferred mode, at the expense of load times when accessing the data each time. It also demonstrates that neither example copies data into memory as the used memory does not increase in relation to the size of the contents of the files.
Load system command
The system command \l
loads various style of tables into memory. For example, you can load a partitioned table stored in /tmp/tab
with:
q
q)\l /tmp/tab
`tab
This builds upon get
to load/organize data and is often used to load a partitioned/segmented database that is comprised of multiple directories. For example, a partitioned database with the following file structure:
bash
db
├── 2015.01.01
│ └── t
│ ├── p
│ └── ti
└── 2015.01.02
└── t
├── p
└── ti
can be loaded to populate a single table for subsequent queries:
q
q)\l db
q)t
date ti p
-------------------------
2015.01.01 09:30:00 101
2015.01.01 09:31:00 102
2015.01.02 09:30:00 101.5
2015.01.02 09:31:00 102.5
Using get
results in a different structure:
q
q)t
| t
----------| --------------------------------------
2015.01.01| +`ti`p!(09:30:00 09:31:00;101 102f)
2015.01.02| +`ti`p!(09:30:00 09:31:00;101.5 102.5)
The .Q namespace
adds even more functionality through utility functions such as .Q.lo
or .Q.li
.
Loading through the command line
Pass a directory path on the command line to load data on start-up. This is similar to using the load system command and provided as a convenience to allow instances to start with data already preloaded. For example, with a partitioned database in directory /tmp/t
, the following loads the database on start-up:
q
$ q /tmp/t
q)t
a b
---
1 4
2 5
3 6
Write KDB-X data
KDB-X has a native on-disk data format that lets it efficiently store, load, and process data.
Using save
save
is a simple way to save any global variables to disk. For example, saving a table t
to a single file called t
:
q
q)t:([]a:1 2 3;b: 4 5 6)
q)save `t
`:t
Or saving a million longs (created using til
` to a file called a
q
q)a:til 1000000
q)save `a
`:a
Or another million longs, this time labelled with the sorted attribute which will be recorded along with the data:
q
q)b:`s#til 1000000
q)save `b
`:b
By viewing the file sizes with hcount
notice that the file sizes are slightly larger than the million 8-byte longs. The extra bytes store metadata that KDB-X uses internally. Whether the data has attributes or not, the size remains the same.
q
q)hcount `:a
8000016
q)hcount `:b
8000016
Using rsave
rsave
is similar to using save, but instead of writing a single file, it saves a table as a splayed table (a directory that contains one file per column).
q
q)t:([]a:1 2 3;b: 4 5 6)
q)rsave `t
`:t/
Results in a directory t
containing:
bash
-rw-r--r-- 1 user wheel 12 30 May 10:26 .d
-rw-r--r-- 1 user wheel 40 30 May 10:26 a
-rw-r--r-- 1 user wheel 40 30 May 10:26 b
You can also create splayed tables by using set.
Using set
Use set
to save data to a file or directory. For example, save a table to a file:
q
q)t:([]a:1 2 3;b:4 5 6)
q)`:file set t
`:file
You can also splay to a directory:
q
q)`:dir/ set t
`:dir/
The following output shows the contents of the directory. q creates a separate file for each column, along with the .d
file that KDB-X uses to record the column names.
bash
$ls -al dir
-rw-r--r-- 1 simon wheel 12 2 Jun 13:07 .d
-rw-r--r-- 1 simon wheel 40 2 Jun 13:07 a
-rw-r--r-- 1 simon wheel 40 2 Jun 13:07 b
Writing anymap
Anymap
(or mapped list, type 77h
), introduced in 3.6, improves on existing mapped nested types (77h+t
) by removing the uniformity restriction, for example:
q
q)a:get`:a set (1 2 3;"cde"); b:get`:b set ("abc";"def")
q)77 77h~(type a;type b)
Mapped lists' elements can be of any type, including lists, dictionaries, tables.
q
q)a:get`:a set ((1 2;3 4);`time`price`vol!(.z.p;1.;100i);([]a:1 2;b:("ab";"cd")))
q)77 0h~(type a;type first a)
A write primitive alternative to set, `:a 1: x
, allows mapped lists to nest within other mapped lists. For files written with 1:
, regardless of depth, vectors within all structures remain mapped. q can use them without copying them to the heap.
q
q)a:get`:a 1: ((1 2;3 4);([]time:1000?.z.p;price:1000?100.);([]time:1000?.z.p;price:1000?200))
q)77 77h~(type a;type first a)
q).Q.w[]`used`mmap / 336736 40432
q)p:exec price from a[1]
q).Q.w[]`used`mmap /336736 40432
Saving compound data introduced two additional files:
-
file#
: The underlying storage (file#
) stays mapped as long as there's a reference to any mapped object within. Be careful when working with compressed data as anything decompressed in the file remains in memory until the last reference is gone. -
file##
: q automatically enumerates symbol vectors/atoms againstfile##
and de-enumerates (and therefore, always copies) on access. For example:
q
q)`:file set((`a`b;`b`c);0 1) / symbols cause a 3rd file to be created, file##, which contains the enumeration domain
Write compressed data
Use set
to compress data, choosing from a range of compression options. This reduces disk space requirements at the expense of time to read/write. Using -21!
on the compressed file provides compression details and stats. An example of writing a splayed table with compression is below:
q
m1:1000000
t:([]a:m1?10;b:m1?10;c:m1?10;d:m1?10)
(`:compressed/;17;2;6) set t
When writing a splayed table, you can specify different compression options on a per column basis, for example:
q
show dict:``a`b!(17 5 3;17 2 6;17 2 6)
(`:compressed_multi/;dict) set t
Each compression algorithm has benefits depending on the data pattern it is applied to, so you may gain advantages by using multiple algorithms in a single deployment. Because compressed data stores the format used, you do not need to specify the compression algorithm when reading the data back into KDB-X.
Use .z.zd
to set global defaults for writing compressed files. This removes the need to specify compression parameters on every write.
Write encrypted data
Use set
to write encrypted data and protect files from unauthorized access. For example, create a key by running the following command and entering testtest
as the password when prompted:
bash
openssl rand 32 | openssl aes-256-cbc -md SHA256 -salt -pbkdf2 -iter 50000 -out testkek.key
After loading the key, any encrypted writes will use that key.
q
q)-36!(`:testkek.key;"testtest")
q)t:([]a:1 2 3;b:4 5 6)
q)(`:test.enc;17;16;6) set t
`:test.enc
A user cannot read the file without a key. For example, trying to open the file in a new session without a loaded key results in an error.
q
q)get `:test.enc
'test.enc. no key loaded for encrypted file test.enc
[0] get `:test.enc
^
Providing the key lets you read the file:
q
q)-36!(`:testkek.key;"testtest")
q)get `:test.enc
a b
---
1 4
2 5
3 6
Or, consider combining encryption with compression to perform both tasks when writing.
Using dsave
Similar to set
and rsave
, dsave
writes splayed tables to disk but with additional functionality. It's like combining the behavior of .Q.en
and set
or .Q.dpft
, but in a simpler, more direct form.
This example splays two tables (quote
and trade
) to a directory with their symbols enumerated and applies the parted attribute to the sym
column. When you use dsave
with two tables, q sorts them with xasc
. The sort is applied to both tables by using the each iterator ('
, quote
).
q
q)trade:([]sym:100?`3;price:100?1.0;size:100?10)
q)quote:([]sym:100?`3;bid:100?1.0;ask:100?1.0;bsize:100?10;asize:100?10)
q)meta trade
c | t f a
-----| -----
sym | s
price| f
size | j
q)meta quote
c | t f a
-----| -----
sym | s
bid | f
ask | f
bsize| j
asize| j
q)`:/tmp/db1 dsave`sym xasc'`trade`quote
which results in the following directory structure:
bash
/tmp/db1
├── quote
│ ├── asize
│ ├── ask
│ ├── bid
│ ├── bsize
│ └── sym
├── sym
└── trade
├── price
├── size
└── sym
Loading the directory reinstates the trade
and quote
table, but now with partitioned attribute as saved by dsave
.
q
q)meta trade
c | t f a
-----| -----
sym | s p
price| f
size | j
q)meta quote
c | t f a
-----| -----
sym | s p
bid | f
ask | f
bsize| j
asize| j
You can also use dsave
to save data to a partition by providing the partition name as the second element after the directory name. For example, using the same two tables, the following command saves the tables to the partition 2015.01.01
:
q
q)`:/tmp/db2`2015.01.01 dsave`sym xasc'`trade`quote
which results in the following a directory structure:
bash
/tmp/db2
├── 2015.01.01
│ ├── quote
│ │ ├── asize
│ │ ├── ask
│ │ ├── bid
│ │ ├── bsize
│ │ └── sym
│ └── trade
│ ├── price
│ ├── size
│ └── sym
└── sym
.Q
namespace utilities
The .Q
namespace provides various utilities for creating, maintaining, loading, and saving databases, whether single , segmented, or partitioned. It also helps enumerate symbols within a database.
For example, .Q.en
enumerates symbols in a database. Symbols are often repeated in a historical database, so it is efficient to store them once in a dedicated file and reference them from the relevant tables.
In a partitioned database, the root directory should contain the symbol file (called sym
). The following example creates a database with two partitions, each containing tables t
and q
. Because .Q.en
returns the enumerated table, you can use set
to write the splayed table into the appropriate directory.
q
q)`:/tmp/partitioned_db/2015.01.01/t/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101 33f)]
`:/tmp/partitioned_db/2015.01.01/t/
q)`:/tmp/partitioned_db/2015.01.02/t/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`ibm`appl;p:101.5 33.5)]
`:/tmp/partitioned_db/2015.01.02/t/
q)`:/tmp/partitioned_db/2015.01.01/q/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`appl`meta;b:100.75 32.75; a:101.25 33.25f)]
`:/tmp/partitioned_db/2015.01.01/q/
q)`:/tmp/partitioned_db/2015.01.02/q/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:30:00; sym:`meta`msft;b:101.25 33.25; a:101.75 33.75)]
`:/tmp/partitioned_db/2015.01.02/q/
This results in the following directory structure:
bash
/tmp/partitioned_db/
├── 2015.01.01
│ ├── q
│ │ ├── a
│ │ ├── b
│ │ ├── sym
│ │ └── ti
│ └── t
│ ├── p
│ ├── sym
│ └── ti
├── 2015.01.02
│ ├── q
│ │ ├── a
│ │ ├── b
│ │ ├── sym
│ │ └── ti
│ └── t
│ ├── p
│ ├── sym
│ └── ti
└── sym
Loading the partitioned database can be as simple as passing the directory path on the command line:
bash
$ q /tmp/partitioned_db
q)q
date ti sym b a
--------------------------------------
2015.01.01 09:30:00 appl 100.75 101.25
2015.01.01 09:31:00 meta 32.75 33.25
2015.01.02 09:30:00 meta 101.25 101.75
2015.01.02 09:30:00 msft 33.25 33.75
q)t
date ti sym p
------------------------------
2015.01.01 09:30:00 ibm 101
2015.01.01 09:31:00 msft 33
2015.01.02 09:30:00 ibm 101.5
2015.01.02 09:31:00 appl 33.5
The contents of the sym
file can be verified using get
, which shows all unique symbols found in the tables:
q
q)get `:/tmp/partitioned_db/sym
`ibm`msft`appl`meta
File utilities
Size of file
Use hcount
to get the size of file.
q
q)hcount `:test.txt
20
Delete file
To delete a file or folder (if empty), use hdel
.
q
q)hdel `:test.txt
`:test.txt
q)hdel `:test.txt
'test.txt. OS reports: No such file or directory
[0] hdel `:test.txt
^
Files in a folder
Use key
to list files in a folder.
q
q)key `:/tmp/textfiles
`s#`test1.txt`test2.txt
Whether files exist
To test whether files or folders exist, use key
.
q
q)key `:/tmp/textfiles
`s#`emptyfolder`test1.txt`test2.txt
q)key `:/tmp/textfiles/doesnt_exist/ / doesnt exist, returns an empty symbol vector
q)key `:/tmp/textfiles/doesnt_exist.txt / doesnt exist, returns an empty symbol vector
q)()~key `:/tmp/textfiles/doesnt_exist.txt
1b
q)()~key `:/tmp/textfiles/doesnt_exist/
1b
q)key `:/tmp/textfiles/emptyfolder
`s#`symbol$()
q)key `:/tmp/textfiles/test1.txt
`:/tmp/textfiles/test1.txt
Run system commands
Run any command-line program through the system
command. This is especially useful when working with files. For example, calling mv
to move a file:
q
system"mv file1.txt file2.txt"
Note
System commands may be operating-system specific. For example, Windows uses dir
to list files. To handle this, KDB-X provides .z.o
, which detects the current operating system. You can use it to write q code specifically for different operating systems.
External interfaces
ODBC
Depending on the individual driver, ODBC drivers can provide access to files in addition to 3rd party systems. For example, ODBC drivers exist for Microsoft Data Access database files, Parquet, Microsoft Excel, etc. Use the ODBC client driver for KDB-X to open or query files.
Example of reading a MDB file
Install the ODBC client driver for KDB-X. Install Microsoft ODBC driver for Microsoft Access. Using the driver installed, open a MDB file by using the following example command:
q
q)h:.odbc.open "driver=Microsoft Access Driver (*.mdb, *.accdb);dbq=C:\\mydb.mdb"
Note
Note that the name of the driver may differ between versions. The command above should be altered to reflect the driver name installed.
Use .odbc.tables
to list the tables.
q
q).odbc.tables h
`aa`bb`cc`dd`ii`nn
Use .odbc.eval
to evaluate SQL commands through ODBC.
q
q).odbc.eval[h;"select * from aa"]
An alternative to querying through SQL is to load the entire database into KDB-X with the .odbc.load
command and query the data using KDB-X directly.
HDF5
Use the KDB-X HDF5 adapter to process HDF5 files.
Arrow
The Arrow interface allows interaction with Arrow and Parquet file formats.
Python
PyKX integrates with kKDB-X/q to allow data originating in Python to be read by KDB-X and vice-versa. This opens up KDB-X to a wide variety of file formats that are supported by Python.
Creating bespoke interfaces
You can extend KDB-X with C code to allow direct integration with 3rd party formats.
Other interfaces
Refer to this full list of external interfaces.
Summary
In this guide, you:
-
Used file handles and paths to identify and locate files in q
-
Read and wrote text files, including CSV, fixed-width, and streamed data
-
Worked with binary files and explored offsets, lengths, and fixed-width binary records
-
Loaded and saved data in KDB-X native formats using
load
,rload
,get
,save
,rsave
, andset
-
Applied compression and encryption when writing files, including setting global defaults with
.z.zd
-
Created and managed mapped lists (
anymap
), splayed tables, and partitioned databases usingdsave
-
Leveraged .Q namespace utilities for enumeration and database layout
-
Inspected file sizes, listed directory contents, and executed OS commands from within q
-
Connected to external interfaces such as ODBC, Arrow/Parquet, HDF5, and Python
You now have the essential skills to manage files in KDB-X and integrate them with external systems.