Important

KDB-X General Availability (GA) Now Live

The KDB-X Public Preview period has ended. Please note that this Public Preview website is no longer updated. Visit the new KDB-X GA site for the latest documentation, downloads, and updates:

Go to the KDB-X GA site

How to Work with Files

This guide shows you how to work with files in KDB-X: read/write text and binary, use the native format, stream large inputs, and apply compression/encryption. It also covers key file/system commands and external interfaces (ODBC, Arrow/Parquet, HDF5, Python).

In this guide, you will learn how to:

Use file handles and resolve file paths in q
Read and write text files (CSV, fixed-width)
Stream large inputs by line, size, or pipe
Read and write binary files (raw, fixed-width) and work with offsets and lengths
Load and save KDB-X formatted data (load, rload, get, save, rsave, set)
Choose and apply compression and encryption (including defaults through .z.zd)

By the end, you will know how to: work with mapped lists (anymap) and splayed/partitioned data (dsave), use .Q utilities for enumeration and database layout, inspect file sizes and directories, run OS commands from q, and connect to external interfaces such as ODBC, Arrow/Parquet, HDF5, and Python.

File handles

In q, you use a symbol in the form of a file handle to identify files or directories. A file handle is a symbol that starts with a colon. For example, this is a file handle (symbol data type) for the file new.txt:

Copy

`:new.txt

The format of a file handle can start with an optional relative or absolute file path, followed by the file name. Use hsym to convert an existing symbol to a symbol that follows the symbol handle format:

Copy

q)hsym `$"/tmp/test.txt"
`:/tmp/test.txt

File paths

q reads files from relative or absolute paths.

When q resolves a relative file path, it searches locations in the following order:

the current directory
the directory specified by the QHOME environment variable
the directory specified by the QLIC environment variable

For example, if a file new.txt containing "Im stored in QHOME!!" exists in the directory specified by the QHOME environment variable, trying to read new.txt while in a different directory resolves to using QHOME:

Copy

q)getenv `QHOME                       / location where new.txt stored
"/Users/myuser/Development/q"
q)\pwd                                / current directory (does not contain new.txt)
"/tmp"
q)read0 `:new.txt                     / read new.txt
"Im stored in QHOME!!"

Creating another new.txt file containing "Im stored in the current directory!!!" in the current directory and retrying will read that instead:

Copy

q)\pwd                                / location where new.txt stored
"/tmp"
q)read0 `:new.txt
"Im stored in the current directory!!!"

Text files

Text conversion

In addition to the text-handling facilities built into the q language (such as ssr,sv, and vs), q also provides functionality through its core namespaces.

General conversions

The .h (markup) namespace offers utilities for converting data to and from different formats.

Note

You usually do not need the CSV conversion utilities in .h, because q already handles CSV directly in its read and write functions.

The .Q namespace includes additional text utilities such as .Q.s1, which produces a string representation of an object.

Copy

q).Q.s til 20
"0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n"

JSON conversion

The .j (JSON) namespace converts data to and from JSON (JavaScript Object Notation). For example, you can convert a dictionary to a JSON text string:

Copy

q)d:`a`b!(0 1;("hello";"world"))
q)show d
a| 0       1
b| "hello" "world"
q)show j:.j.j d
"{\"a\":[0,1],\"b\":[\"hello\",\"world\"]}"

You can then convert the JSON string back into a dictionary:

Copy

q).j.k j
a| 0       1
b| "hello" "world"

Because q represents JSON as a string, you can also use it with the built-in KDB-X facilities for reading and writing text files.

Read text files

Read raw text

Use read0 to read the contents of a text file as a list of strings.

For example, if the file test.txt contains:

txt

Copy

hello
world

you can read it with:

Copy

q)read0 `:test.txt
"hello"
"world"

If you want to read the file without converting each line into a separate string, refer to the Read text as binary data section.

You can also read from a file using an offset and a length. In the following example, q starts 6 characters from the beginning of the test.txt file and reads the next 5 characters:

Copy

q)first read0 (`:test.txt;6;5)
"world"

Deliminated files (CSV)

Use 0: to load files that contain deliminated field values, such as CSV (comma-separated value) files. q parses the file into records based on the KDB-X data type you specify for each field (or whether to skip a field). Refer to the column type reference for details.

For example, given a file test.csv:

csv

Copy

DOB,NAME,ID
20130315,Jim,3404
19760113,Sam,2311
33120212,Ian,2211

you can parse it with:

Copy

("DSJ";enlist ",") 0: `:test.csv
DOB        NAME ID
--------------------
2013.03.15 Jim  3404
1976.01.13 Sam  2311
3312.02.12 Ian  2211

Because you can specify the delimiter, you can also parse files that use other separators. For example, a file that uses tab characters between fields:

Copy

q)("DSJ";enlist "\t") 0: `:tab.txt
DOB        NAME ID
--------------------
2013.03.15 Jim  3404
1976.01.13 Sam  2311
3312.02.12 Ian  2211

If a file does not include a header row, such as:

csv

Copy

20130315,Jim,3404
19760113,Sam,2311
33120212,Ian,2211

parse it by omitting enlist when specifying the delimiter:

Copy

q)("DSJ";",") 0: `:test.csv
2013.03.15 1976.01.13 3312.02.12
Jim        Sam        Ian
3404       2311       2211

Multithreaded load

For large files, you can improve performance by specifying additional threads.

Auto CSV type mapping

The utility script KxSystems/kdb/utils/csvguess.q automatically generates CSV loader scripts. This is especially useful for very wide or long CSV files, where specifying the correct type for each column manually would be time-consuming.

The script also includes:

an optimized on-disk sorter
the ability to create a loader that enumerates all symbol columns and loads them quickly (parallel loading processes only need to read the sym file)

You can find the range of command-line options for the script at the start of its source file.

Fixed width

Use 0: to load text files that contain fixed-length records. q converts each field into a suitable data type for easier analysis. You parse the file into records by defining the corresponding kKDB-X data type of each field and its length. Refer to the column types and formats documentation for details. For example, given a file users.txt containing the following:

txt

Copy

1001  198.00James  STAFF 1997.01.01
1002  142.00Sandra STAFF 1976.01.12
1003  144.12Fred   STAFF 2000.01.23

The example file starts with a 4 character long (type J), followed by 8 characters which contain a float (type F) and so on. Use the space type ' ' to ignore a column. In this example, the parser ignores the field that always contains STAFF.

The size of each field can be smaller (due to a smaller range) or larger (if padded with whitespace) than the typical range for that data type. You parse the file with the following instruction:

Copy

q)("JFS D";4 8 7 6 10) 0: `users.txt
1001       1002       1003
198        142        144.12
James      Sandra     Fred
1997.01.01 1976.01.12 2000.01.23

Because each record has a fixed length, you can also specify an offset and a length to read a particular range of records. In the previous example file, each record is 36-characters long (including the return character). To read 2 records after the first record, use an offset of 36 and a length of 72:

Copy

q)("JFS D";4 8 7 6 10) 0: (`users.txt;36;72)
1002       1003
142        144.12
Sandra     Fred
1976.01.12 2000.01.23

Multithreaded load

For large files, you can improve performance by specifying additional threads.

Stream data from files

When a file is large, loading the entire file into memory can be prohibitive. Streaming lets you load and process a subsection of the file, in sequence, reducing the memory footprint.

Streaming by line

Use .Q.fs to stream an input file line by line, or in chunks of multiple lines. You can add a user-defined function to control how each chunk of data is processed. For example, given a file data.csv:

csv

Copy

2019-10-03, 24.5,  24.51, 23.79, 24.13, 19087300, AMD
2019-10-03, 27.37, 27.48, 27.21, 27.37, 39386200, MSFT
2019-10-04, 24.1,  25.1,  23.95, 25.03, 17869600, AMD
2019-10-04, 27.39, 27.96, 27.37, 27.94, 82191200, MSFT
2019-10-05, 24.8,  25.24, 24.6,  25.11, 17304500, AMD
2019-10-05, 27.92, 28.11, 27.78, 27.92, 81967200, MSFT
2019-10-06, 24.66, 24.8,  23.96, 24.01, 17299800, AMD
2019-10-06, 27.76, 28,    27.65, 27.87, 36452200, MSFT

can be read with .Q.fs, which processes a specified number of lines at a time, which are used as the parameter to a single-parameter function (in this example, 0N! displays the value).

Copy

q).Q.fs[{0N!x}]`:data.csv
("2019-10-03, 24.5,  24.51, 23.79, 24.13, 19087300, AMD";"2019-10-03, 27.37, 27.48, 27.21, 27.37, 39386200, MSFT";"2019-10-04, 24.1,  25.1,  23.95, 25.03, 17869600, AMD";"2019-10-04, 27.39, 27.96, 27.37, 27.94, 82191200, MSFT";"2019-10-05, 24.8,  25.24, 24.6,  25.11, 17304500, AMD";"2019-10-05, 27.92, 28.11, 27.78, 27.92, 81967200, MSFT";"2019-10-06, 24.66, 24.8,  23.96, 24.01, 17299800, AMD";"2019-10-06, 27.76, 28,    27.65, 27.87, 36452200, MSFT")
436

Given the file is a CSV file, you can use 0: to parse each line into their appropriate d ata types.

Copy

q).Q.fs[{0N!("DFFFFIS";",")0:x}]`:data.csv
(2019.10.03 2019.10.03 2019.10.04 2019.10.04 2019.10.05 2019.10.05 2019.10.06 2019.10.06;24.5 27.37 24.1 27.39 24.8 27.92 24.66 27.76;24.51 27.48 25.1 27.96 25.24 28.11 24.8 28;23.79 27.21 23.95 27.37 24.6 27.78 23.96 27.65;24.13 27.37 25.03 27.94 25.11 27.92 24.01 27.87;19087300 39386200 17869600 82191200 17304500 81967200 17299800 36452200i;`AMD`MSFT`AMD`MSFT`AMD`MSFT`AMD`MSFT)
436

In this example, the file has 9 360 lines. For each chunk, q creates a table (using flip) and appends it to the table t with the join operator (,). This creates a table with 9 360 rows. Next, use first to inspect the first row.

Copy

q).Q.fs[{t,:flip `date`open`high`low`close`volume`sym!("DFFFFIS";",")0:x}]`:data.csv
q)count t
9360
q)first t
date  | 2019.10.03
open  | 24.5
high  | 24.51
low   | 23.79
close | 24.13
volume| 19087300i
sym   | `AMD

Streaming by size

Use .Q.fsn to stream an input file, for a given bytes size over each line. You can pass a user-defined function on how to process each chunk of data. For example, given an input file numbers with the following contents:

txt

Copy

0123456789
0123456789

can be read in chunks of 3 bytes, which are used as the parameter to a single-parameter function (here just using `0N!` to display the value):

Copy

q).Q.fsn[{0N!x};`:numbers;3i]
,"012"
,"345"
,"678"
,,"9"
,"012"
,"345"
,"678"
,,"9"
22

Streaming from pipe

.Q.fps and .Q.fpn provide the ability to stream data from a fifo/named pipe. This is useful for many applications, such as streaming data directly from a compressed file without having to decompress the contents to disk. For example, using a CSV file (t.csv) with the contents:

csv

Copy

MSFT,12:01:10.000,A,O,300,55.60
APPL,12:01:20.000,B,O,500,67.70
IBM,12:01:20.100,A,O,100,61.11
MSFT,12:01:10.100,A,O,300,55.60
APPL,12:01:20.100,B,O,500,67.70
IBM,12:01:20.200,A,O,100,61.11
MSFT,12:01:10.200,A,O,300,55.60
APPL,12:01:20.200,B,O,500,67.70
IBM,12:01:20.200,A,O,100,61.11
MSFT,12:01:10.300,A,O,300,55.60
APPL,12:01:20.400,B,O,500,67.70
IBM,12:01:20.500,A,O,100,61.11
MSFT,12:01:10.500,A,O,300,55.60
APPL,12:01:20.600,B,O,500,67.70
IBM,12:01:20.600,A,O,100,61.11
MSFT,12:01:10.700,A,O,300,55.60
APPL,12:01:20.700,B,O,500,67.70
IBM,12:01:20.800,A,O,100,61.11
MSFT,12:01:10.900,A,O,300,55.60
APPL,12:01:20.900,B,O,500,67.70
IBM,12:01:20.990,A,O,100,61.11

If the file is compressed into a ZIP archive (t.zip), the system command unzip has the option to uncompress to stdout, which can be combined with a fifo. The following loads the CSV file through a FIFO without having the intermediary step of creating the unzipped file:

Copy

q)system"rm -f fifo && mkfifo fifo"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q)system"unzip -p t.zip > fifo &"
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo
q)trade

Alternatively, if the file was compressed using gzip (t.gz), use the system command gunzip:

Copy

q)system"rm -f fifo && mkfifo fifo"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q)system"gunzip -cf t.gz > fifo &"
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo
q)trade

Write text files

Write raw text

You can write strings to a file as text by using 0:. For example, the following

Copy

q)`:test.txt 0: ("hello";"world")
`:test.txt

creates a file called test.txt which contains:

txt

Copy

hello
world

Use enlist to write a single string:

Copy

q)`:test.txt 0: enlist "hello"

creates a file called test.txt which contains:

txt

Copy

hello

Write using `hopen`

Use hopen to create or append to a text file. Remember to always call hclose when you finish writing. The following example creates a file /tmp/new.txt and returns the handle to h.

Copy

q)h:hopen `:/tmp/new.txt

To append the text, call the file handle and pass the string you want to append:

Copy

q)h "one";
q)read0 `:/tmp/new.txt
"one"
q)h "two";
q)read0`:/tmp/new.txt
"onetwo"

Negating the file handle adds a new line character to the end of the string (character vector). Passing a vector of strings adds new line characters to the end of each string.

Copy

q)neg[h] "three";
q)neg[h] "four";
q)read0 `:/tmp/new.txt
"onetwothree"
"four"
q)neg[h] ("five";"six";"seven");
q)read0 `:/tmp/new.txt
"onetwothree"
"four"
"five"
"six"
"seven"
q)hclose h

Write and format

save formats different KDB-X data types into a range of text-based formats. For example, saving the table t as csv:

Copy

q)t:([]a:1 2 3;b:4 5 6)
q)save `t.csv
`:t.csv

results in a t.csv file containing:

csv

Copy

a,b
1,4
2,5
3,6

whereas saving as xml

Copy

q)t:([]a:1 2 3;b:4 5 6)
q)save `t.xml
`:t.xml

results in a t.xml containing:

xml

Copy

<R>
<r><a>1</a><b>4</b></r>
<r><a>2</a><b>5</b></r>
<r><a>3</a><b>6</b></r>
</R>

Supported file extensions include: .csv, .txt, .xls, .xml, .json, and no extension for saving in q binary format.

Binary files

Read binary files

Read raw binary

read1 reads the contents of a file as a byte vector. The following example reads a 40-byte file that contains 10 integers in the range 0 to 9:

Copy

q)read1 `:integers.dat
0x00000000010000000200000003000000040000000500000006000000070000000800000009000000

You can specify an offset and length to read a subsection of a file. In this example, q reads the same file starting at a 4-byte offset and loads 36 bytes from that point:

Copy

q)read1 (`:hopen.dat;4;36)
0x010000000200000003000000040000000500000006000000070000000800000009000000

Read text as binary data

You can also use read1 to read files as text, by casting the resulting bytes to characters. This approach lets you process newline or carriage-return characters and returns the file contents of the file as a single string. For example, a file test.txt containing:

txt

Copy

hello
world

can be read with:

Copy

q)"c"$read1 `:test.txt
"hello\nworld\n"

Fixed width binary

1: provides the ability to read binary data by specifying the data types, widths, and endianness. Refer to the range of types documentation for more details. It can read an entire file by repeatedly parsing each fixed-width record that you define.

For example, the following instruction reads each record of two 4-byte integers from a file test.dat that contains 10 4-byte integers (0...9)

Copy

q)("ii";4 4)1:`:test.dat
0 2 4 6 8
1 3 5 7 9

You can also specify an offset and length to read a subsection of the file. Reading the same file, but starting at a 4-byte offset, and only reading 8 bytes from that point:

Copy

q)("ii";4 4)1:(`:test.dat;4;8)
1
2

This is useful when a file contains a header or footer that doesn’t fit the specification of the other fixed-width record within the file.

Multithreaded load

For large files, you can improve performance by specifying additional threads.

Write binary files

Write raw binary

To write raw binary data use 1:. For example, the following creates file.txt containing "hello", as the bytes represent the word in ASCII (for example, 0x68 is h, 0x65 is e).

Copy

q)`:file.txt 1: 0x68656c6c6f

The following example uses til to create a 400-byte file that contains a hundred 4-byte integers, ranging from 0 to 99:

Copy

q)`:file.txt 1:"i"$til 100

Note

Writing the vector above uses the binary representation of the values, which is different from saving in the KDB-X file format. To use the KDB-X file format, you must apply the suite of save options described in KDB-X formatted files.

For example, using the binary write to save the a vector with or without the sorted attribute applied results in the same file. The md5 keyword hashes the contents of each resulting file, to check contents for comparison. The attribute is not saved.

Copy

q)a:til 10
q)b:`s#til 10
q)a
0 1 2 3 4 5 6 7 8 9
q)b
`s#0 1 2 3 4 5 6 7 8 9
q)`:a 1: a
`:a
q)`:b 1: b
`:b
q)md5 "c"$read1 `:a
0xcdff2051b710c66f3373e26c901ee505
q)md5 "c"$read1 `:b
0xcdff2051b710c66f3373e26c901ee505

Write compressed data

You can compress binary data at the time of writing by choosing from a range of compression options. Use -21! on the compressed file to display compression details and statistics.

Copy

q)(`:file;17;2;9) 1:"i"$til 1000000
`:file
q)-21!`:file
compressedLength  | 1383683
uncompressedLength| 4000000
algorithm         | 2i
logicalBlockSize  | 17i
zipLevel          | 9i

Use .z.zd to set global defaults for writing compressed files. This prevents you from having to specify compression parameters on every write.

Copy

q).z.zd:(17;2;9)
q)`:file 1:"i"$til 1000000
`:file
q)-21!`:file
compressedLength  | 1383683
uncompressedLength| 4000000
algorithm         | 2i
logicalBlockSize  | 17i
zipLevel          | 9i
q)`:another_file 1:til 10000
`:another_file
q)-21!`:another_file
compressedLength  | 15194
uncompressedLength| 80000
algorithm         | 2i
logicalBlockSize  | 17i
zipLevel          | 9i

Write using `hopen`

Use hopen to create or append to a binary file. Always call hclose when you finish with the file handle:

Copy

q)h:hopen `:test.dat
q)h "a"
6i
q)h "bc"
6i
q)h 0x21
6i
q)hclose h

This produces a file test.dat containing abc! with no newlines, because no newline characters were written. The last byte written is the ASCII hex code for the exclamation mark, which is why it appears as ! when viewed in a text editor. These are raw binary files, not KDB-X data files, as you can notice if you're trying to use get to read a KDB-X formatted data file.

Copy

q)get `:test.dat
'test.dat
[0]  get `:test.dat
^

KDB-X formatted files

The KDB-X file format is a proprietary format. It adds a small amount of extra bytes to record metadata such as data types, attributes, and lengths. This structure allows q to load data back into memory quickly and efficiently, and to save data with the same benefits.

Read KDB-X data

Using `load`

You can read KDB-X files with load. This reads a file (or a directory of files) and creates a variable using the name provided. For example, loading a sorted list of a million longs:

Copy

q)load `a
`a
q)count a
1000000

or loading directory t that contains files a and b:

Copy

q)load `:t
`t
q)t
a| 1 2 3
b| 4 5 6

Using `rload`

rload loads a splayed table from a given directory. For example, to load a splayed table stored in directory t, run:

Copy

q)rload `t
`t
q)t
a b
---
1 4
2 5
3 6

Using `get`

get loads data from a file or folder (for example, a folder containing a splayed table):

Copy

q)t:get `:orig/
q)first t
a| 3
b| 3
c| 5
d| 8

Memory mapping

When you load a database from a folder (for example, a splayed table), q uses memory differently depending on whether you include a trailing slash at the end of the path.

The two modes are:

Immediate (no trailing slash): q memory-maps the directory contents when you load them.
Deferred (with trailing slash): q does not memory-map the directory at load time, but memory-maps and unmaps files as you access them.

To demonstrate this, first create a splayed table to a directory:

Copy

`:/tmp/data/tab/ set ([]100000?100;100000?1000f)

To compare how immediate and deferred memory mapping behave, we use:

.Q.w to view current memory statistics, with a focus on the memory-mapping values for files mapped to memory.
.Q.s1 to expose the underlying structure by looking at its string representation.
\t to measure elapsed time when running a command repeatedly. Higher values indicate slower execution.

First, we load the database without a trailing slash, which puts it into immediate mode. We then check memory statistics and query run-times against the loaded database.

Copy

q)`used`mmap#.Q.w[]
used| 371616
mmap| 0
q)t:get`:/tmp/data/tab
q)`used`mmap#.Q.w[]
used| 373360
mmap| 1600032
q)\t:100 select from t
0
q)\t:100 select from t
0
q)\t:100 select from t
0
q)`used`mmap#.Q.w[]
used| 373360
mmap| 1600032
q).Q.s1 t
"+`x`x1!(81 96 32 46 99 88 61 54 31 6 25 49 61 76 30 10 37 90 92...

Again, but this time with the trailing slash in deferred mode:

Copy

q)`used`mmap#.Q.w[]
used| 371616
mmap| 0
q)t:get`:/tmp/data/tab/
q)`used`mmap#.Q.w[]
used| 373344
mmap| 0
q)\t:100 select from t
11
q)\t:100 select from t
10
q)\t:100 select from t
12
q)`used`mmap#.Q.w[]
used| 373344
mmap| 0
q).Q.s1 t
"+`x`x1!`:/tmp/data/tab/"

The example shows there is less memory used over the lifetime of the session in deferred mode, at the expense of load times when accessing the data each time. It also demonstrates that neither example copies data into memory as the used memory does not increase in relation to the size of the contents of the files.

Load system command

The system command \l loads various style of tables into memory. For example, you can load a partitioned table stored in /tmp/tab with:

Copy

q)\l /tmp/tab
`tab

This builds upon get to load/organize data and is often used to load a partitioned/segmented database that is comprised of multiple directories. For example, a partitioned database with the following file structure:

bash

Copy

db
├── 2015.01.01
│   └── t
│       ├── p
│       └── ti
└── 2015.01.02
└── t
├── p
└── ti

can be loaded to populate a single table for subsequent queries:

Copy

q)\l db
q)t
date       ti       p
-------------------------
2015.01.01 09:30:00 101
2015.01.01 09:31:00 102
2015.01.02 09:30:00 101.5
2015.01.02 09:31:00 102.5

Using get results in a different structure:

Copy

q)t
| t
----------| --------------------------------------
2015.01.01| +`ti`p!(09:30:00 09:31:00;101 102f)
2015.01.02| +`ti`p!(09:30:00 09:31:00;101.5 102.5)

The .Q namespace adds even more functionality through utility functions such as .Q.lo or .Q.li.

Loading through the command line

Pass a directory path on the command line to load data on start-up. This is similar to using the load system command and provided as a convenience to allow instances to start with data already preloaded. For example, with a partitioned database in directory /tmp/t, the following loads the database on start-up:

Copy

$ q /tmp/t
q)t
a b
---
1 4
2 5
3 6

Write KDB-X data

KDB-X has a native on-disk data format that lets it efficiently store, load, and process data.

Using `save`

save is a simple way to save any global variables to disk. For example, saving a table t to a single file called t:

Copy

q)t:([]a:1 2 3;b: 4 5 6)
q)save `t
`:t

Or saving a million longs (created using til` to a file called a

Copy

q)a:til 1000000
q)save `a
`:a

Or another million longs, this time labelled with the sorted attribute which will be recorded along with the data:

Copy

q)b:`s#til 1000000
q)save `b
`:b

By viewing the file sizes with hcount notice that the file sizes are slightly larger than the million 8-byte longs. The extra bytes store metadata that KDB-X uses internally. Whether the data has attributes or not, the size remains the same.

Copy

q)hcount `:a
8000016
q)hcount `:b
8000016

Using `rsave`

rsave is similar to using save, but instead of writing a single file, it saves a table as a splayed table (a directory that contains one file per column).

Copy

q)t:([]a:1 2 3;b: 4 5 6)
q)rsave `t
`:t/

Results in a directory t containing:

bash

Copy

-rw-r--r--   1 user  wheel    12 30 May 10:26 .d
-rw-r--r--   1 user  wheel    40 30 May 10:26 a
-rw-r--r--   1 user  wheel    40 30 May 10:26 b

You can also create splayed tables by using set.

Using `set`

Use set to save data to a file or directory. For example, save a table to a file:

Copy

q)t:([]a:1 2 3;b:4 5 6)
q)`:file set t
`:file

You can also splay to a directory:

Copy

q)`:dir/ set t
`:dir/

The following output shows the contents of the directory. q creates a separate file for each column, along with the .d file that KDB-X uses to record the column names.

bash

Copy

$ls -al dir
-rw-r--r--  1 simon  wheel   12  2 Jun 13:07 .d
-rw-r--r--  1 simon  wheel   40  2 Jun 13:07 a
-rw-r--r--  1 simon  wheel   40  2 Jun 13:07 b

Writing `anymap`

Anymap (or mapped list, type 77h), introduced in 3.6, improves on existing mapped nested types (77h+t) by removing the uniformity restriction, for example:

Copy

q)a:get`:a set (1 2 3;"cde"); b:get`:b set ("abc";"def")
q)77 77h~(type a;type b)

Mapped lists' elements can be of any type, including lists, dictionaries, tables.

Copy

q)a:get`:a set ((1 2;3 4);`time`price`vol!(.z.p;1.;100i);([]a:1 2;b:("ab";"cd")))
q)77 0h~(type a;type first a)

A write primitive alternative to set, `:a 1: x, allows mapped lists to nest within other mapped lists. For files written with 1:, regardless of depth, vectors within all structures remain mapped. q can use them without copying them to the heap.

Copy

q)a:get`:a 1: ((1 2;3 4);([]time:1000?.z.p;price:1000?100.);([]time:1000?.z.p;price:1000?200))
q)77 77h~(type a;type first a)
q).Q.w[]`used`mmap / 336736 40432
q)p:exec price from a[1]
q).Q.w[]`used`mmap /336736 40432

Saving compound data introduced two additional files:

file#: The underlying storage (file#) stays mapped as long as there's a reference to any mapped object within. Be careful when working with compressed data as anything decompressed in the file remains in memory until the last reference is gone.
file##: q automatically enumerates symbol vectors/atoms against file## and de-enumerates (and therefore, always copies) on access. For example:

Copy

q)`:file set((`a`b;`b`c);0 1) / symbols cause a 3rd file to be created, file##, which contains the enumeration domain

Write compressed data

Use set to compress data, choosing from a range of compression options. This reduces disk space requirements at the expense of time to read/write. Using -21! on the compressed file provides compression details and stats. An example of writing a splayed table with compression is below:

Copy

m1:1000000
t:([]a:m1?10;b:m1?10;c:m1?10;d:m1?10)
(`:compressed/;17;2;6) set t

When writing a splayed table, you can specify different compression options on a per column basis, for example:

Copy

show dict:``a`b!(17 5 3;17 2 6;17 2 6)
(`:compressed_multi/;dict) set t

Each compression algorithm has benefits depending on the data pattern it is applied to, so you may gain advantages by using multiple algorithms in a single deployment. Because compressed data stores the format used, you do not need to specify the compression algorithm when reading the data back into KDB-X.

Use .z.zd to set global defaults for writing compressed files. This removes the need to specify compression parameters on every write.

Write encrypted data

Use set to write encrypted data and protect files from unauthorized access. For example, create a key by running the following command and entering testtest as the password when prompted:

bash

Copy

openssl rand 32 | openssl aes-256-cbc -md SHA256 -salt -pbkdf2 -iter 50000 -out testkek.key

After loading the key, any encrypted writes will use that key.

Copy

q)-36!(`:testkek.key;"testtest")
q)t:([]a:1 2 3;b:4 5 6)
q)(`:test.enc;17;16;6) set t
`:test.enc

A user cannot read the file without a key. For example, trying to open the file in a new session without a loaded key results in an error.

Copy

q)get `:test.enc
'test.enc. no key loaded for encrypted file test.enc
[0]  get `:test.enc
^

Providing the key lets you read the file:

Copy

q)-36!(`:testkek.key;"testtest")
q)get `:test.enc
a b
---
1 4
2 5
3 6

Or, consider combining encryption with compression to perform both tasks when writing.

Using `dsave`

Similar to set and rsave, dsave writes splayed tables to disk but with additional functionality. It's like combining the behavior of .Q.en and set or .Q.dpft, but in a simpler, more direct form.

This example splays two tables (quote and trade) to a directory with their symbols enumerated and applies the parted attribute to the sym column. When you use dsave with two tables, q sorts them with xasc. The sort is applied to both tables by using the each iterator (', quote).

Copy

q)trade:([]sym:100?`3;price:100?1.0;size:100?10)
q)quote:([]sym:100?`3;bid:100?1.0;ask:100?1.0;bsize:100?10;asize:100?10)
q)meta trade
c    | t f a
-----| -----
sym  | s
price| f
size | j
q)meta quote
c    | t f a
-----| -----
sym  | s
bid  | f
ask  | f
bsize| j
asize| j
q)`:/tmp/db1 dsave`sym xasc'`trade`quote

which results in the following directory structure:

bash

Copy

/tmp/db1
├── quote
│   ├── asize
│   ├── ask
│   ├── bid
│   ├── bsize
│   └── sym
├── sym
└── trade
├── price
├── size
└── sym

Loading the directory reinstates the trade and quote table, but now with partitioned attribute as saved by dsave.

Copy

q)meta trade
c    | t f a
-----| -----
sym  | s   p
price| f
size | j
q)meta quote
c    | t f a
-----| -----
sym  | s   p
bid  | f
ask  | f
bsize| j
asize| j

You can also use dsave to save data to a partition by providing the partition name as the second element after the directory name. For example, using the same two tables, the following command saves the tables to the partition 2015.01.01:

Copy

q)`:/tmp/db2`2015.01.01 dsave`sym xasc'`trade`quote

which results in the following a directory structure:

bash

Copy

/tmp/db2
├── 2015.01.01
│   ├── quote
│   │   ├── asize
│   │   ├── ask
│   │   ├── bid
│   │   ├── bsize
│   │   └── sym
│   └── trade
│       ├── price
│       ├── size
│       └── sym
└── sym

`.Q` namespace utilities

The .Q namespace provides various utilities for creating, maintaining, loading, and saving databases, whether single , segmented, or partitioned. It also helps enumerate symbols within a database.

For example, .Q.en enumerates symbols in a database. Symbols are often repeated in a historical database, so it is efficient to store them once in a dedicated file and reference them from the relevant tables.

In a partitioned database, the root directory should contain the symbol file (called sym). The following example creates a database with two partitions, each containing tables t and q. Because .Q.en returns the enumerated table, you can use set to write the splayed table into the appropriate directory.

Copy

q)`:/tmp/partitioned_db/2015.01.01/t/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101 33f)]
`:/tmp/partitioned_db/2015.01.01/t/
q)`:/tmp/partitioned_db/2015.01.02/t/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`ibm`appl;p:101.5 33.5)]
`:/tmp/partitioned_db/2015.01.02/t/
q)`:/tmp/partitioned_db/2015.01.01/q/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:31:00; sym:`appl`meta;b:100.75 32.75; a:101.25 33.25f)]
`:/tmp/partitioned_db/2015.01.01/q/
q)`:/tmp/partitioned_db/2015.01.02/q/ set .Q.en[`:/tmp/partitioned_db/;([] ti:09:30:00 09:30:00; sym:`meta`msft;b:101.25 33.25; a:101.75 33.75)]
`:/tmp/partitioned_db/2015.01.02/q/

This results in the following directory structure:

bash

Copy

/tmp/partitioned_db/
├── 2015.01.01
│   ├── q
│   │   ├── a
│   │   ├── b
│   │   ├── sym
│   │   └── ti
│   └── t
│       ├── p
│       ├── sym
│       └── ti
├── 2015.01.02
│   ├── q
│   │   ├── a
│   │   ├── b
│   │   ├── sym
│   │   └── ti
│   └── t
│       ├── p
│       ├── sym
│       └── ti
└── sym

Loading the partitioned database can be as simple as passing the directory path on the command line:

bash

Copy

$ q /tmp/partitioned_db
q)q
date       ti       sym  b      a
--------------------------------------
2015.01.01 09:30:00 appl 100.75 101.25
2015.01.01 09:31:00 meta 32.75  33.25
2015.01.02 09:30:00 meta 101.25 101.75
2015.01.02 09:30:00 msft 33.25  33.75
q)t
date       ti       sym  p
------------------------------
2015.01.01 09:30:00 ibm  101
2015.01.01 09:31:00 msft 33
2015.01.02 09:30:00 ibm  101.5
2015.01.02 09:31:00 appl 33.5

The contents of the sym file can be verified using get, which shows all unique symbols found in the tables:

Copy

q)get `:/tmp/partitioned_db/sym
`ibm`msft`appl`meta

File utilities

Size of file

Use hcount to get the size of file.

Copy

q)hcount `:test.txt
20

Delete file

To delete a file or folder (if empty), use hdel.

Copy

q)hdel `:test.txt
`:test.txt
q)hdel `:test.txt
'test.txt. OS reports: No such file or directory
[0]  hdel `:test.txt
^

Files in a folder

Use key to list files in a folder.

Copy

q)key `:/tmp/textfiles
`s#`test1.txt`test2.txt

Whether files exist

To test whether files or folders exist, use key.

Copy

q)key `:/tmp/textfiles
`s#`emptyfolder`test1.txt`test2.txt
q)key `:/tmp/textfiles/doesnt_exist/        / doesnt exist, returns an empty symbol vector
q)key `:/tmp/textfiles/doesnt_exist.txt     / doesnt exist, returns an empty symbol vector
q)()~key `:/tmp/textfiles/doesnt_exist.txt
1b
q)()~key `:/tmp/textfiles/doesnt_exist/
1b
q)key `:/tmp/textfiles/emptyfolder
`s#`symbol$()
q)key `:/tmp/textfiles/test1.txt
`:/tmp/textfiles/test1.txt

Run system commands

Run any command-line program through the system command. This is especially useful when working with files. For example, calling mv to move a file:

Copy

system"mv file1.txt file2.txt"

Note

System commands may be operating-system specific. For example, Windows uses dir to list files. To handle this, KDB-X provides .z.o, which detects the current operating system. You can use it to write q code specifically for different operating systems.

External interfaces

ODBC

Depending on the individual driver, ODBC drivers can provide access to files in addition to 3rd party systems. For example, ODBC drivers exist for Microsoft Data Access database files, Parquet, Microsoft Excel, etc. Use the ODBC client driver for KDB-X to open or query files.

Example of reading a MDB file

Install the ODBC client driver for KDB-X. Install Microsoft ODBC driver for Microsoft Access. Using the driver installed, open a MDB file by using the following example command:

Copy

q)h:.odbc.open "driver=Microsoft Access Driver (*.mdb, *.accdb);dbq=C:\\mydb.mdb"

Note

Note that the name of the driver may differ between versions. The command above should be altered to reflect the driver name installed.

Use .odbc.tables to list the tables.

Copy

q).odbc.tables  h
`aa`bb`cc`dd`ii`nn

Use .odbc.eval to evaluate SQL commands through ODBC.

Copy

q).odbc.eval[h;"select * from aa"]

An alternative to querying through SQL is to load the entire database into KDB-X with the .odbc.loadcommand and query the data using KDB-X directly.

HDF5

Use the KDB-X HDF5 adapter to process HDF5 files.

Arrow

The Arrow interface allows interaction with Arrow and Parquet file formats.

Python

PyKX integrates with kKDB-X/q to allow data originating in Python to be read by KDB-X and vice-versa. This opens up KDB-X to a wide variety of file formats that are supported by Python.

Creating bespoke interfaces

You can extend KDB-X with C code to allow direct integration with 3rd party formats.

Other interfaces

Refer to this full list of external interfaces.

Summary

In this guide, you:

Used file handles and paths to identify and locate files in q
Read and wrote text files, including CSV, fixed-width, and streamed data
Worked with binary files and explored offsets, lengths, and fixed-width binary records
Loaded and saved data in KDB-X native formats using load, rload, get, save, rsave, and set
Applied compression and encryption when writing files, including setting global defaults with .z.zd
Created and managed mapped lists (anymap), splayed tables, and partitioned databases using dsave
Leveraged .Q namespace utilities for enumeration and database layout
Inspected file sizes, listed directory contents, and executed OS commands from within q
Connected to external interfaces such as ODBC, Arrow/Parquet, HDF5, and Python

You now have the essential skills to manage files in KDB-X and integrate them with external systems.

KDB-X General Availability (GA) Now Live

How to Work with Files

Overview

File handles

File paths

Text files

Text conversion

General conversions

JSON conversion

Read text files

Read raw text

Deliminated files (CSV)

Multithreaded load

Auto CSV type mapping

Fixed width

Multithreaded load

Stream data from files

Streaming by line

Streaming by size

Streaming from pipe

Write text files

Write raw text

Write using hopen

Write and format

Binary files

Read binary files

Read raw binary

Read text as binary data

Fixed width binary

Multithreaded load

Write binary files

Write raw binary

Write compressed data

Write using hopen

KDB-X formatted files

Read KDB-X data

Using load

Using rload

Using get

Memory mapping

Load system command

Loading through the command line

Write KDB-X data

Using save

Using rsave

Using set

Writing anymap

Write compressed data

Write encrypted data

Using dsave

.Q namespace utilities

File utilities

Size of file

Delete file

Files in a folder

Whether files exist

Run system commands

External interfaces

ODBC

Example of reading a MDB file

HDF5

Arrow

Python

Creating bespoke interfaces

Other interfaces

Summary

Write using `hopen`

Write using `hopen`

Using `load`

Using `rload`

Using `get`

Using `save`

Using `rsave`

Using `set`

Writing `anymap`

Using `dsave`

`.Q` namespace utilities