How to Work with Data

This guide shows you how to create, save, load, and manage data in kdb+.

In this guide, you will learn how to:

  • Create in-memory tables, both keyed and unkeyed

  • Save and load tables in different formats (single file, text, splayed, partitioned, segmented)

  • Understand how q stores and handles datatypes such as timestamp, symbol, long, and float

  • Use linked columns to reference values between tables

  • Apply database utilities in the .Q namespace to manage partitions and missing tables

By the end, you will know how to persist tables to disk, manage large datasets efficiently, and prepare data structures that support growth over time.

Overview

Create table

Note: Datatypes

The examples in this section use four basic datatypes: timestamp (nanosecond precision), symbol (internal character data), long (8-byte signed integer), and float (double precision floating point number). For a full list, including specific types for temporal data, refer to q datatypes.

Define a table using the following syntax:

q

Copy
tab:([]colexp1;..;colexpN)

To create a keyed table, specify one or more column expressions within the square brackets:

q

Copy
keytab:([keyexp1;..;keyexpM]colexp1;..;colexpN)

Initialize a table with data by defining columns as lists in the keyexp or colexp expressions:

q

Copy
trade:([]time:asc 1000?.z.P;sym:1000?`AAPL`IBM`MSFT;size:100*1+1000?10;price:1000?100f)

Alternatively, create an empty schema by defining columns as (optionally typed) empty lists:

q

Copy
trade:([]time:`timestamp$();sym:`symbol$();size:`long$();price:`float$())

Note: Character data

q stores character-based data (strings) in two main datatypes: symbols and character vectors. q stores symbols in an internal hash table and represents each instance as a reference to that table. This design speeds up comparisons (no string comparison is required) and reduces the memory and disk space needed to store repeated values.

However, because the symbol pool never shrinks in memory or on disk, avoid using symbols for non-repeating data. Use symbols for values that repeat, such as stock tickers or product IDs. For values that do not repeat (like order IDs or freeform text), store them as nested lists of character vectors.

Save data

The tables you created in the examples above live only in memory. They exist for the lifetime of the process that created them. To persist them, you need to save them to disk.

You can choose from three on-disk formats. The right format depends on the size of the table and how you expect it to grow in the future:

  • Single file – Use this for smaller tables. When you load this type of table, q loads the entire table into memory. Only use this format if the available memory can hold the full table.

  • Splayed table – Use this for medium-sized tables, typically up to around 100 million rows, when you do not expect the table to grow significantly larger.

  • Partitioned table – Use this for the largest tables, typically 100's of millions of rows, especially when you expect the table to grow over time.

Single files

The simplest way to save a table is as a single file. Use the save keyword with the table name as the argument:

q

Copy
q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104)
q)save`cities
`:cities        

You can then load the table into a new process using the load keyword:

q

Copy
q)cities / not present yet
'cities
[0]  cities
q)load`cities
q)cities
city     pop
-----------------
Tokyo    37435191
Delhi    29399141
Shanghai 26317104  

Save in text formats

The save keyword also supports text formats such as .csv, tab-delimited, or .xls. Add the appropriate file extension to the argument to save:

q

Copy
q)save`cities.csv
`:cities.csv
q)read0`:cities.csv / read back the saved text
"city,pop"
"Tokyo,37435191"
"Delhi,29399141"
"Shanghai,26317104"        

Instead of save and load, you can also use the more general set and get.

  • set takes 2 arguments: the first argument is the name of the file to save to (as a handle), the second is the object to save;

  • get takes 1 argument: the name of the file to load as a handle, the result is returned and can be separately assigned to a variable

q

Copy
q)`:foo/bar/bigcities set cities
`:foo/bar/bigcities
q)newcities:get`:foo/bar/bigcities
q)newcities
city     pop
-----------------
Tokyo    37435191
Delhi    29399141
Shanghai 26317104         

Splayed tables

The second on-disk table format is a splayed table, which you typically use for medium-sized tables (around 100 million rows). In this format, q stores each column in a separate file, rather than storing the entire table in a single file. q infers the table name from the name of the directory that contains these files.

Save a splayed table by using the set keyword with a trailing slash (/) to indicate that you want to save the data to a directory. For example:

q

Copy
q)n:1000000
q)trade:`sym`time xasc ([]time:asc n?.z.P;sym:n?`AAPL`IBM`MSDT;size:100*1+n?10;price:n?100f)
q)`:trade/ set select time, size, price from trade
`:trade/
q)key`:trade/                                      / list the directory contents
`.d`price`size`time
q)select from `:trade/                             / we can select directory from the directory           

In this example, you will also notice an additional hidden file named .d inside the directory. This file contains the list of columns in the table in the order they should appear.

Enumerate symbol columns

In the example above, q did not save the sym column because it is of type symbol and you must enumerate this type of column before saving it to a splayed or partitioned table. The function .Q.en enumerates symbol columns against a vector called sym and saves the enumeration on disk.

q

Copy
q)`:trade/ set trade                                / error as there are unenumerated symbol type columns
'type
[0]  `:trade/ set trade
q)`:trade/ set .Q.en[`:.]trade                      / enumerate using .Q.en and save as before

Set attributes

You can set attributes such as s (sorted), u (unique), p (parted), or g (grouped) on a column of a splayed table on disk. Use amend to apply these attributes. For example:

q

Copy
q)@[`:trade,`sym;`p#]    

Note: Important Limitation

You can save only unkeyed tables as splayed or partitioned tables.

Partitioned tables

A partitioned table is a splayed table that you split further into groups of rows with the same value in a partition field. In q, you can use date, month, year, or int as partition fields.

Saving data in partitioned tables works the same way as with splayed tables, the difference being that each partition (splay) is saved in a subdirectory named after the partition value. For example, the following commands save two copies of the trade table in separate partitions:

q

Copy
/ save 2 copies of trade as separate partitions
q)`:./2025.06.01/trade/ set .Q.en[`:.]trade
`:./2025.06.01/trade/
q)`:./2025.06.02/trade/ set .Q.en[`:.]trade
`:./2025.06.02/trade/         

This results in a directory structure like:

q

Copy
├── 2025.06.01
│   └── trade
│       ├── .d
│       ├── price
│       ├── size
│       ├── sym
│       └── time
├── 2025.06.02
│   └── trade
│       ├── .d
│       ├── price
│       ├── size
│       ├── sym
│       └── time
└── sym           

Load the database with \l to see the partitions:

q

Copy
q)\l .                                                           / reload the database to see the changes
q)date
2025.06.01 2025.06.02
q)trade
date       time                          sym  size price
------------------------------------------------------------
2025.06.01 2000.01.01D00:21:48.836608570 AAPL 100  9.055363
2025.06.01 2000.01.01D00:25:37.859626281 AAPL 500  72.43675
2025.06.01 2000.01.01D00:51:29.565415399 AAPL 1000 58.5308
2025.06.01 2000.01.01D01:27:07.862021939 AAPL 400  32.52928
2025.06.01 2000.01.01D02:40:42.542642756 AAPL 300  23.25753
2025.06.01 2000.01.01D03:01:13.541362954 AAPL 500  30.259
2025.06.01 2000.01.01D03:10:51.150885564 AAPL 500  20.77152
2025.06.01 2000.01.01D03:31:25.704701625 AAPL 500  99.97752
2025.06.01 2000.01.01D03:34:46.848283357 AAPL 500  86.45456
2025.06.01 2000.01.01D03:48:53.148209099 AAPL 300  42.4489
2025.06.01 2000.01.01D04:36:16.289348089 AAPL 400  21.32947
2025.06.01 2000.01.01D05:30:42.486894795 AAPL 600  75.78067
2025.06.01 2000.01.01D06:09:21.906280047 AAPL 200  88.49448
2025.06.01 2000.01.01D06:34:39.370882693 AAPL 100  48.22444
..

Note: Date columns in partitioned tables

When you reload a partitioned database, q creates a global variable date that lists the partitions. The trade table also includes a virtual date column. q materializes the values in this column only when you query them.

Segmented databases

You can distribute partitioned tables across multiple storage devices or directory locations to:

  • give them more space

  • support parallelization

  • reduce the size of individual partitions, potentially reducing database ingest, and maintenance time.

At the top level, a segmented database contains only:

  • single file tables and objects

  • enumeration files

  • splayed tables

  • a text file par.txt that lists the directories containing the data for the partitions in the database.

Here’s the structure of a segmented database with 2 partitions split across 2 segments, each containing a single table:

q

Copy
├── db
│   ├── sym
│   └── par.txt
├── 1
│   └── 2025.06.01
│       └── trade
│           ├── time
│           ├── sym
│           ├── size
│           ├── price
│           └── .d
└── 0
└── 2025.06.02
└── trade
├── time
├── sym
├── size
├── price
└── .d   

The root of this database is db. The directories 0 and 1 are the segments. The par.txt file contains paths to the segments, either absolute or relative to its location.

q

Copy
../0
../1    

Save data to segmented databases

Saving data to a segmented database works much like saving to a non-segmented one. The difference is that you must distribute the data across the available segments.

q provides the function .Q.par to round-robin partitions over the segments listed in par.txt. For example, using the par.txt file above:

q

Copy
q).Q.par[`:db;2025.06.03;`$"trade/"]
`:../1/2025.06.03/trade/
q).Q.par[`:db;2025.06.04;`$"trade/"]
`:../0/2025.06.04/trade/      

.Q.par simply returns a directory location. You must still write the data to this location using set, just as you do with partitioned tables:

q

Copy
q).Q.par[`:db;2025.06.03;`$"trade/"]set .Q.en[`:db]trade
`:../1/2025.06.03/trade/
q).Q.par[`:db;2025.06.04;`$"trade/"]set .Q.en[`:db]trade
`:../0/2025.06.04/trade/

Linked columns

Linked columns store indices into another table. This design lets you reference values from the second table easily when you write queries. Linked columns do not enforce referential integrity — they simply act as indices into a second table along with a label. For example:

q

Copy
q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104;country:`japan`india`china)
q)countries:([]country:`india`japan`china;pop:1463865525 123103479 1416096094)
q)update country:`countries!countries.country?country from `cities;
q)select city, pop, country.country, countrypop:country.pop from cities 

Linked columns play an important role in on-disk data. q does not require the target table of a link to be a keyed table, and in fact you cannot key splayed or partitioned tables.

Database utilities

Each partitioned table is expected to appear in every partition of a database, even when some partitions contain no data. If a table is missing from a partition, q may return errors. This situation often arises when you add a new table but have not yet backfilled it across the database. q provides utilities in the .Q namespace to help manage this.

.Q.chk fill database

Use .Q.chk to scan the database for tables that are missing from partitions. The function takes a handle to the database as its single argument.

.Q.chk checks the most recent partition to determine the complete list of tables. When it finds a missing table in other partitions, it places an empty prototype in that partition.

Because .Q.chk writes to the database, you must run it as a user with write permissions to the database directory on disk.

.Q.bv build virtual prototypes

Use .Q.bv to scan for missing tables without writing to disk. Instead, it builds a map of missing table/partition pairs and creates an empty prototype for each missing table.

At runtime, q supplies these prototypes to queries that attempt to access missing data. This lets you continue querying the database without errors, even if you have not backfilled the new table.

Summary

In this guide, you:

  • Defined unkeyed and keyed tables, and initialized tables with data or empty schemas.

  • Saved and loaded tables as single files, in text formats, or with set/get.

  • Stored data in splayed tables for medium-sized datasets, including symbol enumeration and column attributes.

  • Used partitioned tables for large and growing datasets, and reloaded them with \l to access partitions and virtual columns.

  • Distributed partitions across multiple devices using segmented databases.

  • Linked columns between tables for efficient lookups in on-disk data

  • Applied .Q.chk to fill missing tables with empty prototypes and .Q.bv to build virtual prototypes at runtime.

You now have the essential skills to create, save, and manage data in kdb+ across multiple storage formats and database structures.