How to Work with Data
This guide shows you how to create, save, load, and manage data in kdb+.
In this guide, you will learn how to:
-
Create in-memory tables, both keyed and unkeyed
-
Save and load tables in different formats (single file, text, splayed, partitioned, segmented)
-
Understand how q stores and handles datatypes such as
timestamp
,symbol
,long
, andfloat
-
Use linked columns to reference values between tables
-
Apply database utilities in the
.Q
namespace to manage partitions and missing tables
By the end, you will know how to persist tables to disk, manage large datasets efficiently, and prepare data structures that support growth over time.
Overview
Create table
Note: Datatypes
The examples in this section use four basic datatypes: timestamp
(nanosecond precision), symbol
(internal character data), long
(8-byte signed integer), and float
(double precision floating point number). For a full list, including specific types for temporal data, refer to q datatypes.
Define a table using the following syntax:
q
tab:([]colexp1;..;colexpN)
To create a keyed table, specify one or more column expressions within the square brackets:
q
keytab:([keyexp1;..;keyexpM]colexp1;..;colexpN)
Initialize a table with data by defining columns as lists in the keyexp
or colexp
expressions:
q
trade:([]time:asc 1000?.z.P;sym:1000?`AAPL`IBM`MSFT;size:100*1+1000?10;price:1000?100f)
Alternatively, create an empty schema by defining columns as (optionally typed) empty lists:
q
trade:([]time:`timestamp$();sym:`symbol$();size:`long$();price:`float$())
Note: Character data
q stores character-based data (strings) in two main datatypes: symbols and character vectors. q stores symbols in an internal hash table and represents each instance as a reference to that table. This design speeds up comparisons (no string comparison is required) and reduces the memory and disk space needed to store repeated values.
However, because the symbol pool never shrinks in memory or on disk, avoid using symbols for non-repeating data. Use symbols for values that repeat, such as stock tickers or product IDs. For values that do not repeat (like order IDs or freeform text), store them as nested lists of character vectors.
Save data
The tables you created in the examples above live only in memory. They exist for the lifetime of the process that created them. To persist them, you need to save them to disk.
You can choose from three on-disk formats. The right format depends on the size of the table and how you expect it to grow in the future:
-
Single file – Use this for smaller tables. When you load this type of table, q loads the entire table into memory. Only use this format if the available memory can hold the full table.
-
Splayed table – Use this for medium-sized tables, typically up to around 100 million rows, when you do not expect the table to grow significantly larger.
-
Partitioned table – Use this for the largest tables, typically 100's of millions of rows, especially when you expect the table to grow over time.
Single files
The simplest way to save a table is as a single file. Use the save
keyword with the table name as the argument:
q
q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104)
q)save`cities
`:cities
You can then load the table into a new process using the load
keyword:
q
q)cities / not present yet
'cities
[0] cities
q)load`cities
q)cities
city pop
-----------------
Tokyo 37435191
Delhi 29399141
Shanghai 26317104
Save
in text formats
The save
keyword also supports text formats such as .csv
, tab-delimited
, or .xls
. Add the appropriate file extension to the argument to save
:
q
q)save`cities.csv
`:cities.csv
q)read0`:cities.csv / read back the saved text
"city,pop"
"Tokyo,37435191"
"Delhi,29399141"
"Shanghai,26317104"
Instead of save
and load
, you can also use the more general set
and get
.
-
set
takes 2 arguments: the first argument is the name of the file to save to (as a handle), the second is the object to save; -
get
takes 1 argument: the name of the file to load as a handle, the result is returned and can be separately assigned to a variable
q
q)`:foo/bar/bigcities set cities
`:foo/bar/bigcities
q)newcities:get`:foo/bar/bigcities
q)newcities
city pop
-----------------
Tokyo 37435191
Delhi 29399141
Shanghai 26317104
Splayed tables
The second on-disk table format is a splayed table, which you typically use for medium-sized tables (around 100 million rows). In this format, q stores each column in a separate file, rather than storing the entire table in a single file. q infers the table name from the name of the directory that contains these files.
Save a splayed table by using the set
keyword with a trailing slash (/
) to indicate that you want to save the data to a directory. For example:
q
q)n:1000000
q)trade:`sym`time xasc ([]time:asc n?.z.P;sym:n?`AAPL`IBM`MSDT;size:100*1+n?10;price:n?100f)
q)`:trade/ set select time, size, price from trade
`:trade/
q)key`:trade/ / list the directory contents
`.d`price`size`time
q)select from `:trade/ / we can select directory from the directory
In this example, you will also notice an additional hidden file named .d
inside the directory. This file contains the list of columns in the table in the order they should appear.
Enumerate symbol columns
In the example above, q did not save the sym column because it is of type symbol
and you must enumerate this type of column before saving it to a splayed or partitioned table. The function .Q.en
enumerates symbol columns against a vector called sym
and saves the enumeration on disk.
q
q)`:trade/ set trade / error as there are unenumerated symbol type columns
'type
[0] `:trade/ set trade
q)`:trade/ set .Q.en[`:.]trade / enumerate using .Q.en and save as before
Set attributes
You can set attributes such as s
(sorted), u
(unique), p
(parted), or g
(grouped) on a column of a splayed table on disk. Use amend
to apply these attributes. For example:
q
q)@[`:trade,`sym;`p#]
Note: Important Limitation
You can save only unkeyed tables as splayed or partitioned tables.
Partitioned tables
A partitioned table is a splayed table that you split further into groups of rows with the same value in a partition field. In q, you can use date
, month
, year
, or int
as partition fields.
Saving data in partitioned tables works the same way as with splayed tables, the difference being that each partition (splay) is saved in a subdirectory named after the partition value. For example, the following commands save two copies of the trade
table in separate partitions:
q
/ save 2 copies of trade as separate partitions
q)`:./2025.06.01/trade/ set .Q.en[`:.]trade
`:./2025.06.01/trade/
q)`:./2025.06.02/trade/ set .Q.en[`:.]trade
`:./2025.06.02/trade/
This results in a directory structure like:
q
├── 2025.06.01
│ └── trade
│ ├── .d
│ ├── price
│ ├── size
│ ├── sym
│ └── time
├── 2025.06.02
│ └── trade
│ ├── .d
│ ├── price
│ ├── size
│ ├── sym
│ └── time
└── sym
Load the database with \l
to see the partitions:
q
q)\l . / reload the database to see the changes
q)date
2025.06.01 2025.06.02
q)trade
date time sym size price
------------------------------------------------------------
2025.06.01 2000.01.01D00:21:48.836608570 AAPL 100 9.055363
2025.06.01 2000.01.01D00:25:37.859626281 AAPL 500 72.43675
2025.06.01 2000.01.01D00:51:29.565415399 AAPL 1000 58.5308
2025.06.01 2000.01.01D01:27:07.862021939 AAPL 400 32.52928
2025.06.01 2000.01.01D02:40:42.542642756 AAPL 300 23.25753
2025.06.01 2000.01.01D03:01:13.541362954 AAPL 500 30.259
2025.06.01 2000.01.01D03:10:51.150885564 AAPL 500 20.77152
2025.06.01 2000.01.01D03:31:25.704701625 AAPL 500 99.97752
2025.06.01 2000.01.01D03:34:46.848283357 AAPL 500 86.45456
2025.06.01 2000.01.01D03:48:53.148209099 AAPL 300 42.4489
2025.06.01 2000.01.01D04:36:16.289348089 AAPL 400 21.32947
2025.06.01 2000.01.01D05:30:42.486894795 AAPL 600 75.78067
2025.06.01 2000.01.01D06:09:21.906280047 AAPL 200 88.49448
2025.06.01 2000.01.01D06:34:39.370882693 AAPL 100 48.22444
..
Note: Date columns in partitioned tables
When you reload a partitioned database, q creates a global variable date that lists the partitions. The trade
table also includes a virtual date
column. q materializes the values in this column only when you query them.
Segmented databases
You can distribute partitioned tables across multiple storage devices or directory locations to:
-
give them more space
-
support parallelization
-
reduce the size of individual partitions, potentially reducing database ingest, and maintenance time.
At the top level, a segmented database contains only:
-
single file tables and objects
-
enumeration files
-
splayed tables
-
a text file
par.txt
that lists the directories containing the data for the partitions in the database.
Here’s the structure of a segmented database with 2 partitions split across 2 segments, each containing a single table:
q
├── db
│ ├── sym
│ └── par.txt
├── 1
│ └── 2025.06.01
│ └── trade
│ ├── time
│ ├── sym
│ ├── size
│ ├── price
│ └── .d
└── 0
└── 2025.06.02
└── trade
├── time
├── sym
├── size
├── price
└── .d
The root of this database is db
. The directories 0
and 1
are the segments. The par.txt
file contains paths to the segments, either absolute or relative to its location.
q
../0
../1
Save data to segmented databases
Saving data to a segmented database works much like saving to a non-segmented one. The difference is that you must distribute the data across the available segments.
q provides the function .Q.par
to round-robin partitions over the segments listed in par.txt
. For example, using the par.txt
file above:
q
q).Q.par[`:db;2025.06.03;`$"trade/"]
`:../1/2025.06.03/trade/
q).Q.par[`:db;2025.06.04;`$"trade/"]
`:../0/2025.06.04/trade/
.Q.par
simply returns a directory location. You must still write the data to this location using set
, just as you do with partitioned tables:
q
q).Q.par[`:db;2025.06.03;`$"trade/"]set .Q.en[`:db]trade
`:../1/2025.06.03/trade/
q).Q.par[`:db;2025.06.04;`$"trade/"]set .Q.en[`:db]trade
`:../0/2025.06.04/trade/
Linked columns
Linked columns store indices into another table. This design lets you reference values from the second table easily when you write queries. Linked columns do not enforce referential integrity — they simply act as indices into a second table along with a label. For example:
q
q)cities:([]city:`Tokyo`Delhi`Shanghai;pop:37435191 29399141 26317104;country:`japan`india`china)
q)countries:([]country:`india`japan`china;pop:1463865525 123103479 1416096094)
q)update country:`countries!countries.country?country from `cities;
q)select city, pop, country.country, countrypop:country.pop from cities
Linked columns play an important role in on-disk data. q does not require the target table of a link to be a keyed table, and in fact you cannot key splayed or partitioned tables.
Database utilities
Each partitioned table is expected to appear in every partition of a database, even when some partitions contain no data. If a table is missing from a partition, q may return errors. This situation often arises when you add a new table but have not yet backfilled it across the database. q provides utilities in the .Q
namespace to help manage this.
.Q.chk
fill database
Use .Q.chk to scan the database for tables that are missing from partitions. The function takes a handle to the database as its single argument.
.Q.chk
checks the most recent partition to determine the complete list of tables. When it finds a missing table in other partitions, it places an empty prototype in that partition.
Because .Q.chk
writes to the database, you must run it as a user with write permissions to the database directory on disk.
.Q.bv
build virtual prototypes
Use .Q.bv
to scan for missing tables without writing to disk. Instead, it builds a map of missing table/partition pairs and creates an empty prototype for each missing table.
At runtime, q supplies these prototypes to queries that attempt to access missing data. This lets you continue querying the database without errors, even if you have not backfilled the new table.
Summary
In this guide, you:
-
Defined unkeyed and keyed tables, and initialized tables with data or empty schemas.
-
Saved and loaded tables as single files, in text formats, or with
set
/get
. -
Stored data in splayed tables for medium-sized datasets, including symbol enumeration and column attributes.
-
Used partitioned tables for large and growing datasets, and reloaded them with
\l
to access partitions and virtual columns. -
Distributed partitions across multiple devices using segmented databases.
-
Linked columns between tables for efficient lookups in on-disk data
-
Applied
.Q.chk
to fill missing tables with empty prototypes and.Q.bv
to build virtual prototypes at runtime.
You now have the essential skills to create, save, and manage data in kdb+ across multiple storage formats and database structures.