Table

This page explains the concepts of table (splayed and partitioned), columnar storage, and memory mapping in KDB.AI.

In KDB.AI, tables are the main structures for storing and organizing data. They come in two types, each suited to different use cases and performance requirements.

Table types

Below is an overview of the types of tables supported in KDB.AI: splayed and partitioned:

Table Type

Splayed

Partitioned

Description

Stores each column as a separate file, improving performance for large datasets.

A splayed table further divided into partitions based on a column with special types like date, month, or year.

Use cases

Best for medium-sized tables (up to 100 million rows) where queries often access a subset of columns.

Suitable for very large datasets (over 100 million rows) or when data grows over time and queries can be limited to specific partitions.

Additional information

Requires more disk space and management but offers faster reading times for specific columns.

Enhances performance by reducing the amount of data scanned during queries.

Splayed tables

Splayed tables store each column in a separate file. This structure allows for more efficient queries, especially when only a subset of columns is needed. Splayed tables are particularly useful for medium-sized datasets where the overhead of managing multiple files is outweighed by the performance benefits.

Partitioned tables

Partitioned tables take the concept of splayed tables further by dividing the data into partitions based on a specific column, such as date. Partitioning allows for even more efficient queries by limiting the data scanned to relevant partitions. Partitioned tables are ideal for very large datasets or datasets that grow over time, such as time-series data.

Examples in q:

Splayed table

Partitioned table

q

Copy
// Example of creating a splayed table for trade data
trades: ([] 
date: `date$(), 
time: `time$(), 
sym: `symbol$(), 
price: `real$(), 
size: `int$(), 
cond: `char$()
)

// Insert sample data
`trades insert (2024.11.01; 10:03:54.347; `AAPL; 150.25; 1000; "N")
`trades insert (2024.11.01; 10:04:05.827; `GOOG; 2750.50; 500; "B")

// Save the table as splayed
`:trades/ set trades

q

Copy
// Example of creating a partitioned table for trade data
trades: ([] 
date: `date$(), 
time: `time$(), 
sym: `symbol$(), 
price: `real$(), 
size: `int$(), 
cond: `char$()
)

// Insert sample data
`trades insert (2024.11.01; 10:03:54.347; `AAPL; 150.25; 1000; "N")
`trades insert (2024.11.01; 10:04:05.827; `GOOG; 2750.50; 500; "B")

// Save the table as partitioned by date
`:trades/2024.11.01/ set trades

Columnar storage

Columnar storage is a method of storing tables by column rather than by row. This approach significantly improves the performance of read-heavy operations, such as analytical queries, because it allows for more efficient data compression and faster access to the relevant columns. In KDB.AI, columnar storage is particularly beneficial for large datasets where only a few columns are queried at a time.

Memory mapping

Memory mapping is a technique used to access data stored on disk as if it were in memory. This method allows KDB.AI to handle large datasets efficiently by mapping files directly into the virtual memory space of a process. Memory mapping reduces the overhead of copying data between disk and memory, leading to faster data retrieval and improved performance.

Next steps