Object Storage

This page explains what the Object Storage module is and how it works on KDB-XDB-X.

The Object Storage (objstor) module allows KDB-X processes to read data directly from cloud storage as if they were local files.

It provides a compute–storage separation, enabling scalable query access to large datasets without keeping them on block storage.

Use cases

Common use cases for the Object Storage module include:

  • Storing historical HDB data in cloud storage while retaining query access.

  • Reducing storage costs by offloading infrequently accessed data.

  • Disaster recovery and archival.

  • Hybrid deployments: combining block storage with cloud object storage.

Cloud providers

KX support major clouds providers like:

  • Amazon Web Services (AWS)

  • Microsoft Azure

  • Google Cloud Provider (GCP)

S3-compatible stores

In addition to the major cloud providers, the module supports S3-compatible object stores such as MinIO.

These can be used for testing locally or in hybrid deployments.

Refer to the examples in the quickstart attached below for a complete MinIO walkthrough.

Key concepts

Understanding these concepts helps you work effectively with object storage in KDB-X.

  • URI prefixes: :s3://, :ms://, :gs://

  • Authentication: Integrated with kurl (or inline credentials in simple cases)

  • Inventory files: JSON describing bucket contents for faster metadata loading

  • Hybrid HDBs: combine cloud and block storage paths in par.txt

Read only

The objstor module is read-only. The objects must be created using the cloud vendor’s standard CLI tooling to copy data from block storage to the cloud. For example:

Amazon Web Services (AWS)

 

Copy
aws s3 cp "/path/to/file.txt" s3://kxs-prd-cxt-twg-roinsightsdemo/kxinsights-marketplace-data/ --recursive

Microsoft Azure

 

Copy
azcopy cp "/path/to/file.txt" "https://[account].blob.core.windows.net/[container]/[path/to/blob]"

Google Cloud Platform (GCP)

 

Copy
gsutil cp -r "/path/to/file.txt" gs://kxinsights-marketplace-data/

Metadata in object storage

Metadata is important because queries can't run until metadata is loaded.

When querying data in cloud object storage, metadata refers to the list of objects (keys) within a bucket, along with their sizes.

This metadata must be loaded before queries can run, since it tells KDB-X which partitions and files are available.

How metadata is loaded

  • On first access, the module retrieves and caches the list of all keys in the bucket.

  • This cache remains in memory to avoid repeated lookups.

  • To force a refresh of the metadata, append /_ to the bucket path:

    q

    Copy
    / AWS
    q)key`:s3://mybucket/_
    / Azure
    q)key`:ms://mycontainer/_
    / GCP
    q)key`:gs://mybucket/_

Performance

Compared to block storage, cloud object storage has higher latency and lower bandwidth. This means queries can be slower, especially when scanning large buckets. KDB-X provides several features like caching, secondary threads, compression, and inventory files to help mitigate these challenges.

Next steps

Check out the Quickstart guide for more details on how to get started with the object storage module: