Query Configuration

This page explains how to configure and scale query processing components in kdb Insights Enterpriseto optimize performance and resource utilization.

The kdb Insights Enterprisedatabase uses Data Access Processes (DAPs) to serve data for queries. DAPs are split into tiers based on the age of the data. Tiers are typically split into a real-time database (RDB), an intra-day database (IDB), and a historical database (HDB). DAPs are configured under a daps key of the shard file within a package. The data of a given tier is maintained by the mount or mounts for that tier. Different scaling configurations are possible depending on how many tiers are provided to a DAP.

Tip

DAPs are accessed using the routing layer. Routing configuration is set at install time across all packages.

Tip

This guide discusses configuration using YAML files. You can also configure your system using the Web Interface

Configuration

Configuration for the Data Access Process is nested under a daps key within the shard file.

YAML

Copy
# Other fields ...
daps:
  instances:
    da:
      mountList: [rdb, idb, hdb]

name

type

required

description

mountName

string

No

References the name of the mount this DAP uses to surface data.

 

Either mountName or mountList (but not both) must be provided depending on the desired scaling mode.

 

Providing mountName uses single mount mode.

mountList

string[]

No

References a set of mounts this DAP uses to surface data.

 

Either mountList or mountName (but not both) must be provided depending on the desired scaling mode.

 

Providing mountList uses multiple mount mode.

pctMemThreshold

float

No

This threshold limits the amount of memory that is used before the DAP triggers a cache flush.

 

This value is a decimal value between 0 and 1, and is multiplied by the memory limit of the DAP (which is determined, in order of priority, from the -w parameter, the KXI_MEM_LIMIT or 2 times the KXI_MEM_REQUEST environment variable, or the cgroup memory limit) to find the effective memory threshold.

 

When this value is exceeded, the DAP enters low memory mode until the next writedown interval completes. See best practices.

Scaling configurations

DAPs can be configured in two different modes depending on the anticipated query requirements. DAPs can either be configured to scale independently per data tier, or uniformly across all data tiers.

Scaling independently

Scaling independently means that you can have more RDBs than IDBs or HDBs, or vice versa. This allows you to tailor your setup to match the anticipated query distribution across the data tiers to maximize query throughput. Scaling independently means that each tier will consume its own set of resources and will run its own container. To use this mode, configure your DAP with the mountName configuration option.

YAML

Copy
daps:
  instances:
    rdb:
      mountName: rdb
    idb:
      mountName: idb
    hdb:
      mountName: hdb

Scaling Uniformly

To share container resources, you can scale your DAPs uniformly in a single container. This mode is typically referred to as single DAP mode. In this mode, RDBs, IDBs and HDBs are all within a single container.

In this mode, adding another instance adds another copy of all configured tiers. To use this mode, configure your DAP with the mountList configuration option.

YAML

Copy
daps:
  instances:
    db:
      mountList: [rdb, idb, hdb]

Environment variables

Advanced configuration can be supplied to a DAP using environment variables. Environment variables are configured differently depending on the method of deployment. In all cases, the variables are always string values.

Package

User Interface

In a package, environment variables have to be set for the daps.instances.<name> element. Environment variables are supplied under the env as a list of objects where each is a pair of name and value.

YAML

Copy
daps:
  instances:
    da:
      env:
        - name: KXI_NAME
          value: "da"

name

description

KXI_NAME

Process name.

KXI_SC

Service class for data access (for example, RDB, IDB, HDB). Must match value in daps.instances of the shard file.

KXI_ASSEMBLY_FILE

Assembly YAML file (for internal use only).

KXI_PORT

Port.

KXI_CUSTOM_FILE

File containing custom code to load in DA processes.

KXI_DAP_SANDBOX

Whether this DAP is a query environment (default: false).

KXI_SBX_MAX_ROWS

Maximum number of rows, per partitioned table, to store in memory.

KXI_ALLOWED_SBX_APIS

Comma-delimited list of query environment APIs to allow in non-query-environment DAPs (e.g. .kxi.sql,.kxi.qsql,.kxi.sql2).

KXI_DA_RELOAD_STAGGER

Time in seconds between DAPs of the same class reloading after an EOX (default: 30).

KXI_DA_USE_REAPER

Whether to use KX Reaper and object storage cache (default: false).

KXI_HB_FREQ

Time in milliseconds to run the heartbeat to connected processes (default: 30000).

KXI_HB_TOL

Number of heartbeat intervals a process can miss before being disconnected (default: 3).

KXI_GC_FREQ

Frequency in milliseconds to run garbage collect in a timer (default: 600000, set to 0 to disable).

KXI_ENABLE_FLUSH

Set to "true" to enable async flush on messages from DA to Agg (default false).

KXI_RT_EVENT_FATAL

If "true", RT badtail and badmsg events are treated as fatal; SM crashes and ingestion stops. If "false" or unspecified, events are logged but ingestion continues. Note that reset events are never treated as fatal.

KXI_SG_RC_ADDR

A URL address for an explicit Resource Coordinator for this specific DAP instance to connect to. This must be a fully qualified host name and port. If not specified, the DAP will fallback to Kubernetes label discovery.

KX_OBJSTR_INVENTORY_FILE

Set to path relative to the root of the bucket to use an inventory file.

KXI_LATE_DATA

If "true", DAP runs with late data mode on. Takes precedence over daps.instances.*.lateData setting.

KXI_MAX_CONN_RETRY

Number of connection retry attempts to perform before restarting the process. (default: 20).

KXI_MEM_LIMIT

If set and -w is not set, determines the memory limit used in the calculation of the threshold for triggering an emergency EOI (see the pctMemThreshold configuration setting).

KXI_MEM_REQUEST

If set and neither -w nor KXI_MEM_LIMIT are set, determines the memory limit used in the calculation of the threshold for triggering an emergency EOI (see the pctMemThreshold configuration setting). The threshold is based on 2 times the value of the variable.

In addition, the following environment variables apply to both the sidecar and DAP images.

name

container

description

KXI_CONFIG_FILE

sidecar

Metrics configuration file.

KXI_LOG_FORMAT

ALL

Log message format.

KXI_LOG_DEST

ALL

Log endpoints.

KXI_LOG_LEVELS

ALL

Component routing.

KXI_LOG_CONFIG

ALL

Alternative logging configuration: replaces KXI_LOG_FORMAT, KXI_LOG_DEST, and KXI_LOG_LEVELS.

In kdb Insights Enterprise, variables can be supplied in the web interface under the advanced query settings option.

name

description

KXI_NAME

Process name.

KXI_SC

Service class for data access (e.g. RDB, IDB, HDB). Must match value in daps.instances of the shard file.

KXI_ASSEMBLY_FILE

Assembly YAML file (for internal use only).

KXI_PORT

Port.

KXI_CUSTOM_FILE

File containing custom code to load in DA processes.

KXI_DAP_SANDBOX

Whether this DAP is a query environment (default: false).

KXI_SBX_MAX_ROWS

Maximum number of rows, per partitioned table, to store in memory.

KXI_ALLOWED_SBX_APIS

Comma-delimited list of query environment APIs to allow in non-query-environment DAPs (e.g. .kxi.sql,.kxi.qsql,.kxi.sql2).

KXI_DA_RELOAD_STAGGER

Time in seconds between DAPs of the same class reloading after an EOX (default: 30).

KXI_DA_USE_REAPER

Whether to use KX Reaper and object storage cache (default: false).

KXI_HB_FREQ

Time in milliseconds to run the heartbeat to connected processes (default: 30000).

KXI_HB_TOL

Number of heartbeat intervals a process can miss before being disconnected (default: 3).

KXI_GC_FREQ

Frequency in milliseconds to run garbage collect in a timer (default: 600000, set to 0 to disable).

KXI_ENABLE_FLUSH

Set to "true" to enable async flush on messages from DA to Agg (default false).

KXI_RT_EVENT_FATAL

If "true", RT badtail and badmsg events are treated as fatal; SM crashes and ingestion stops. If "false" or unspecified, events are logged but ingestion continues. Note that reset events are never treated as fatal.

KXI_SG_RC_ADDR

A URL address for an explicit Resource Coordinator for this specific DAP instance to connect to. This must be a fully qualified host name and port. If not specified, the DAP will fallback to Kubernetes label discovery.

KX_OBJSTR_INVENTORY_FILE

Set to path relative to the root of the bucket to use an inventory file.

KXI_LATE_DATA

If "true", DAP will run with late data mode on. Takes precedence over daps.instances.*.lateData setting.

KXI_MAX_CONN_RETRY

Number of connection retry attempts to perform before restarting the process. (default: 20).

KXI_MEM_LIMIT

If set and -w is not set, determines the memory limit used in the calculation of the threshold for triggering an emergency EOI (see the pctMemThreshold configuration setting).

KXI_MEM_REQUEST

If set and neither -w nor KXI_MEM_LIMIT are set, determines the memory limit used in the calculation of the threshold for triggering an emergency EOI (see the pctMemThreshold configuration setting). The threshold is based on 2 times the value of the variable.

In addition, the following environment variables apply to both the sidecar and DAP images.

name

container

description

KXI_CONFIG_FILE

sidecar

Metrics configuration file.

KXI_LOG_FORMAT

ALL

Log message format.

KXI_LOG_DEST

ALL

Log endpoints.

KXI_LOG_LEVELS

ALL

Component routing.

KXI_LOG_CONFIG

ALL

Alternative logging configuration: replaces KXI_LOG_FORMAT, KXI_LOG_DEST, and KXI_LOG_LEVELS.

Query size limitations

IPC queries routing through the Service Gateway using the SQL, getData or user defined analytics (UDAs) have no limitations on size when using version 6 of the q IPC Protocol.

In the response path, queries are streamed through the Gateway when the response size exceeds KXI_SG_STREAM_THRESHOLD bytes.

For RESTful queries, responses are not streamed to the client, the results will be uncompressed, and limited to 2Gb.