kdb Insights Grafana Dashboard Reference

kdb Insights Grafana dashboards provide visualizations to help you monitor the performance and status of a kdb Insights system. The dashboards can be automatically deployed alongside each kdb Insights Enterprise instance.

Getting started

  1. Go to the Grafana homepage, select the Toggle menu and choose Dashboards.

  2. The list of folders includes the namespace into which kdb Insights Enterprise is deployed. The following Dashboards are available within the namespace folder:

Note

When databases are referenced here and in the dashboards, this refers to both assemblies deployed via the kdb Insights CLI and databases created from the kdb Insights Enterprise UI.

kdb Insights logging summary

Panels shown on this dashboard provide the ability to drill down into the log messages to identify issues. The dashboard displays the number of log messages by type and then allows you to group those messages using any label that Kubernetes uses to distinguish the components logging the messages. For example, you could choose to just display the error messages and group them by database to see which databases are raising error messages, then you can filter on a specific database and see a list of messages for the database you have chosen.

Variables

At the top of the dashboard there is a set of variables that allows you to filter some of the panels to show messages that have particular properties.

variable

description

Log Status

The log status filters the panels in the second and third row. The states are: FATAL, ERROR, WARN, INFO, DEBUG and TRACE.

Group by

Group the second row by any label associated with the messages. This allows you to find components that are logging the messages. For example choosing insights_kx_com_app groups the messages by database.

Include Messages

Filter all rows to only include messages with the specific text included in the message

+

Built in Grafana option to filter on any label that is included in the messages

Log messages by type

Count of the number of messages per type. On the left there is the total number of messages per type in the selected time range. On the right is a line chart showing the total number of messages per type over time.

Note

The 'Include Messages' and '+' variables are the only ones that filter this row.

Log type details

Count of the number of messages grouped by the selected View by label. On the left there is the count of the number of messages per label value across the selected time range. On the right is a line chart showing the number of messages per label value over time.

This allows you to find which databases or components are logging the errors and warnings to assist you in determining the root cause of an issue.

Note

All filters are applied to this row.

Messages

Detailed list of all the messages that match the variables selected.

Click on the '>' error to drill into a specific log message.

Note

All filters are applied to this row.

Kubernetes Events

List of Kubernetes events raised in the namespace in the time range.

metrics

description

Time

Time of event

Reason

Reason for the event

Object

Object raising the event

Message

Message details

Note

The 'Include Messages' filter is the only variable that is applied to this row.

kdb Insights Enterprise Database

This dashboard is intended to assist in monitoring the CPU, memory and disk of each database as well as giving details on the logs and alerts associated with the whole namespace.

Variables

At the top of the dashboard there is a set of variables that allows you to filter some of the panels to show messages that have particular properties.

variable

description

Database

A list of all deployed databases. Filters the panels in all except the first row.

Filters +

Built in Grafana option to filter on any label that is included in the records.

Alerts and logs summary

This row shows a high level overview of all the alerts and logs for the whole namespace. This allows you to view information from for all databases and components that are shared between the databases, for example, the Service Gateway and kdb Insights CLI.

panels

description

Critical Alerts

Total number of critical alerts that have occurred in the time range

Warning Alerts

Total number of warning alerts that have occurred in the time range

Info Alerts

Total number of information alerts that have occurred in the time range

Logs

Total number of log messages per type that have occurred in the time range

Alerts

Detailed list of all the messages that match the variables selected. Click on the '>' error to drill into a specific alert.

Database Status

Status of each database including Ready and NotReady. If the database is not ready, a reason is included

Overview

This row shows a high-level overview of the database selected in the Database variable above.

panels

description

HDB Size

Current size of the HDB

Stream Ingestion

Rate of ingestion of data into each stream associated with the database

Pods CPU above Requested

Number of pods with CPU above their requested values *

Pods CPU above Limit

Number of pods with CPU above their limit values *

Memory CPU above Requested

Number of pods with memory above their requested values *

Memory CPU above Limit

Number of pods with memory above their limit values *

*On the dashboard,the CPU and Memory rows provide details of each pod that has breached these limits.

CPU

This row shows the CPU details of each pod in the selected database and a chart that is populated with the details over time for the selected pod. To select a pod, click on the pod name in the grid.

metrics

description

color thresholds

CPU Usage

CPU utilization in seconds

 

CPU Requested

CPU seconds requested

 

CPU Req %

Percentage of requested CPU currently being used

Yellow: 80% // Orange: 90% // Red: 100%

CPU Limit

CPU seconds limit

 

CPU Limit %

Percentage of requested CPU currently being used

Yellow: 80% // Orange: 90% // Red: 100%

Memory

This row shows the memory details of each pod in the selected database and a chart that is populated with the details of the selected pod over time. To select a pod, click on the pod name in the grid.

metrics

description

color thresholds

Memory Usage (MB)

Memory utilization in MBs

 

Memory Requested (MB)

Memory requested in MBs

 

Memory Req (%)

Percentage of requested memory currently being used

Yellow: 80% // Orange: 90% // Red: 100%

Memory Limit (MB)

Memory limit in MBs

 

Memory Limit (%)

Percentage of memory limit currently being used

Yellow: 80% // Orange: 90% // Red: 100%

Disk

This row shows the persistent volume claim (PVC) disk usage of each PVC in the selected database and a chart that is populated with the details of the selected PVC over time. To select a PVC, click on the PVC name in the grid.

metrics

description

color thresholds

PVC (GB)

PVC size

PVC Used (GB)

Amount of the PVC being used

Used %

Percentage of the PVC being used

Yellow: 80% // Orange: 90% // Red: 100%

1 Day Growth (GB)

Growth in the last 24 hours

2 Day Growth (GB)

Growth in the last 48 hours

kdb Insights detail

This dashboard is intended to assist in monitoring the whole of your kdb Insights deployment. It provides in depth details on the components, and gives information about the logs and alerts associated with the namespace.

Alerts

This row shows all the alerts raised in the whole namespace.

panels

description

Critical Alerts

Total number of critical alerts that have occurred in the time range

Warning Alerts

Total number of warning alerts that have occurred in the time range

Info Alerts

Total number of information alerts that have occurred in the time range

Alerts

Detailed list of all the alerts. Click on the '>' error to drill into a specific alert. The alerts list is ordered alphabetically.

Base infrastructure

This row shows general information about the status of the databases and pods.

Deployment status

This panel provides a list of databases / assemblies and reasons why they are not ready. Each query environment has its own record.

metrics

description

Database

Name of database

Ready

true if the database is ready

Not Ready

true if the database is not ready

Reason

The reason the database is not ready

License status

This panel allows you to see if any of your pod licenses are expiring.

metrics

description

Pod

Pod linked to the license

Process Cores

Number of CPU cores running in the cluster

Release Date

Date when the license was issued

Release Version

Version the license was released on

License Expiry

Date when the license expires

StatefulSet status

StatefulSets are workload API objects used to manage stateful applications. They manage the deployment and scaling of a set of pods that are based on an identical container and provides guarantees about the ordering and uniqueness of these Pods.

This panel shows StatefulSets that may not have all the requested replicas available.

metrics

description

StatefulSet

Name of the StatefulSet

Requested

The number of replicas requested

Available

The number of replicas available

Deployment status

Deployments provide declarative updates for pods and ReplicaSets.

This panel shows deployments that may not have all the requested replicas available.

metrics

description

Deployment

Name of the resource object responsible for keeping a set of pods running

Requested

The number of replicas requested

Available

The number of replicas available

Pods not available

This panel shows details of all the pods that are not available and the reason.

metrics

description

Pod

Pod identifier name

Ready

Readiness of the pod. 0 means the pod is not ready.

Restarts

Number of times the pod has restarted, trying to successfully become ready.

Reason 1

Short summary on the reason why the pod is not available

Reason 2

Detailed technical reason why the pod is not available

Persistent volume claim usage

This panel shows details of all disk usage for all PVCs.

metrics

description

PVC

Name of the persistent volume claim

Used (GB)

Disk space used

Capacity (GB)

Disk space available

Used (%)

Percentage of the disk space used

Ingest

This row shows details of each pod involved in data ingestion and how much data they are processing.

RT Services

This panel shows details of the messages being ingested by each RT pod.

metrics

description

RT Pod

Name of the specific reliable transport pod

Leader

Leadership status of the pod, there should always be one leader per RT service

Node Index

The node index from the hostname

In Msg/s

Incoming messages per second. *

Message Queue Size

Number of messages in the queue *

In Bytes/s

Incoming bytes per second *

  • These metrics are only recorded for the leader node

RT Publishers Messages In

This panel shows details of the messages being ingested by each RT pod per publisher.

metrics

description

RT Pod

Name of the specific reliable transport pod

Publisher

Name of the directory the publisher is publishing to

In Bytes/s

Incoming bytes per second from the publisher

RT Publishers Messages Out

This panel shows details of the messages being sent by each RT pod to each subscriber.

metrics

description

RT Pod

Name of the specific reliable transport pod

Publisher

Name of the directory the subscriber is subscribing to

Out Msg/s

Outgoing messages per second to the subscriber

DAP ingest

This panel shows details of the DAPs including their purview time range, their ingestion rate and how many records they retain after a purge.

metrics

description

Pod

DAP pod identifier

Instance Type

Data Access Processor type of instance (rdb, idb, hdb)

Purview Start

Start timestamp of Data Access Purview

Purview End

End timestamp of Data Access Purview

Records/s

Inbound records received by the Data Access Processor per second

Stream Pos

Current subscriber stream position

Records Post Purge

Number of records left in the Data Access Processor after purge

Storage Manager ingest

This panel shows details of the Storage Manager clients, ingestion and EOI and EOD status.

metrics

description

Pod

Storage Manager pod identifier

Connected Clients

Number of connected clients

Stream Records

Number of records held by the stream

Stream Msgs

Number of messages streamed by the stream

EOI Stream position

End of interval stream position

EODs Pending

Number of end of day requests pending

Data persistence

This row shows details of each pod storing symbols and the symbol growth rate.

Symbols

metrics

description

Pod

Pod identifier name

Symbols

Number of symbols for the component container

Sym growth (1d)

Daily growth of symbols for the component container

Sym growth (7d)

Weekly growth of symbols for the component container

EOI by shard

metrics

description

Pod

Pod identifier name

Last EOI duration (s)

Number of seconds the last end of interval lasted

Last EOI records written

Number of records written during the last end of interval

Pending EOIs

Number of EOI requests awaiting completion

EOD by shard

metrics

description

Pod

Pod identifier name

Last EOD duration (s)

Number of seconds the last end of day lasted

Last EOD records written

Number of records written into hdb at end of day

HDB Partitions

Number of partitions in the historical database

HDB Size (MB)

Size in MB of the historical database

Pending EODs

Amount of EOD requests awaiting completion

Query

Gateway query status

metrics

description

Pod

Pod identifier name

Service

Service identifier name

Pending Queries

Number of pending queries (Both HTTP/IPC)

IPC Requests/s

Number of incoming IPC requests per second

Connected Clients

Number of connected clients

Connected Aggs

Number of connected aggregators

Connected DAPs

Number of connected Data Access Processors

Resource coordinator query status

metrics

description

Service

Service identifier name

Pod

Pod identifier name

Queue size

Length of the outstanding request queue

Avg Response (ms)

Average response time in milliseconds

Requests/s

Number of incoming requests per second

Success Query/s

Number of successful queries per second

Retry Rate/s

Number of retries per second

Connected Aggs

Number of connected Aggregators

Connected DAPs

Number of connected Data Access Processes

Agg Query Status

metrics

description

Pod

Pod identifier name

Request/s

Number of incoming requests per second

Errors/s

Number of errors received per second

Timeouts/s

Number of timeouts per second

Active Queries

Number of queries being executed now

Avg Response (ms)

Average response time in milliseconds

DAP Request Status

metrics

description

Pod

Pod identifier name

Endpoint

Database type where the Data Access Process is pointing at

Success Query/s

Number of successful queries per second

Failed Query/s

Number of failed queries per second

Failure (%)

Percentage of queries that failed

Kubernetes Events

List of Kubernetes events raised in the namespace in the time range.

metrics

description

Time

Time of event

Reason

Reason for the event

Object

Object raising the event

Message

Message details