Alerts Reference
kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.
This page lists the pre-packaged alerts generated by kdb Insights Enterprise.
Note
All alert names are prefixed with "NAMESPACE-kxi-".
|
name |
level |
description |
threshold |
context |
|---|---|---|---|---|
|
CriticalNodeCPUUsage |
critical |
Node CPU utilization % for the last 5 minutes |
> 90% |
node |
|
CriticalNodeMemoryUsage |
critical |
Node Memory usage for the last 5 minutes |
> 90% |
node |
|
CriticalPVCDiskUsage |
critical |
Percentage disk utilization of Non-RT PVC connected to one or multiple pods |
> 80% |
pvc |
|
CriticalRookCephDiskUsage |
critical |
Percentage rook-ceph disk utilization across the cluster |
> 80% |
cluster |
|
CriticalRTPVCDiskUsage |
warning |
Percentage disk utilization of RT PVC connected to one or multiple pods |
> 95% |
pvc |
|
DAPIsNotReceivingData |
warning |
A previously active DAP has not received any data in the last minute |
|
pod |
|
DAPPurgeIncomplete |
warning |
At the last EOI the DAP purged less than 50% of the records written to the Storage Manager |
> 50% |
assembly |
|
HighAggErrors |
warning |
Aggregator errors for the last minute |
> 20 |
pod |
|
HighAggQueueSize |
warning |
Aggregator request queue size for the last minute |
> 20 |
pod |
|
HighCPUThrottling |
warning |
CPU throttling issues for a process in container for the last minute |
|
container |
|
HighNodeCPUUsage |
warning |
Node CPU utilization % for the last 5 minutes |
> 80% |
node |
|
HighNodeMemoryUsage |
warning |
Node Memory usage for the last 5 minutes |
> 80% |
node |
|
HighPVCDiskUsage |
warning |
Percentage disk utilization of Non-RT PVC connected to one or multiple pods |
> 60% |
pvc |
|
HighRCQueueSize |
warning |
Resource Coordinator queue size |
> 20 |
pod |
|
HighRCRetries |
warning |
Resource Coordinator request retries |
> 20 |
pod |
|
HighRookCephDiskUsage |
warning |
Percentage rook-ceph disk utilization across the cluster |
> 90% |
cluster |
|
HighSGPendingQueries |
warning |
Service Gateway pending queries for the last minute |
> 20 |
container |
|
HighSMEODTime |
info |
Time take for an EOD |
> 4h |
database |
|
HighSymFileGrowth |
info |
Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB |
> 25 % |
pod |
|
KeycloakContainerFailed |
warning |
Pod responsible for Keycloak failing to restart in the last 5 minutes |
|
pod |
|
NoAggsPresent |
warning |
At least one assembly is deployed, but no Resource Coordinator Aggregators exist |
|
container |
|
NoDAPsPresent |
warning |
At least one assembly is deployed, but no Resource Coordinator DAPs exist |
|
container |
|
NodeNotInReadyState |
warning |
A node is not in a ready state |
|
node |
|
NoRDBGrowth |
warning |
Rate of rdb growth is 0% |
= 0% |
pod |
|
NoRTLeader |
critical |
There is no leader for the Stream and therefore no messages will be merged and available for the subscribers |
|
RT stream |
|
PodCrashing |
critical |
Pod in a CrashLoopBackoff for the last minute |
|
pod |
|
PodCrashLoopBackOff |
warning |
Pod failing to restart on for the last minute |
|
pod |
|
PodInFailedState |
warning |
Pod in Failed state for the last minute |
|
pod |
|
PodInUnknownState |
warning |
Pod in Unknown state for the last minute |
|
pod |
|
PodNotReady |
warning |
Pod is in NotReady state for the last minute |
|
pod |
|
PodOOMKilled |
warning |
Container is Out of memory (OOM) killed and restarting |
|
container |
|
PodTargetDown |
warning |
A target is down |
|
pod |
|
PostgreSQLContainerFailed |
warning |
PostgreSQL container which supports Keycloak is no longer running |
|
pod |
|
RCsWithoutDAPs |
critical |
Resource Coordinators have connected clients but there are no Data Access Processes connected to them |
|
container |
|
RTContainerDown |
warning |
A Reliable Transport container has either failed, or been stopped manually |
|
container |
|
RookCephLimitedDiskAvailable |
warning |
Limited Rook-Ceph disk storage available in MBs |
< 2000 MB |
node |
|
SGWithoutAggs |
warning |
Service Gateway has connected client but there are no Aggregators connected |
|
container |
|
SMContainerDown |
warning |
Storage Manager container has either failed, or been stopped manually |
|
container |
|
SMNoRecordsWrittenDuringEOI |
warning |
An End of Interval ran but no records were written |
|
pod |
|
SMPendingEOIs |
warning |
Storage Manager has pending End of Interval requests |
|
pod |