Alerts Reference
kdb Insights Enterprise provides a set of pre-configured alerts to help you monitor and maintain the health of kdb Insights Enterprise.
This page lists the pre-packaged alerts generated by kdb Insights Enterprise.
Note
All alert names are prefixed with "NAMESPACE-kxi-".
name |
level |
description |
threshold |
context |
---|---|---|---|---|
CriticalNodeCPUUsage |
critical |
Node CPU utilization % for the last 5 minutes |
> 90% |
node |
CriticalNodeMemoryUsage |
critical |
Node Memory usage for the last 5 minutes |
> 90% |
node |
CriticalPVCDiskUsage |
critical |
Percentage disk utilization of Non-RT PVC connected to one or multiple pods |
> 80% |
pvc |
CriticalRookCephDiskUsage |
critical |
Percentage rook-ceph disk utilization across the cluster |
> 80% |
cluster |
CriticalRTPVCDiskUsage |
warning |
Percentage disk utilization of RT PVC connected to one or multiple pods |
> 95% |
pvc |
DAPIsNotReceivingData |
warning |
A previously active DAP has not received any data in the last minute |
|
pod |
DAPPurgeIncomplete |
warning |
At the last EOI the DAP purged less than 50% of the records written to the Storage Manager |
> 50% |
assembly |
HighAggErrors |
warning |
Aggregator errors for the last minute |
> 20 |
pod |
HighAggQueueSize |
warning |
Aggregator request queue size for the last minute |
> 20 |
pod |
HighCPUThrottling |
warning |
CPU throttling issues for a process in container for the last minute |
|
container |
HighNodeCPUUsage |
warning |
Node CPU utilization % for the last 5 minutes |
> 80% |
node |
HighNodeMemoryUsage |
warning |
Node Memory usage for the last 5 minutes |
> 80% |
node |
HighPVCDiskUsage |
warning |
Percentage disk utilization of Non-RT PVC connected to one or multiple pods |
> 60% |
pvc |
HighRCQueueSize |
warning |
Resource Coordinator queue size |
> 20 |
pod |
HighRCRetries |
warning |
Resource Coordinator request retries |
> 20 |
pod |
HighRookCephDiskUsage |
warning |
Percentage rook-ceph disk utilization across the cluster |
> 90% |
cluster |
HighSGPendingQueries |
warning |
Service Gateway pending queries for the last minute |
> 20 |
container |
HighSMEODTime |
info |
Time take for an EOD |
> 4h |
database |
HighSymFileGrowth |
info |
Daily sym file growth as a percentage of the total sym file size, where the sym file is larger than 50MB |
> 25 % |
pod |
KeycloakContainerFailed |
warning |
Pod responsible for Keycloak failing to restart in the last 5 minutes |
|
pod |
NoAggsPresent |
warning |
At least one assembly is deployed, but no Resource Coordinator Aggregators exist |
|
container |
NoDAPsPresent |
warning |
At least one assembly is deployed, but no Resource Coordinator DAPs exist |
|
container |
NodeNotInReadyState |
warning |
A node is not in a ready state |
|
node |
NoRDBGrowth |
warning |
Rate of rdb growth is 0% |
= 0% |
pod |
NoRTLeader |
critical |
There is no leader for the Stream and therefore no messages will be merged and available for the subscribers |
|
RT stream |
PodCrashing |
critical |
Pod in a CrashLoopBackoff for the last minute |
|
pod |
PodCrashLoopBackOff |
warning |
Pod failing to restart on for the last minute |
|
pod |
PodInFailedState |
warning |
Pod in Failed state for the last minute |
|
pod |
PodInUnknownState |
warning |
Pod in Unknown state for the last minute |
|
pod |
PodNotReady |
warning |
Pod is in NotReady state for the last minute |
|
pod |
PodOOMKilled |
warning |
Container is Out of memory (OOM) killed and restarting |
|
container |
PodTargetDown |
warning |
A target is down |
|
pod |
PostgreSQLContainerFailed |
warning |
PostgreSQL container which supports Keycloak is no longer running |
|
pod |
RCsWithoutDAPs |
critical |
Resource Coordinators have connected clients but there are no Data Access Processes connected to them |
|
container |
RTContainerDown |
warning |
A Reliable Transport container has either failed, or been stopped manually |
|
container |
RookCephLimitedDiskAvailable |
warning |
Limited Rook-Ceph disk storage available in MBs |
< 2000 MB |
node |
SGWithoutAggs |
warning |
Service Gateway has connected client but there are no Aggregators connected |
|
container |
SMContainerDown |
warning |
Storage Manager container has either failed, or been stopped manually |
|
container |
SMNoRecordsWrittenDuringEOI |
warning |
An End of Interval ran but no records were written |
|
pod |
SMPendingEOIs |
warning |
Storage Manager has pending End of Interval requests |
|
pod |