kdb+ Initial Import Quickstart
This page explains how to import an existing kdb+ database into a new kdb Insights database.
If you have a large existing kdb+ database (the source database), you should use an Initial Import when creating your new kdb Insights database. This is generally faster than using batch ingest for large historical datasets.
It is important to understand the difference between Initial Import and Batch Ingest:
-
Initial Import: used only when creating a new kdb Insights database that has never been initialized and therefore contains no data. Initial Import loads historical data at creation time.
-
Batch Ingest: used to add additional data to an existing kdb Insights database that has already been initialized and may already contain data.
Important
You cannot Initial Import into a kdb Insights database that has already been initialized or has begun receiving data. In that case, use Batch Ingest to load your historical data instead. Before importing any data, ensure you have read the prerequisites to ensure that your kdb+ database is suitable for Initial Import.
Quickstart steps
Create temporary structure
-
First, create a temporary structure.
TEMP=`mktemp -d`
export TEMP # root directory to store data and config for example
export DATA=$TEMP/data/ # directory where test data will be created
export SCHEMA=$TEMP/schema/ # directory where schema will be stored
export INSIGHTS_HOSTNAME=insights.example.com # kdb+ Insights URL
mkdir -pv $DATA $SCHEMA
-
If needed, create some sample data. The following sample creates a kdb+ database containing trade and quote tables.
dbroot:hsym`$getenv`DATA; d:2021.09.01+til 20; {[d;n]sv[`;.Q.par[dbroot;d;`trade],`]set .Q.en[dbroot;([]sym:`$'n?.Q.A;time:d+.z.N+til n;price:n?100f;size:n?50)];}[;10000]each d; {[d;n]sv[`;.Q.par[dbroot;d;`quote],`]set .Q.en[dbroot;([]sym:`$'n?.Q.A;time:d+.z.N+til n;ask:n?100f;asize:n?50;bid:n?100f;bsize:n?50)];}[;10000]each d;This creates the below structure.
/tmp/test/. ├── db │ ├── 2021.09.01 │ ├── 2021.09.02 │ ├── 2021.09.03 │ ├── 2021.09.04 │ ├── 2021.09.05 │ ├── 2021.09.06 │ ├── 2021.09.07 │ ├── 2021.09.08 │ ├── 2021.09.09 │ ├── 2021.09.10 │ ├── 2021.09.11 │ ├── 2021.09.12 │ ├── 2021.09.13 │ ├── 2021.09.14 │ ├── 2021.09.15 │ ├── 2021.09.16 │ ├── 2021.09.17 │ ├── 2021.09.18 │ ├── 2021.09.19 │ └── 2021.09.20 └── sym
Copy the data
-
Next, copy the data using the following steps. Create a database tier(s) storage to store the data.
Note
You may need to change the storage class for the PVC depending on what is available on your cluster.
YAML to create a PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: $KX_DATABASE_NAME-hdb
spec:
storageClassName: rook-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 20Gi
EOF
-
Create a pod, which mounts the local storage tier.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: $KX_DATABASE_NAME-copy-pod
spec:
securityContext:
fsGroup: 2000
containers:
- name: $KX_DATABASE_NAME-copy-container
image: amazonlinux:2
stdin: true
command: [ "/bin/bash", "-c" ]
args:
- |
yum install -y rsync
mkdir /data/db/hdb/data/
while true; do sleep 30; done;
volumeMounts:
- mountPath: /data/db/hdb
name: $KX_DATABASE_NAME-storage
volumes:
- name: $KX_DATABASE_NAME-storage
persistentVolumeClaim:
claimName: $KX_DATABASE_NAME-hdb
EOF
-
Copy kdb+ data to the tiers created.
bash krsync.sh $DATA $KX_DATABASE_NAME-copy-pod /data/db/hdb/data/
Deploy the database
Now the data is imported, you can deploy your database.
-
Create a kdb+ Insights schema for the database to be imported to.
q create-schema.q -db $DATA -fmt package -out $SCHEMA
Edit the schema of the trade and quote tables to set the prtnCol.
-
Create a kdb+ Insights database with the new database configured.
The following example shows a database configuration with a schema:
Database Assembly
name: Trade data
labels:
region: amer
dbSettings:
encryption:
encryptAll: false
tables:
quote:
type: "partitioned"
columns:
- name: "sym"
type: "symbol"
- name: "time"
type: "timestamp"
- name: "ask"
type: "float"
- name: "asize"
type: "long"
- name: "bid"
type: "float"
- name: "bsize"
type: "long"
prtnCol: "time"
trade:
type: "partitioned"
columns:
- name: "sym"
type: "symbol"
- name: "time"
type: "timestamp"
- name: "price"
type: "float"
- name: "size"
type: "long"
prtnCol: "time"
mounts:
rdb:
type: stream
partition: none
baseURI: none
idb:
type: local
partition: ordinal
baseURI: file:///data/db/idb
hdb:
type: local
partition: date
baseURI: file:///data/db/hdb
elements:
dap:
instances:
rdb:
mountName: rdb
idb:
mountName: idb
hdb:
mountName: hdb
sm:
source: stream
tiers:
- name: rdb
mount: rdb
- name: idb
mount: idb
schedule:
freq: 0D00:10:00 # every 10 minutes
- name: hdb
mount: hdb
schedule:
freq: 1D00:00:00 # every day
snap: 01:35:00 # at 1:35 AM
retain:
time: 2 days
If a warning is displayed indicating that kxi package is deprecated, you can use kxi pm instead.
The deployment mechanism depends on which SDK microservices are being used. Some examples are available here.
-
Inspect status API and logs to verify that the import worked.
kxi --realm insights --hostname $INSIGHTS_HOSTNAME auth login
export INSIGHTS_TOKEN=`kxi auth print-token` && curl -L "https://${INSIGHTS_HOSTNAME}/servicegateway/api/v1/database/$KX_DATABASE_NAME/status" -H "Authorization: Bearer $INSIGHTS_TOKEN"
{
"state": "normal",
"encryption": "encrypted",
"progress": {
"cmdCurrent": "",
"cmdIndex": null,
"cmdTotal": null,
"subCurrent": "",
"subIndex": null,
"subTotal": null,
"completeETA": ""
},
"memory": {
"usage": 151252992,
"limit": 2147483648
}
}
curl -X POST --header "Content-Type: application/json"\
--header "Accepted: application/json"\
--header "Authorization: Bearer $INSIGHTS_TOKEN"\
--data "{\"table\":\"trade\",\"startTS\":\"2021-09-01T00:00:00.000000000\",\"endTS\":\"2024-11-12T13:27:00.000000000\",\"agg\":[[\"cnt\",\"count\",\"time\"]]}"\
"https://${INSIGHTS_HOSTNAME}/servicegateway/kxi/getData"
{
"header": {
"auditID": "e13b1e27-a82a-41e3-b6c9-cc08d4678367",
"rcvTS": "2024-11-12T20:00:15.588000000",
"corr": "aff5c6a1-ac83-4895-9df5-221b66c6e64d",
"logCorr": "e13b1e27-a82a-41e3-b6c9-cc08d4678367",
"http": "json",
"api": ".kxi.getData",
"agg": ":10.6.142.183:5070",
"refVintage": -9223372036854776000,
"rc": 0,
"ac": 0,
"ai": "",
"limitApplied": false
},
"payload": [
{
"cnt": 200000
}
]
}
Next steps
Now you have tried this process with the quickstart, you can attempt following the initial import process. If you experience any issues, refer to our troubleshooting page for help.