Quickstart Guide

This guide outlines the essential steps to using KDB.AI. Before proceeding, ensure your environment is set up as described in Prerequisites and has the necessary information to connect to KDB.AI Cloud or KDB.AI Server.

Tip

If you prefer to start in GitHub or Google Colab, refer to the Quickstart sample project.

Note

The following instructions apply to both KDB.AI Cloud and KDB.AI Server users.

Required imports

Python

Copy
# Prerequisites
# Before running this script ensure you follow prerequisites in the cloud setup guide
# A valid KDB.AI endpoint and key which you can obtain following the setup process here https://cloud.kdb.ai/
import sys
import numpy as np
import pandas as pd
import kdbai_client as kdbai
from logging import Logger

Python

Copy
# Endpoints created in setup guide
# Cloud instance
# You can get your <INSTANCE_ID> and <API_KEY> from the 'Connection Details' page in the Cloud UI
# session = kdbai.Session(endpoint="https://cloud.kdb.ai/instance/<INSTANCE_ID>", api_key="<API_KEY>")

# Local server
session = kdbai.Session(endpoint='http://localhost:8082')

# Get the database connection. Default database name is 'default'
database = session.database('default')

Create a new table

Before creating a table you must first set the table schema. This is defined as a python dictionary containing a list of columns. For each column you must define the name and a type. Full schema definition specifications are available in the manage tables section.

Python

REST

Python

Copy
# Set up the schema and index for the KDB.AI table, specifying embeddings column with 8 dimensions, Euclidean Distance, and flat index
schema = [
    {"name": "id", "type": "str"},
    {"name": "vectors", "type": "float32s"}
]

# Define the index
index = [
    {
        "name": "flat_index",
        "type": "flat",
        "column": "vectors",
        "params": {"dims": 8, "metric": "L2"},
    }
]

Python

Copy
# Create a test table called 'quickstartkdbai'
table_name = 'quickstartkdbai'

# First ensure the table does not already exist
try:
    database.table(table_name).drop()
except kdbai.KDBAIException:
    pass

try:
    table = database.create_table(table_name, schema=schema, indexes=index)
    # Check table was created successfully
    if(table in database.tables):
        f"Table {table_name} was created successfully"
    else:
        Logger.info(f"Could not create the table \'{table_name}\' in endpoint \'{session._session.host}\'")
except Exception as e:
    (f'Exception {e} occurred creating the table \'{table_name}\'. Check endpoint and key and that table does not exist already.')
    sys.exit(1)
finally:
    pass

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -X "POST" -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" -s $KDBAI_ENDPOINT/api/v2/databases/default/tables -d @table.json

The request body is in a file called table.json:

JSON

Copy
{
    "table": "quickstartkdbai",
    "schema": [
        {"name": "id", "type": "char"},
        {"name": "vectors", "type": "floats"}
    ],
    "indexes": [ 
        {"name": "vectorIndex", "type": "flat", 
        "params": {"dims": 8, "metric": "L2"},
        "column": "vectors"}
    ]
}

Retrieve a list of tables

Display a list of tables, including your recently created table, using the following command:

Python

REST

Python

Copy

print(database.tables)
# This command should return an array with the table: [KDBAI table "quickstartkdbai"]

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -X "GET" -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" -s $KDBAI_ENDPOINT/api/v2/databases/default/tables

Note

If you pipe the result to a tool such as jq you can see the result pretty printed:

JSON

Copy
curl -s localhost:8081/api/v2/databases | jq
{
  "result": {
    "database": "default",
    "tables": [
      "quickstartkdbai"
    ]
  }
}

Add data to your table

Generate an array of five 8-dimensional vectors that will be the vector embeddings:

Python

REST

You can then add these to the pandas dataframe ensuring the column names/types match the table schema.

Python

Copy
# Insert a row of data with sample vectors
try:
    ids = ['h', 'e', 'l', 'l', 'o']  # Example ID values
    vectors = np.random.rand(40).astype(np.float32).reshape(5,8)
    df = pd.DataFrame({"id": ids, "vectors": list(vectors)})
    table.insert(df)
# Check we have our row
    if(len(table.query().values) == 5):
        f"Row with ID {id} in table {table_name} was inserted successfully"
    else:
        Logger.info(f"Could not insert a row with ID {id} into table {table_name}.")
except Exception as e:
    f'Exception {e} occurred trying to insert a row with ID {id} into table {table_name}.'
    sys.exit(1)
finally:
    pass

JSON data can be generated in any language. Below .j.j from q is used to generate example data and write to a file to insert.json.

q

Copy
# start the kdb+ binary with the q command
q
# run the following on the 'q)' prompt
`insert.json 0: enlist .j.j `table`rows!(`quickstartkdbai;([] id:"hello";vectors:(5;8)#(5*8)?1e))
\\

The contents of the generated file should be:

JSON

Copy
$ cat insert.json
{
    "payload": [
        {"id":"h","vectors":[0.3927524,0.5170911,0.5159796,0.4066642,0.1780839,0.3017723,0.785033,0.5347096]},
        {"id":"e","vectors":[0.7111716,0.411597,0.4931835,0.5785203,0.08388858,0.1959907,0.375638,0.6137452]},
        {"id":"l","vectors":[0.5294808,0.6916099,0.2296615,0.6919531,0.4707882,0.6346716,0.9672399,0.2306385]},
        {"id":"l","vectors":[0.949975,0.439081,0.5759051,0.5919004,0.8481566,0.389056,0.391543,0.08123546]},
        {"id":"o","vectors":[0.9367504,0.2782122,0.2392341,0.1508133,0.1567317,0.9785,0.7043314,0.9441671]}
    ]
}

Note

As above, if you pipe the result to a tool like jq you can see the result pretty printed.

JSON

Copy
$ cat insert.json | jq
{
  "payload": [
    {
      "id": "h",
      "vectors": [
        0.3927524,
        0.5170911,
        0.5159796,
        0.4066642,
        0.1780839,
        0.3017723,
        0.785033,
        0.5347096
      ]
    },
...

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

# insert data from file insert.json
curl -X "POST" -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" -s $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai/insert  -d @insert.json

Query the table

Use the following command to query data from the table.

Python

REST

Note

The query function accepts a wide range of arguments to make it easy to filter, aggregate, and sort. Run ?table.query to see them all.

Python

Copy
data = table.query()
print(f'Table data:\n {data}')

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -H 'Content-Type: application/json' -H "X-Api-Key: $KDBAI_TOKEN" $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai/query -d '{"filter" : []}'

Note

As above, if you pipe the result to a tool like jq you can see the result pretty printed.

Shell

Copy
curl -H 'Content-Type: application/json' -H "X-Api-Key: $KDBAI_TOKEN" $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai/query -d '{"filter" : []}' | jq
"result": [
{
  "id": "h",
  "vectors": [
    0.6594225,
    0.5260468,
    0.2424757,
    0.2224251,
    0.6360764,
    0.05000889,
    0.2665702,
    0.9261618
  ]
},
...

Run similarity search

Search for the nearest neighbors using the following command:

Note

The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.

Python

REST

Python

Copy
# Run a similarity search
results = table.search(vectors={'flat_index':[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, n=3)
print(f'Similarity search results: {results}\n\n')

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -s -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai/search -d '{"vectors":{"vectorIndex": [[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, "n":3,"options":{"distanceColumn":"dist"}}'

Note

As above, if you pipe the result to a tool like jq you can see the result pretty printed.

Shell

Copy
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -s -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai/search -d '{"vectors":{"vectorIndex": [[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, "n":3,"options":{"distanceColumn":"dist"}}' | jq
{
  "result": [
    {
      "id": "h",
      "vectors": [
        0.6594225,
        0.5260468,
        0.2424757,
        0.2224251,
        0.6360764,
        0.05000889,
        0.2665702,
        0.9261618
      ]
    },
...

The closest matching neighbors for the query vector are returned along with the calculation of L2 (Euclidean Distance) similarity.

Note

The search API supports batch querying and filtered search.

Delete table

Use the following command when you want to delete a table:

Python

REST

Python

Copy
# Clean up our table
if(table in database.tables):
    database.table(table_name).drop()

Shell

Copy
# Using a local server
KDBAI_ENDPOINT='http://localhost:8081'
KDBAI_TOKEN=''

curl -X "DELETE" -H "Content-Type: application/json" -H "X-Api-Key: $KDBAI_TOKEN" -s $KDBAI_ENDPOINT/api/v2/databases/default/tables/quickstartkdbai

WARNING

Once you delete a table, you cannot use it again.

In KDB.AI, when you delete a table, the associated index is also removed.

Next steps

Now that you are successfully making indexes with KDB.AI, you can start inserting your own data and analysing it:

Samples

You can also explore the following: