Parallel Processing in KDB.AI

This page explores two primary methods of parallel processing: worker processes and multithreading.

Parallel processing can significantly enhance the performance of computational tasks by allowing multiple operations to run concurrently. Two key approaches to parallel processing are:

- Worker processes

- Multithreading

What is a worker

A worker in KDB.AI is a process that performs tasks such as executing queries or handling data operations. Each worker can utilize multiple threads to perform its tasks more efficiently.

Multi-worker setup

You can configure the number of workers and threads per worker during startup, with KDB.AI Server. A multi-worker setup can be beneficial in several scenarios. Before implementing it, consider the points below:

Key considerations

  1. Thread contention:

    • Base the number of workers and threads on user concurrency requirements.

    • General rule: number_of_workers * number_of_threads <= number_of_cores.

  2. Memory utilization: each worker performs tasks that may involve data insertion or database loading, increasing memory utilization.

  3. Syntax: NUM_WRK=<number_of_workers>

  4. Example usage: in the server setup, you can specify the number of workers as shown below. By default, NUM_WRK is set to 1.

Bash

Copy
# On the `docker run` command:
-e NUM_WRK=1

By considering these points, you can optimize the performance and resource utilization of your KDB AI Server setup.

Tip

Multi-worker use cases:

  1. Parallel insert to different tables:

    • Insert data into different tables in parallel.

    • Note: multi-worker setup does not support parallel inserts into a single table. This means that if a server starts with multiple workers and you send parallel insert requests for the same table, the server uses only one worker to handle all the inserts.

  2. Parallel searches/queries:

    • Use for parallel search operations across different databases or within the same database.

Multithreading

Multithreading is a powerful technique used to improve the performance of applications. In KDB.AI, the THREADS environment variable plays a crucial role in configuring and managing multithreading.

What is a multithreaded operation

A multithreaded operation is one that can be divided into smaller tasks, which are then executed simultaneously across multiple threads. This parallel execution can significantly speed up processing times, especially for large datasets or complex computations.

Setting the THREADS variable

The THREADS environment variable determines the number of threads each worker uses during multithreaded operations. This makes the configuration process easier and more straightforward by using a single option.

  • Syntax: THREADS=<number_of_threads>

  • Example usage: in the server setup, you can specify the number of threads like below. By default, we recommend setting THREADS to the number of CPU cores available on the machine running KDB.AI Server.

Bash

Copy
# On the `docker run` command:
-e THREADS="8"

Benefits of using threads

  • Improved performance: Setting the THREADS variable correctly can lead to significant performance improvements in multithreaded applications.

  • Flexibility: You can easily adjust the number of threads to match your server's resources and workload requirements.

Tip

Use cases: qHNSW insert, qFlat/qHNSW searches across partitions, and TSS search across splayed and partitioned tables.

Next steps

Now that you're familiar with multithreading, you can improve the performance of the following actions: