Decoders

This page explains how to set up decoder operators for kdb Insights Enterprisepipelines using the Web Interface.

Decoding allows data to be converted into a format that can be processed directly within the Stream Processor. Decoders need to be used when ingesting data from an external data format before performing other transformations.

Tip

Both q and Python interfaces can be used to build pipelines programmatically.

The pipeline builder uses a drag-and-drop interface to link together operations within a pipeline. For details on how to wire together a transformation, see the building a pipeline guide.

These decoders are used by the Import Wizard to configure a decoder for pipelines created using the wizard.

Arrow

(Beta Feature) The Arrow operator decodes Arrow encoded data.

Note

Beta - For evaluation and trial use only

This feature is currently in beta.

  • Refer here to the standard terms related to beta features

  • We invite you to use this beta feature and to provide feedback using the Ideas portal

  • During deployment, the entitlements feature is disabled by default, meaning no restrictions are applied and you can manage all databases, pipelines, and views as well as query all data in a kdb Insights Enterprise deployment

  • When you enable the feature, you do not have access to query data in a database unless you have been given a data entitlement to query the database in question

Note

See q and Python API for more details.

Required Parameters:

name

description

default

As List

If checked, the decoded result is a list of arrays, corresponding only to the Arrow stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the Arrow stream.

No

Avro

The Avro decoder processes messages in either binary or JSON Avro format, producing a kdb+ dictionary that conforms to the configured Avro schema. The schema must be supplied as a JSON string.

Note

Beta - For evaluation and trial use only

This feature is currently in beta.

  • Refer here to the standard terms related to beta features

  • We invite you to use this beta feature and to provide feedback using the Ideas portal

  • During deployment, the entitlements feature is disabled by default, meaning no restrictions are applied and you can manage all databases, pipelines, and views as well as query all data in a kdb Insights Enterprise deployment

  • When you enable the feature, you do not have access to query data in a database unless you have been given a data entitlement to query the database in question

Note

See q and Python API for more details.

Required Parameters:

name

description

default

Schema

The schema used by the incoming avro messages, in JSON format.

 

 

Optional Parameters:

name

description

default

Encoding

Encoding of incoming avro messages, either Binary or JSON.

Binary

Offset

Offset to begin decoding each message from, in bytes. Useful for trimming magic bytes or schema registry IDs.

0

CSV

This operator parses CSV data to a table.

Note

See q and Python API for more details.

Required Parameters:

name

description

default

Delimiter

Field separator for the records in the encoded data

,

Optional Parameters:

name

description

default

Header

Defines whether source CSV file has a header row, either Always, First and Never. When set to Always, the first record of every batch of data is treated as being a header. This is useful when decoding a stream (ex. Kafka) and each message has a header. First indicates that only the first batch of data has a CSV header. This must be used when processing files that have a header row. Lastly, None indicates that there is no header row in the data.

First

Schema

A table with the desired output schema.

Columns to Exclude

A list of columns to exclude from the output.

Encoding Format

How the data is expected to be encoded when being consumed. Currently supports UTF8 and ASCII.

UTF8

Newlines

Indicates whether newlines may be embedded in strings. Can impact performance when enabled.

0b

Expected type formats

The parse option allows for string representations to be converted to typed values. For numeric values to be parsed correctly, they must be provided in the expected format. String values in unexpected formats may be processed incorrectly.

  • Strings representing bytes are expected as exactly two base 16 digits, e.g. "ff"

  • Strings representing integers are expected to be decimal, e.g. "255"

  • Strings representing boolean values have a number of supported options, e.g. "t", "1"

    • More information on the available formats.

GZIP

(Beta Feature) The GZIP operator inflates (decompresses) gzipped data.

Beta Features

Beta feature are included for early feedback and for specific use cases. They are intended to work but have not been marked ready for production use. To learn more and enable beta features, see enabling beta features.

Note

See q and Python API for more details.

Warning

Fault tolerance

GZIP decoding is currently not fault tolerant which is why it is marked as a beta feature. In the event of failure, the incoming data must be entirely reprocessed from the start. The GZIP decoder is only fault tolerant when you are streaming data with independently encoded messages.

GZIP requires no additional configuration. On each batch of data, this operator decodes as much data as it can passing it down the pipeline and buffering any data that cannot be decoded until the next batch arrives.

JSON

The JSON operator parses JSON data.

Note

See q and Python API for more details

Required Parameters:

name

description

default

Decode Each

By default messages passed to the decoder are treated as a single JSON object. Setting decodeEach to true indicates that parsing must be done on each value of a message. This is useful when decoding data that has objects separated by newlines. This allows the pipeline to process partial sets of the JSON file without requiring the entire block to be in memory.

No

Pcap

The Pcap operator decodes Pcap Data.

Note

See q and Python API for more details.

Required Parameters:

name

description

default

Columns to Include

The columns to include. If none of the options are selected, the output includes every available column.

Protocol Buffers

The Protocol Buffers operator decodes Protocol Buffer encoded data.

Note

See q and Python API for more details.

Required Parameters:

name

description

default

Message Name

The name of the Protocol Buffer message type to decode

Message Definition

A .proto definition containing the expected schema of the data to decode. This definition must include a definition of the Message Name referenced above.

Optional Parameters:

name

description

default

As List

If checked, the decoded result is a list of arrays, corresponding only to the Protocol Buffer stream data. Otherwise, by default the decoded result is a table corresponding to both the schema and data in the stream.

No

Definition Example:

In this example, the operator is configured to read the Person message and decode data with the defined fields. Because As List is unchecked, the resulting data is a table with the columns name, id and email.

JSON

Copy
message Person {
 string name = 1;int32 id = 2;string email = 3;
 }                  

Further reading