Best Matching 25 (BM25)

This page describes the parameters for the Best Matching 25 (bm25) ranking algorithm as part of AI libs.

.ai.bm25.psearch

The .ai.bm25.psearch function returns the top k documents against the contents of indexes across partitions.

Parameters

Name

Type(s)

Description

indexName

symbol

The name of the loaded bm25 partitioned table

q

dict | long[]

The query sparse object

k

long | int

The number of nearest neighbors to retrieve

ck

real | float

The term saturation

cb

real | float

The document length impact on relevance

parts

long[] | date[] | month[]

The partitions to query from

Returns

Type

Description

(real[]; long[])

bm25 scores and index

Example

q

Copy
 sparse:(100;100)#10000?200j;
 index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
 .ai.bm25.search[index;first sparse;5;1.25e;0.75e];
 dates:reverse .z.D-til 3;
 path:`:db;
 paths:` sv/:path,/:`$string dates;
 indexName:`test;
 .ai.bm25.write[;index;indexName] each paths;
 .Q.lo[`:db;0;0];
 .ai.bm25.psearch[`test;first sparse;5;1.25e;0.75e;dates]
 
 82.95999 82.95999 82.95999 43.1496 43.1496
 0        100      200      25      125

.ai.bm25.put

The .ai.bm25.put function inserts sparse vectors into a bm25 index.

Parameters

Name

Type

Description

index

dict

The bm25 object

ck

real | float

The term saturation

cb

real | float

The document length impact on relevance

sparse

dict | long[]

The tokenizer input IDs list or counted grouped tokenizer input IDs

Returns

Type

Description

table | dict

Returns updated bm25 object

Example

q

Copy
 sparse:(100;100)#10000?200j;
 index:.ai.bm25.put[()!();1.25e;0.75e;sparse]
 
 token   | +`token`document`occurs`noccurs!(`g#0 1 2 3 4 5 6 7 8 9 10 11 12 13..
 document| +`dlen`denoms!(100 100 100 100 100 100 100 100 100 100 100 100 100 ..
 stats   | +`ck`cb!(,1.25e;,0.75e)

.ai.bm25.score

The .ai.bm25.score function calculates the scores against the contents of an index.

Parameters

Name

Type

Description

index

dict | symbol

The bm25 object or on-disk name

q

dict | long[]

The query sparse object

ck

real | float

The term saturation

cb

real | float

The document length impact on relevance

Returns

Type

Description

real []

The bm25 scores

Example

q

Copy
 sparse:(100;100)#10000?200j;
 index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
 .ai.bm25.score[index;first sparse;1.25e;0.75e]

 81.89743 28.61314 33.78403 33.34681 34.98316 24.59848 36.47417 27.12549 29.11884 33.15578 30.1717..

.ai.bm25.search

The .ai.bm25.search function returns the top k nearest neighbors for sparse search.

Parameters

Name

Type

Description

index

dict | symbol

The bm25 object

q

dict | long[]

The query sparse object

k

long | int

The number of nearest neighbors to retrieve

ck

real | float

The term saturation

cb

real | float

The document length impact on relevance

Returns

Type

Description

(real[]; long[])

The bm25 scores and index

Example

q

Copy
 sparse:(100;100)#10000?200j;
 index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
 .ai.bm25.search[index;first sparse;5;1.25e;0.75e]

 81.73318 44.41976 41.04497 40.98081 39.71086
 0        54       66       29       21

.ai.bm25.write

The .ai.bm25.write function saves a bm25 index to disk broken up into three tables.

Parameters

Name

Type

Description

path

symbol

The filehandle to save location

index

dict

The bm25 index

indexName

symbol

The name to save the index as on disk

Returns

Type

Description

symbol[]

The filehandles to set components

Example

q

Copy
 sparse:(100;100)#10000?200j;
 index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
 .ai.bm25.write[`:db;index;`test]

 `:db/teststats/`:db/testtoken/`:db/testdocument/