Best Matching 25 (BM25)
This page describes the parameters for the Best Matching 25 (bm25) ranking algorithm as part of AI libs.
.ai.bm25.psearch
The .ai.bm25.psearch
function returns the top k
documents against the contents of indexes across partitions.
Parameters
Name |
Type(s) |
Description |
---|---|---|
|
symbol |
The name of the loaded bm25 partitioned table |
|
dict | long[] |
The query sparse object |
|
long | int |
The number of nearest neighbors to retrieve |
|
real | float |
The term saturation |
|
real | float |
The document length impact on relevance |
|
long[] | date[] | month[] |
The partitions to query from |
Returns
Type |
Description |
---|---|
(real[]; long[]) |
bm25 scores and index |
Example
q
sparse:(100;100)#10000?200j;
index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
.ai.bm25.search[index;first sparse;5;1.25e;0.75e];
dates:reverse .z.D-til 3;
path:`:db;
paths:` sv/:path,/:`$string dates;
indexName:`test;
.ai.bm25.write[;index;indexName] each paths;
.Q.lo[`:db;0;0];
.ai.bm25.psearch[`test;first sparse;5;1.25e;0.75e;dates]
82.95999 82.95999 82.95999 43.1496 43.1496
0 100 200 25 125
.ai.bm25.put
The .ai.bm25.put
function inserts sparse vectors into a bm25 index.
Parameters
Name |
Type |
Description |
---|---|---|
|
dict |
The bm25 object |
|
real | float |
The term saturation |
|
real | float |
The document length impact on relevance |
|
dict | long[] |
The tokenizer input IDs list or counted grouped tokenizer input IDs |
Returns
Type |
Description |
---|---|
table | dict |
Returns updated bm25 object |
Example
q
sparse:(100;100)#10000?200j;
index:.ai.bm25.put[()!();1.25e;0.75e;sparse]
token | +`token`document`occurs`noccurs!(`g#0 1 2 3 4 5 6 7 8 9 10 11 12 13..
document| +`dlen`denoms!(100 100 100 100 100 100 100 100 100 100 100 100 100 ..
stats | +`ck`cb!(,1.25e;,0.75e)
.ai.bm25.score
The .ai.bm25.score
function calculates the scores against the contents of an index.
Parameters
Name |
Type |
Description |
---|---|---|
|
dict | symbol |
The bm25 object or on-disk name |
|
dict | long[] |
The query sparse object |
|
real | float |
The term saturation |
|
real | float |
The document length impact on relevance |
Returns
Type |
Description |
---|---|
real [] |
The bm25 scores |
Example
q
sparse:(100;100)#10000?200j;
index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
.ai.bm25.score[index;first sparse;1.25e;0.75e]
81.89743 28.61314 33.78403 33.34681 34.98316 24.59848 36.47417 27.12549 29.11884 33.15578 30.1717..
.ai.bm25.search
The .ai.bm25.search
function returns the top k
nearest neighbors for sparse search.
Parameters
Name |
Type |
Description |
---|---|---|
|
dict | symbol |
The bm25 object |
|
dict | long[] |
The query sparse object |
|
long | int |
The number of nearest neighbors to retrieve |
|
real | float |
The term saturation |
|
real | float |
The document length impact on relevance |
Returns
Type |
Description |
---|---|
(real[]; long[]) |
The bm25 scores and index |
Example
q
sparse:(100;100)#10000?200j;
index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
.ai.bm25.search[index;first sparse;5;1.25e;0.75e]
81.73318 44.41976 41.04497 40.98081 39.71086
0 54 66 29 21
.ai.bm25.write
The .ai.bm25.write
function saves a bm25 index to disk broken up into three tables.
Parameters
Name |
Type |
Description |
---|---|---|
|
symbol |
The filehandle to save location |
|
dict |
The bm25 index |
|
symbol |
The name to save the index as on disk |
Returns
Type |
Description |
---|---|
symbol[] |
The filehandles to set components |
Example
q
sparse:(100;100)#10000?200j;
index:.ai.bm25.put[()!();1.25e;0.75e;sparse];
.ai.bm25.write[`:db;index;`test]
`:db/teststats/`:db/testtoken/`:db/testdocument/