LIFTI v7: Anchoring Queries to Field Boundaries

Share on: bluesky linkedin copy

LIFTI is an open source full text index library for .NET - you can check it out on GitHub

LIFTI v7 introduces anchor operators that let you constrain matches to the start or end of fields. This feature was requested for exact match scenarios - like finding records where a field contains exactly one specific value, or where text starts or ends with particular words.

The syntax

The anchor operators are simple:

  • <<term - matches when “term” appears as the first token in a field
  • term>> - matches when “term” appears as the last token in a field
  • <<term>> - matches when “term” is both the first and last token (i.e., the entire field content)

Here’s how you might use them:

1// Find cars where the brand is exactly "Skoda"
2index.Search("Brand=<<Skoda>>")
3
4// Find documents where *any* field's text starts with "The"
5index.Search("<<The")
6
7// Find descriptions ending with "excellent"
8index.Search("Description=excellent>>")

Token positions are the key

To implement this feature, LIFTI needed to track the position of the last token in each field. Previously, we only tracked token counts. This required changes to the serialization format and a new FieldStatistics record:

1public record FieldStatistics
2{
3    public int TokenCount { get; init; }
4    public int LastTokenIndex { get; init; }  // NEW in V7
5}

During indexing, we now capture both the total token count and the index of the final token for each field. This metadata enables precise positional matching.

Implementation details

The AnchoredWordQueryPart class handles the evaluation. First, it performs a normal exact word match, then filters the results based on the anchor requirements:

 1public override IntermediateQueryResult Evaluate(
 2    Func<IIndexNavigator> navigatorCreator, 
 3    QueryContext queryContext)
 4{
 5    var timing = queryContext.ExecutionTimings.Start(this, queryContext);
 6    using var navigator = navigatorCreator();
 7    navigator.Process(this.Word.AsSpan());
 8    var results = navigator.GetExactMatches(queryContext, this.ScoreBoost ?? 1D);
 9
10    // Filter results based on anchor requirements
11    var filteredResults = this.FilterByAnchors(results, navigator.Snapshot.Metadata);
12
13    return timing.Complete(filteredResults);
14}

The filtering logic examines each match location against the field’s last token index:

 1private bool IsAnchorMatch(int minTokenIndex, int maxTokenIndex, int lastTokenIndex)
 2{
 3    if (this.RequireStart)
 4    {
 5        if (this.RequireEnd)
 6        {
 7            // Both anchors: must include index 0 AND the last token
 8            return minTokenIndex == 0 && maxTokenIndex == lastTokenIndex;
 9        }
10
11        // Start anchor only: must include index 0
12        return minTokenIndex == 0;
13    }
14
15    // End anchor only: must include the last token index
16    return maxTokenIndex == lastTokenIndex;
17}

This approach handles both single tokens and composite locations (like those from sequential text queries). For a match to be valid, the token position range must include the required boundary position(s).

Combining with other operators

Anchor operators work seamlessly with LIFTI’s other query features:

 1// Wildcard matching at field start
 2index.Search("Brand=<<Sk*")  // Matches "Skoda", "Škoda", "Ski"
 3
 4// Fuzzy matching for exact fields
 5index.Search("Status=<<?Activ>>")  // Matches "Active" with typo tolerance
 6
 7// Multiple conditions
 8index.Search("Title=<<The & Description=excellent>>")
 9
10// Sequential text anchoring
11index.Search("\"<<The West\"")  // Phrase must start the field

Why not just use field filters?

You might wonder why not just use field filters like Brand=Skoda without the anchors. The difference is that a normal field filter matches anywhere in the field:

1// Without anchors - matches "New Skoda" and "Skoda Octavia"
2index.Search("Brand=Skoda")
3
4// With start anchor - only matches "Skoda Octavia", not "New Skoda"  
5index.Search("Brand=<<Skoda")
6
7// Both anchors - only matches exact "Skoda"
8index.Search("Brand=<<Skoda>>")

This precision is particularly useful for categorical fields or when you need to distinguish between “title starts with X” versus “title contains X somewhere”.

Deserializing older indexes

LIFTI maintains backwards compatibility with V6 indexes. When deserializing an older index, the LastTokenIndex values are calculated on-the-fly from the existing token locations. However, for optimal performance, it’s recommended to re-index your data with V7 to persist these statistics.

Testing the edge cases

The implementation needed careful testing for various scenarios. Here’s a snippet from the test suite showing the “exact match” case:

 1[Fact]
 2public void EvaluatingNavigationInIndex_WithBothAnchors_ShouldOnlyReturnExactMatches()
 3{
 4    var sut = new AnchoredWordQueryPart("test", requireStart: true, requireEnd: true);
 5
 6    var matches = new[]
 7    {
 8        ScoredToken(1, ScoredFieldMatch(1D, 1, 0)),     // Single token (MATCH)
 9        ScoredToken(2, ScoredFieldMatch(1D, 1, 0)),     // Token at 0, field has 5 tokens (NOT MATCH)
10        ScoredToken(3, ScoredFieldMatch(1D, 1, 0))      // Single token (MATCH)
11    };
12
13    var metadata = new FakeIndexMetadata<int>(
14        3,
15        documentMetadata:
16        [
17            (1, DocumentMetadata.ForLooseText(1, 1, 
18                new(new Dictionary<byte, FieldStatistics> { { 1, new(1, 0) } }, 1))),
19            (2, DocumentMetadata.ForLooseText(2, 2, 
20                new(new Dictionary<byte, FieldStatistics> { { 1, new(5, 4) } }, 5))),
21            (3, DocumentMetadata.ForLooseText(3, 3, 
22                new(new Dictionary<byte, FieldStatistics> { { 1, new(1, 0) } }, 1)))
23        ]);
24
25    // ... test continues
26}

Notice how documents 1 and 3 have LastTokenIndex: 0 (single token fields), while document 2 has LastTokenIndex: 4 (five token field). Only the single-token matches pass the filter.

Summary

Anchor operators are particularly useful for:

  • Exact categorical field matching
  • Title/heading prefix matching
  • Sentence ending detection
  • Record validation queries

The combination of anchor operators with wildcards and fuzzy matching opens up some interesting query possibilities. Give them a try and see how they can make your searches more precise!