Custom query parsers in LIFTI

Share on:

LIFTI is an open source full text index library for .NET - you can check it out on GitHub

The Visual Studio Go To Symbol feature is smart enough that if you miss out letters it can still match with pretty good accuracy:

Now that LIFTI support wildcard matching (and fuzzy matching too!) I thought this would be a good little demonstration of how you could customize LIFTI to behave in a similar way.

First, let’s define a really simple index that’s going to store the name of a token both as the key and the text in the index. We’ll store the text case insensitive.

 1var index = new FullTextIndexBuilder<string>()
 2    .WithDefaultTokenization(o => o.CaseInsensitive())
 3    .Build();
 4
 5index.BeginBatchChange();
 6await index.AddAsync("QueryPart", "QueryPart");
 7await index.AddAsync("ExactWordQueryPart", "ExactWordQueryPart");
 8await index.AddAsync("FuzzyMatchQueryPart", "FuzzyMatchQueryPart");
 9await index.AddAsync("FullTextIndex", "FullTextIndex");
10await index.AddAsync("IFullTextIndex", "IFullTextIndex");
11await index.CommitBatchChangeAsync();

Now we need to think about how we can query the index. Let’s say we wanted to search for fti - we’d expect FullTextIndex and IFullTextIndex to be returned because those letters appear in that order in both.

In terms of a wildcard query, that would be *f*t*i*, so we could use the standard LIFTI query parser to do just that:

1foreach (var item in index.Search("*f*t*i*"))
2{
3    Console.WriteLine(item.Key);
4}
5
6// Prints:
7// FullTextIndex
8// IFullTextIndex

Alternatively we can skip the query parser and build our own Query object (note that because we’re skipping the query parser, we need to uppercase the search text to match the indexed characters):

 1var query = new Query(
 2    new WildcardQueryPart(
 3        WildcardQueryFragment.MultiCharacter,
 4        WildcardQueryFragment.CreateText("F"),
 5        WildcardQueryFragment.MultiCharacter,
 6        WildcardQueryFragment.CreateText("T"),
 7        WildcardQueryFragment.MultiCharacter,
 8        WildcardQueryFragment.CreateText("I"),
 9        WildcardQueryFragment.MultiCharacter));
10
11foreach (var item in index.Search(query))
12{
13    Console.WriteLine(item.Key);
14}
15
16// Prints:
17// FullTextIndex
18// IFullTextIndex

But we can do better. We can write our own IQueryParser implementation for the index to automatically build the wildcard matching for us:

 1public class CustomWildcardQueryParser : IQueryParser
 2{
 3    public IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, ITokenizer tokenizer)
 4    {
 5        // Use the default tokenizer to normalize the text so it's the same as in the index
 6        queryText = tokenizer.Normalize(queryText);
 7
 8        var queryFragments = new List<WildcardQueryFragment>();
 9
10        // Add the leading multi-character match
11        queryFragments.Add(WildcardQueryFragment.MultiCharacter);
12
13        // Add each character in the query text, with a trailing multi-character match
14        foreach (var letter in queryText)
15        {
16            queryFragments.Add(WildcardQueryFragment.CreateText(letter.ToString()));
17            queryFragments.Add(WildcardQueryFragment.MultiCharacter);
18        }
19
20        // Compose the final query
21        return new Query(new WildcardQueryPart(queryFragments));
22    }
23}

Then we just configure it as the query parser to use in the index, and query just the text that we wanted to initially, fti:

 1var index = new FullTextIndexBuilder<string>()
 2                .WithDefaultTokenization(o => o.CaseInsensitive())
 3                .WithQueryParser(new CustomWildcardQueryParser())
 4                .Build();
 5
 6index.BeginBatchChange();
 7await index.AddAsync("QueryPart", "QueryPart");
 8await index.AddAsync("ExactWordQueryPart", "ExactWordQueryPart");
 9await index.AddAsync("FuzzyMatchQueryPart", "FuzzyMatchQueryPart");
10await index.AddAsync("FullTextIndex", "FullTextIndex");
11await index.AddAsync("IFullTextIndex", "IFullTextIndex");
12await index.CommitBatchChangeAsync();
13
14// Now just use the simple search terms
15foreach (var item in index.Search("fti"))
16{
17    Console.WriteLine(item.Key);
18}
19
20// Yep, it still prints:
21// FullTextIndex
22// IFullTextIndex

I’ve put together a .NET Fiddle with the final example over here for you to play with.

That’s it - hopefully this shows how easy it can be to swap out the default query parser with something that meets your needs.