Custom query parsers in LIFTI
LIFTI is an open source full text index library for .NET - you can check it out on GitHub
The Visual Studio Go To Symbol feature is smart enough that if you miss out letters it can still match with pretty good accuracy:
Now that LIFTI support wildcard matching (and fuzzy matching too!) I thought this would be a good little demonstration of how you could customize LIFTI to behave in a similar way.
First, let’s define a really simple index that’s going to store the name of a token both as the key and the text in the index. We’ll store the text case insensitive.
1var index = new FullTextIndexBuilder<string>()
2 .WithDefaultTokenization(o => o.CaseInsensitive())
3 .Build();
4
5index.BeginBatchChange();
6await index.AddAsync("QueryPart", "QueryPart");
7await index.AddAsync("ExactWordQueryPart", "ExactWordQueryPart");
8await index.AddAsync("FuzzyMatchQueryPart", "FuzzyMatchQueryPart");
9await index.AddAsync("FullTextIndex", "FullTextIndex");
10await index.AddAsync("IFullTextIndex", "IFullTextIndex");
11await index.CommitBatchChangeAsync();
Now we need to think about how we can query the index. Let’s say we wanted to search for fti
- we’d expect FullTextIndex
and IFullTextIndex
to be returned because those letters appear in that order in both.
In terms of a wildcard query, that would be *f*t*i*
, so we could use the standard LIFTI query parser to do just that:
1foreach (var item in index.Search("*f*t*i*"))
2{
3 Console.WriteLine(item.Key);
4}
5
6// Prints:
7// FullTextIndex
8// IFullTextIndex
Alternatively we can skip the query parser and build our own Query
object (note that because we’re skipping the query parser, we need to uppercase the search text to match the indexed characters):
1var query = new Query(
2 new WildcardQueryPart(
3 WildcardQueryFragment.MultiCharacter,
4 WildcardQueryFragment.CreateText("F"),
5 WildcardQueryFragment.MultiCharacter,
6 WildcardQueryFragment.CreateText("T"),
7 WildcardQueryFragment.MultiCharacter,
8 WildcardQueryFragment.CreateText("I"),
9 WildcardQueryFragment.MultiCharacter));
10
11foreach (var item in index.Search(query))
12{
13 Console.WriteLine(item.Key);
14}
15
16// Prints:
17// FullTextIndex
18// IFullTextIndex
But we can do better. We can write our own IQueryParser
implementation for the index to automatically build the wildcard matching for us:
1public class CustomWildcardQueryParser : IQueryParser
2{
3 public IQuery Parse(IIndexedFieldLookup fieldLookup, string queryText, ITokenizer tokenizer)
4 {
5 // Use the default tokenizer to normalize the text so it's the same as in the index
6 queryText = tokenizer.Normalize(queryText);
7
8 var queryFragments = new List<WildcardQueryFragment>();
9
10 // Add the leading multi-character match
11 queryFragments.Add(WildcardQueryFragment.MultiCharacter);
12
13 // Add each character in the query text, with a trailing multi-character match
14 foreach (var letter in queryText)
15 {
16 queryFragments.Add(WildcardQueryFragment.CreateText(letter.ToString()));
17 queryFragments.Add(WildcardQueryFragment.MultiCharacter);
18 }
19
20 // Compose the final query
21 return new Query(new WildcardQueryPart(queryFragments));
22 }
23}
Then we just configure it as the query parser to use in the index, and query just the text that we wanted to initially, fti
:
1var index = new FullTextIndexBuilder<string>()
2 .WithDefaultTokenization(o => o.CaseInsensitive())
3 .WithQueryParser(new CustomWildcardQueryParser())
4 .Build();
5
6index.BeginBatchChange();
7await index.AddAsync("QueryPart", "QueryPart");
8await index.AddAsync("ExactWordQueryPart", "ExactWordQueryPart");
9await index.AddAsync("FuzzyMatchQueryPart", "FuzzyMatchQueryPart");
10await index.AddAsync("FullTextIndex", "FullTextIndex");
11await index.AddAsync("IFullTextIndex", "IFullTextIndex");
12await index.CommitBatchChangeAsync();
13
14// Now just use the simple search terms
15foreach (var item in index.Search("fti"))
16{
17 Console.WriteLine(item.Key);
18}
19
20// Yep, it still prints:
21// FullTextIndex
22// IFullTextIndex
I’ve put together a .NET Fiddle with the final example over here for you to play with.
That’s it! Hopefully this shows how easy it can be to swap out the default query parser with something that meets your needs.