Fast and Easy Levenshtein Distance Calculation with Quickenshtein in .NET

Fast and Easy Levenshtein Distance Calculation with Quickenshtein in .NET

High-performance string comparison in one line of C# code

Tpyos happen. Users are nearly guaranteed to make small mistakes or type something very close to what you were expecting, but not exactly correct. When working in natural language processing (NLP) applications such as chatbots or text-based games, being able to cope with small differences in user input is critical. One of the most popular ways of measuring differences between two strings is something called Levenshtein Distance.

In this article we’ll explore Levenshtein distance and the open source Quickenshtein library that provides a fast and efficient .NET implementation of it.

What is Levenshtein Distance?

Put simply, Levenshtein distance is the number of edits needed to one of the two strings you are comparing to make the two strings identical. These edits could be adding, removing, or replacing characters within the string.

Levenshtein Distance Illustrated

The mathematical details of Levenshtein distance can be a little tricky to grasp so I’ll spare the full implementation details here.

As anecdotal evidence to the complexity, last summer I walked by a group of my students that had spent the past few hours diagramming this algorithm and really understanding it and were planning on implementing it themselves.

Students Analyzing Levenshtein Distance

While I was very proud of them - and still am - I told them that it would probably be more efficient to find a library out there that solves this problem for them in a performant and reliable manner instead of trying to create their own during their final capstone project. They followed my advice and came to a very good result as a result of it.

The great news for us is that many such libraries exist for Levenshtein distance, which has fostered a bit of competition and innovation in the community. In the next part of this article, I’ll walk through getting started with the best one I’ve found for .NET developers: Quickenshtein.

What is Quickenshtein?

According to its Github repository, Quickenshtein is a quick and memory efficient Levenshtein Distance calculator for .NET and achieves its high performance via SSE2, SSE4.1, and AVX2 hardware intrinsics when those are available.

The creator has obviously put a lot of thought and documentation into the performance aspect of Quickenshtein and the library is available under the MIT license, which happens to be the same license .NET is available under.

Installing and Using Quickenshtein

You can install Quickenshtein via searching for it in NuGet package manager in Visual Studio or by using the .NET CLI:

dotnet add package Quickenshtein

Once Quickenshtein is installed, you can reference it with a using statement:

using Quickenshtein;

After that, it’s just a single line of code to compare two strings and get the Levenshtein distance between them:

int distance = Levenshtein.GetDistance("Hello There!", "General Kenobi");

That’s it. That’s all you have to do to compare any two strings using the built-in high-performance optimizations that the developer spent so much time developing that the optimization process was featured in .NET Conf 2020.

There are more advanced details available. For example, if you wanted to compare very large bodies of text, Quickenshtein provides an overload that takes in CalculationOptions to enable multithreading over a certain character count. There are also a few overloads that work with ReadOnlySpan<char> sources for more extreme performance scenarios, but for most people the simple implementation will be more than enough.

Conclusion

If you want to build a rich natural language processing application and wanted to use Levenshtein distance, all you need to do is add a reference to Quickenshtein and then make a single call to GetDistance. This call will take in the two strings to compare and returns an integer representing the difference between the two strings.

Using this simple, efficient, and performant library you can fuel a number of features such as:

  • Chatbots and other conversational AI solutions
  • Spell Checkers
  • Mapping user text to known intents or commands, even when typos are present
  • Language comparisons
  • Auditing optical character recognition algorithms

Quickenshtein reduces the barriers to entry in getting started with Levenshtein distance by providing an intuitive, efficient, and fast open source implementation and has become my default Levenshtein distance recommendation for those looking into NLP solutions on .NET.