A while back I announced an early release of similarity on GitHub in a blog post. Similarity wraps SQL Server functions around the SimMetrics approximate string matching library, making the library’s functions available in SQL Server. Version 1.1.0 has now been released and is available on GitHub here.
Advantages of Similarity
- this library gives you approximate string matching for free. You don’t have to resort to expensive SQL Server additions like SSIS.
- the functions are inline in your SQL code. You don’t have to pipe your data through external tools. This is great for a Guerrilla Analytics environment where you prefer to do everything through code that can be version controlled.
- the is a wide variety of functions to choose from. Some functions are more appropriate for certain types of matches in certain problem domains e.g. comparing URLs or comparing common names.
Version 1.1.0 sees several improvements aimed at making the library easier to install and use and making it easier for others to contribute.
- the entire project is now driven by an Apache Ant build file. This covers similarity as well as the original C# SimMetrics code.
- The project uses semantic versioning.
- there is a small set of SQL install scripts if you just want the functions.
- the project now follows git-flow development conventions to make it easy to contribute.