Transliterate is a utility library designed to seamlessly convert accented characters into their ASCII equivalents.
$ dotnet add package PlayfulSparkle.TransliterateLibraryPlayful Sparkle: Transliterate Library is a high-precision, extensible C# library targeting .NET Standard 2.0, purpose-built for deterministic Unicode transliteration and normalization. It enables structured transformation of input strings by decomposing composite characters, replacing multi-character graphemes and emoji sequences with pre-defined or user-defined ASCII-compatible mappings, and re-normalizing the result using standardized Unicode normalization forms (NFD, NFC, NFKD, NFKC).
The library is designed to process complex Unicode input in a consistent and idempotent manner. It supports:
Ideal for applications requiring language-agnostic preprocessing such as SEO sanitization, search indexing, canonical form comparison, filename/path generation, and legacy system compatibility.
U+XXXX notation, converting them into valid character sequences for processingemojiUnicodeMappings and defaultUnicodeMappings) for efficient storage and lookup of pre-defined transliteration rules.The Decompose method is used to transliterate and normalize an input string based on a specified Unicode normalization form. It processes the string by applying custom character mappings (if provided) and default mappings for complex characters (e.g., emoji and Unicode sequences). The method can be used to decompose composed characters into their base forms or apply other normalization forms like composition or compatibility normalization.
public static string Decompose(string text, Normalization normalization, bool useDefaultMapping = true, Dictionary<string, string> customMapping = null)
Parameters:
Returns:
The transliterated and normalized string.
Exceptions:
ArgumentException: Thrown if the input text is null or empty.ArgumentOutOfRangeException: Thrown if the input text contains invalid Unicode characters.Example:
string input = "Some text with 🙂 and complex characters!";
string result = Transliterate.Decompose(input, Transliterate.Normalization.Decompose);
Console.WriteLine(result); // Result: Some text with slightly smiling face and complex characters!
The DecomposeAsync method is an asynchronous version of the Decompose method. It runs the transliteration and normalization process in a separate task to avoid blocking the calling thread, which is useful in scenarios where you need to process large strings or perform the operation without affecting the responsiveness of your application.
public static async Task<string> DecomposeAsync(string text, Normalization normalization, bool useDefaultMapping = true, Dictionary<string, string> customMapping = null)
Parameters:
Returns:
A Task<string> representing the asynchronous operation, containing the transliterated and normalized string.
Exceptions:
ArgumentException: Thrown if the input text is null or empty.ArgumentOutOfRangeException: Thrown if the input text contains invalid Unicode characters.Example:
string input = "Some text with 🙂 and complex characters!";
string result = await Transliterate.DecomposeAsync(input, Transliterate.Normalization.Decompose);
Console.WriteLine(result); // Result: Some text with slightly smiling face and complex characters!
int[] arrays of Unicode code points as dictionary keys instead of string sequences (e.g., "U+XXXX"). This significantly improves the speed and efficiency of lookup operations during transliteration.Decompose method to use a new fast-path sequence matcher and a more efficient fallback for surrogate-aware single-character replacement. This results in considerably faster processing of text with complex mappings and Unicode sequences.GetMaxKeyLength to calculate the maximum key length for mappings, resolving potential issues with shorter, incorrect matches.useDefaultMapping: Added a useDefaultMapping option (defaults to true) to enable or disable the built-in default mapping.PrepareDictionary method is now internal static for improved code maintainability.IsValidUnicodeString method has been replaced with a more comprehensive implementation that:
IsValidUnicodeString method now accurately validates user input for valid Unicode sequences by correctly identifying and rejecting strings containing lone high or low surrogate characters.PreprocessDictionary for the user-defined character mapping. The user can now pass directly to the Decompose method.Decompose method for non-blocking operations.Decompose method to handle surrogate pairs and multi-codepoint sequences.For any inquiries, bug reports, or feature requests related to the Playful Sparkle: Transliterate Library extension, please feel free to utilize the following channels:
support@playfulsparkle.com. Please allow a reasonable timeframe for a response.We encourage users to use the GitHub Issues page for bug reports and feature requests as it helps in better organization and tracking of the extension's development.
This extension is licensed under the BSD-3-Clause License. See the LICENSE file for complete details.
Hi! We're the team behind Playful Sparkle, a creative agency from Slovakia. We got started way back in 2004 and have been having fun building digital solutions ever since. Whether it's crafting a brand, designing a website, developing an app, or anything in between, we're all about delivering great results with a smile. We hope you enjoy using our Visual Studio extension!