Unicode Normalizer
Normalize Unicode text to NFC, NFD, NFKC, or NFKD form instantly online. Fix encoding inconsistencies and ensure text compatibility. Free — runs entirely in your browser.
Canonical Decomposition, then Canonical Composition
Input Text
Normalized Output (NFC)
Related Tools
Markdown → HTML
Convert Markdown to clean HTML instantly. Supports GFM, tables, code blocks, and more.
Case Converter
Convert text to camelCase, PascalCase, snake_case, kebab-case, and more — all at once.
HTML → Markdown
Convert HTML to clean Markdown instantly. Supports headings, links, lists, tables, and code blocks.
Unicode defines four normalization forms that resolve the problem of multiple canonical representations for the same text. The core issue: some characters exist as a single precomposed codepoint (é = U+00E9) and also as a base letter plus combining accent (e + U+0301). These two sequences look identical and are semantically equivalent, but they are not equal as byte sequences — causing string comparison failures and database inconsistencies.
The four normalization forms differ in two dimensions: decomposition type (canonical vs. compatibility) and whether to recompose after decomposing. NFC (Canonical Decomposition + Composition) is the dominant form used on the web and in most APIs — it produces precomposed characters and is the default in macOS HFS+ and most web browsers. NFD (Canonical Decomposition) expands characters to base+marks and is used by some filesystems. NFKC and NFKD additionally normalize compatibility variants like ligatures (fi → fi), width variants (a → a), and fractions (½ → 1⁄2).
This tool normalizes text using the browser's native String.prototype.normalize() method with the selected form. Use NFC for API submissions, database storage, and web content to ensure consistent string comparison. Use NFD before diacritic removal. Use NFKC for search normalization, username uniqueness checks, and security-sensitive text comparison where lookalike characters could be abused.
Common Use Cases
Fixing string comparison failures across platforms
macOS HFS+ uses NFD for filenames while most Linux filesystems and Windows NTFS use NFC. A file synced from macOS to Linux may appear with a different byte sequence than a locally created file with the same visible name, causing comparison failures in build tools, git, and deployment scripts. Normalizing both strings to NFC before comparison resolves the mismatch.
Preventing duplicate usernames in authentication systems
Security-conscious identity systems (Auth0, Okta, custom OAuth servers) normalize usernames and email addresses before uniqueness checks. Without normalization, 'Résumé@example.com' and 'Résumé@example.com' could register as two distinct accounts despite being visually identical. NFKC normalization is the recommended pre-processing step before username deduplication.
Preparing text for NLP and machine learning pipelines
Language model tokenizers (Hugging Face's BPE, SentencePiece, tiktoken) perform best on consistently normalized text. If training data contains mixed NFC and NFD representations of the same word, the tokenizer may assign different token IDs to visually identical strings, creating vocabulary bloat and reducing model quality. Normalizing all text to NFC before tokenization eliminates this inconsistency.