JustConvertAll-in-One Convert

Unicode Normalizer

Normalize Unicode text to NFC, NFD, NFKC, or NFKD form instantly online. Fix encoding inconsistencies and ensure text compatibility. Free — runs entirely in your browser.

Canonical Decomposition, then Canonical Composition

Input Text

Normalized Output (NFC)

Related Tools

Advertisement

Unicode defines four normalization forms that resolve the problem of multiple canonical representations for the same text. The core issue: some characters exist as a single precomposed codepoint (é = U+00E9) and also as a base letter plus combining accent (e + U+0301). These two sequences look identical and are semantically equivalent, but they are not equal as byte sequences — causing string comparison failures and database inconsistencies.

The four normalization forms differ in two dimensions: decomposition type (canonical vs. compatibility) and whether to recompose after decomposing. NFC (Canonical Decomposition + Composition) is the dominant form used on the web and in most APIs — it produces precomposed characters and is the default in macOS HFS+ and most web browsers. NFD (Canonical Decomposition) expands characters to base+marks and is used by some filesystems. NFKC and NFKD additionally normalize compatibility variants like ligatures (fi → fi), width variants (a → a), and fractions (½ → 1⁄2).

This tool normalizes text using the browser's native String.prototype.normalize() method with the selected form. Use NFC for API submissions, database storage, and web content to ensure consistent string comparison. Use NFD before diacritic removal. Use NFKC for search normalization, username uniqueness checks, and security-sensitive text comparison where lookalike characters could be abused.

Common Use Cases

Fixing string comparison failures across platforms

macOS HFS+ uses NFD for filenames while most Linux filesystems and Windows NTFS use NFC. A file synced from macOS to Linux may appear with a different byte sequence than a locally created file with the same visible name, causing comparison failures in build tools, git, and deployment scripts. Normalizing both strings to NFC before comparison resolves the mismatch.

Preventing duplicate usernames in authentication systems

Security-conscious identity systems (Auth0, Okta, custom OAuth servers) normalize usernames and email addresses before uniqueness checks. Without normalization, 'Ré​sumé@example.com' and 'Résumé@example.com' could register as two distinct accounts despite being visually identical. NFKC normalization is the recommended pre-processing step before username deduplication.

Preparing text for NLP and machine learning pipelines

Language model tokenizers (Hugging Face's BPE, SentencePiece, tiktoken) perform best on consistently normalized text. If training data contains mixed NFC and NFD representations of the same word, the tokenizer may assign different token IDs to visually identical strings, creating vocabulary bloat and reducing model quality. Normalizing all text to NFC before tokenization eliminates this inconsistency.

Normalization Forms

NFC — Canonical Decomposition followed by Canonical Composition. The most common form used on the web and in most APIs.
NFD — Canonical Decomposition. Characters are decomposed into base characters and combining marks (e.g. é → e + combining acute accent).
NFKC — Compatibility Decomposition followed by Canonical Composition. Converts visually similar characters to their canonical equivalents (e.g. fi → fi).
NFKD — Compatibility Decomposition. Fully decomposes characters including compatibility equivalents.