# Ftfy — fixes text for you
An Elixir port of the Python [ftfy](https://github.com/rspeer/python-ftfy)
library (version 6.3.1). It takes in broken Unicode text and makes it less
broken — most importantly, it detects and fixes *mojibake* (text that was
decoded in the wrong encoding).
```elixir
iex> Ftfy.fix_text("✔ No problems")
"✔ No problems"
iex> Ftfy.fix_text("Broken text… it’s flubberific!")
"Broken text… it's flubberific!"
iex> Ftfy.fix_text("LOUD NOISES")
"LOUD NOISES"
iex> Ftfy.fix_encoding_and_explain("só")
{"só", [{"encode", "latin-1"}, {"decode", "utf-8"}]}
```
## What it does
`Ftfy.fix_text/2` runs a sequence of fixes, each individually configurable via
`Ftfy.TextFixerConfig`:
- **fix_encoding** — detect mojibake and undo it by re-encoding and re-decoding
through the right pair of encodings (the heart of ftfy), including the
sub-fixes `restore_byte_a0`, `replace_lossy_sequences`,
`decode_inconsistent_utf8`, and `fix_c1_controls`
- **unescape_html** — decode HTML entities (`&`, `é`, `’`, …)
- **remove_terminal_escapes** — strip ANSI color codes
- **fix_latin_ligatures** — `fi` → `fi`
- **fix_character_width** — fullwidth/halfwidth → standard width
- **uncurl_quotes** — curly quotes → straight quotes
- **fix_line_breaks** — CRLF, CR, LS, PS, NEL → `\n`
- **fix_surrogates** — repair UTF-16 surrogate pairs
- **remove_control_chars** — strip useless control characters
- Unicode **normalization** (NFC by default)
Other entry points mirror the Python API: `fix_and_explain/2`,
`fix_encoding/2`, `fix_encoding_and_explain/2`, `fix_text_segment/2`,
`apply_plan/2`, `guess_bytes/1`, `fix_file/2`, and `explain_unicode/1`. The
`Ftfy.Fixes`, `Ftfy.Badness`, `Ftfy.Chardata`, `Ftfy.Codecs`, and
`Ftfy.Formatting` modules expose the lower-level building blocks.
## Configuration
Pass a keyword list or a `%Ftfy.TextFixerConfig{}`:
```elixir
Ftfy.fix_text(text, uncurl_quotes: false)
Ftfy.fix_text(text, %Ftfy.TextFixerConfig{normalization: "NFKC"})
```
## Command line
Build the escript and fix text from a file or stdin:
```sh
mix escript.build
echo '✔ No problems' | ./ftfy
./ftfy -e latin-1 broken.txt -o fixed.txt
```
## Installation
Add `ftfy` to your dependencies in `mix.exs`:
```elixir
def deps do
[
{:ftfy, "~> 0.1.0"}
]
end
```
## Notes on the port
- The encoding-detection data tables (HTML entities, the single-byte charmap
encodings, the fullwidth/halfwidth map, the `wcwidth` width tables) and the
two large heuristic regexes are generated from the reference implementation by
`scripts/gen_data.py` into the Ftfy.Data module (internal, undocumented). The
reference package is vendored as a
git submodule at `vendor/python-ftfy` (pinned to the `v6.3.1` tag); run
`git submodule update --init` before regenerating.
- `Ftfy.Codecs` reimplements Python's `bad_codecs`: the `sloppy-windows-*` and
related charmap encodings, and the `utf-8-variants` (CESU-8 / Java modified
UTF-8) decoder, including incremental decoding.
- The behavioral test corpus is read directly from the pinned
`vendor/python-ftfy` submodule (`tests/test_cases.json`); the unit tests are
ported from python-ftfy. All 151 "pass" cases and 10 "known failure" cases
match the reference. (Running the tests therefore needs the submodule:
`git submodule update --init`.)
- One deliberate difference: the BEAM cannot represent lone UTF-16 surrogate
codepoints in a binary, so `Ftfy.Fixes.fix_surrogates/1` is effectively a
no-op on valid strings, and `explain_unicode/1` omits the Unicode character
*name* (the BEAM has no names database).
## License and credits
This library is a port of [ftfy](https://github.com/rspeer/python-ftfy)
("fixes text for you"), created by **Robyn Speer**. ftfy is the result of years
of careful work on the messy reality of broken Unicode, and this Elixir port
exists only because of it — our deepest thanks to Robyn Speer for building and
maintaining the original, and for releasing it under a permissive license.
- Original ftfy: Copyright 2023 Robyn Speer, licensed under the Apache
License, Version 2.0 — <https://github.com/rspeer/python-ftfy>
- This Elixir port: Copyright 2026 FashionUnited, also licensed under the
Apache License, Version 2.0.
The data tables and test corpus in this repository are generated from / ported
directly from python-ftfy 6.3.1 and remain the work of the original author.
See [`LICENSE`](https://github.com/fuww/ftfy/blob/main/LICENSE) for the full
license text and [`NOTICE`](https://github.com/fuww/ftfy/blob/main/NOTICE) for
the attribution and change notice required by the Apache License.
If you use ftfy in research, please cite the original author's work as
described at <https://github.com/rspeer/python-ftfy>.