# mimetype
MIME type lookup and magic-number detection for Gleam on Erlang and JavaScript targets.
## Features
- Extension-to-MIME and MIME-to-extensions lookup derived from `mime-db`
- Magic-number detection for common binary formats across archive, document, image, audio, and video families
- Pure Gleam implementation that builds on both targets
## Install
```sh
gleam add mimetype
```
## When to use this
Use `mimetype` when you need a small, cross-target MIME utility in
Gleam:
- Serving files or attachments: resolve `Content-Type` from a filename or extension
- Validating uploads: prefer magic-number detection over user-supplied extensions
- Bridging APIs: map between file extensions and MIME types in both directions
The extension database is generated from `jshttp/mime-db`, which tracks
the IANA media type registry and common ecosystem aliases. Refreshing
the generated table keeps lookups aligned with that upstream source.
## Serving a file: pick a Content-Type from a filename
The most common use is reading the filename your handler already has,
turning it into a wire-ready `Content-Type` value. `filename_to_mime_type`
is case-insensitive and falls back to `application/octet-stream` for
unknown extensions, so the helper is safe to drop into a response path
without extra branching.
```gleam
import mimetype
/// Pick the Content-Type header value to send back when serving
/// `filename` from disk or object storage.
pub fn content_type_for(filename: String) -> String {
mimetype.filename_to_mime_type(filename)
|> mimetype.to_string
}
// content_type_for("report.PDF") -> "application/pdf"
// content_type_for("avatar.jpg") -> "image/jpeg"
// content_type_for("archive.tar.gz") -> "application/gzip"
// content_type_for("notes") -> "application/octet-stream"
```
For HTML / CSS / JS responses where browsers expect a charset, parse
the wire string once and append the parameter you actually serve:
```gleam
import gleam/option.{Some}
import mimetype
pub fn html_content_type() -> String {
let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
mimetype.to_string(html)
// -> "text/html; charset=utf-8"
}
```
## Validating an upload: detect from bytes, not the user's extension
Browser-uploaded filenames are user input and can lie. Match the leading
bytes of the upload against `mimetype.detect` to get the actual format,
then enforce an allowlist of MIME types your endpoint will accept.
```gleam
import mimetype
pub type UploadError {
EmptyUpload
Unsupported(detected: String)
}
/// Allow only PNG, JPEG, and WebP uploads. The detected MIME type is
/// derived from magic bytes — the caller's filename is ignored.
pub fn validate_image_upload(
bytes: BitArray,
) -> Result(mimetype.MimeType, UploadError) {
case mimetype.detect_strict(bytes) {
Ok(mime) ->
case mimetype.is_image(mime) && image_is_allowed(mime) {
True -> Ok(mime)
False -> Error(Unsupported(detected: mimetype.to_string(mime)))
}
Error(mimetype.EmptyInput) -> Error(EmptyUpload)
Error(_) -> Error(Unsupported(detected: "application/octet-stream"))
}
}
fn image_is_allowed(mime: mimetype.MimeType) -> Bool {
case mimetype.essence_of(mime) {
"image/png" | "image/jpeg" | "image/webp" -> True
_ -> False
}
}
```
The strict variant separates `EmptyInput` (zero-byte upload) from
`NoMatch` (bytes that did not match any signature) so the caller can
return the right HTTP status. For a non-throwing path, `mimetype.detect`
returns `application/octet-stream` for both cases instead.
## Other API entry points
The full surface returns an opaque `MimeType`. Use `mimetype.to_string`
to serialise for an HTTP header; use `mimetype.parse` to construct one
from a wire-format string. Inspect with `essence_of`, `parameter_of`,
`charset_of_type`, `is_image`, `is_a`, and the rest of the predicate /
accessor family. The `parameter_of` docstring pins the rules for
duplicate names (first wins), case-insensitive lookup, and value
whitespace handling — consult it before building anything that round-
trips parameters.
```gleam
import gleam/option.{Some}
import mimetype
pub fn main() {
mimetype.extension_to_mime_type(".json")
|> mimetype.to_string
// -> "application/json"
let assert Ok(jpeg) = mimetype.parse("image/jpeg")
mimetype.mime_type_to_extensions(jpeg)
// -> ["jpg", "jpeg", "jpe"]
mimetype.detect_with_filename(<<0, 1, 2, 3>>, "report.csv")
|> mimetype.essence_of
// -> "text/csv"
let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
mimetype.charset_of_type(html)
// -> Some("utf-8")
}
```
## Capabilities and limitations
This library intentionally stays focused. Knowing where the detector
stops is more useful than discovering it from a surprising result:
- It does perform shallow ZIP-container inspection for a small fixed allowlist: `epub`, OOXML (`docx`/`xlsx`/`pptx`), OpenDocument (`odt`/`ods`/`odp`), `jar`, and `apk`. It does not recurse arbitrarily into nested containers or inspect embedded subformats beyond those targeted signatures.
- It does sniff `text/plain` from printable-ASCII-only payloads (the bounded WHATWG-style binary-vs-text heuristic added in #20) and recognises the UTF-8/16/32 BOM signatures, returning `text/plain; charset=<utf-X>` for the BOM cases. This is the **only** text-related sniffing — it does not detect text encodings beyond the BOM marker, and the printable-ASCII fallback emits a bare `text/plain` with no charset parameter.
- Beyond the four BOM-derived `text/plain; charset=utf-*` signatures it does not parse, validate, or surface MIME-parameter values from the wire.
## Content negotiation
`mimetype/accept` parses RFC 9110 §12.5 `Accept`-family headers and
picks the best server offer for a given client header.
```gleam
import mimetype
import mimetype/accept
pub fn main() {
let assert Ok(items) = accept.parse("text/html, application/json;q=0.9")
let assert Ok(html) = mimetype.parse("text/html")
let assert Ok(json) = mimetype.parse("application/json")
accept.negotiate(client_accepts: items, server_offers: [json, html])
// -> Some(html)
}
```
The same module handles `Accept-Encoding`, `Accept-Charset`, and
`Accept-Language`:
```gleam
import mimetype/accept
pub fn main() {
let assert Ok(items) =
accept.parse_encoding("gzip, br;q=1.0, *;q=0.1")
accept.negotiate_value(client_accepts: items, server_offers: ["br", "gzip"])
// -> Some("br")
}
```
Notes:
- `q=0` excludes a media range from consideration.
- A bare `*/*` client header returns the server's first offer
(server preference).
- `Specific(MimeType)` matching is essence-only — RFC §12.5.1
parameter-level "more-specific" matching is currently out of scope.
## Reader-based detection
`detect_reader` and `detect_reader_strict` let callers detect a MIME
type **without buffering the whole input**. They take a synchronous
reader plus a byte budget, and the reader is invoked **at most once**
to fetch up to that many bytes from the start of the source.
### Reader contract
```gleam
pub type Reader(read_error) = fn(Int) -> Result(BitArray, read_error)
```
- The `Int` argument is the maximum number of bytes the detector wants.
- Returning fewer bytes than requested is fine — it is interpreted as
"the source ended early". Detection runs against whatever was
returned.
- The returned `BitArray` should always be the prefix starting at
offset 0 of the source. The detector inspects it from byte 0.
- The error parameter `read_error` is opaque to the library; in the
strict variant it is preserved as `ReaderError(read_error)` so
callers can distinguish IO failures from "no signature matched".
The reader is called **once per detection call**. There is no
streaming or back-and-forth — return enough bytes for the largest
signature you care about (the detector inspects up to a few KB by
default), or pass a custom `limit` argument tuned for your workload.
### In-memory adapter
The simplest case: when the bytes are already in hand, wrap them in a
function that ignores its argument.
```gleam
import mimetype
pub fn main() {
let png = <<0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A>>
let reader = fn(_limit) { Ok(png) }
mimetype.detect_reader(reader, 3072)
|> mimetype.to_string
// -> "image/png"
}
```
### BEAM file prefix reader
On the Erlang target, wrap a file-IO library so that one call returns
up to `limit` bytes from the start of the file. Any IO library that
can open a file and read a fixed-size prefix works — the snippet below
sketches the shape using a `read_prefix(path, limit)` helper that
returns `Result(BitArray, your_error)`:
```gleam
import mimetype
pub fn detect_file(path: String) -> Result(mimetype.MimeType, mimetype.DetectionError(your_error)) {
let reader = fn(limit) { read_prefix(path, limit) }
mimetype.detect_reader_strict(reader, 3072)
}
```
If `read_prefix` returns `Ok(<<>>)` for an empty file, the strict
variant surfaces `Error(EmptyInput)`. If `read_prefix` itself returns
`Error(some_io_error)`, the strict variant surfaces
`Error(ReaderError(some_io_error))` so the caller can distinguish IO
failure from a genuine no-match.
### JavaScript browser adapter
In the browser, `File` / `Blob` / `ReadableStream` reads are
asynchronous, so they cannot satisfy the synchronous `Reader`
contract directly. The intended pattern is:
1. Read the prefix asynchronously (`await blob.slice(0, limit).arrayBuffer()`
or the equivalent on a `ReadableStream`).
2. Pass the resulting bytes to `detect` / `detect_strict`, **not** to
`detect_reader`.
In Gleam pseudo-code, with an FFI helper `read_blob_prefix` that
awaits the slice and returns a `BitArray`:
```gleam
import mimetype
pub fn detect_blob(blob: Blob) -> mimetype.MimeType {
// `read_blob_prefix` is your FFI: await blob.slice(0, 3072).arrayBuffer()
let bytes = read_blob_prefix(blob, 3072)
mimetype.detect(bytes)
}
```
The reader-based API is most useful when the source is itself
synchronous (BEAM file IO, in-memory buffers, deterministic stream
adapters). For Promise-based sources, awaiting the prefix once and
calling `detect` is the recommended shape.
### Strict variants and error handling
The strict variants return `Result(MimeType, DetectionError(read_error))`,
where `DetectionError` distinguishes:
- `EmptyInput` — the reader returned a zero-byte payload, so no
detection was possible.
- `NoMatch` — the reader returned bytes, but no signature and no
printable-ASCII fallback applied.
- `ReaderError(e)` — the reader itself failed; `e` is preserved
unchanged.
- `UnknownExtension(_)` — only emitted by extension/filename helpers,
not the reader API.
```gleam
import gleam/io
import mimetype
pub fn classify(reader) {
case mimetype.detect_reader_strict(reader, 3072) {
Ok(mime) -> io.println(mimetype.to_string(mime))
Error(mimetype.EmptyInput) -> io.println("empty source")
Error(mimetype.NoMatch) -> io.println("unrecognised content")
Error(mimetype.ReaderError(reason)) -> io.debug(reason)
Error(mimetype.UnknownExtension(_)) -> Nil
}
}
```
## Supported magic-number formats
<!-- BEGIN_SUPPORTED_FORMATS -->
`detect/1` recognises the following MIME types from byte-level
signatures or structural sniffs near the start of the input. This
list is generated from `src/mimetype/internal/magic.gleam` by
`scripts/generate_supported_formats.sh` — do not edit it by hand;
re-run `just generate-readme` after adding or removing a signature.
### Application formats
- `application/epub+zip`
- `application/gzip`
- `application/java-archive`
- `application/json`
- `application/msword`
- `application/ogg`
- `application/pdf`
- `application/rtf`
- `application/vnd.android.package-archive`
- `application/vnd.apache.parquet`
- `application/vnd.ms-asf`
- `application/vnd.ms-cab-compressed`
- `application/vnd.ms-excel`
- `application/vnd.ms-fontobject`
- `application/vnd.ms-powerpoint`
- `application/vnd.oasis.opendocument.presentation`
- `application/vnd.oasis.opendocument.spreadsheet`
- `application/vnd.oasis.opendocument.text`
- `application/vnd.openxmlformats-officedocument.presentationml.presentation`
- `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
- `application/vnd.sqlite3`
- `application/wasm`
- `application/x-7z-compressed`
- `application/x-archive`
- `application/x-bzip2`
- `application/x-compress`
- `application/x-deflate`
- `application/x-elf`
- `application/x-lz4`
- `application/x-lzh-compressed`
- `application/x-lzip`
- `application/x-ole-storage`
- `application/x-rar-compressed`
- `application/x-snappy-framed`
- `application/x-tar`
- `application/x-xz`
- `application/zip`
- `application/zstd`
### Audio formats
- `audio/aac`
- `audio/ac3`
- `audio/aiff`
- `audio/amr`
- `audio/amr-wb`
- `audio/flac`
- `audio/midi`
- `audio/mp4`
- `audio/mpeg`
- `audio/wav`
### Font formats
- `font/collection`
- `font/otf`
- `font/ttf`
- `font/woff`
- `font/woff2`
### Image formats
- `image/avif`
- `image/bmp`
- `image/fits`
- `image/gif`
- `image/heic`
- `image/jp2`
- `image/jpeg`
- `image/jxl`
- `image/png`
- `image/svg+xml`
- `image/tiff`
- `image/vnd.adobe.photoshop`
- `image/vnd.ms-dds`
- `image/vnd.radiance`
- `image/webp`
- `image/x-exr`
- `image/x-icon`
- `image/x-qoi`
### Text formats
- `text/html`
- `text/plain`
- `text/plain; charset=utf-16be`
- `text/plain; charset=utf-16le`
- `text/plain; charset=utf-32be`
- `text/plain; charset=utf-32le`
- `text/plain; charset=utf-8`
- `text/xml`
### Video formats
- `video/mp4`
- `video/quicktime`
- `video/webm`
- `video/x-flv`
- `video/x-matroska`
- `video/x-msvideo`
<!-- END_SUPPORTED_FORMATS -->
The detector is intentionally shallow: it looks only at fixed
signatures near the start of the byte stream, plus a small amount of
targeted ZIP local-header inspection for the container formats listed
above. It does not recurse arbitrarily into nested containers.
## Development
```sh
mise install
just ci
```
The generated MIME-DB lookup tables live in
`src/mimetype/internal/mimetype_db_ffi.erl` and
`src/mimetype/internal/db_ffi.mjs`, with a thin Gleam wrapper at
`src/mimetype/internal/db.gleam`. All three files are derived from
`doc/reference/upstream/mime-db/db.json`. Refresh them with:
```sh
just generate-db
```
CI runs the same generator against the pinned upstream commit and fails
the build if the regenerated output drifts from the committed copies.
### Benchmarks
The hot lookup and detection paths have a small reproducible bench
harness under `test/mimetype_bench.gleam`. Run it on either target:
```sh
just bench-erlang
just bench-javascript
just bench # both, in sequence
```
Each run prints a Markdown table of `ns/op` figures. Capture a
baseline from `main` before a refactor
(`just bench-erlang > before.md`), then re-run on the working branch
and diff the two tables to check for material regressions. The
harness is intentionally not wired into PR-time CI gates — it is for
local A/B comparison and ad-hoc investigation, not for blocking
merges on micro-fluctuations.
## Licensing
The data tables under `src/mimetype/internal/` are generated from
`jshttp/mime-db`. The generated FFI source files
(`mimetype_db_ffi.erl` and `db_ffi.mjs`) carry the MIT notice inline;
the same packaged notice is also included in `THIRD_PARTY_NOTICES.md`.