# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.1.0] — 2026-07-01
Initial release: pure-Elixir git client for clone, fetch, push over
smart HTTP v2, with lazy partial-clone support and a path-oriented FS
API for agents.
See [README](./README.md) and [BENCHMARKS on the smoketest
repo](https://github.com/ivarvong/exgit_smoketest/blob/main/BENCHMARKS.md).
### Security — credential redaction + ref bounds
- **Telemetry no longer leaks URL-embedded credentials.** A token
embedded in a remote URL (`https://token@host/…`, as git clients
commonly accept) is now redacted to `***` before the URL enters any
`:telemetry` span metadata (`ls_refs`, `fetch`, `push`) or the
`[:exgit, :security, :ref_rejected]` event. Previously such a token
could reach telemetry exporters / log aggregators. Prefer the
`:auth` field regardless — it was already redacted.
- **`ls-refs` responses are capped at 1,000,000 refs.** A hostile or
broken server can no longer stream unbounded refs into client
memory; the transport stream halts once the cap trips and the
caller gets `{:error, {:too_many_refs, cap}}`. Tunable via the
`:max_refs` option on `Exgit.Transport.HTTP.new/2`. Real repos
(linux, esp-idf) sit far below the default.
- **Dependencies bumped to clear HTTP-stack advisories.** `req`
`0.5.17 → 0.6.2` and `mint` `1.7.1 → 1.9.0` resolve the decompression-
bomb DoS (CVE-2026-49755), multipart header injection
(CVE-2026-49756), and HTTP/2 CONTINUATION flood (CVE-2026-49754).
The cross-origin credential-leak test suite was re-run against the
new `req` line. The only remaining `mix hex.audit` advisories are in
`cowlib`, reachable solely through the `only: :test` `bypass`
dependency — never part of the published package or a consumer's
runtime.
### Notes
- The optional `:vfs` integration depends on the pre-1.0 `vfs`
package (`~> 0.1.0`), which itself depends on exgit. The cycle is
handled with `runtime: false` + a compile-time `Code.ensure_loaded?`
guard, but treat the integration as pre-release and pin `vfs` if you
rely on it.
### Added — size-aware reads
- **`Exgit.FS.size/3`** — byte size of a blob at a path *without*
materializing its content. The size-aware companion to
`read_path/4`: gate on it before pulling a blob into memory.
O(1) for the in-memory store; on-disk loose objects inflate only
the header. Resolving the path may fetch trees (small) on a lazy
clone, but never the blob — an un-fetched blob returns
`{:error, :not_local}` instead of triggering a possibly-multi-GB
fetch. Directories return `{:error, :not_a_blob}`; gitlink
(submodule) entries return `{:error, :submodule}` — as do
`read_path/4` lookups on them, while `stat/3` reports
`%{type: :submodule}` without fetching.
- **`Exgit.ObjectStore.object_size/2`** — new protocol callback
backing the above. Memory keeps a parallel `sha => size` index
(no extra decompression); `Promisor` answers from cache or
returns `{:error, :not_local}` without fetching.
### Added — observability + workload bench
- **`Exgit.Profiler`** — structured trace of `:telemetry` span
events emitted during a function call. One-shot `profile/1`
returns `{result, %{total_us, totals, peak_cache_bytes,
events}}`; manual `attach/0` + `read/1` + `detach/1` for
long-running processes. Process-scoped — concurrent
profilers don't interfere.
- **`Exgit.Repository.memory_report/1`** — structured memory
report (object counts by type, cache_bytes, max_cache_bytes,
mode, backend) with consistent shape across all object-store
backends; counts report `:unknown` for backends that can't be
introspected (Disk). Suitable for emission into observability
stacks.
- **`bench/agent_workload.exs`** — realistic agent-session
benchmark: clone + prefetch + ls + grep + reads, with `:cold`
and `:hot` variants. Reports per-op breakdown + peak cache
bytes.
- **`test/exgit/fs_grep_git_parity_test.exs`** — correctness
oracle: `Exgit.FS.grep` output vs `git grep -n` for 7
representative patterns. Tagged `:real_git :slow`; gates
every push via the extended CI tier.
- **`docs/NOTES.md`** — design notes for deferred work (LRU
eviction, decompressed-blob cache, literal-string grep fast
path). Captures enough detail that future implementation
doesn't re-reason from scratch.
### Performance
A perf-focused round triggered by a real-world bug report (partial
clone returning empty packs). Adding real-world fixtures
(`cloudflare/agents`, `anomalyco/opencode`) to the benchmark
immediately surfaced three cascading bugs in the core
"clone + prefetch + read" hot path that pyex was too small to
expose:
- **`FS.walk` threaded the updated repo through stream state**.
Previously discarded the grown promisor from `resolve_tree`,
so every walk on a lazy repo triggered a fresh commit fetch.
On `cloudflare/agents` (1,418 files): 7,700 ms → 2 ms per walk.
~3,800× faster.
- **Promisor cache bytes now tracked as compressed**, not
decompressed. Previous accounting over-counted by 3-10×,
tripping eviction during normal prefetch. Combined with the
evictor only dropping commits (not blobs/trees), this could
drop the single commit we'd just fetched for a streaming walk.
- **`:max_cache_bytes` default changed from 64 MiB to `:infinity`**.
Unbounded is the right default for partial-clone / prefetch
workflows. Callers with real memory envelopes (long-running
daemons, low-memory deployments) set an explicit cap based on
their budget.
- **`:max_resolved_bytes` default raised from 500 MiB to 2 GiB**
(matches `:max_pack_bytes`). The old cap blocked real-world
monorepos; `anomalyco/opencode` resolves to 524 MB.
Plus two real optimizations:
- **Adler32 trailer probe** for pack zlib stream tracking.
Replaces an O(log N) binary-search with one linear scan +
one verify probe. `pack.parse` went from 127 ms → 49 ms on
pyex (2.6× faster). Saves several seconds on large packs.
- **Single-pass grep (`matches_in` rewrite)**. Previously
split every blob into lines via regex before matching; now
`Regex.scan` on whole blob + compute line numbers only for
matched files. 13× faster on the common case (repo with few
matches).
**Explicitly reverted:** initial parallel-grep implementation
using `Task.async_stream`. Measured 22× SLOWER on `cloudflare/agents`
— per-file spawn overhead (~50-100 µs) dominates the microsecond
regex work. Default stays sequential. Callers with substantial
per-file work opt in via `max_concurrency: :schedulers`.
Full benchmark methodology + per-fixture numbers in
[`docs/PERFORMANCE.md`](docs/PERFORMANCE.md). Benchmark harness
in `bench/review_bench.exs`.
### Production-readiness round
A follow-up audit after the staff-engineering review closed the
reviewer's "what I didn't look at" list:
- **Config RCE audit** — `Exgit.Config` is read-only data; no code
path executes values from it (no `core.sshCommand`,
`core.fsmonitor`, `core.hookspath`, `insteadOf`, `includeIf`
expansion). A new structural test
(`test/exgit/security/no_shell_exec_test.exs`) asserts `lib/`
contains zero `System.cmd` / `:os.cmd` / `Port.open` /
`Path.expand` / `Path.absname` calls; failure means someone
introduced a new execution path that needs review against the
threat model.
- **Pack.Writer concurrent-build stress** — 3 tests assert 100
parallel builds of identical input produce byte-identical
output, 100 parallel builds of distinct input round-trip
cleanly, and 1000 sequential builds don't leak zlib ports.
- **Decoder fuzz corpus** — 10 property tests, 500 cases each,
exercise `Blob.decode/1`, `Tree.decode/1`, `Commit.decode/1`,
`Tag.decode/1`, and `Pack.Reader.parse/2` on random bytes.
Every decoder's "never raises on untrusted input" promise is
now explicitly tested.
- **Config fuzz corpus** — 3 property tests, 500 cases each,
cover `Config.parse/1` on random bytes, section-header-like
noise, and roundtrip fixpoint. Includes RCE-shape regression
tests that parse `core.fsmonitor` / `core.sshCommand` /
`includeIf` values and assert they are stored verbatim (not
executed or expanded).
- **Walk cross-check vs real git** — `test/exgit/walk_real_git_test.exs`
constructs 5 DAG shapes (fork, criss-cross, linear, deep-fork,
octopus) with real git, then compares `Exgit.Walk.merge_base/2`
and `merge_base_all/2` against `git merge-base` and
`git merge-base --all`. Found and fixed a nondeterministic
LCA-pick bug (criss-cross merges).
### Fixed
- **`Walk.merge_base/2`** picked from the candidate `MapSet` with
`hd(MapSet.to_list(...))`, whose order depends on insertion
hashing. Multiple-LCA cases (criss-cross merges) returned
different SHAs on different runs. Now sorts candidates by
`{-timestamp, sha}` (newest first, SHA-ascending tiebreak) for
a deterministic pick. Documented divergence from git's exact
tiebreak (traversal-order-dependent) in the docstring.
### Added
- **`Walk.merge_base_all/2`** — returns every valid LCA, matching
`git merge-base --all`. Cross-checked against real git on 5 DAG
shapes.
- **`Diff.trees/4` bounds** — `:max_depth` (default 256),
`:max_changes` (default `nil`), and tree-cycle detection via
the descent-path `seen` set. Hostile trees can no longer
overflow the stack or loop forever during a diff.
- **`Index.parse/2` bounds** — `:max_entries` (default 1M),
`:max_bytes` (default 512 MiB), and SHA-1 checksum verification
(`:verify_checksum`, default `true`). Catches hostile indexes
claiming 4-billion entries, oversized inputs, and bit-rot.
### Changed — **breaking (pre-release API redesign)**
These changes were driven by an API audit after the staff-engineering
review round. Exgit has not yet cut an official release, so we're
taking the opportunity to land the right shapes before v0.1.
- **`Exgit.lazy_clone/2` removed.** Fold into `Exgit.clone/2` via
new options:
- `clone(url)` — full clone (eager; default behavior).
- `clone(url, lazy: true)` — refs only; objects fetched on demand.
Returns `%Repository{mode: :lazy}`.
- `clone(url, filter: {:blob, :none})` — partial clone; commits
and trees eager, blobs on demand.
- `clone(url, filter: ..., lazy: true)` — refs only; everything
on demand.
- `clone(url, path: "...", lazy: true)` — returns
`{:error, :disk_partial_clone_unsupported}` (explicit; no
silent `:path`-ignored footgun).
Matches `git clone`'s single-command mental model.
- **`%Exgit.Repository{}` gained `:mode` field** (`:eager | :lazy`).
Defaults to `:eager` in `Repository.new/3`. `clone(url, lazy: true)`
and `clone(url, filter: ...)` produce `:lazy`. `Repository.materialize/2`
flips `:lazy → :eager`. Streaming FS ops (`FS.walk/2`, `FS.grep/4`)
now pattern-match on `:eager` and raise on `:lazy` with a pointer
at `materialize/2` or `prefetch/3`. Callers of `FS.walk/2`/`FS.grep/4`
on lazy repos get a clear error message; the previous
`ArgumentError` checked struct-internal cache emptiness.
- **`FS.prefetch/3` with `blobs: true` flips `:mode` to `:eager`** on
a previously-lazy repo. After a full prefetch every reachable
object is resident, so streaming ops proceed without a second
conversion step. `blobs: false` (trees-only) leaves `:mode`
unchanged.
- **`Exgit.Transport.ls_refs/2` return shape changed** from
`{:ok, refs}` to `{:ok, refs, meta}`. `refs` is always a list of
`{ref_name, sha}` 2-tuples (the protocol spec never described any
other shape); `meta` is a map carrying protocol-v2 side-channel
data:
- `meta.head` — HEAD's symref target (e.g. `"refs/heads/main"`),
present when the server advertises it via the protocol-v2
`symrefs` argument.
- `meta.peeled` — `%{tag_ref => peeled_target_sha}`, populated
when the server emits `peeled:<sha>` attributes on annotated
tags.
`Exgit.Transport.File.ls_refs/2` surfaces `meta.head` by reading
the on-disk HEAD symref. Every user-defined Transport
implementation must update to the new 3-tuple return shape.
### Added
- **`Exgit.RefName`** — validation of git ref names at the transport
boundary. Ports `git check-ref-format` rules; emits
`[:exgit, :security, :ref_rejected]` telemetry on hostile names.
- **`Exgit.Filter`** — structured partial-clone filter specs
(`{:blob, :none}`, `{:blob, {:limit, n}}`, `{:tree, depth}`,
`{:raw, "spec"}`).
- **`Exgit.Repository.materialize/2`** — convert a Promisor-backed repo
into a plain `ObjectStore.Memory`-backed one in a single call.
- **`Exgit.Transport.HTTP.request_opts/5`** and **`.auth_headers_for/2`** —
exposed for test introspection; host-bound credential check is now
the single enforcement point.
- **`Exgit.Transport.HTTP.capabilities_cached/1`** — memoizing
capabilities accessor. Reduces HTTP round-trips in agent workflows
that issue many fetches against one transport (review #13).
- **`Exgit.Error`** — canonical error struct (`%Exgit.Error{code,
context, message}`). New error paths SHOULD use it; existing ad-hoc
shapes (`{:error, atom}`, `{:error, {atom, details}}`) are preserved
for SemVer. v1.0 may coalesce (review #18).
- **`Exgit.Credentials.bind_to/2`** — pipeline-friendly host-binding:
`Credentials.bearer(token) |> Credentials.bind_to("github.com")`
(review #44).
- **`Exgit.ObjectStore.Promisor.empty?/1`** — stable abstraction
replacing struct-peeking on `%Promisor{cache: %Memory{objects: _}}`
(review #17).
- **`Exgit.ObjectStore.Promisor.resolve_with_fetch/2`** — variant of
`resolve/2` that threads the grown promisor back on the
fetch-but-not-found path so the cache side-effect isn't wasted
(review #33).
- `:max_pack_bytes` (default 2 GiB), `:max_object_bytes` (default
100 MiB), and `:max_resolved_bytes` (default 500 MiB) options on
`Exgit.Pack.Reader.parse/2` bound memory on untrusted input
(review #11/#35).
- `:max_cache_bytes` option on `Exgit.ObjectStore.Promisor.new/2` —
enables FIFO-by-commit eviction so long-running agent loops don't
OOM (review #34).
- `:redirect` option on `Exgit.Transport.HTTP.new/2` — `false`
(default), `:same_origin`, or `:follow`. Host-bound credentials
enforce the cross-origin leak check regardless (review #14).
- Protocol v2 `symrefs` argument on `ls-refs` — `Exgit.clone/2` now
picks the server's actual HEAD target instead of guessing
`main`/`master`/first-advertised (review #9).
- `[:exgit, :security, :ref_rejected]`, `[:exgit, :ref_store,
:write_failed]`, and `[:exgit, :object_store, :haves_sent]`,
`[:exgit, :object_store, :cache_overfull]` telemetry events.
- Peeled-tag parsing in `packed-refs` (review #37). Peeled targets
are threaded through for a future fetch-negotiator; not yet
surfaced in `list_refs/2`.
- Dialyzer and Credo in CI (currently report-only; will gate in a
future release).
### Changed — **breaking**
- **`Exgit.FS.read_path/3`**, **`ls/3`**, **`stat/3`**, **`write_path/4`**
now return `{:ok, result, repo}` to support Promisor cache growth
across calls. Callers must thread the returned `repo` forward to
benefit from the populated cache.
- **`Exgit.Transport.HTTP.new/2`** automatically wraps bare auth tuples
(`{:basic, u, p}`, `{:bearer, t}`, etc.) in a host-bound
`%Exgit.Credentials{}`. Legacy callers are transparently protected
against cross-origin credential leaks. To opt out, wrap the tuple
with `Exgit.Credentials.unbound/1`.
- **`{:callback, fun}` auth** now receives the request URL as its sole
argument (was previously mis-called with zero arguments — crash on
first use).
- **`ObjectStore.Disk.import_objects/2`** returns
`{:error, {:partial_import, [{sha, reason}]}}` on any per-object
failure instead of crashing or silently succeeding.
- **`Exgit.FS.walk/2`** and **`.grep/4`** now raise `ArgumentError` if
called on a Promisor-backed repo whose cache is empty, pointing the
caller at `FS.prefetch/3` or `Repository.materialize/2`. Prefixes
no longer silently return empty results.
- HTTP requests explicitly set `redirect: false` on Req — no longer
depends on Req's default cross-origin auth-stripping behavior.
- **`Exgit.Transport.HTTP.ls_refs/2`** now returns a mix of 2-tuples
`{ref, sha}` and 3-tuples `{ref, sha, meta}` — the 3-tuple shape
carries protocol-v2 attributes like `symref-target` and `peeled`.
Consumers that care only about the `{ref, sha}` pair can use
`elem/2` or run through a tuple-shape-agnostic iteration.
- **`Tree.new/1`** accepts `:strict` option; when `true`, unknown
modes raise `ArgumentError` instead of being silently coerced
(review #10). Default behavior unchanged.
### Fixed
- Pack parser no longer raises `ArgumentError` / `MatchError` on
malformed input. Every decoder returns `{:error, _}`.
- `Pack.Delta.apply/2` validates copy offsets, insert lengths, and
the result-size cap — hostile deltas produce tagged errors.
- `Pack.Common.decode_type_size_varint/1` and
`decode_ofs_varint/1` return `{:error, :truncated}` on empty input
instead of crashing on `FunctionClauseError`.
- Loose-object parser validates the declared size against the
content length and rejects unknown object types with a structured
error.
- `Pack.Index` no longer generates descending `0..-1` ranges on empty
packs (removes Elixir 1.19 deprecation warning).
- **Commit.decode/1** and **Tag.decode/1** validate hex-header
values — a structurally-valid commit with non-hex `tree`/`parent`
bytes is rejected with `{:error, {:invalid_hex_header, name,
value}}` instead of crashing downstream accessors (review #23).
- **Tree.decode/1** validates every entry name against
path-traversal rules — rejects empty, `.`, `..`, any `/`, any NUL,
and case-insensitive `.git`/`.gitmodules` (review #2).
- **RefStore.Disk** validates ref names at every public entry
(`read_ref/2`, `resolve_ref/2`, `write_ref/4`, `delete_ref/2`) and
revalidates symbolic targets read from disk. Hostile targets
return `{:error, :invalid_ref_name}` with telemetry (review #1).
- **ObjectStore.Disk.get_object/2** wraps `:zlib.uncompress/1` in
`try/rescue`, returning `{:error, :zlib_error}` on corrupt/hostile
loose objects instead of raising (review #3).
- **Pack.Reader** zlib tracking uses `:zlib.safeInflate/2` +
`:zlib.inflateEnd/1` probes — no `:zlib.uncompress/1` calls on
hostile input; per-probe output is bounded by `safeInflate`'s
implementation-defined threshold (review #4).
- **Pack.Writer.deflate/1** wraps zlib calls in `try/after` so the
zlib port is freed even when `deflate/3` raises. Previously a
long-running server would slowly leak ports under memory pressure
(review #30).
- **Credentials.host_matches?/2** normalizes both pattern and URL
host: ASCII-case-folded, trailing-dot-stripped. `GITHUB.COM`,
`github.com.`, `GitHub.com.` all match a `"github.com"` binding.
Host-confusion attacks like `evil.comgithub.com` still correctly
fail to match (review #5).
- **Custom `Inspect` impl** for `%Exgit.Credentials{}` — default
Inspect would dump the raw token into crash logs (review #15).
- **Walk.merge_base/2** maintains `stale_in_queue` incrementally;
the early-termination check is now O(1) instead of O(Q) per
iteration. Merge-base on histories with hundreds of shared
ancestors is no longer O(Q²) (review #25).
- **Walk.parse_timestamp/1** uses a module-attribute regex compiled
once at load time instead of per-call (review #27).
- **Config** pre-compiles section-header regexes at module load
(review #29).
- **Config.parse/1** uses `case` instead of an unconditional match
on `parse_key_value/1`'s result — future branches that return
`{:error, _}` cannot crash the parser, matching the moduledoc's
"never raises on untrusted input" contract (review #28).
- **Pack.Reader** bounds `by_sha` + `resolved` memory via
`:max_resolved_bytes` so a pack of many small OFS_DELTA chains
can't balloon heap beyond the per-pack cap (review #11/#35).
- **ObjectStore.Disk** `pread_tail/3` size-probes the pack file and
reads the full object body instead of capping at 128 KiB. Objects
larger than 128 KiB in packs now decode correctly; previously
they silently returned truncated bodies (review #12).
- **Promisor.collect_commit_haves/1** uses a `:gb_trees` priority
queue keyed on recency instead of sorting the full commit map.
O(N log K) where N is the 256-cap, not O(K log K) per miss
(review #32).
- **`Exgit.clone/2`** picks the default branch from the server's
HEAD symref (via protocol-v2 `symrefs` on `ls-refs`) instead of
guessing from advertised refs (review #9).
- **`Exgit.lazy_clone/2`** emits `[:exgit, :ref_store,
:write_failed]` telemetry if a ref-store write fails during
initial seed, instead of silently dropping the ref (review #8).
- **`Exgit.push/3`** emits an empty-but-valid PACK header when
pushing a fast-forward that needs no new objects, matching git's
`send-pack` wire shape; pure-delete pushes still send no pack
(review #6).
- **RefStore.Disk.list_loose_refs/3** caps recursion depth at 16 and
refuses to follow symlinks, defending against symlink loops in
ref directories (review #36).
- **RefStore.Disk** parses peeled-tag lines in `packed-refs` instead
of silently dropping them (review #37).
- **FS.resolve_tree/2** accepts a ref that points directly at a
tree in both the string-ref and raw-SHA branches (review #40).
- **FS.resolve_tree/2** disambiguates 20-byte binary inputs: a
binary of all printable ASCII with non-hex characters is treated
as a ref name, not a SHA (review #41).
- **FS.compile_glob/1** returns a harmless always-false regex on
compilation failure instead of raising (review #20).
### Security
- **CVE-worthy**: remote-controlled ref names can no longer escape
the repo root via `Path.join`. `Exgit.RefName` validates every ref
at the wire perimeter; `RefStore.Disk` re-validates defense-in-depth.
- **CVE-worthy**: hostile trees containing path-traversal entry
names (`..`, `/foo`, `.git`) are rejected at `Tree.decode/1` —
they never reach FS operations or a future checkout.
- **CVE-worthy**: a malformed commit (structurally valid but with
non-hex `tree`/`parent` headers) previously DoS'd every operation
that called a Commit accessor (walk, diff, push, FS). Validation
moved into `decode/1`.
- **CVE-worthy**: credentials set via bare auth tuples are now
host-bound automatically. Cross-origin redirects cannot leak the
token regardless of Req's redirect behavior. Host matching is
ASCII-case-folded and trailing-dot-stripped.
- Pack parser bounded at 2 GiB pack / 100 MiB per-object /
500 MiB resolved-total by default; no hostile server response can
unbounded-allocate the BEAM heap.
- `%Exgit.Credentials{}` has a custom `Inspect` impl that redacts
auth values; crash logs, SASL reports, and IEx sessions do not
leak tokens.
- Loose-object zlib decompression is wrapped in `try/rescue`;
corrupt or tampered objects return tagged errors instead of
crashing.