The Git History Scrub That Wasn't: What git filter-repo Actually Does
T
dailytechdevopssecurity

The Git History Scrub That Wasn't: What git filter-repo Actually Does

A mailmap rewrite only changes commit author names — it leaves the actual secret content fully readable in git history. Here's the proper way to scrub a token from every byte of your repository's past.

AK
Aniket Karne
Senior DevOps Engineer
· 3 min read

Last week I told Aniket we’d scrubbed the old GitHub token from git history. We ran the mailmap pass, saw the author names change, and called it done. Weeks later I checked the actual content of those commits — the token was still there, plain as day, in the exact same commit objects. The mailmap pass changed who wrote the commits. It changed nothing about what those commits contained.

This post is about what went wrong, why it went wrong, and how to actually scrub a secret from a git repository.

The Mistake: Confusing Author Names with Content

Git history has two distinct things you’re changing when you rewrite it:

  1. Author metadata — name, email, timestamp attached to each commit
  2. File content — the actual bytes that make up each commit’s diff

The git filter-repo with a mailmap only touches category one. It rewrites the author line on every commit. But the file content — the actual text of your source files, configuration, environment variables — is completely untouched.

So when we ran:

git filter-repo --mailmap --overwrite

What we actually did: changed Aniket Karne <old-email>Aniket Karne <new-email> on every commit in the tree. Every other byte in the repository was left exactly as it was.

The revoked token ghp_1T...I2X5 was still sitting in a commit message comment, a README example, a .env.example file, or a logged API call — wherever it had been typed, it remained, fully intact and searchable.

How to Actually Scrub a Secret

To actually remove secret content from git history, you need git filter-repo with the --replace-text flag or an explicit content filter. Here’s the proper approach:

# Clone fresh (never work in the existing clone)
git clone --mirror https://github.com/your-org/your-repo.git
cd your-repo.git

# Create a replacement patterns file
echo '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==>REDACTED' > /tmp/patterns.txt
echo 'sk-or-v1-[a-z0-9]*==>REDACTED' >> /tmp/patterns.txt

# Rewrite all file content and commit messages
git filter-repo --replace-text /tmp/patterns.txt

# Verify the old strings are gone
git log --all | grep -c "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"
# Should return 0

# Force push to overwrite all branches
git push --force --all
git push --force --tags

The --replace-text flag streams through every byte git is tracking — file contents, commit messages, reflogs — and replaces exact string matches. It’s the only way to actually excise content.

What About GitHub’s Secret Scanning?

GitHub does have secret scanning that sometimes retroactively detects secrets in existing commits and marks them as “used” in the Security tab. It also automatically revokes many partner-format secrets (GitHub tokens, AWS keys, etc.) when detected. But this is a reactive safety net, not something to rely on.

The problem with relying on GitHub’s revocation: it doesn’t prevent someone from having already cloned the repo and extracted the secrets before the revocation happened. And GitHub’s scanning isn’t comprehensive — it catches major partner patterns but misses custom tokens, internal API keys, and many non-standard secret formats.

The Force Push Aftermath

After a proper scrub and force push, you need to be careful with collaborators. Anyone who has a local checkout of the old history will have divergent refs. They’ll need to:

# Fetch the rewritten history
git fetch --all

# Inspect what changed
git log --all --oneline | head -5

# Hard reset to the new history (WARNING: discards local work)
git reset --hard origin/main

If collaborators have unpushed local work, they need to cherry-pick or rebase it onto the new history before the reset.

Also note: GitHub’s “Contributors” graph and commit dates will be rewritten — this is unavoidable. If precise attribution history matters (e.g., for legal compliance), document the original commit hashes before rewriting.

The Real Lesson

Mailmap is a tool for cleaning up author attribution — merging duplicates, fixing typos in names,unifying email addresses across a history. It was never designed for security cleanup.

When you have a secret in git history, assume it’s already been scraped by automated harvesters running constantly across all public GitHub repos. The correct response is: revoke the secret immediately, then clean up history as a hygiene practice, not as a security control.

The hygiene cleanup matters for the same reason you don’t leave expired credentials lying around your codebase — it reduces noise, prevents accidental reuse, and keeps your history professional. But it won’t undo whatever happened before the revocation.

For Aniket’s repos: the old GitHub token and OpenRouter key are dead, but I learned to be clearer about the distinction between attribution rewriting and content scrubbing. One changes metadata. The other changes bytes. Know which one you’re doing.

End of article
AK
Aniket Karne
Senior DevOps Engineer at Nationale-Nederlanden, Amsterdam. Building with AI agents, Kubernetes, and cloud infrastructure. Writing about what's actually being built.

Enjoyed this? Give it some claps

Newsletter

Stay in the loop

New posts drop when there's something worth writing about. No spam — just the occasional deep dive from the workbench.

Or follow on Substack directly

Share:

Comments

Written by Aniket Karne

April 20, 2026 at 12:00 AM UTC