Technology & Innovation
Linux's version control history before Git is really a story about scaling past what any tool was designed to handle. For years it was just Linus, developers submitted patches, trusted lieutenants reviewed them, and Linus manually merged everything. When that stopped working, the team turned to BitKeeper in 2002 because it was genuinely the only tool capable of handling Linux's scale: distributed, fast, and able to juggle hundreds of contributors without a central bottleneck.
The arrangement always had an uneasy feel to it. The open-source community, Richard Stallman especially, hated that the world's flagship free software project depended on proprietary tooling. When Andrew Tridgell, the Samba developer, reverse-engineered BitKeeper's protocol, the company revoked its free license for Linux developers. It wasn't entirely surprising. It was still a crisis.
The problem wasn't just losing one tool. CVS and Subversion existed, but they were centralized and sluggish, built for a different era of development where one server handled everything. Linux needed something distributed, fast, and honest about data integrity. Nothing on the market fit the bill.
Linus's design choices for Git weren't theoretical, they came directly from watching Linux kernel development break under the wrong assumptions.
The most fundamental decision was making it fully distributed. Every developer gets a complete copy of the repository, full history included. No central server required for day-to-day work. This wasn't idealism; it was pragmatism. A global contributor base working across time zones can't depend on a central server being available when they need to commit.
Most version control systems track changes as diffs, the difference between one file state and the next. Git tracks snapshots of the entire project at each commit. It sounds wasteful, but it makes branching and switching between states near-instant because there's no chain of patches to reconstruct. Git uses compression aggressively enough that the storage cost is rarely a real concern.
Every file and commit gets identified by a SHA-1 hash of its content. This isn't just clever, it means any corruption in the repository is immediately detectable. The hash changes if anything does. The repository becomes a giant key-value store where identical content always maps to the same key, enabling deduplication without any extra logic.
The staging area was a genuinely novel idea. Rather than committing whatever changed in the working directory, Git lets you carefully select exactly what goes into each commit. This sounds like a minor convenience; in practice, it's what makes clean, reviewable commits possible even when you've been hacking messily across five files at once.
And branches, in Git, a branch is just a named pointer to a commit. Creating one is effectively free. This single decision changed how developers think about their workflow. Feature branches, experiment branches, throwaway branches, none of it carries the weight it did in Subversion, where branching was an expensive, significant act.
Linus built all of this optimizing for speed first. The ergonomics were secondary. That decision would cause pain later, but it's also why Git could handle Linux kernel scale from day one.
Git solved Linux's immediate problem within months. But the ripple effects went much further than anyone anticipated.
GitHub launched in 2008 and added a social layer on top of Git's mechanics, pull requests, forks, issues, discussions. That combination turned open-source contribution from a somewhat arcane patch-submission workflow into something closer to a social platform. Today GitHub hosts over 400 million repositories. The numbers matter less than the behavioral shift: forking a project and proposing changes became something a developer could do in minutes, not hours.
Open-source software scaled accordingly. The JavaScript, Python, Rust, and Go ecosystems, all of their package registries and contribution models assume Git-based workflows. Hugging Face, essentially the GitHub of AI models, runs on the same primitives. The infrastructure of modern software development, from npm to PyPI to the entire DevOps toolchain, is built on the assumption that Git is there underneath everything.
None of this means Git is perfect. It just means its problems are now everyone's problems.
The learning curve is genuinely steep. Rebase, reflog, the staging index, detached HEAD states, these concepts are not obvious, and the error messages rarely help. The industry has built a cottage industry of GUI wrappers and explainer articles trying to paper over this, but the underlying complexity hasn't gone away.
Monorepos at extreme scale expose Git's limits. Google's internal codebase has two billion files. Git wasn't designed for that, and Google built its own system (Piper) rather than fight Git's assumptions. Microsoft had similar problems with the Windows codebase and built VFS for Git as a workaround. These aren't edge cases, they're what happens when an organization's codebase grows for a decade under one roof.
SHA-1, the hashing algorithm underpinning Git's integrity guarantees, was demonstrated to be vulnerable to collision attacks. Git 2.29 added SHA-256 support, but migrating existing repositories is messy and adoption has been slow.
Git's core design is essentially frozen, the object model hasn't changed meaningfully in twenty years, which is both a sign of how right the original design was and how cautious the project is about breaking compatibility. The action is at the edges: Git LFS for large files, partial clones to avoid downloading entire histories, worktrees for parallel development across branches.
The more interesting shifts are happening at the platform layer. GitHub Copilot is already suggesting code mid-PR. Automated dependency updates, security scanning, AI-assisted conflict resolution, the platforms that sit on top of Git are evolving fast even if Git itself isn't.