Git: please stop squash merging!
TRADUĂĂO INCOMPLETA
Intro
It says â19 minute readâ.
If youâre thinking âitâs easy, Iâll keep squashing so I can save those 19 minutes of my lifeâ, well, I must say that it is not an excuse, you can stop squashing right now without reading this. But if you still thinking that using squash merging is a good idea, please, keep reading this. Youâll probably wonât want to squash your anymore after reading this, except in some very specific cases (and those cases probably are not what youâre expecting).
This is long and dense, so, no more intro. Here we go!
Goals
- Explain what squash merges actually are, detailing how Git works under the hood;
- Show how the most well-known âprosâ about squash merging are conceptually wrong;
- Show situations that squash merges can be harmful;
- Show what you can (and should) do instead of using squash merge;
Non-goals
- Tell something that works for me and dictating that it will also work for you;
- Create another âgood practice guideâ for people follow without questioning if is true. Be skeptical;
Why do you use squash merge?
If you consider that squash merge is a good practice, have you ever thought why it is a good practice? Or are you only repeating something that you have heard?
Be honest.
In the past, I also thought it was cool, but as I studied Git a little deeper soon I figured out that it doesnât make sense. Not only that, but it is never mentioned as good practice in Git documentation, not even as a practice:
-
Pro Git (the official Git book) has two chapters and a appendix about merge and it barely mentions squash in the merge context (it only mentions here and here, using squash only as an intermediate tool);
-
Apart from
git merge
manpage and Pro Git, the official Git documentation only mentions squash in gitfaq, focusing on the problems of doing that; -
Last but not least, Git itself doesnât have a single âsquash and mergeâ command, you have to run two commands to perform it (weâll discuss about that later).
So, before we go deeper, I need you to ask you two things:
First, what do you think that a squash merge is?
(go ahead, write it, itâs only a textarea without any hidden JavaScript)
Second, why do you think you should do it, given the first answer?
Ok, thanks! Weâll come here later. Now, before I tell you why you shouldnât squash your commits (and before you squash your commits again), we need to have it clear what a squash actually is. And I must say most people who squash do it because they donât know what a squash merge is. And, again, I must say this is because they have two major misconceptions about Git.
Thereâs where we start.
The first misconception: what a commit stores
I strongly recommend you that you read Pro Git, the official Git book. For our purposes here, section 1.3 and chapter 10 will be enough. But by now what we need to know from it is that Git commits donât store changes. They store snapshots.
Repeat after me: âGit commits donât store changes, they store snapshotsâ.
And thatâs the first misconception. When weâre using commands such as git show
,
git diff
or external tools such as the Pull Request UI on GitHub we think
that commits are sets of changes of states. But no, each commit is a state
plus some metadata.
Under the hood, Git stores the repository data in the so-called objects. Each object stores data and is identified by its SHA-1 hash. Roughly saying, we have three main types of objects:
- Blob: contains file contents;
- Tree: represents a directory. It is a list of files or directories, the hash of their contents (a blob if it is a file or a tree if it is a directory) and their file permissions;
- Commit: contains the author, the commiter, its timestamp , its message, its parents and the snapshot
It is important to say that the child commit points to its parent or parents, but the parent commit doesnât point to its children. The first commit doesnât have any parent, and a merge commit has two parents or more. In the case of merge commits, the parent order matters.
The snapshot of a commit is just the tree object that represents the root directory of the commit when it was created.
Perhaps you are thinking: âthat is not possible, that would be a waste of disk spaceâ. But that it is not true: a file that hasnât changed between two commits has the same content in both states, so their contents are stored in the same blob. This is also true for directories.
And if youâre thinking âthis is not true, I saw something about deltas, those are the commit changesâ, yeah, those deltas are changes but they are not changes between commits. They are related to a compresion that Git does using something called packfile, but you donât need to know about them by now. Only keep in mind that Git somehow compresses the objects, so the disk space is not something you should care too much.
Just one more thing: branches ares not objects. They are references, that is, only pointers to commits. This is, thereâs no such thing as âa commit is in a branchâ, as a branch is not a set of commits as we may think. But it is very common to say that, even though it is not 100% correct. It is just a shorter way to say âthe commit is an ancestor of the commit pointed by the branchâ.
For now on, sometimes Iâll refer to a commit pointed by a branch as just âa branchâ.
I disccussed a little more about objects here, but again, I strongly suggest you to read about that on Pro Git for more datailed info.
How merge works
Now that we know what a commit and a branch are, we can proceed to see how merge works.
In Git, by âmergeâ we can think as three things, at least:
- The command
git merge
- A merge commit (i.e. a commit with two or more parents)
- The mechanism used by
git merge
to join the contents of both branches
When you run git merge <another branch>
, you tell Git to join the contents of
both branches and create a merge commit. The merge commit will have as the first
parent the one pointed by the current HEAD and the commit pointed by the
merged branch as its second parent.
When the commit of the current branch is a direct ancestor of the commit of the
merged branch, Git will perform a fast-forward. Fast forwards donât create
merge commits, they only make the current branch point to the same commit as the
merged branch, so they are not true merges. You can change this behaviour by
using --no-ff
. GitHub also provides a feature for merging PRs without
fast-forwarding them in those cases. Personally, I donât like fast-forwards, so
I use no fast forward when possible.
So, by now we know what is 1 (git merge
) and 2 (merge commit), but what about
3? What is the mechanism that Git uses to join the files of both branches? Well,
Git performs an algorithm called three-way merge that uses the snapshots of
three commits in order to figure out what should be kept in the merge commit and
what should not:
- the commit pointed by the current branch (named A);
- the commit pointed by the branch that we want to merged (named B);
- the commit that is the last common ancestor between A and B (named O).
When performing a three-way merge, Git will keep everything that are the same in A and B. If thereâs something different, it will compare O and A, and O and B. If something doesnât changed from O to A but changed from O to B, it keeps the change introduced in B, and vice-versa. If O, A and B differ, then we have a conflict and you need to manually fix it.
This picture shows how the three-way merge works. Different colors means different contents of a file:
Note: for plaintext files, if thereâs a conflict Git will perform three-way merge comparing the lines, so it can merge the contents of a file that donât overlap. If thereâs a overlap, then it puts the conflict marks.
This mechanism is also used in other commands, such as cherry-pick, revert, rebase and stash, but this is a topic for another post.
This is really brief explanation about merge and thereâs a lot to talk about (e.g. how Git handles renames, file permissions, submodules and so on). Here I discussed a little more about merge, but by now this is enough.
As an exercise: try to imagine how hard and expensive merge would be if Git stored changes instead of snapshots.
What squash merge actually is
By now, we know what is a merge. But how about a squash merge? I just googled âwhy squash merge?â and I found these definitions about what it is:
-
âInstead of each commit on the topic branch being added to the history of the default branch, a squash merge adds all the file changes to a single new commit on the default branchâ
-
âIt turns all your changes into one single commit and then adds that to the main branchâ
-
âBy squashing commits, developers can condense a series of small, incremental changes into a single meaningful changeâ
-
âSquashing retains the changes but discards all the individual commitsâ
Well, time to revisit your answer. Click here to see it. Is your answer similar to those? Well, I ask you that because thereâs something that all those answers have in common: they are all wrong!
And they are wrong because, remember: Git commits donât store changes, they store snapshots.
Again: Git commits donât store changes, they store snapshots.
Now, an exercise: try to rewrite those answers (or your answer, if it has the same idea) replacing the wrong idea of commits storing changes by the correct idea of commits storing snapshots. Hereâs another textarea only for your convenience:
Was it hard? Well, this is because that idea of what a squash merge is is based on a wrong idea of what Git is, and it doesnât make any sense in real life.
I bet that if you regularly squash merge you do it by using the UI of GitHub or similar, and you never did it locally on your machine through CLI, using this button:
And I say that because, as I said before, thereâs no command for squash merging in Git! In fact, in order to do the same thing as the âsquash and mergeâ button on GitHub youâll need two commands:
git merge --squash another_branch
git commit
Several useful Git commands that are a single command originally were a set of
commands, such as git stash
,
git subtree
and so on, but this
never managed (at least so far) to be a single command. That should raise an
alert on you that something is out of place here.
Letâs check what happens when we perform a squash merge locally:
-
The first command is the squash itself. It merges the current branch with the other one, but it doesnât create a merge commit. Instead, it only merges the files (using the mechanism that we discussed before) in the filesystem (in Git jargon, working directory) and stages them just like
git add
does (in Git jargon, it adds them to the index); -
The second command is just a standard
git commit
. When we perform agit commit
we create a new commit after the state of the index (the staging area), that contains the merged files and the untouched ones. The previous step creates a commit message containing a list of the messages of the branch that was squashed, and it is used as template here.
A commit whose contents are a merge of the two merged branches. Hmmmm, sounds familiar? Isnât it a merge commit?
Oh wait, isnât it a merge commit????
Are squash merge and standard merge the same??
WHAT???
Please, calm down. They are different. How? Remember that a merge commit has two or more parents? Thatâs the difference. This commit has only one parent: the commit pointed by the current branch before merging. This way, thereâs no reference to the other branch, and thatâs the only difference between the squash merge commit and the true merge commit.
Donât you believe? What more could change?
- The author, commiter, timestamps or message? These are only metadata;
- The snapshot? No, it is the same, as it was generated by the same mechanism.
The only thing left is the parents. Squash merge is the same as the true merge but with a missing information: the reference to the merged branch.
Why is it good?
Is it good?
This leads to the second misconception.
The second misconception: the fallacy of the clean history
I donât know if you still think that squash is a good idea after knowing what it really is. But anyway, letâs read again why you think that squash merges are good: click here.
Just like I did before, here are some answers from some of the first results after a quick Google search for âwhy squash mergeâ:
- âSquash merging keeps your default branch histories cleanâ
- âYour base branch remains cleanâ
- âThis helps developers to maintain a clean Git commit historyâ
- âThis [squash merging] helps keep your code history cleanâ
See? It is very common to say that squash merges keeps the history clean. And, was we saw, the only difference between the squash merges and true merges is that squash merges donât preserve the reference to the original branch, so âcleanâ in this situation means âless informationâ and âless commitsâ. And I ask you: why a commit history with fewer commits are the better? Hereâs another textar⌠(no, enough textareas!)
If it was true, the best commit history would be not commiting at all. So, the best way to use Git would be not using Git. Something is wrong here.
That is our second misconception. Cleaner commit histories are better, indeed, but a history with fewer commits is not cleaner. So, what is a good commit history?
A good commit history
A good commit history is a history that contains good commits.
Personally, Iâm intrigued that it is a common sense that most developers agree that code quality and data quality are important while it is harder to find someone that really cares about the code repository quality. And the repository is the database where we store code.
Lets recap some practices that we commonly associate with code quality:
- Code that is easy to understand;
- Code that implements simple solutions;
- Code that is easy to debug, when needed;
- Code that is simple;
- Code that can be tested;
- Code that follow the same codestyle;
- Good documentation.
And about data quality:
- Data that is easy to retrieve;
- Data that is accurate;
- Data that is clean.
I donât want to write a commit guideline here, but there is a lot of good resources that you can read about. Iâll leave some of them at the bottom of this page. I donât think that you should follow them as a religion (I think that you should follow them as long as them make sense), but there are some practices that worth be mentioned here:
- Atomic changes: each commit should introduce an indivisible codeset;
- Write good commit messages: use the commit message to describe the change you introduced;
- Donât commit incomplete work: if it is not ready, then donât commit (use the staging area);
And we if we do that, weâll have more commits, but they will be good commits. They will be meaningful and they will be easier to debug with tools such as the ones that I mentioned here. Not only that, it will be easier to cherry-pick and revert specific changes.
There are some commit guidelines, I donât think that you need to adopt any specific one, but you can take a look a decide if they make sense to you. Or they can be a inspiration for you create your own:
-
the Linux kernel, that is the project that Git was created for has a strict guideline for writing patches (commits that are sent by email). I also had a patch that was rejected by doing too much, and I need to rewrite my history and send again.
-
the conventional commits specification is less strict then the Linux kernel, as several projects follow them;
-
Gitmoji is similar to conventional commits, but it uses emoji to identify the commit types (for good or for bad).
For example, Marlin, a firmware for 3D printers, uses Gitmoji. Look how it is easy to find commits with bugfixes:
Last but not least: donât fear rewriting history. Of course it can be dangerous, but you can use it in your favor to create a clean commit history. I suggest you to read this. I also suggest you to include commit history reviewing to your code reviewing process, just like the Linux kernel does (and rejecting bad commit histories like they did with me!).
When squash can be evil
I think I have enough evidence for you to see that squash is not exactly something good and the arguments to use it are based on wrong principles of what Git is and what are a so called clean history. But if you asking if thereâs a case where squash merges can break something, I once saw it silently breaking a repository with submodules as a consequence of not having the reference of the merged branch.
At first I started to write about that situation but I thought it will make this article even longer than it already is, so I wrote another article only about it here.
So, what should I do?
If you are used to perform squash merges probably you are thinking âwhat should I do?â, because it still feels comfortable to keep squashing even though you they are not the best choice. Letâs do the right things from now on.
But squashing makes my life so easyâŚ
Letâs imagine you are driving a car with manual transmission. You have 5 forward gears and one reverse. Do you think it is a good idea to drive your car backwards because you donât want to shift gears? I hope you donât, and I hope that you at least suspect that the car have 5 gears instead of 1 because things are not so easy and pretending they are easy is not the solution (err⌠unless you drive an automatic).
And here it is the same. We need to understand our tools and how to use them.
Some people say: âif you squash you have a linear history, only seeing the merge commitsâ. Ok, but you can see it without squashing (and without losing the original commits) by running:
git log --first-parent
âSquash merges are easy to revertâ:
git revert -m 1 <commit>
âSquash merges are easy to cherry-pickâ:
git cherry-pick -m 1 <commit>
âSquash merges are easy to <insert something>
â: read the documentation about
how to do something
properly! No excuses!
If you are convincedâŚ
Here are some things that you can do:
-
Try to write meaningful commits like we discussed before. It is hard at first, but it will soon become easy and it will pay your effort;
-
On you code reviews, also review the commit history. Check if it is clean and, if it is not, ask to the author of the code to rewrite the history, just like you would do with code;
-
If you are a junior developer, listen to the seniors developers but donât take everything that they say as true. Question everything, and that will make you grow;
-
If you are a senior, donât tell juniors to do something that you arenât certain that is right and that you believe that it is right because another senior told you to do that when you were a junior. Donât be a parrot;
-
The same applies here. Question everything that I said. Donât do anything only because I told here.
If you are not convincedâŚ
I tried to do my best.
Is there any good use case for squash merge?
Oh yeah, of course. It is a tool just like many others that Git has, so of course it can be useful! But there are a lot of more useful Git tools that people generally donât know, for example, some searching and debugging tools that I explained here.
I stopped to think for about two minutes in a use case for squash. And here it is one: supposing that you have an old branch with only one commit diverging from your main. Now you want to merge it, but the main branch changed so much that the code would need to be compatible with those changes.
An option here is to rebase foo
onto main
, make the code compatible with the
new main
, and then git commit --amend
. Other is to cherry-pick foo (in
practice, it would be the same as rebasing). But you can also do this:
git checkout main
git checkout -b bar
git merge --squash foo
# edit the code in order to make it compatible
git commit
In that situation, weâre using git merge --squash
to apply the changes
introduced in foo
to our files, without commiting. Then we make the code
compatible and create a new commit.
And thatâs it. A simple tool that may be useful in very specific situations.
Conclusion
Thanks for your time! I hope that this may be useful. If something is wrong, please open a issue.
And for the last time, remember: Git commits donât store changes, they store snapshots!
Further reading
The references that I mentioned in this text are listed here.
For general Git information:
- Pro Git by Scott Chacon: Itâs the official Git Book. Iâll select the sections that are most important here.
About Git internals:
- Pro Git - What is Git: explains what Git is and compares it with delta-based versioning systems;
- Pro Git - Git Internals: explains how Git works under the hood.
About merge:
- Wikipedia - Merge (version control): explains merge and three way merge
About commiting:
- How to Write a Git Commit Message by cbeams: Advices about what to put in a commit message;
- Git 101 by Matheus Tavares: itâs a general presentation about Git but it focus on commits after slide 29;
- Pro Git - commit guidelines: chapter of Pro Git about how to commit;
- Philosophy of Linux kernel patches: about patches (commits) in the Linux Kernel;
- Conventional Commits v1.0.0: a convention about commits that is adopted by several projects;
- Gitmoji: similar to conventional commits, but using emoji.
About rewriting history:
- Git: Rewriting History 101 by Matheus Tavares: about when, how and why rewrite the commit history;
- Pro Git - rewriting history: Chapter of Pro Git about rewriting history.
About squash merging (note that both discuss about the fallacy of the clean history):
- Squash commits considered harmful by Manuel Odendahl. After writing some sections here I found this article that is quite similar;
- Squash merges are evil by L. Holanda. Another good article against squash merge.
Other articles about Git that I wrote: