Sourcetree

Introduction

Git is used everywhere in the industry and for a good reason: it is an extremely valuable tool. Even if you are alone, on a one-man project, even a hobby project, even a draft to learn stuff, even if everything is local. Git gives you 5 impressive abilities:

  1. you can supervise all your changes: the undergoing changes, the previous changes and actually the whole history.
  2. the game-changer: you can recover code from any tracked file, even a deleted file or folder.
  3. you can actually go back to any point of the history of changes.
  4. you can develop features independently.
  5. you can share your code.

Git is extremely valuable to:

  1. supervise all your changes
  2. recover code, even deleted files or folders
  3. go back to any point of the history of changes
  4. develop features independently
  5. share your code

ability n°2 is awesome but cannot work when you are coding in a file you just created, as Git only registers code changes when explicitly told so. Not having registered the file even once, Git has no point of reference to perform a recovery for you. Personally, I took the habit to stage a file as soon as I create it (I explain the jargon below).

ability n°2 fails on new files: do track them in Git as soon as you create them.

CLI vs GUI

Lots of people swear by the command line but the core of Git is better used in a GUI like Sourcetree for two simple reasons:

  1. all the main concepts are visual in nature: branches, commits, staged files and diffs.
  2. few commands are needed to reap huge benefits and they are simple enough to be done in Sourcetree.

Git is better used in a GUI like Sourcetree:

  1. it is visual in nature (branches, commits, the stage & diffs)
  2. few commands are needed to reap huge benefits

If you need other commands or options, consult the official documentation: it's well done and there are too many options and syntaxes to remember anyway. This is by design. Git provides a rich set of tools so as to not force any worflow. You may come across projects that are using Git in different ways, especially with regards to branching and merging.

Finally, note that Git works at the level of characters: it does not track code changes per se, as it has no notions of syntax nor semantics. As such, nothing it reports makes actual sense. All Git can tell is: “there's this text here and this text there”. All the more reason to use a GUI for me.

For specific needs, refer to the official documentation.

Git's goals

Git is complex at first. But the reality of what it manages is very mudane:

  • we want people to work together on a project
  • we want people to transfer code changes to one another
  • we have to group related changes together when coding, as Git knows nothing about code
  • changes are:
    • adding some code (or a file or a folder)
    • modifying some code
    • moving/renaming a file or a folder
    • delete some code (or a file or a folder)
  • since changes are so diverse, we want good control over their grouping
Git's approach

Here is how Git organizes its response to the above problems, from bottom to top.

layergoalwhat it iswhat you do
0 Coding changes develop a feature, fix a bug many changes of texts, files and folders coding
1 Staged changes organize related changes
(Git knows nothing about code)
groups of code changes stage / unstage / discard
2 Commit identify and isolate groups of related changes named groups of code changes commit
3 Branch maintain a history of changes sequence of commits merge / rebase / cherry-pick
4 Repository join branches that are independent collection of branches push / pull / init / clone

We can see that every layer has a problem of organizing a multiplicity. The next layer offers us the tools to solve the problem of the previous layer. But it itself gets into the same kind of troubles, until we get to the Repository layer.

Commit

Now, here is how Git articulates the tracking of changes from layers 0 to 2, which form this basic loop of programming:

Layers from 0 to 2 form this basic loop of programming:

  1. Git detects all the changes you do.
    Your root folder and every sub folder at every level are under its watch. Git calls this whole space the working tree.
  2. you have to explicitly tell Git which change to track.
    Doing so is called staging a change. Git notes down the change in what is called the index. Of course, you can remove a change from the index: it is called unstaging. Interacting with the index is how you manage the grouping of the changes.
  3. you can review the undergoing changes by comparing the content of the index and the latest registered content.
    Of course, at first, no content has been registered.
  4. you can register new content by packing all the changes grouped in the index, under a description you provide.
    Doing so is called committing the changes and the described pack is then called a commit. Git cleans up its index and the registered content becomes the new point of reference.
  1. Git detects all the changes you do.
  2. you have to explicitly tell Git which change to track.
  3. you can review the undergoing changes by comparing the content of the index and the latest registered content.
  4. you can register new content by packing all the changes grouped in the index, under a description you provide.
    ⇒ that is a commit.

At this point, we may think we are ready to exchange commits. But this would not work well, as a commit is a photograph of your changes done on an existing state of the code. So it only makes sense in the context of this existing state, which is the result of your developments so far. And the latest step to reach that existing state is also a commit.

The conclusion is simple: a commit always follows another commit and we cannot lose track of this.

Branch

A sequence of commits is a branch. This is layer n°3, necessary because a commit only makes sense as the follower of another commit. We thus cannot transfer commits in a carefree manner, decorrelated from any history. We have to do all transfers at the level of branches, with a source branch and a target branch.

The key insight is that while branches move forward independently from one another, they have stemmed from a common starting point. Git will recoup what happened on both branches, up to their common commit ancestor.

The standard way is to merge a branch A into a branch B. Branch A is supposed to contain your work and branch B is the reference. Git creates a new commit in branch B by automatically replacing the content from B by the content from A. This is normal: it is exactly the history told in branch A.

Except if the history of branch A is wrong. It is wrong when the history of B has moved on. When the same lines of code have been modified on both sides, Git has to leave the situation in your hand. Such places are called merge conflicts as both histories conflict during the merge. You manually edit the code and let Git know when you are done, effectively resolving the conflict.

What makes merge the standard is that it respects both histories. The two other ways of transferring commits, rebase and cherry-pick, do without the story carried by the source branch because they just take the commits from there and apply them on the target branch, trying to build the story what you were coding on the target branch all along.

Layer n°3 is the branch, i.e. a sequence of commits. We merge one branch into another, effectively applying what happened in the source branch while being aware of what happened in the target one. It works well because there is always a common commit ancestor that Git uses as the starting point. In case the same lines of code have changed on both sides, it is a merge conflict and you have to manually resolved it.

The two other ways of transferring commits, rebase and cherry-pick, take the commits from the source branch and apply them on the target branch.

Clone, pull, push, init

Layer n°4 is the repository, a collection of branches. This is possible because Git can efficiently manage a lot of branches. This makes independent developments possible. And the merge operation joins changes from independent branches and makes it possible to synchronize two repositories on different machines.

This is where the whole “branch” concept gets really smart: you clone a branch from a remote repository on your local machine and you branch out from it to start coding, independently of what can happen on the clone. It's not yours actually, you just have a copy. Maybe you will have to pull more commits from the original remote branch, to keep up to date.

But once you are done, you use the merge operation to incorporate your changes in your local clone. And this is awesome: everything is local and safe.

  • you do not need the network to do the merge.
  • you are not messing with somebody else's repository.
  • you can test the result of the merge on your local machine.
  • if the merge failed, you can delete the copy, clone again, merge again.
    Indeed, your developments are safely waiting on your work branch since it is separate from the clone.

It is hard to emphasize how clean and secure this is. It can be tedious and a bit bothersome though, which just shows how much Git turned the table here. If the merge is successful, you can push the changes, although it is customary to request the other person to actually pull from your branch, to let him work at his rhythm.

The latest command is init. It sets up a Git repository in one of your folder. There is not much more to it.

Layer n°4 is the repository, a collection of branches, where we send and receive commits between repositories via branches:

  • clone copies a branch from a remote repository in your local repository.
  • pull updates your clone with commits from the remote branch.
  • push is the reverse operation: you send commits from your clone to the remote branch.
  • init creates a Git repository in a folder.

We synchronize two repositories on different machines with the merge operation.

The idea is to clone any remote branch, derive a new branch from it and work on that new branch. Then we merge our work back in the clone of the remote branch, fixing all the conflicts in the process. Once done, our clone of the remote branch is updated and stable: we can send our merged work to the original remote branch.

The beauty of this is that everything is local and safe. The remote branch is safe as the merge happens on a local copy. Your work is safe as the merge copies it on the copy. None of the original sources have to be modified. If an issue arise, you just delete the copy and clone again the remote branch. And you can try the merge again, on a clean slate.

Solo development

I stage a lot, because I split my work in several small tasks. Once a task is done, I switch to Sourcetree, review the code and stage it. Since I stage often, the diff is concise and the review is quick. And if I mess up the code, I just discard the changes in Sourcetree to restart the task. The changes of the previous tasks are safe as they are all staged.

When enough tasks have been staged, I commit. Now, take this with a huge grain of salt: I amend the previous commit a lot. Commits always contain a whole feature at once even if there is a lot of code. So far, I never had any use for commits on my solo projects and it has been a while since I do this. I also rarely use branches.

Maybe it is not smart for you to imitate me here.

  • split the work in tasks
  • stage often during a task, always at the end of a task
  • easily discard changes (code of previous tasks is safely staged)
  • commit with amend once too much code has been staged