The world decides to make changes constantly in technology. So old tools start dying and new ones become more popular. This was the case with our little story in the Klocwork development team. We decided to do a migration of our source code management system (SCM) going from old SVN to new and pretty Git. While the debate about which is the right SCM to use will go on forever, we chose to make the switch for some added benefits of Continuous Integration, branching and workflow changes.
The migration challenge
The premise we’re going with is the SVN server is always a good running server. In our case, it hasn’t been updated in years, we were running SVN 1.4 on the server which is rather old. The second premise is that there are no problems in the repository in terms of history or commits – well once again we’ve been proven wrong as we had issues with the history and a corrupt commit. So now we get to the heart of this blog!
A very old SVN server, a corrupt history point, a large repository that contained binaries as well as source meant that we were looking at a conversion that was going to be very large, slow, confusing, and possibly containing some corruption. So the research began on how we should migrate and get our repo to our target size.
The first test conversion revealed that we would end up with a very large repo somewhere around 160 GB, and the time it would take was over a month. This wasn’t the way to go. Using some tools to correct the corruption in the repo to help the migration to a new SVN server that would shrink the metadata and allow for a faster conversion also proved to be problematic and didn’t work.
Enter the hacker!
Enter the hacking and slashing of things to get the repo running the way we want. What do we really need in terms of a conversion; we need the history of trunk! And the tags! But then you ask: what about the branches and we decided to skip these. We (namely I) decided that we’re going to copy the branches as snapshots in time and recreate these snapshots as they were in SVN just as a single commit, omitting their history.
That was the plan to execute, now we had to figure out which tools to use. Running multiple migrations on different machines and using different tools gave us a little race. We decided that we weren’t going to do a conversion of 120 000-plus commits in our conversion, rather we were going to do it from a point in time far back enough to avoid the need to look back at the SVN history. This means SVN would eventually be shutdown and would just exist as a pure read-only mode.
So, this is our story, we used “git svn”. Git svn is a built-in command that allows for the migration of SVN repos to Git. It also allows you to be running Git on an SVN server using Git commands. We didn’t need it to point back to the SVN server so we severed the connection by omitting all the SVN metadata from the initial git svn command run. Next was to commit this SVN-git repository mix to a pure Git repo. Queue the Staples “That was easy” button.
How big is our repo now? Around 25 GB which is a rather drastic improvement from that initial trial and it took less than 24 hours and contained 27 000 commits.
Shrinking the repo
The next challenge was to shrink down this repo. We figured out how to take away the binary blobs that we found were being committed into the SVN. Ask any DevOps person and they’ll tell you that binaries don’t go inside the SCM. Ever! And we had to correct that mistake. We knew we always could go back to a commit in SVN and grab that binary back if we really needed it, so we deleted the binaries from the history. The next step was to remove unnecessary files and directories that we found were not really needed in our newly created repo. Check the size, and bingo! We have achieved our target size if not 20% of it that made us happy. We then put up the repo on the Git server and served it out for testing by other developers, and it ran fast. Cloning the repo took 10 percent of the time needed on a test build done to verify compared to SVN.
So from where we stand, we were able to convert our big SVN repo to a Git repo that’s more robust, contains less useless stuff, and maintained our history. But you say, well not really, your branches are gone! We simply answer this by saying we just copied over the important ones, and did so by snapshotting them. Any history that’s needed still exists but in SVN. And these branches do get abandoned after a period of time, so they will most likely be orphans at some point with a stop point. Final size is around 2 GB and on SVN it was 9GB for the same amount of files that are actually useful.