Merging two git repositories of the same project, linking file history

I have a project which I started a long time ago, and made a number of commits to. The project was then abandoned for about two years, during which time I forgot I had been using git version control on the project. I picked it up, copying all files to a new machine, and started a new git repo with ~100,000 lines of code and dozens of files, which now has its own lengthy commit history. I recently rediscovered the old repo, and attempted to merge the commit history of both repos together, using the instructions here.

However, the result was incomplete. If I look at the commit history on github, commits from the old and new repository are intact, but each individual file history does not extend back to the old repository's series of commits, still showing them as simply created during the commit made at the creation of the new repository. A couple of files which were not transferred when I manually copied everything over to start the new repo don't show up at all.

The project's file structure and naming convention has changed significantly since the end of the old repository's history, and some file associations may not be obvious. If I have to link the old with the new one at a time manually, I can do that, but an automatic solution would be better.

728x90

1 Answers Merging two git repositories of the same project, linking file history

I assume you followed the steps from the top answer to the question you linked. Those are not the best steps for this situation.

You have two segments of history for your project. If we suppose the first segment had commits

A -- B -- C <--(master)

and the second segment had commits

D -- E -- F <--(master)

then a complete history which behaves as expected would look like

A -- B -- C -- D' -- E' -- F' <--(master)

(A note on notation: I've replaced D with D' in the combined history, etc. The reasons for this are arguably technical and probably not immediately important; in summary, it just means that in terms of commit identity, D' is distinct from D because D' has C as a parent whereas D does not. But the letter is kept the same, to show that D' represents the same state of the code - i.e. the same content or TREE - as D.)

The answer you linked does not accomplish that. It meets the two most basic goals - putting the commits in one repo, and combining them into one graph - but it does not meet the most valuable one: making a coherent history of them. Instead it gives you

   A -- B -- C
              \
D -- E -- F -- f*

where f* is a merge commit (i.e. a commit with multiple parents) whose content matches F, but who also lists C as part of its history.

The problem with this is that C is not then recognized as part of Ds history. In fact, git's default history filtering rules (e.g. for log output) will exclude A, B, and C entirely, because from git's point of view the state of the code can be explained without them.

(Most of the current comments on your question, which talk about things like the similarity heuristic, are red herrings. It seems to me those comments were written by people who didn't really look closely at the steps you had followed.)

There are a couple different ways to get to the desired state. If this is a repo that only you use, or if you can coordinate with all repo users to do a history rewrite, then a "re-parenting" operation would be a good solution. This is a permanent fix that will create a seamless history; but, because it will change the history of the current repo's branches, coordination with any other users is important. The issue of rewriting shared histories is generally described in the git rebase documentation in the section about "recovering from upstream rebase".

Another alternative is to use git replace. This has the advantage that it is not a history rewrite, but it does have some known issues, and it requires a little special setup in each clone. (If the setup isn't done, it just means that particular clone doesn't see the full history.)

Here is a post that discusses ways to do each of these: Git: Copy history of file from one repository to another

There are other variations as well, and it's hard to say which would best suit your situation. If you want to more generally explore the possibilities, you might consult the documentation for git filter-branch and git replace.

4 months ago