1

I needed to create a GitHub repo in which every commit had older dates. What I mean by this is that I changed my local computer datetime (to a date in the past), wrote some code and later committed and pushed using that date. Then, I can see in my GitHub repo that the commit dates that appear are the fake ones I had in my local computer when I did the commit and push.

My question is: is there some way a person can verify I actually faked those dates? I also deleted some commits and pushes (using git reset) and changed the commit's author name using git amend. Is there any way someone can verify I made these changes?

As far as I know, git log is one of the few ways a person can see the author and dates of each commit (which I successfully changed), but is there any way they can have a change history of a repo?

Mark Johnson
  • 101
  • 1
  • 12

2 Answers2

2

Warning for others reading this question: Changing the git history of a codebase is not without caveats, especially if other developers are using it too. So if you have a good reason to change commit authors/dates, make sure your fellow developers are aligned on this matter.

Regarding git history

If other developers have the same code base checked out locally, and they pull your changes, it is likely they will notice the conflicting histories.

For example, let's say I overwrite the last commit, and I change the author from someone else to myself:

git commit --amend --reset-author
git push -f

When another developer pulls, they will get a merge conflict.
Their Vim (or other configured editor) will open up, and they will be asked if they want to merge. If they do merge, this message will be (part of) what git outputs to the terminal:

 + ed36af4...fbe3f7f main       -> origin/main  (forced update)

If nobody has the tampered-with branch checked out, it is unlikely that anyone will notice the history change.

Other ways of figuring out tampering

Above, I'm handwaving some detective work. As a contrived example, if you change your commit date to before git was invented, then people will know the git history has been tampered with. As a less contrived example, if you change the git author such that they couldn't have access to that repo, then that would be evidence of tampering. ("huwh, Sara didn't work here in 2017, how could she have accessed this private repo?!")

Note that on top of git history, there are other ways repos can be protected/monitored/audited. How this can be achieved is another topic for another question. But a very simple way would be to regularly backup git repos.

melvio
  • 762
  • 6
  • 24
1

Here is what to know:

  • A commit is a numbered entity consisting of data (a source snapshot) and metadata (author name and email and date-stamp, committer triple, parent hash IDs, any other internal headers, parent lines, and so on, then a blank line and the commit message subject-and-body).

  • The data—the source snapshot—is actually stored as a single metadata line, tree hash-id. The source snapshot is therefore indirect, which is how two commits that store the same source snapshot don't actually store the source snapshot twice. For instance, if you make commit C1 with tree T1, then make new commit C2 that has a new tree, then revert commit C2 to make commit C3, the tree in commit C3 is the same tree as in C1.

  • The hash ID of a commit is a cryptographic checksum of the (entire) metadata, including any commit signature appended to the message body. See How does commit signing work? for some additional details.

  • Currently, this cryptographic checksum uses SHA-1, which has been broken. However, the existing techniques for breaking SHA-1 are expensive (in both dollars, or whatever your local currency may be, and compute time) and leave obvious traces.

Whenever someone makes a new commit, they control all of the metadata, so there is no way to tell whether that new commit contains accurate metadata ("yes, my name really is Zaphod Beeblebrox, I had it legally changed this morning") or not ("I lied, my legal name is actually Jean Baptiste Emmanuel Zorg"). But if someone were to take an existing commit and try to modify it, what they will get is not a modified commit, but rather another new commit, with a new and different SHA-1 hash ID. The only way to get a new commit that re-uses the old hash ID is to use the SHA-1-breaking technique, which will leave obvious traces.

Commits themselves are the history, and form a kind of Merkle Tree, due to each commit holding the hash ID(s) of its parent commit(s). It's just that we (humans), and Git itself, usually find commits by using a branch name, which is defined as a name1 holding a changeable hash ID. As long as someone has permission to store a new and different hash ID in that name, if we trust the name, we'll see the latest hash ID.

What this in turn means is that if you actually pay attention to / use the hash IDs, rather than simply depending on some changeable branch name, you will notice anyone "fiddling with history", to whatever extent it is physically noticeable (i.e., you have to have had some commit X to see that someone else has come up with some different hash ID Y to use in place of X, and then also changed every subsequent commit due to the Merkle-tree issue).

Git tags can also be signed (using GPG signatures or equivalent). These digital signature methods—whether used on individual commits, or simply on a tag at the end of a series of commits—can use more-secure cryptography than SHA-1. Git itself is also being modified to use SHA-256, which if nothing else at least has more bits in it, making the techniques used in breaking SHA-1 less effective.


1Specifically, a branch name is a name of the form refs/heads/branch. The leading refs/heads/ part makes it a branch name. If the leading part is refs/tags/, the name is a tag name. Other names live in different parts of the refs/ name-space. The special names—HEAD, ORIG_HEAD, MERGE_HEAD, and so on—live outside the otherwise-required top level refs/ qualifier.

torek
  • 448,244
  • 59
  • 642
  • 775