Skip to content

Chapter_10

Chris McIntosh edited this page Dec 4, 2019 · 4 revisions

10 - Git Internals

1 - Plumbing and Porcelain

  • Files and folders we care about:
    • HEAD
    • index
    • objects
    • refs

2 - Git Objects

1 - Objects

  • Git is a content-addressable filesystem
  • Key-value store
  • Any type of file can be inserted
  • git hash-object can add the file to the db
  • git cat-file can display the data

2 - Tree Objects

  • Tree objects contain a list of sub objects and their names
  • Can be used to store commit info as well as directory structure and file name
  • Even be used for just a single file
  • git cat-file -p master^{tree} shows the master tree for the HEAD commit

3 - Commit Objects

  • git commit-tree adds your tree to the index
  • Commit objects specify the top level tree for the snapshot, the parent commits, the author and commiter information, a blank line, finally the commit message

4 - Object Storage

  • Objects have a header with the size of the content and then the content
  • And SHA1 is deterministically generated from the data and the header
  • Finally the blob is compressed with zlib

3 - Git References

References

  • Git stores references to SHA1's in the refs folder
  • git update-ref can change references
  • A branch is a simple reference

The HEAD

  • Head is a symbolic link to a branch
  • If you are in a the "detached head" state then it will actually have data in it

Tags

  • Simple tags are a dumb ref to a tree
  • Annotated tags contain commit like data and a reference to the commit object that is tagged
  • Technically you can tag any object
    • Public keys for instance can be stored and tagged

Remotes

  • Stored in the refs subdirectory
  • refer to the remote you have saved

4 - Packfiles

1 - Packfiles

  • git gc will force packing files
  • Git can look at blobs that have similar patterns and store one copy and the deltas

5 - The Refspec

1 - Refspec

  • Tells git how to track the remote
  • Format is (+?)<source location>:<destination location>
    • +refs/heads/*:refs/remotes/origin/* as a default example for origin
  • You can use the refspec to delete stuff on origin if you want, by leaving off the source

6 - Transfer Protocols

1 - Dumb Protocol

  • Used by a clone without auth
  • Uses the info/refs file which is created with the post-receive hook
  • Uses basic HTTP

2 - Smart Protocol

  • send-pack and receive-pack are used to work with the smart protocol to upload data
    • Work over SSH
  • fetch-pack and upload-pack are used to download data

7 - Maintenance and Data Recovery

1 - Maintenance

  • git gc --auto only does anything when you have a lot of loose files or packs
    • Configurable but around 7k loose files or 50 packs
  • gc will pack up loose files into a pack

2 - Data Recovery

  • git reflog records what HEAD has pointed to
  • You can use this to find the lost commit hash
  • git branch <name> <hash you want to save> will create a branch pointing to your lost commits
  • git fsck --full will show you all dangling objects if you for instance lost your reflog
    • This probably implies that if you do a gc you will lose your data, that you have already gone out of your way to lose

3 - Removing Objects

  • You can use filter-branch to rewrite history by git rm'ing the file you want gone
  • Will have to inspect packs to find the sha1 of the file you care about and then find the commits that touched the file and then work backward from there

8 - Environment Variables

  • Not super useful right now, but bookmark the page for when it comes in handy
  • The Debugging env vars are going to be great

Clone this wiki locally