Skip to content

Unnecessarily large repo size #4

@wang-boyu

Description

@wang-boyu

The repo is 52.4 MB large in size after being cloned. After running the following code

Click to expand
#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
  # extract the size in bytes
  size=$((`echo $y | cut -f 5 -d ' '`/1024))
  # extract the compressed size in bytes
  compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
  # extract the SHA
  sha=`echo $y | cut -f 1 -d ' '`
  # find the objects location in the repository tree
  other=`git rev-list --all --objects | grep $sha`
  #lineBreak=`echo -e "\n"`
  output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

I got this result:

All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
60584  23291  91ea84276e189380734fd7026b3aa0c4085896eb  inst/extdata/test_sample_new.csv
26684  23873  9a2c64b93a8876dd9a86842443d7c738e5844fa8  demo/demo.mov
1560   384    bc97a271465a1631f55fdcc0a20f8676bdba87c2  inst/extdata/test_sample.csv
889    522    ca0506ace8c63adee718c07d04b0eca76a596390  data-raw/MP14_SUBZONE_NO_SEA_PL.shp
160    160    b6a98628c50aabb54d91086fc79c6a1aae30b9ea  data/test_sample.rda
130    106    53f5ac9221c469209435912350585c8597d0f60a  data/pic/sample-result-freq.png
129    106    c3fbb0b68517fcb47cc9ab1c591400daf39f0589  data/pic/sample-result-osna.png
129    106    cd81c57cfa9cc11d99ef198e94c080c0cef40984  data/pic/sample-result-hmlc.png
116    102    22f17f8ff11fe5d0573220694b23b09740b53a4e  data/pic/sample-result-apdm.png
106    19     133851f70190cefa4802455ea871a35c19aefc8d  docs/articles/homelocator.html

Apparently there're two large files that are hidden in the commit history (added and then deleted):

  • inst/extdata/test_sample_new.csv
  • demo/demo.mov

If they are not being used in any way, then I would suggest to remote them completely. With these commands:

git filter-repo --invert-paths --path inst/extdata/test_sample_new.csv --force
git filter-repo --invert-paths --path demo/demo.mov --force

The repo size gets reduced from 52.4 MB to 3.8 MB.

However, this completely rewrites the commit history and needs to be force pushed by a package maintainer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions