Git 跟踪大文件优化
Summary: Author: 张亚飞 | Read Time: 4 minute read | Published: 2020-03-15
Filed under
—
Categories:
Linux
—
Tags:
Note,
Git 大文件清理
- 提交大文件导致.git/objects/pack目录很大解决办法
- Consider cleaning up the .git folder to reduce the large repo size
- .git目录文件过大
- tkersey/Remove a file from git repo (when the simple way doesn’t work)
准备工作
先git clone
一份,作为代码备份,把远端分支都拉到本地(默认在master
上),如果分支比较多可用如下脚本来完成
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master `; do
git branch --track ${branch#remotes/origin/} $branch
done
在 master
主分支查看 git
项目 pack
文件数据
~/Server/Run/run_s
$ git count-objects -v
count: 4
size: 16
in-pack: 4853
packs: 1
size-pack: 202900
prune-packable: 0
garbage: 0
size-garbage: 0
查看最大的 10 个跟踪文件
~/Server/Run/run_s
$ git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'
4f32918e869c88a0ed27c46c049efeeba3a9a22f
81ddc977fa01011591835e02e586130b2cb94c45
5bc41077714de997f1218101aba8c21652652c35
cccbff7856a3e7bb07b1e23fd41fd5d31b0eda51
2f0e524c945a2761474f3056a4c4576b8c47bef9
6693640a258f705e0a438c619a5c34690477fcdf
7fd1e6c16f85dce01467b4cb3aacbeba98048d02
4fd0e5fc180e9449fa360f934776ef2d3de20d14
2819a65f19bd420413b4448db91539c82b4b3a5d
18b637e60418bbdf04e9b3e3defcf2a703fdfb4c
输出关联文件
~/Server/Run/run_s
$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'`"
2f0e524c945a2761474f3056a4c4576b8c47bef9 Service/bin/qshell
4f32918e869c88a0ed27c46c049efeeba3a9a22f System/Users/coam/Library/Rime/pinyin_simp.dict.yaml
cccbff7856a3e7bb07b1e23fd41fd5d31b0eda51 Service/bin/Baidu-Login/Baidu-Login
4fd0e5fc180e9449fa360f934776ef2d3de20d14 Service/bin/BaiduPCS-Go
7fd1e6c16f85dce01467b4cb3aacbeba98048d02 docker/_/source/cmake/cmake-3.16.2.tar.gz
2819a65f19bd420413b4448db91539c82b4b3a5d docker/_/source/erlang/otp_src_22.0.tar.gz
18b637e60418bbdf04e9b3e3defcf2a703fdfb4c docker/_/source/erlang/otp_src_22.2.tar.gz
6693640a258f705e0a438c619a5c34690477fcdf docker/_/source/git/v2.24.1.tar.gz
81ddc977fa01011591835e02e586130b2cb94c45 System/Users/coam/Library/Rime/_/build/cangjie5.prism.bin
5bc41077714de997f1218101aba8c21652652c35 System/Users/coam/Library/Rime/_/build/stroke.prism.bin
注意: 如果筛选
git rev-list --objects --all
发现过滤掉了很多条无用的记录,可以在执行命令前执行git gc --prune=now
清理本地无效关联记录
提取文件名
~/Server/Run/run_s
$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'`" | awk '{print $2}'
Service/bin/qshell
System/Users/coam/Library/Rime/pinyin_simp.dict.yaml
Service/bin/Baidu-Login/Baidu-Login
Service/bin/BaiduPCS-Go
docker/_/source/cmake/cmake-3.16.2.tar.gz
docker/_/source/erlang/otp_src_22.0.tar.gz
docker/_/source/erlang/otp_src_22.2.tar.gz
docker/_/source/git/v2.24.1.tar.gz
System/Users/coam/Library/Rime/_/build/cangjie5.prism.bin
System/Users/coam/Library/Rime/_/build/stroke.prism.bin
换行转空格合并字符串并写入到待清除文件
~/Server/Run/run_s
$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5 | awk '{print$1}'`" | awk '{print $2}' | tr '\n' ' ' > _/git-trace-large-files-inline.txt
$ cat _/git-trace-large-files-inline.txt
Service/bin/BaiduPCS-Go docker/_/source/cmake/cmake-3.16.2.tar.gz docker/_/source/erlang/otp_src_22.0.tar.gz docker/_/source/erlang/otp_src_22.2.tar.gz docker/_/source/git/v2.24.1.tar.gz
删除所有大文件
~/Server/Run/run_s
$ git filter-branch -f --prune-empty --index-filter "git rm -rf --cached --ignore-unmatch `cat _/git-trace-large-files-inline.txt`" --tag-name-filter cat -- --all
...
WARNING: Ref 'refs/heads/master' is unchanged
Ref 'refs/heads/master-2020-03-15' was rewritten
Ref 'refs/heads/os-MacPro.local' was rewritten
Ref 'refs/heads/os-a.us.1' was rewritten
Ref 'refs/heads/os-a.us.2' was rewritten
Ref 'refs/heads/os-m.us.1' was rewritten
Ref 'refs/heads/os-t.cs.1' was rewritten
Ref 'refs/heads/os-t.cs.2' was rewritten
Ref 'refs/heads/os-t.cs.3' was rewritten
Ref 'refs/heads/os-v.cs.1' was rewritten
Ref 'refs/heads/test' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged
Ref 'refs/remotes/origin/master-2020-03-15' was rewritten
Ref 'refs/remotes/origin/os-MacPro.local' was rewritten
Ref 'refs/remotes/origin/os-a.us.1' was rewritten
Ref 'refs/remotes/origin/os-a.us.2' was rewritten
Ref 'refs/remotes/origin/os-m.us.1' was rewritten
Ref 'refs/remotes/origin/os-t.cs.1' was rewritten
Ref 'refs/remotes/origin/os-t.cs.2' was rewritten
Ref 'refs/remotes/origin/os-t.cs.3' was rewritten
Ref 'refs/remotes/origin/os-v.cs.1' was rewritten
强制推送到远端
$ git push origin --force --all
Enumerating objects: 4783, done.
Counting objects: 100% (4783/4783), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2533/2533), done.
Writing objects: 100% (4783/4783), 56.16 MiB | 3.88 MiB/s, done.
Total 4783 (delta 2755), reused 3394 (delta 1967)
remote: Resolving deltas: 100% (2755/2755), done.
To e.coding.net:coam/Run.run_s.git
+ 06a671e...0c0ee6b master-2020-03-15 -> master-2020-03-15 (forced update)
+ e2ca5d5...0c6eb50 os-MacPro.local -> os-MacPro.local (forced update)
+ 28a6d0e...8d87436 os-a.us.1 -> os-a.us.1 (forced update)
+ 226587c...66e1b22 os-a.us.2 -> os-a.us.2 (forced update)
+ 63f5b55...9e89fa2 os-m.us.1 -> os-m.us.1 (forced update)
+ da981dc...60031f1 os-t.cs.1 -> os-t.cs.1 (forced update)
+ 33c5e41...565c126 os-t.cs.2 -> os-t.cs.2 (forced update)
+ 5bbd5bc...e6d9289 os-t.cs.3 -> os-t.cs.3 (forced update)
+ aeccac9...7219cb6 os-v.cs.1 -> os-v.cs.1 (forced update)
+ 6ffe2fc...10e3655 test -> test (forced update)
推送到远端后,再次测试使用git clone
拉取新的项目代码,发现项目下的.git
目录依然很大,执行以下代码解决问题
git push origin --force --all --prune
再次查看最大的文件
~/Server/Run/run_s
$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`"
5f6a4705b9d57572961341a987ad628c59b80154 System/data/home/coam/.dir_colors.ls
7d231b15f895a06f03f85f02f7e3647670308782 test/data/qcs/qcs_test_file.jpg
94cca9a331ed0e7e93a3357868bbd54b8650ed11 System/data/opt/elk/filebeat/fields.yml
c83f88a65c2b031445e46862657e347ce01e4035 System/data/opt/elk/filebeat/filebeat.reference.yml
a243c4ec12b45eb4f95b599f7a35ef97dcef1bae System/data/opt/elk/filebeat/filebeat.yml
b55fefe6d97f9c84a1c9afcf78c630c1c5e4a544 Server/WinScripts/7z.exe
29e42c8b6c5712e02f1aed9e8ea734e9ec9604c3 Server/WinScripts/junction.exe
e9dc44dfe99d067be97125be62f366ba4b579181 Server/WinScripts/unzip.exe
ebfc046e0d2c2facb8b4707b91043a4cc266587f System/Users/coam/Library/Rime/_/sync/f48ce424-c011-465b-bab7-fc44bd523abf/symbols.yaml
3ee2bf6904f90da07895f7fdd779f37e80eaa7de System/Users/coam/Library/Rime/squirrel_config/opencc/emoji_category.txt
0aed42f99dac1ae0d1b6a0e6fcf0f00673fa8a53 System/Users/coam/Library/Rime/squirrel_config/opencc/emoji_word.txt
发现仍有很多目录文件没删除,切换到其它分支手动删除跟踪文件test/data/qcs/qcs_test_file.jpg
,解决遗留问题.
由于分支比较多,于是写个脚本自动清理
~/Server/Run/run_s
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master `; do
#git branch --track ${branch#remotes/origin/} $branch
echo "切换新分支: $branch"
echo "查询文件列表"
git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`" | awk '{print $2}'
echo "导出文件列表"
git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`" | awk '{print $2}' | tr '\n' ' ' > _/git-trace-large-files-inline.txt
echo "清理文件列表"
git filter-branch -f --prune-empty --index-filter "git rm -rf --cached --ignore-unmatch `cat _/git-trace-large-files-inline.txt`" --tag-name-filter cat -- --all
done
清除后发现在其它项目下同步失败:
$ git pull origin master --force
From e.coding.net:coam/Run.run_s
* branch master -> FETCH_HEAD
fatal: refusing to merge unrelated histories
合并其它分支代码发现也报错:fatal: refusing to merge unrelated histories
,合并添加参数: --allow-unrelated-histories
git merge master --allow-unrelated-histories
清理文件
#git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
#git reflog expire --expire=now --all
#git gc --prune=now
清理所有未跟踪文件
#git clean -fdx
列出 git
已跟踪文件
#gls=$(git ls-tree -r master --name-only | grep $line)
#gls=$(git ls-tree -r HEAD~1 --name-only | grep $line)
#gls=$(git log --pretty=format: --name-only --diff-filter=A | sort -u | grep $line)
#[ ! -z "$gls" ] && ebc_info "[$line]存在!..." && continue
Comments