Git 跟踪大文件优化

Summary: Author: 张亚飞 | 阅读时间: 4 minute read | Published: 2020-03-15
Filed under Categories: LinuxTags: Note,

Git 大文件清理


准备工作

git clone一份,作为代码备份,把远端分支都拉到本地(默认在master上),如果分支比较多可用如下脚本来完成

for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master `; do
    git branch --track ${branch#remotes/origin/} $branch
done

master 主分支查看 git 项目 pack 文件数据

~/Server/Run/run_s

$ git count-objects -v
count: 4
size: 16
in-pack: 4853
packs: 1
size-pack: 202900
prune-packable: 0
garbage: 0
size-garbage: 0

查看最大的 10 个跟踪文件

~/Server/Run/run_s

$ git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'
4f32918e869c88a0ed27c46c049efeeba3a9a22f
81ddc977fa01011591835e02e586130b2cb94c45
5bc41077714de997f1218101aba8c21652652c35
cccbff7856a3e7bb07b1e23fd41fd5d31b0eda51
2f0e524c945a2761474f3056a4c4576b8c47bef9
6693640a258f705e0a438c619a5c34690477fcdf
7fd1e6c16f85dce01467b4cb3aacbeba98048d02
4fd0e5fc180e9449fa360f934776ef2d3de20d14
2819a65f19bd420413b4448db91539c82b4b3a5d
18b637e60418bbdf04e9b3e3defcf2a703fdfb4c

输出关联文件

~/Server/Run/run_s

$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'`"
2f0e524c945a2761474f3056a4c4576b8c47bef9 Service/bin/qshell
4f32918e869c88a0ed27c46c049efeeba3a9a22f System/Users/coam/Library/Rime/pinyin_simp.dict.yaml
cccbff7856a3e7bb07b1e23fd41fd5d31b0eda51 Service/bin/Baidu-Login/Baidu-Login
4fd0e5fc180e9449fa360f934776ef2d3de20d14 Service/bin/BaiduPCS-Go
7fd1e6c16f85dce01467b4cb3aacbeba98048d02 docker/_/source/cmake/cmake-3.16.2.tar.gz
2819a65f19bd420413b4448db91539c82b4b3a5d docker/_/source/erlang/otp_src_22.0.tar.gz
18b637e60418bbdf04e9b3e3defcf2a703fdfb4c docker/_/source/erlang/otp_src_22.2.tar.gz
6693640a258f705e0a438c619a5c34690477fcdf docker/_/source/git/v2.24.1.tar.gz
81ddc977fa01011591835e02e586130b2cb94c45 System/Users/coam/Library/Rime/_/build/cangjie5.prism.bin
5bc41077714de997f1218101aba8c21652652c35 System/Users/coam/Library/Rime/_/build/stroke.prism.bin

注意: 如果筛选git rev-list --objects --all发现过滤掉了很多条无用的记录,可以在执行命令前执行git gc --prune=now清理本地无效关联记录

提取文件名

~/Server/Run/run_s

$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}'`" | awk '{print $2}'
Service/bin/qshell
System/Users/coam/Library/Rime/pinyin_simp.dict.yaml
Service/bin/Baidu-Login/Baidu-Login
Service/bin/BaiduPCS-Go
docker/_/source/cmake/cmake-3.16.2.tar.gz
docker/_/source/erlang/otp_src_22.0.tar.gz
docker/_/source/erlang/otp_src_22.2.tar.gz
docker/_/source/git/v2.24.1.tar.gz
System/Users/coam/Library/Rime/_/build/cangjie5.prism.bin
System/Users/coam/Library/Rime/_/build/stroke.prism.bin

换行转空格合并字符串并写入到待清除文件

~/Server/Run/run_s

$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5 | awk '{print$1}'`" | awk '{print $2}' | tr '\n' ' ' > _/git-trace-large-files-inline.txt
$ cat _/git-trace-large-files-inline.txt
Service/bin/BaiduPCS-Go docker/_/source/cmake/cmake-3.16.2.tar.gz docker/_/source/erlang/otp_src_22.0.tar.gz docker/_/source/erlang/otp_src_22.2.tar.gz docker/_/source/git/v2.24.1.tar.gz

删除所有大文件

~/Server/Run/run_s

$ git filter-branch -f --prune-empty --index-filter "git rm -rf --cached --ignore-unmatch `cat _/git-trace-large-files-inline.txt`" --tag-name-filter cat -- --all
...
WARNING: Ref 'refs/heads/master' is unchanged
Ref 'refs/heads/master-2020-03-15' was rewritten
Ref 'refs/heads/os-MacPro.local' was rewritten
Ref 'refs/heads/os-a.us.1' was rewritten
Ref 'refs/heads/os-a.us.2' was rewritten
Ref 'refs/heads/os-m.us.1' was rewritten
Ref 'refs/heads/os-t.cs.1' was rewritten
Ref 'refs/heads/os-t.cs.2' was rewritten
Ref 'refs/heads/os-t.cs.3' was rewritten
Ref 'refs/heads/os-v.cs.1' was rewritten
Ref 'refs/heads/test' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged
Ref 'refs/remotes/origin/master-2020-03-15' was rewritten
Ref 'refs/remotes/origin/os-MacPro.local' was rewritten
Ref 'refs/remotes/origin/os-a.us.1' was rewritten
Ref 'refs/remotes/origin/os-a.us.2' was rewritten
Ref 'refs/remotes/origin/os-m.us.1' was rewritten
Ref 'refs/remotes/origin/os-t.cs.1' was rewritten
Ref 'refs/remotes/origin/os-t.cs.2' was rewritten
Ref 'refs/remotes/origin/os-t.cs.3' was rewritten
Ref 'refs/remotes/origin/os-v.cs.1' was rewritten

强制推送到远端

$ git push origin --force --all
Enumerating objects: 4783, done.
Counting objects: 100% (4783/4783), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2533/2533), done.
Writing objects: 100% (4783/4783), 56.16 MiB | 3.88 MiB/s, done.
Total 4783 (delta 2755), reused 3394 (delta 1967)
remote: Resolving deltas: 100% (2755/2755), done.
To e.coding.net:coam/Run.run_s.git
 + 06a671e...0c0ee6b master-2020-03-15 -> master-2020-03-15 (forced update)
 + e2ca5d5...0c6eb50 os-MacPro.local -> os-MacPro.local (forced update)
 + 28a6d0e...8d87436 os-a.us.1 -> os-a.us.1 (forced update)
 + 226587c...66e1b22 os-a.us.2 -> os-a.us.2 (forced update)
 + 63f5b55...9e89fa2 os-m.us.1 -> os-m.us.1 (forced update)
 + da981dc...60031f1 os-t.cs.1 -> os-t.cs.1 (forced update)
 + 33c5e41...565c126 os-t.cs.2 -> os-t.cs.2 (forced update)
 + 5bbd5bc...e6d9289 os-t.cs.3 -> os-t.cs.3 (forced update)
 + aeccac9...7219cb6 os-v.cs.1 -> os-v.cs.1 (forced update)
 + 6ffe2fc...10e3655 test -> test (forced update)

推送到远端后,再次测试使用git clone拉取新的项目代码,发现项目下的.git目录依然很大,执行以下代码解决问题

git push origin --force --all --prune

再次查看最大的文件

~/Server/Run/run_s

$ git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`"
5f6a4705b9d57572961341a987ad628c59b80154 System/data/home/coam/.dir_colors.ls
7d231b15f895a06f03f85f02f7e3647670308782 test/data/qcs/qcs_test_file.jpg
94cca9a331ed0e7e93a3357868bbd54b8650ed11 System/data/opt/elk/filebeat/fields.yml
c83f88a65c2b031445e46862657e347ce01e4035 System/data/opt/elk/filebeat/filebeat.reference.yml
a243c4ec12b45eb4f95b599f7a35ef97dcef1bae System/data/opt/elk/filebeat/filebeat.yml
b55fefe6d97f9c84a1c9afcf78c630c1c5e4a544 Server/WinScripts/7z.exe
29e42c8b6c5712e02f1aed9e8ea734e9ec9604c3 Server/WinScripts/junction.exe
e9dc44dfe99d067be97125be62f366ba4b579181 Server/WinScripts/unzip.exe
ebfc046e0d2c2facb8b4707b91043a4cc266587f System/Users/coam/Library/Rime/_/sync/f48ce424-c011-465b-bab7-fc44bd523abf/symbols.yaml
3ee2bf6904f90da07895f7fdd779f37e80eaa7de System/Users/coam/Library/Rime/squirrel_config/opencc/emoji_category.txt
0aed42f99dac1ae0d1b6a0e6fcf0f00673fa8a53 System/Users/coam/Library/Rime/squirrel_config/opencc/emoji_word.txt

发现仍有很多目录文件没删除,切换到其它分支手动删除跟踪文件test/data/qcs/qcs_test_file.jpg,解决遗留问题.

由于分支比较多,于是写个脚本自动清理

~/Server/Run/run_s

for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master `; do
    #git branch --track ${branch#remotes/origin/} $branch
    echo "切换新分支: $branch"
    echo "查询文件列表"
    git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`" | awk '{print $2}'
    echo "导出文件列表"
    git rev-list --objects --all | grep "`git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -100 | awk '{print$1}'`" | awk '{print $2}' | tr '\n' ' ' > _/git-trace-large-files-inline.txt
    echo "清理文件列表"
    git filter-branch -f --prune-empty --index-filter "git rm -rf --cached --ignore-unmatch `cat _/git-trace-large-files-inline.txt`" --tag-name-filter cat -- --all
done

清除后发现在其它项目下同步失败:

$ git pull origin master --force
From e.coding.net:coam/Run.run_s
 * branch              master     -> FETCH_HEAD
fatal: refusing to merge unrelated histories

合并其它分支代码发现也报错:fatal: refusing to merge unrelated histories,合并添加参数: --allow-unrelated-histories

git merge master --allow-unrelated-histories

清理文件

#git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
#git reflog expire --expire=now --all
#git gc --prune=now

清理所有未跟踪文件

#git clean -fdx 

列出 git 已跟踪文件

#gls=$(git ls-tree -r master --name-only | grep $line)
#gls=$(git ls-tree -r HEAD~1 --name-only | grep $line)
#gls=$(git log --pretty=format: --name-only --diff-filter=A | sort -u | grep $line)
#[ ! -z "$gls" ] && ebc_info "[$line]存在!..." && continue

Comments

  • WinToUSB Crack says: 2020-05-15 03:29:13

    the most well-known windows tool which is basically developed for creating windows operating system installation on storage devices like USB hard drive, USB flash drive, ISO image or CD/DVD drive.https://crackedpedia.com/wintousb-plus-crack/

Cor-Ethan, the beverage → www.iirii.com