Friday, May 20, 2022

Preparing to Clean a Repository

 Most of the customer-projects I work on use a modular repository system. That is to see, each discrete service-element is developed in a purpose-specific repository. Further, most of the customer-projects I work on use a fork-and-branch workflow. As such, the "upstream" repositories tend to stay fairly "clean" of stale branch content.

Recently, I was moved to a project where the customer uses a (ginormous) mono-repo design for their repository and uses a simple, wholly branch-based workflow. As team members are assigned Jira tickets, they open a branch (almost always) off the project-trunk. Once they complete the modifications within their branch, the submit a merge-request back and, if accepted, their branch gets deleted …assuming the submitter has ticked the "Delete source branch when merge request is accepted" checkbox in their MR.

Unfortunately, not every branch actually ends up getting merged and, even more-frequently, not every MR submitter ticks the delete-on-accept checkbox in their MR. So, as the project goes on, there gets to be more and more stale branches hanging around.

I'm a bit of a neat-freak when it comes to what I like to see in the output of any given tool I use. It's why a prefer when tools offer good output-filtering …and get cranky when, as part of "improving" an interface, output-filtering is adversely impacted (I'm glaring at you, Amazon, and your "improvements" to the various service-components' web-consoles).

As a result of this, when I join a new project and see a "cluttered" project-root, I like to see, "can this mess be cleaned up". A quick way to do that is check how old the various branches are as, typically, if a branch has been sitting out there with no activity on it for two or more months, it's probably stale. One way to check for such staleness is a quick shell one-liner:

     (
        for branch in $(
          git branch -r | grep -v HEAD
        )
        do
           echo -e $(
             git show --format="%ci|%cr|%cN|" $branch | head -n 1
           ) $branch
        done | sort
     ) | \
     sed -e 's/|\s+/|/g' | 
     awk -F '|' '{ printf ("%s\t%-20s\t%-20s\t%s\n",$1,$2,$3,$4) }'
Whata the above does is iterates across all branches, finding the most-recent commit in the branch, then formats each line of the output into a |-delimited string. Since the commit-date (in ISO 8601 format) is the first column of each line, the sort causes all of the output-lines to be displayed in ascending recency-order. The `sed` line is a bit of a shim – ensuring that extraneous white-spaces afer the |-delimiter are removed. The `awk` statement just ensures that the output has a nice, aligned columnar display. For example:
2021-02-16 20:21:23 +0000       1 year, 3 months ago    Billy Madison          origin/PROJ-9388
2021-04-14 13:09:26 -0400       1 year, 1 month ago     Art Donovan            origin/PROJ-11000
2021-06-04 12:57:35 +0000       12 months ago           Billy Madison          origin/PROJ-8538
2021-06-16 11:37:18 -0400       11 months ago           William Gibson         origin/PROJ-11649
2021-06-16 15:43:15 -0400       11 months ago           William Gibson         origin/PROJ-11364
2021-07-16 17:39:23 -0400       10 months ago           William Gibson         origin/PROJ-11767
2021-09-02 16:54:02 -0400       9 months ago            Art Donovan            origin/PROJ-13029
2021-09-22 14:07:16 +0000       8 months ago            David Graham           origin/PROJ-13023
2021-10-04 16:55:57 -0400       8 months ago            David Morgan           origin/PROJ-13259
2021-10-19 16:19:54 +0000       7 months ago            David Graham           origin/PROJ-13214
2021-12-06 08:59:48 -0500       6 months ago            Art Donovan            origin/PROJ-14228
2022-01-05 18:22:36 +0000       5 months ago            David Graham           origin/PROJ-13228
2022-01-25 20:24:59 +0000       4 months ago            David Graham           origin/PROJ-14447
2022-04-05 18:18:48 +0000       6 weeks ago             Susan McDonald         origin/PROJ-15004
2022-04-06 14:18:16 +0000       6 weeks ago             Tracy Morgan           origin/PROJ-16375
2022-05-06 14:49:54 +0000       2 weeks ago             Dee Madison            origin/PROJ-16641
2022-05-06 14:49:54 +0000       2 weeks ago             Dee Madison            origin/PROJ-16666
2022-05-13 18:34:21 +0000       7 days ago              Alexis Veracruz        origin/PROJ-16507
2022-05-13 21:42:16 +0000       7 days ago              Gomez Addams           origin/PROJ-16653
2022-05-18 15:52:04 +0000       2 days ago              Ronald Johnson         origin/PROJ-16740
2022-05-19 17:18:44 +0000       19 hours ago            Thomas H Jones II      origin/master
2022-05-19 17:47:28 +0000       19 hours ago            Thomas H Jones II      origin/PROJ-16874
2022-05-19 20:57:26 +0000       16 hours ago            Kaiya Nubbins          origin/PROJ-16631
As you can see, there's clearly some cleanup to be done in this project.