Debugging CI release pipelines

A normal quiet day ready to cut a new release in my team, we trigger the pipeline, everything is green, all is well, until...

Release 0.0.1 available!

uwotm8? We should be on 1.24.1

Luckily due to semantic versioning the pressure was off, no previously published versions had been overwritten and we always publish ahead of our consumers (currently on version 1.24.0). So we had the time to stay calm, and take a look at how and why the versioning was completely wrong.

As with most Git based version management systems when we want to cut a release we create a tag next to a particular commit, pretty standard practice.

We use a tool called git-conventional-commits (following this standard: https://www.conventionalcommits.org/en/v1.0.0/) that automatically calculates the version number for the next release. Depending on the commit names from the last tag it looks to see if there's any that start with feat in which case it will bump a minor version, and if there's any that start with fix it will bump a patch version.

1.24.0 == major.minor.patch

Investigation

My initial thought was that git-conventional-commits was causing the problem, pulling the updated repo down after the bad release and running npm run git-conventional-commits version locally also reported 0.0.1.

However, the reason for this was that because the CI had already applied the tag to the repo, that was technically the "latest tag" from a timeline of commits perspective.

Deleting the tag from the remote branch as well as locally (via git tag -d 0.0.1) fixed that problem. Local was now working as expected showing the estimated next tag as 1.24.1.

Mismatched versions?

My next thought, was there a different version of git-conventional-commits being used between local and CI? Maybe an update went out which caused it to trip up?

However, looking at https://www.npmjs.com/package/git-conventional-commits showed that at the current time 2.6.7 was the latest version and hadn't had a release in the past 6 months, plus the package-lock file showed that 2.6.7 was the version being installed, so very unlikely to be different versions.

Digging further into git-conventional-commits to see exactly how it was retrieving the latest tag I found this line https://github.com/qoomon/git-conventional-commits/blob/master/lib/git.js#L17 which boiled down to this Git command:

git describe --tags --match='*' --no-abbrev HEAD

Running this locally also correctly showed the latest tag to be 1.24.0.

I wanted to see what the CI would return to help rule out git-conventional-commits so ran it in the CI pipeline and it returned:

fatal: No names found, cannot describe anything

Problem in Git?

So it was a problem at the Git layer, but not in my local Git, so what was going on? I added extra commands to try and force the CI to fetch the tags as well as commits via git fetch --tags, this test showed it was pulling in the tags, but when the describe command ran it still said:

fatal: No names found, cannot describe anything

At this point I realised I don't know enough about how Git really works.

How does Git work?

A short conversation with ChatGPT revealed that a tag is just a pointer to a particular commit. A branch is also a pointer to a commit, however a branch pointer moves as commits are added to that branch. commits on the other hand chain together as a linked list which ultimately form the shape of the branch.

So commits are essentially an atomic unit in Git, most things revolve around them.

Looking further into the CI

Looking at the CI job that tries to get the tag I thought I'd step through the giant terminal log, luckily a few lines in I saw this:

Fetching changes with git depth set to 20...

Sounds like the CI was only grabbing the latest 20 commits from the main branch? This felt suspicious.

If a tag points to a particular commit, but the CI is only getting a reduced number of commits, maybe by (bad) luck we happened to make more than 20 commits since the last tag, and therefore the commits for that tag weren't there.

Running this command forced the CI to get all the commits instead:

git fetch --unshallow
// and just for good measure to make sure it gets the tags too
git fetch --tags

It worked!

The following command:

git describe --tags --match='*' --no-abbrev HEAD

Finally reported the latest tag as 1.24.0

git-conventional-commits then did its job properly and calculated 1.24.1 as the next tag to publish.

Diagnosis

The CI as far as I could tell had always been doing a shallow clone, and only pulling the latest 20 commits, it just so happened that we'd never made more than 20 commits between doing releases.

Recently we'd made a bunch of dependency updates and patches that caused us to add more than 20 commits, so when the CI tried to look for tags across the latest 20 commits it couldn't find any.

When this error happened git-conventional-commits defaulted to 0.0.1 as it assumed that no releases had been made yet.

The fix

Adding the following to the pipeline job:

git fetch --unshallow
git fetch --tags

1.24.1 Successfully released!

These are the worst kind of bugs in my opinion, ones where on the surface "nothings changed" and something that worked the past 30 times all of a sudden doesn't work. It was quite fun in the end debugging this one, however we're lucky we didn't have something catastrophic happen, otherwise with added time pressure debugging is no fun at all.

A plus was that due to this I learnt more about how Git works, as well as the interals of git-conventional-commits thanks to its easy to read code.