Test failure analysis with LLM in CI pipeline

On my current project, I'm the sole test engineer in a team with several developers. We merge pull requests only when regression tests are implemented, so I sometimes find myself under pressure to handle multiple tasks at once - especially when there are failed tests. Without my input, it can be difficult to understand what a test is doing and what exactly is causing a failure: is it a bug or does a test need to be updated?

After reading many posts about AI in testing on softwaretestingweekly.com, I decided to give it a try and integrate Claude into our pipelines. The idea was to feed it test reports, grant it read-only access to the repository and the changes in a PR, and ask it to analyze everything and leave a comment with its findings.

Photo by cottonbro studio

One month later, here are my thoughts.

Switch to GitHub Actions and Packages

To no one's surprise, posting regularly to a blog when you have a job and when you don't are two different things. Nevertheless, I'm trying to reincorporate writing into my daily routine. If you're reading this, then I have succeeded! In the previous entry, I briefly described that I settled on a Go backend and a React web app for this site, explaining how code generation based on the OpenAPI specification simplifies the synchronization of any changes between the two.

Once the code is written and tested, it's time to deploy it to my VPS. I used a couple of tools for that, with the actual commands stored in Makefiles. For the Go service, it looked like this:

Build a new Docker image.
Save it as a tar.gz archive.
Copy it to a remote server using scp.
Load this image to Docker.
Update the running container.

While I believe there's nothing wrong with this approach given the nature of the project, I wanted to switch to GitHub Actions and Packages. That would eradicate the need to define deployment properties on every machine where I code, let alone maintaining them up to date if. Moreover, even though the exact tools differ from company to company, this approach would be much closer to what a real business would do with their product. In other words, besides the personal interest, it is relevant to my job!

The transition was rather simple. I was already building Docker images, so all I had to do was to move this part to GitHub and push images to their registry. Here, I want to document what I did and think about what can be done next.