Test failure analysis with LLM in CI pipeline

On my current project, I'm the sole test engineer in a team with several developers. We merge pull requests only when regression tests are implemented, so I sometimes find myself under pressure to handle multiple tasks at once - especially when there are failed tests. Without my input, it can be difficult to understand what a test is doing and what exactly is causing a failure: is it a bug or does a test need to be updated?

After reading many posts about AI in testing on softwaretestingweekly.com, I decided to give it a try and integrate Claude into our pipelines. The idea was to feed it test reports, grant it read-only access to the repository and the changes in a PR, and ask it to analyze everything and leave a comment with its findings.

Detective at a Investigation Board Running Through Leads

Photo by cottonbro studio

One month later, here are my thoughts.

Read more...

My new mug

I got myself a new mug!

A white mug with a logo of Claude AI

Same mug from the other side, the text says: "You're absolutely right!"

I saw a meme on reddit with a mug like this and found it hilarious, because:

  1. We use claude code at work.
  2. It really does say "You're absolutely right!" almost every time.

So, I ordered one from kingitare.ee. They have an online editor where you can upload an image (including an svg!), choose a font, and preview how the product will look like - super convenient. The mug arrived three days after I placed the order.

I'm satisfied with the overall quality, though I haven't tested it in a dishwasher yet. I had a bad experience with an expensive mug from the official Arsenal store - it lost its print after a few cycles.

Happy vibing everyone!

LLM chatbot as a tool to simplify foreign language texts

I believe one of the best ways to learn a language is to use it every day. However, it can be hard to incorporate a new language into your life at the beginning. You simply don't know it well enough to chat online and read news. This leaves you with rather boring learning materials. I've tried to solve this problem with LLM chatbots, and so far, the results are rather encouraging!

Read more...

DuckDuckGo AI Chat

DuckDuckGo, a privacy-friendly alternative to Google and other search providers, has recently launched a new product: AI Chat. It offers anonymous access to popular AI models, including GPT-3.5, Claude 3, and open-source Llama 3 and Mixtral. While I cannot make a reliable assessment of their claims, using this service allows you to work with these models without registration, which is a good starting point.

I decided to compare them to ChatGPT 4o. There are many ways to do this, but I didn't aim to make a professional and thorough comparison. As a user of these tools, I wanted to see how they could handle my daily requests. Since I am learning German, sometimes I need to clarify certain words, phrases, or how to apply different cases in various situations.

The prompt was inspired by my mistake on Duolingo. To put it simple, I thought that the German "in" was equivalent to the English "to". However, it turned out that "in" can change its meaning depending on the case.

Let's see how various LLMs explained the difference.

Read more...