GenAI in Test Driven Development

Everyone’s moving fast today. In Software Development we want everything to be lean, agile, automated, configurable, scripted, resilient, scalable… cheap and fast. Now there’s this new thing called Generative AI that is making a lot of promises and being used like a kitchen sink. Funny story, it can really be a very interesting and useful one.

I do believe that our traditional software development methodologies can and will be redefined by generative emerging technologies and one such use case is the shift towards integration of GenAI in our coding practices. I’m not talking about the failed and fake Devin, but rather about the way we’ll probably leverage these tools for development.

In short, how about enhancing our software development process: by creating unit tests before writing code and using GenAI to assist in both writing the tests and the subsequent coding? Oh that’s TDD (Test Driven Development), no?

Kinda of… The concept of Test-Driven Development isn’t new, but its adoption has been a struggle with varying degrees of enthusiasm. At its core, TDD involves writing unit tests before the actual code is developed. This approach ensures that the code meets the specified requirements from the outset, leading to more robust and error-free software. At least that was the promise.

With GenAI, things have advanced to the point where they can assist in writing comprehensive and effective unit tests. It’s no ride in the park, but by providing the GenAI with a clear description of the intended functionality, we can generate a suite of unit tests that cover various edge cases and scenarios. This not only saves time but also ensures a higher degree of test coverage and thoroughness.

Now, once the unit tests are in place, the next step is to utilize GenAI coding agents to write the actual code for us… These agents can interpret the requirements outlined in the tests and generate code that aims to pass all the predefined tests. As many times we want and need. It’s an iterative process of writing code, running tests, and refining the code continues until all tests are passed, resulting in a fully functional and tested piece of software.

Some might say that’s not a good approach, but I do believe that this approach can enhance productivity; by automating the coding process and focusing on more complex and creative tasks – the tests. But also improve quality; that using the GenAI unit tests, the code is subject to more rigorous testing from the start, leading to fewer bugs and higher end reliability. Finally, by fixing consistency and standardization; when GenAI will ensure and enforce coding standards, best practices and apply that in a uniform way across the projects – reducing human error.

Yes, at this point you’re shouting to the screen “and what about hallucination?!” whilst reading this. You’re not the first one:

Or even; “TDD never worked!”, again you’re not alone:

For those, I counterpoint:

TDD is not merely an optimization technique (like hill climbing) but a discipline ensuring steady progress. It fosters better design and understanding of requirements. The iterative nature of TDD helps in continuously refining the solution. So, comparing TDD strictly to hill climbing overlooks its broader benefits.
TDD might not work for all problems, but you need to train GenAI systems with a broader range of complex problem scenarios to improve their ability to generate optimal solutions and navigate local maxima. Use diverse datasets and real-world examples, incorporating expert knowledge into the training data.
Then implement interfaces that allow developers to easily review, modify, and approve GenAI generated code and tests.
Enhance AI’s contextual understanding to better grasp requirements and design principles, reducing reliance on incremental steps alone. Here all the natural language processing (NLP) can be used to enable GenAI to understand project documentation and specifications more deeply. It’s a tool, remember?
Implement a learning mechanism where GenAI updates its models based on developer feedback and project outcomes. Githhub Copilot has been doing that from the start.

Here’s the approach: Integrating GenAI in TDD

Generative AI for Unit Tests: Developers focus on providing clear requirements, and GenAI generates comprehensive unit tests.
GenAI Driven Code Development: GenAI agent writes code iteratively until it passes all tests, adhering to specified requirements.
Developer GenAI Collaboration: Developers review and refine GenAI generated outputs, ensuring quality and accuracy.
#4 – Win!

Yes, developers still need to concentrate on creating detailed requirements and unit tests. #Bummer! Also, they are needed to review the outputs, making sure the GenAI generated code meets quality and functional standards through regular reviews and refinements (feedback loops).

Integrating these solutions can significantly improve the efficacy of GenAI in the TDD process, addressing gaps and leveraging GenAI for true development expertise are fully realised. It’s an hybrid approach that leverages the strengths of GenAI for efficiency and reliability, while maintaining the crucial human element for nuanced decision-making and complex problem-solving – maybe 🙂

Diagram adapted from: https://www.spiceworks.com/tech/devops/articles/what-is-tdd/