If you’ve worked with LLMs for even a short while, by now you’re aware of a few things -
- LLMs are extremely powerful in generating text and fetching information about a vast variety of topics
- LLMs can perform computations and help you analyse data if armed with the right tools
- LLMs are not yet completely 100% reliable.
This last point is where the building that you’re painstakingly constructing can crumble like a deck of cards very rapidly. And this is where LLM evaluations come in.
Now this is likely a familiar situation if you’ve been coding software in a pre-LLM world also.
When you build software, testing is one of The Most Important parts of the Software Development Life Cycle.