On evaluating LLMs (WIP)

If you’ve worked with LLMs for even a short while, by now you’re aware of a few things -

LLMs are extremely powerful in generating text and fetching information about a vast variety of topics
LLMs can perform computations and help you analyse data if armed with the right tools
LLMs are not yet completely 100% reliable.

This last point is where the building that you’re painstakingly constructing can crumble like a deck of cards very rapidly. And this is where LLM evaluations come in.

Now this is likely a familiar situation if you’ve been coding software in a pre-LLM world also.

When you build software, testing is one of The Most Important parts of the Software Development Life Cycle.