evals
also known as: evaluations
evals — Tests that check whether an AI system is actually doing its job well.
A demo can look good once. Evals are how you know if the AI keeps working across many inputs, edge cases, and repeated runs. The moment you build anything real with AI, you need them.
"We need evals before shipping." "The evals caught a regression."