Bring LLM evaluation into your existing pytest workflow. No custom test runners. No new concepts. Just pytest. tests/test_chatbot.py::test_chatbot PASSED similar 0.94 ≥0.80 ...