Bring LLM evaluation into your existing pytest workflow. No custom test runners. No new concepts. Just pytest. tests/test_chatbot.py::test_chatbot PASSED similar 0.94 ≥0.80 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results