About
Ocarina Labs builds independent safety testing for AI agents. I took BlueDot Impact's AI safety course, built Rideshare-Bench, and caught Claude chasing surge pricing through driver exhaustion in a simulated city. Standard benchmarks missed all of it.
The people building agents shouldn't be the same people grading them. Independent testing exists for drugs, aircraft, and financial instruments. It didn't exist for AI agents. We're building it.
Quaver generates test environments from natural language. We run every major model through them and publish the scores. We test from the outside and share everything we find.