Wait, you guys run evals?
The author asks the community about the importance of building specific evaluations for AI systems, beyond standard benchmarks, to identify true benefits and failures. They seek different perspectives on how people approach creating custom metrics to ensure product rigor and quality.