#Evaluation
17 tools tagged “Evaluation”
Showing 17 tools
Athina AI
Build, test, and monitor AI features with collaborative experimentation and production observability.

FoundryAI
Build, evaluate, and improve AI agents for business automation

Log10
Scale expert LLM evaluation with automated real-time error detection.

Langfuse
An open-source LLM engineering platform offering observability, metrics, evaluations, and prompt management to debug and enhance large language model applica...
Latitude
Build production AI agents from a single prompt with built-in evals and observability.

Quotient AI
Real-time monitoring and evaluation platform for catching AI failures in search, RAG, and agents.
Vijil
Platform to build, evaluate, and operate trustworthy AI agents with reliability and safety guardrails.

LangSmith
A comprehensive platform offering observability, evaluation, and debugging tools for building and optimizing large language model (LLM) applications.
Keywords AI
Observability and debugging platform for shipping reliable LLM-powered applications faster.

Coval
A simulation and evaluation platform that automates testing for AI agents, enhancing reliability across chat, voice, and other modalities.

Phonic Voice AI
End-to-end voice AI platform to build, observe, and evaluate reliable conversational voice agents.

KeywordsAI
Unified developer platform for building, monitoring, and scaling LLM applications.
Phoenix
Open-source observability and evaluation platform for tracing and improving AI applications.

QualiaInterviews
AI-led platform for multilingual, conversational research and evaluation interviews at scale.

Mosaic AI Agent Framework
A suite of tools by Databricks for building, deploying, and evaluating high-quality AI agents and RAG applications.

Humanloop
Enterprise LLM evaluation and prompt management platform for shipping reliable AI features.

LangWatch
LLM optimization studio for monitoring, evaluating, and improving AI applications in production.















