#Evaluation

17 tools tagged “Evaluation”

Showing 17 tools

Athina AI

Build, test, and monitor AI features with collaborative experimentation and production observability.

4.5 (4)

25Freemium

FoundryAI

Build, evaluate, and improve AI agents for business automation

4.8 (4)

Free

Log10

Scale expert LLM evaluation with automated real-time error detection.

4.6 (5)

Freemium

Langfuse

An open-source LLM engineering platform offering observability, metrics, evaluations, and prompt management to debug and enhance large language model applica...

4.2 (6)

Freemium

Latitude

Build production AI agents from a single prompt with built-in evals and observability.

4.8 (5)

Free

Quotient AI

Real-time monitoring and evaluation platform for catching AI failures in search, RAG, and agents.

4.4 (5)

Free

Vijil

Platform to build, evaluate, and operate trustworthy AI agents with reliability and safety guardrails.

4.8 (5)

Free

LangSmith

A comprehensive platform offering observability, evaluation, and debugging tools for building and optimizing large language model (LLM) applications.

4.8 (5)

Freemium

Keywords AI

Observability and debugging platform for shipping reliable LLM-powered applications faster.

4.8 (4)

Paid

#10

Coval

A simulation and evaluation platform that automates testing for AI agents, enhancing reliability across chat, voice, and other modalities.

4.5 (6)

Freemium

#11

Phonic Voice AI

End-to-end voice AI platform to build, observe, and evaluate reliable conversational voice agents.

4.7 (6)

Freemium

#12

KeywordsAI

Unified developer platform for building, monitoring, and scaling LLM applications.

5.0 (6)

Free

#13

Phoenix

Open-source observability and evaluation platform for tracing and improving AI applications.

4.5 (4)

Free

#14

QualiaInterviews

AI-led platform for multilingual, conversational research and evaluation interviews at scale.

4.3 (4)

Freemium

#15

Mosaic AI Agent Framework

A suite of tools by Databricks for building, deploying, and evaluating high-quality AI agents and RAG applications.

4.0 (4)

Contact

#16

Humanloop

Enterprise LLM evaluation and prompt management platform for shipping reliable AI features.

4.5 (4)

Freemium

#17

LangWatch

LLM optimization studio for monitoring, evaluating, and improving AI applications in production.

4.6 (5)

Freemium