#Evaluation

17 tools tagged “Evaluation


Showing 17 tools

A
#1

Athina AI

Build, test, and monitor AI features with collaborative experimentation and production observability.

4.5 (4)
25Freemium
#2

FoundryAI

Build, evaluate, and improve AI agents for business automation

4.8 (4)
Free
#3

Log10

Scale expert LLM evaluation with automated real-time error detection.

4.6 (5)
Freemium
#4

Langfuse

An open-source LLM engineering platform offering observability, metrics, evaluations, and prompt management to debug and enhance large language model applica...

4.2 (6)
Freemium
#5

Latitude

Build production AI agents from a single prompt with built-in evals and observability.

4.8 (5)
Free
#6

Quotient AI

Real-time monitoring and evaluation platform for catching AI failures in search, RAG, and agents.

4.4 (5)
Free
V
#7

Vijil

Platform to build, evaluate, and operate trustworthy AI agents with reliability and safety guardrails.

4.8 (5)
Free
#8

LangSmith

A comprehensive platform offering observability, evaluation, and debugging tools for building and optimizing large language model (LLM) applications.

4.8 (5)
Freemium
K
#9

Keywords AI

Observability and debugging platform for shipping reliable LLM-powered applications faster.

4.8 (4)
Paid
#10

Coval

A simulation and evaluation platform that automates testing for AI agents, enhancing reliability across chat, voice, and other modalities.

4.5 (6)
Freemium
#11

Phonic Voice AI

End-to-end voice AI platform to build, observe, and evaluate reliable conversational voice agents.

4.7 (6)
Freemium
#12

KeywordsAI

Unified developer platform for building, monitoring, and scaling LLM applications.

5.0 (6)
Free
P
#13

Phoenix

Open-source observability and evaluation platform for tracing and improving AI applications.

4.5 (4)
Free
#14

QualiaInterviews

AI-led platform for multilingual, conversational research and evaluation interviews at scale.

4.3 (4)
Freemium
#15

Mosaic AI Agent Framework

A suite of tools by Databricks for building, deploying, and evaluating high-quality AI agents and RAG applications.

4.0 (4)
Contact
#16

Humanloop

Enterprise LLM evaluation and prompt management platform for shipping reliable AI features.

4.5 (4)
Freemium
#17

LangWatch

LLM optimization studio for monitoring, evaluating, and improving AI applications in production.

4.6 (5)
Freemium