Back to Tools
Galileo AI Evals vs Phoenix
Side-by-side comparison of features, pricing, and ratings

AI observability and eval engineering platform that turns evals into production guardrails.
Visit WebsitePricing
Contact Sales
Freemium
Plans
$0/month
$100/month (billed yearly, saves 33%)
Contact us
$0/mo (open-source)
$0/mo (managed cloud)
Custom
Popularity
6.2k views
7.0k views
Skill Level
Intermediate
Intermediate
API Available
Platforms
WebAPICLI
WebAPICLI
Categories
💻 Code & Development📊 Data & Analytics🔒 Security & Privacy
💻 Code & Development📊 Data & Analytics
Features
Eval engineering platform for AI systems
20+ out-of-box evals for RAG, agents, safety, security
Custom evaluators to encode domain expertise
Auto-tune metrics from live feedback
Distill LLM judges into compact Luna models
Low-cost production guardrails at 97% lower cost
Pre-production evals become production guardrails
Guardrail policies to block harmful responses
Insights engine for failure mode analysis and prescription
Capture groundtruth from synthetic, dev, and production data
Subject matter expert annotations
Eval scores control agent actions, tool access, escalation
Run guardrails on L4 GPUs
Deployment options: SaaS, VPC, On-Premises
Supports millions of signals (models, prompts, functions, context, datasets, traces)
Agent tracing: prompts, retrievals, tool calls, outputs
LLM-as-judge evaluations for scoring outputs
Annotation workflows with human review
Dataset creation from traces
Experiment runner for hypothesis testing
Prompt IDE for iterative prompt optimization
Evaluation on cost, latency, and performance
Self-hosting on Docker or Kubernetes
Cloud instances with no infrastructure setup
Vendor-agnostic: support for any model or framework
OpenTelemetry native support
Integrations
NVIDIA NeMo
NVIDIA NIM
MongoDB
CrewAI
HP AI Studio
LlamaIndex
OpenTelemetry
NVIDIA NeMo Agent Toolkit
Docker
Kubernetes