Back to Tools
Galileo AI Evals vs Arize Phoenix
Side-by-side comparison of features, pricing, and ratings

AI observability and eval engineering platform that turns evals into production guardrails.
Visit WebsitePricing
Contact Sales
Freemium
Plans
$0/month
$100/month (billed yearly, saves 33%)
Contact us
Free
Contact for pricing
Popularity
6.2k views
7.3k views
Skill Level
Intermediate
Intermediate
API Available
Platforms
WebAPICLI
WebAPICLI
Categories
💻 Code & Development📊 Data & Analytics🔒 Security & Privacy
💻 Code & Development📊 Data & Analytics
Features
Eval engineering platform for AI systems
20+ out-of-box evals for RAG, agents, safety, security
Custom evaluators to encode domain expertise
Auto-tune metrics from live feedback
Distill LLM judges into compact Luna models
Low-cost production guardrails at 97% lower cost
Pre-production evals become production guardrails
Guardrail policies to block harmful responses
Insights engine for failure mode analysis and prescription
Capture groundtruth from synthetic, dev, and production data
Subject matter expert annotations
Eval scores control agent actions, tool access, escalation
Run guardrails on L4 GPUs
Deployment options: SaaS, VPC, On-Premises
Supports millions of signals (models, prompts, functions, context, datasets, traces)
Agent tracing with prompts, retrievals, tool calls, outputs
LLM-as-judge evaluation and human annotation
Dataset creation from traces for experiments
Prompt IDE for iterative prompt optimization
Hypothesis testing with benchmarked experiments
Cost, latency, and performance scoring
Self-hosted deployment (local, Docker, Kubernetes)
Cloud-based free instances with no infrastructure setup
OpenTelemetry native support
Vendor-agnostic: works with any model/framework/language
Open-source (ELv2) with community contributions
9k+ GitHub stars and 2.5M+ monthly downloads
Integrations
NVIDIA NeMo
NVIDIA NIM
MongoDB
CrewAI
HP AI Studio
LlamaIndex
OpenTelemetry