Galileo AI Evals vs Arize Phoenix

Side-by-side comparison of features, pricing, and ratings

Galileo AI Evals

AI observability and eval engineering platform that turns evals into production guardrails.

Visit Website

Arize Phoenix

Open-source AI agent observability and evaluation platform.

Visit Website

Pricing

Contact Sales

Freemium

Plans

$0/month

$100/month (billed yearly, saves 33%)

Free

Contact for pricing

Popularity

6.2k views

7.3k views

Skill Level

Intermediate

API Available

Platforms

WebAPICLI

Categories

💻 Code & Development📊 Data & Analytics🔒 Security & Privacy

💻 Code & Development📊 Data & Analytics

Features

Eval engineering platform for AI systems

20+ out-of-box evals for RAG, agents, safety, security

Custom evaluators to encode domain expertise

Auto-tune metrics from live feedback

Distill LLM judges into compact Luna models

Low-cost production guardrails at 97% lower cost

Pre-production evals become production guardrails

Guardrail policies to block harmful responses

Insights engine for failure mode analysis and prescription

Capture groundtruth from synthetic, dev, and production data

Subject matter expert annotations

Eval scores control agent actions, tool access, escalation

Run guardrails on L4 GPUs

Deployment options: SaaS, VPC, On-Premises

Supports millions of signals (models, prompts, functions, context, datasets, traces)

Agent tracing with prompts, retrievals, tool calls, outputs

LLM-as-judge evaluation and human annotation

Dataset creation from traces for experiments

Prompt IDE for iterative prompt optimization

Hypothesis testing with benchmarked experiments

Cost, latency, and performance scoring

Self-hosted deployment (local, Docker, Kubernetes)

Cloud-based free instances with no infrastructure setup

OpenTelemetry native support

Vendor-agnostic: works with any model/framework/language

Open-source (ELv2) with community contributions

9k+ GitHub stars and 2.5M+ monthly downloads

Integrations

NVIDIA NeMo

NVIDIA NIM

MongoDB

CrewAI

HP AI Studio

LlamaIndex

OpenTelemetry