Microsoft library for constrained LLM generation — force structured outputs and enforce syntax.
The canonical open-source structured-generation library. Essential for serious work with local / open models; OpenAI structured outputs make it less critical on hosted APIs.
Compare with: Guidance vs MarsX
Last verified: April 2026
Sweet spot: a team running local or open-source LLMs (llama.cpp, vLLM-served open models) that needs guaranteed structured output. OpenAI and Anthropic now ship first-class structured outputs, which took pressure off Guidance for hosted APIs — but for self-hosted models, Guidance and its siblings (Outlines, LMQL) remain the correct answer. Failure modes. If your stack is entirely hosted OpenAI / Anthropic, the providers' own structured-output features are simpler and just as reliable for most schemas. Template language requires investment — teams that expect a one-line decorator will find it heavier than that. And as a library, Guidance is a piece of a stack, not a full solution; you still need orchestration around it. What to pilot. Take one structured-extraction task on a local model where you currently post-process or retry. Rewrite it in Guidance. Measure: failure rate, wall-time per call, total tokens. If the guaranteed-schema property eliminates retries and speeds up generation, Guidance pays for its learning curve; if post-processing was already fast and rare, skip it.
Guidance is a Microsoft-developed Python library for constrained LLM generation. Instead of asking a model to "please return JSON" and hoping, you write a template with explicit structure — fixed text, typed slots, branches, loops — and Guidance constrains the model's token generation so the output is guaranteed to match the template. It works best with local and open-source models served via llama.cpp, Transformers, or vLLM, where Guidance can manipulate logits directly to prevent invalid tokens. It also supports OpenAI and Azure via best-effort constraint reflection (no logit access means weaker guarantees, but templates still help). Use cases include JSON/XML extraction where schema adherence matters, code generation where syntax must be valid, and multi-step reasoning where you want to force a specific format (chain-of-thought, then answer). Guidance is also notably faster than unconstrained sampling in many cases because it can skip generation for fixed template parts. Apache-2-licensed, originally released by Microsoft Research, now maintained by a mix of Microsoft and community contributors. It is the reference implementation of the "structured generation" approach that frameworks like Outlines and LMQL also explore.
Real power comes with logit access — on OpenAI / Anthropic endpoints, constraints are best-effort. Template syntax has a learning curve distinct from prompts. Some features require llama-cpp-python or Transformers with specific versions. Not an orchestration framework — combine with an agent library for full applications.
No reviews yet. Be the first to share your experience.
Sign in to write a review
No questions yet. Ask something about Guidance.
Sign in to ask a question
No discussions yet. Start a conversation about Guidance.
Sign in to start a discussion
Unleash rapid app development with AI, NoCode, and MicroApps ecosystem.
Open-source Firebase alternative with Postgres, Auth, and Realtime
AI-powered terminal for developers
AI-powered code snippet manager and developer assistant