AWS-managed trace-level observability for agents via CloudWatch generative AI observability and OTEL.
Observability & Tracing
Trace, log and monitor LLM and agent calls in production. 32 tools tracked.
Enterprise AI observability and evaluation platform extending Arize's ML-monitoring heritage to LLM and agent workloads at scale.
Open-source, OpenTelemetry-based tracing and evaluation library that runs locally or self-hosted, serving as the OSS on-ramp to Arize's enterprise platform.
Built-in evaluation, tracing and monitoring for models and agents inside Azure AI Foundry.
LLM and agent tracing inside the Datadog APM suite, attractive to teams already standardized on Datadog for infrastructure monitoring.
GenAI and LLM monitoring within the Dynatrace APM platform, covering tokens, cost and service health for enterprises already on Dynatrace.
Enterprise AI observability vendor from the ML-monitoring era, now offering LLM scoring, guardrails and bias/fairness auditing.
OTel-based GenAI observability solution on Grafana Cloud, built on open-source instrumentation rather than a proprietary SDK.
Helicone
Proxy/gateway-based LLM logging with one-line setup, unified cost and latency visibility across providers; now under Mintlify ownership.
Evaluation and observability platform for production agents with prompt versioning, A/B tests and OTel-based tracing.
Open-source (MIT) LLM engineering platform combining tracing, prompt management, evals and datasets, widely used as the default self-hosted observability stack.
Closed-source tracing, evals and monitoring platform from the LangChain team, deepest integration with LangChain/LangGraph but usable via OTel from any stack.
OpenTelemetry-native open-source tracing and metrics for LLM apps and agent frameworks, with a managed cloud option.
Lightweight open-source LLM observability with tracing, analytics, prompt templates and PII masking, formerly known as LLMonitor.
The MLOps standard's GenAI extension: trace logging, LLM evaluation and prompt registry inside open-source MLflow 3.
AI/LLM monitoring layer in the New Relic APM platform tracking model latency, token cost and errors alongside conventional app telemetry.
OpenTelemetry-native open-source platform covering LLM tracing, GPU monitoring, guardrails and a prompt vault with one-line auto-instrumentation of 50+ providers.
Open-source LLM evaluation and tracing platform from Comet, combining trace logging, eval metrics and CI-friendly test suites.
Open-source text analytics on LLM app messages, clustering and scoring conversations to surface what users actually do.
Portkey
AI gateway routing 1,600+ models with built-in logging, cost tracking, caching and guardrails; observability comes as a side effect of the proxy layer.
LLM cost, latency and trace analytics bolted onto PostHog's product-analytics platform, letting teams join AI telemetry with user behavior data.
OpenTelemetry-based observability service from the Pydantic team with first-class PydanticAI and Python ecosystem integration.
Unified gateway-plus-observability control plane for tracing and evaluating agent behavior, rebranded from Keywords AI.
Production observability for voice agents that captures real calls and converts failures into test cases.
Agent-call tracing and error monitoring inside Sentry, giving app developers LLM visibility in the tool they already use for crash reporting.
Open-source OpenTelemetry APM that handles LLM observability via standard OTel instrumentation rather than an LLM-specific SDK.
Open-source LLMOps stack unifying gateway, observability, evaluations, optimization and experimentation.
Vendor-neutral OpenTelemetry instrumentation for LLM apps (OpenLLMetry) that ships traces to any OTel backend, plus a hosted monitoring platform.
User analytics and feedback tracking for LLM applications to surface real usage patterns.
AI gateway and deployment platform with built-in request logging, cost attribution and rate limiting for enterprise Kubernetes environments.
W&B Weave
LLM tracing and evaluation toolkit from Weights & Biases, integrated with the broader W&B experiment-tracking ecosystem.
Profile-based monitoring (whylogs) plus LangKit LLM metrics that summarize data locally so raw prompts never leave your infrastructure.