About 240,000 results
Open links in new tab
  1. DeepEval by Confident AI - The LLM Evaluation Framework

    By the authors of DeepEval, Confident AI is a cloud LLM evaluation platform. It allows you to use DeepEval for team-wide, collaborative AI testing. Try DeepEval Free on Confident AI

  2. Open-Source LLM Evaluation Platform | Opik by Comet

    Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with tracing, eval metrics, and production-ready dashboards. Now with automated agent …

    Missing:
    • dataset
    Must include:
  3. Confident AI - The LLM Evaluation & Observability Platform

    Confident AI provides an opinionated solution to curate dataset, align metrics, and automate LLM testing with tracing. Teams use it to safeguard AI systems to save hundreds of hours a week …

  4. Home - Phoenix

    Arize Phoenix is an open-source LLM tracing & evaluation platform. Seamlessly instrument, experiment, and optimize AI applications in real time—transparent, framework-agnostic, and …

    Missing:
    • dataset
    Must include:
  5. GitHub - Arize-ai/phoenix: AI Observability & Evaluation

    It provides: Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation. Evaluation - Leverage LLMs to benchmark your application's performance …

  6. Langfuse - Open Source LLM Engineering Platform

    Traces, evals, prompt management and metrics to debug and improve your LLM application. Integrates with Langchain, OpenAI, LlamaIndex, LiteLLM, and more.

  7. The LLM Evaluation Landscape: 16 Frameworks by Functionality

    Oct 31, 2025 · Opik is an open-source LLM evaluation and monitoring platform developed by Comet. It provides tools to track, evaluate, and monitor LLM applications throughout their …

  8. Top 7 LLM Evaluation Tools for 2025 - dataaspirant.com

    Apr 30, 2025 · In this comprehensive guide, we will explore the top 7 LLM evaluation tools for 2025, delving deep into their features, use cases, and relevance for businesses and developers.

  9. Best LLM evaluation platforms 2025 - Articles - Braintrust

    Aug 21, 2025 · Compare top LLM evaluation platforms: Braintrust, LangSmith, Langfuse, and Arize.

  10. The Top 10 LLM Evaluation Tools - Analytics Insight

    Nov 5, 2025 · Explore top LLM evaluation tools like Deepchecks, LangSmith, and Humanloop to advance AI performance and reliability.