📶 LLMLagBench - When did your LLM go stale?

Piotr Pęzik, Konrad Kaczyński, Maria Szymańska, Filip Żarnecki, Zuzanna Deckert, Jakub Kwiatkowski, Wojciech Janowski

Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff, creating a strict knowledge boundary beyond which models cannot provide accurate information without querying external sources. More subtly, when this limitation is unknown or ignored, LLMs may inadvertently blend outdated time-sensitive information with general knowledge during reasoning tasks, potentially compromising response accuracy.

LLMLagBench provides a systematic approach for identifying the earliest probable temporal boundaries of an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of 1,700+ curated questions about events sampled from news reports published between 2020-2025 (we plan to update the question set regularly). Each question could not be accurately answered before the event was reported in news media. We evaluate model responses using a 0-2 scale faithfulness metric and apply the PELT (Pruned Exact Linear Time) changepoint detection algorithm to identify where model performance exhibits statistically significant drops, revealing their actual knowledge cutoffs.

Our analysis of major LLMs reveals that knowledge infusion operates differently across training phases, often resulting in multiple partial cutoff points rather than a single sharp boundary. Provider-declared cutoffs and model self-reports frequently diverge from empirically detected boundaries by months or even years, underscoring the necessity of independent empirical validation.


Select Years for Graph

Model Comparison with Trend Changepoints



The leaderboard below ranks models by their Overall Average faithfulness score (0-2 scale) across all 1,700+ questions spanning 2020-2025. The table also displays Provider Cutoff dates as declared by model developers, 1st and 2nd Detected Cutoffs identified by LLMLagBench's PELT algorithm, and additional metadata including release dates and model parameters. Notable discrepancies between provider-declared cutoffs and empirically detected cutoffs reveal cases where models' actual knowledge boundaries differ significantly from official declarations — sometimes by months or even years.

{
  • "headers": [
    • "Model",
    • "1st Detected cutoff",
    • "2nd Detected cutoff",
    • "trend_changepoints",
    • "Provider",
    • "Parameters",
    • "Provider cutoff",
    • "Release date",
    • "Model cutoff",
    • "2021_01",
    • "2021_02",
    • "2021_03",
    • "2021_04",
    • "2021_05",
    • "2021_06",
    • "2021_07",
    • "2021_08",
    • "2021_09",
    • "2021_10",
    • "2021_11",
    • "2021_12",
    • "2022_01",
    • "2022_02",
    • "2022_03",
    • "2022_04",
    • "2022_05",
    • "2022_06",
    • "2022_07",
    • "2022_08",
    • "2022_09",
    • "2022_10",
    • "2022_11",
    • "2022_12",
    • "2023_01",
    • "2023_02",
    • "2023_03",
    • "2023_04",
    • "2023_05",
    • "2023_06",
    • "2023_07",
    • "2023_08",
    • "2023_09",
    • "2023_10",
    • "2023_11",
    • "2023_12",
    • "2024_01",
    • "2024_02",
    • "2024_03",
    • "2024_04",
    • "2024_05",
    • "2024_06",
    • "2024_07",
    • "2024_08",
    • "2024_09",
    • "2024_10",
    • "2024_11",
    • "2024_12",
    • "2025_01",
    • "2025_02",
    • "2025_03",
    • "2025_04",
    • "2025_05",
    • "2025_06",
    • "2025_07",
    • "2025_08",
    • "2025_09",
    • "2021",
    • "2022",
    • "2023",
    • "2024",
    • "2025",
    • "Overall Average"
    ],
  • "data": [
    • [
      • "CohereForAI/c4ai-command-a-03-2025",
      • "2023.02",
      • "2024.04",
      • [
        • "2023_02",
        • "2024_04"
        ],
      • "CohereForAI",
      • "111B",
      • "2024.07.01",
      • "2025.03.13",
      • "2024.06",
      • 1.03,
      • 1.14,
      • 1.11,
      • 1.14,
      • 0.93,
      • 1,
      • 1.29,
      • 0.57,
      • 0.93,
      • 1.11,
      • 0.96,
      • 0.86,
      • 1.23,
      • 1.11,
      • 1.43,
      • 0.89,
      • 0.82,
      • 1.14,
      • 1.06,
      • 1.04,
      • 0.93,
      • 0.69,
      • 1.25,
      • 1.38,
      • 1,
      • 0.71,
      • 0.82,
      • 0.51,
      • 0.5,
      • 0.5,
      • 0.54,
      • 0.5,
      • 0.43,
      • 0.29,
      • 0.18,
      • 0.52,
      • 0.34,
      • 0.5,
      • 0.36,
      • 0.2,
      • 0.14,
      • 0.21,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 1.01,
      • 1.08,
      • 0.54,
      • 0.15,
      • 0,
      • 0.58
      ],
    • [
      • "CohereForAI/c4ai-command-r-plus",
      • "2022.05",
      • "2023.01",
      • [
        • "2022_05",
        • "2023_01"
        ],
      • "CohereForAI",
      • "35B",
      • "2024.07.01",
      • "2024.08",
      • "2023.01",
      • 1.17,
      • 1.36,
      • 1.39,
      • 1.4,
      • 1.36,
      • 1.32,
      • 1.46,
      • 0.75,
      • 1.46,
      • 1.43,
      • 1.18,
      • 1.24,
      • 1.4,
      • 1.11,
      • 1.32,
      • 1.29,
      • 1.18,
      • 1.14,
      • 1,
      • 0.96,
      • 0.86,
      • 0.86,
      • 0.93,
      • 1.07,
      • 0.71,
      • 0.36,
      • 0.25,
      • 0.14,
      • 0.21,
      • 0,
      • 0.06,
      • 0.29,
      • 0.21,
      • 0.06,
      • 0.14,
      • 0.07,
      • 0.17,
      • 0.18,
      • 0.04,
      • 0,
      • 0.07,
      • 0.07,
      • 0.14,
      • 0,
      • 0,
      • 0,
      • 0.11,
      • 0.03,
      • 0.06,
      • 0.1,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0.07,
      • 0.14,
      • 1.29,
      • 1.09,
      • 0.21,
      • 0.07,
      • 0.05,
      • 0.57
      ],
    • [
      • "Qwen/Qwen2.5-Omni-7B",
      • "",
      • "",
      • [],
      • "Qwen",
      • "7B",
      • "N/A",
      • "2025.01.03",
      • "2023.12",
      • 0.23,
      • 0.04,
      • 0.07,
      • 0.26,
      • 0,
      • 0.04,
      • 0.06,
      • 0.07,
      • 0.07,
      • 0.11,
      • 0,
      • 0.17,
      • 0.06,
      • 0.04,
      • 0.07,
      • 0.14,
      • 0.14,
      • 0.14,
      • 0.09,
      • 0.04,
      • 0.04,
      • 0.06,
      • 0.14,
      • 0.07,
      • 0.03,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.14,
      • 0.09,
      • 0.09,
      • 0,
      • 0,
      • 0.02,
      • 0.04
      ],
    • [
      • "ai21/jamba-large-1.7",
      • "2023.02",
      • "2024.02",
      • [
        • "2023_02",
        • "2024_02"
        ],
      • "ai21",
      • "398B",
      • "2024.03 / 2024.08.22",
      • "2024.07.03",
      • "2024.02",
      • 0.8,
      • 1.21,
      • 0.93,
      • 1.03,
      • 0.79,
      • 0.71,
      • 1.14,
      • 0.86,
      • 1.21,
      • 1.2,
      • 0.75,
      • 0.86,
      • 1.23,
      • 1,
      • 1.43,
      • 0.83,
      • 1,
      • 1.07,
      • 0.94,
      • 1.04,
      • 1.07,
      • 1.03,
      • 1.18,
      • 0.9,
      • 1.37,
      • 1.14,
      • 0.79,
      • 0.69,
      • 1.14,
      • 0.61,
      • 0.63,
      • 1,
      • 0.86,
      • 0.71,
      • 0.54,
      • 0.76,
      • 0.6,
      • 0.5,
      • 0.04,
      • 0.06,
      • 0.04,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.14,
      • 0.96,
      • 1.06,
      • 0.85,
      • 0.11,
      • 0.02,
      • 0.63
      ],
    • [
      • "amazon/nova-lite-v1",
      • "2023.02",
      • "",
      • [
        • "2023_02"
        ],
      • "amazon",
      • "N/A",
      • "N/A",
      • "2025.03.17",
      • "2021.10",
      • 0.57,
      • 0.21,
      • 0.46,
      • 0.54,
      • 0.29,
      • 0.43,
      • 0.34,
      • 0.36,
      • 0.29,
      • 0.17,
      • 0.21,
      • 0.28,
      • 0.4,
      • 0.21,
      • 0.61,
      • 0.37,
      • 0.5,
      • 0.29,
      • 0.11,
      • 0.21,
      • 0.36,
      • 0.34,
      • 0.39,
      • 0.28,
      • 0.46,
      • 0.18,
      • 0.04,
      • 0,
      • 0,
      • 0.14,
      • 0.17,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0.21,
      • 0.2,
      • 0.07,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.35,
      • 0.34,
      • 0.11,
      • 0.03,
      • 0,
      • 0.17
      ],
    • [
      • "amazon/nova-pro-v1",
      • "2023.02",
      • "2024.05",
      • [
        • "2023_02",
        • "2024_05"
        ],
      • "amazon",
      • "N/A",
      • "N/A",
      • "2025.03.17",
      • "2023.10",
      • 1.09,
      • 0.96,
      • 1.14,
      • 1.09,
      • 1.18,
      • 0.96,
      • 1.23,
      • 0.82,
      • 1.07,
      • 1.11,
      • 1,
      • 0.79,
      • 1.09,
      • 1.43,
      • 1.29,
      • 1.29,
      • 1,
      • 1.07,
      • 1.23,
      • 0.93,
      • 0.86,
      • 1.03,
      • 1.29,
      • 1.38,
      • 1.31,
      • 1,
      • 0.64,
      • 0.54,
      • 0.64,
      • 0.86,
      • 0.91,
      • 0.96,
      • 0.5,
      • 0.6,
      • 0.57,
      • 0.86,
      • 0.57,
      • 0.79,
      • 0.36,
      • 0.43,
      • 0.29,
      • 0.11,
      • 0.17,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.03,
      • 0,
      • 0,
      • 1.04,
      • 1.16,
      • 0.78,
      • 0.23,
      • 0.01,
      • 0.68
      ],
    • [
      • "anthropic/claude-sonnet-4",
      • "2023.02",
      • "2024.12",
      • [
        • "2023_02",
        • "2024_12"
        ],
      • "anthropic",
      • "N/A",
      • "2025.01",
      • "2025.05.22",
      • "2024.04",
      • 1.17,
      • 1.18,
      • 1.46,
      • 1.06,
      • 1.18,
      • 0.93,
      • 1.34,
      • 0.79,
      • 1.39,
      • 1.26,
      • 1.25,
      • 1.14,
      • 1.43,
      • 1.39,
      • 1.75,
      • 1.11,
      • 1.18,
      • 1.21,
      • 1.2,
      • 1.29,
      • 1.18,
      • 1.2,
      • 1.43,
      • 1.31,
      • 1.4,
      • 1.14,
      • 0.82,
      • 1.06,
      • 1.07,
      • 1.18,
      • 1.03,
      • 1.32,
      • 0.93,
      • 0.94,
      • 0.79,
      • 0.69,
      • 0.89,
      • 1,
      • 0.61,
      • 0.54,
      • 0.64,
      • 0.79,
      • 0.91,
      • 1.07,
      • 1.11,
      • 1,
      • 1.21,
      • 0.73,
      • 0.23,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 1.18,
      • 1.31,
      • 1.03,
      • 0.88,
      • 0.03,
      • 0.93
      ],
    • [
      • "claude-3-5-sonnet-20240620",
      • "2023.08",
      • "2024.03",
      • [
        • "2023_08",
        • "2024_03"
        ],
      • "anthropic",
      • "N/A",
      • "2024.04",
      • "2024.08.21",
      • "2022.09",
      • 1.6,
      • 1.68,
      • 1.71,
      • 1.43,
      • 1.71,
      • 1.46,
      • 1.77,
      • 1.36,
      • 1.46,
      • 1.71,
      • 1.71,
      • 1.52,
      • 1.8,
      • 1.75,
      • 1.79,
      • 1.66,
      • 1.68,
      • 1.68,
      • 1.89,
      • 1.61,
      • 1.71,
      • 1.43,
      • 1.61,
      • 1.55,
      • 1.77,
      • 1.64,
      • 1.57,
      • 1.46,
      • 1.57,
      • 1.57,
      • 1.49,
      • 1.46,
      • 1.11,
      • 1.31,
      • 1.46,
      • 1.07,
      • 1.2,
      • 1.18,
      • 0.54,
      • 0.17,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.05,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.59,
      • 1.68,
      • 1.46,
      • 0.26,
      • 0.02,
      • 1.05
      ],
    • [
      • "claude-3-haiku-2024030",
      • "2023.01",
      • "",
      • [
        • "2023_01"
        ],
      • "anthropic",
      • "N/A",
      • "2023.08",
      • "2024.03.13",
      • "2021",
      • 0.89,
      • 0.86,
      • 0.96,
      • 1.03,
      • 0.68,
      • 0.93,
      • 0.94,
      • 0.46,
      • 1,
      • 0.86,
      • 0.64,
      • 0.59,
      • 1.03,
      • 0.71,
      • 1.25,
      • 1.03,
      • 0.82,
      • 0.89,
      • 0.89,
      • 0.79,
      • 0.79,
      • 0.71,
      • 1.25,
      • 0.66,
      • 0.71,
      • 0.14,
      • 0.36,
      • 0.06,
      • 0,
      • 0,
      • 0.06,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0.82,
      • 0.9,
      • 0.12,
      • 0.01,
      • 0.01,
      • 0.39
      ],
    • [
      • "claude-3-opus-20240229",
      • "2023.04",
      • "2023.08",
      • [
        • "2023_04",
        • "2023_08"
        ],
      • "anthropic",
      • "N/A",
      • "2023.08",
      • "2024.03.04",
      • "2021",
      • 1.49,
      • 1.46,
      • 1.54,
      • 1.34,
      • 1.61,
      • 1.29,
      • 1.63,
      • 1.14,
      • 1.5,
      • 1.49,
      • 1.64,
      • 1.52,
      • 1.71,
      • 1.57,
      • 1.61,
      • 1.43,
      • 1.57,
      • 1.61,
      • 1.57,
      • 1.71,
      • 1.61,
      • 1.26,
      • 1.54,
      • 1.62,
      • 1.71,
      • 1.61,
      • 1.29,
      • 1.03,
      • 0.79,
      • 0.71,
      • 0.54,
      • 0.54,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.47,
      • 1.57,
      • 0.69,
      • 0.01,
      • 0.01,
      • 0.79
      ],
    • [
      • "deepseek-ai/DeepSeek-V3-0324",
      • "2023.09",
      • "2024.07",
      • [
        • "2023_09",
        • "2024_07"
        ],
      • "deepseek-ai",
      • "685B",
      • "N/A",
      • "2024.12.27",
      • "2023.09",
      • 1.43,
      • 1.43,
      • 1.29,
      • 1.2,
      • 1.29,
      • 1.39,
      • 1.2,
      • 0.89,
      • 1.39,
      • 1.49,
      • 1.29,
      • 1.03,
      • 1.46,
      • 1.18,
      • 1.71,
      • 1.29,
      • 1.14,
      • 1.21,
      • 1.57,
      • 1.07,
      • 1.25,
      • 0.97,
      • 1.5,
      • 1.34,
      • 1.4,
      • 1.21,
      • 1.04,
      • 0.97,
      • 1.25,
      • 1.11,
      • 1.03,
      • 1.43,
      • 1.21,
      • 0.86,
      • 0.93,
      • 0.97,
      • 1.23,
      • 0.75,
      • 0.96,
      • 0.97,
      • 0.54,
      • 0.86,
      • 0.31,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0.03,
      • 0.06,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.28,
      • 1.31,
      • 1.12,
      • 0.48,
      • 0.01,
      • 0.88
      ],
    • [
      • "deepseek-ai/DeepSeek-Chat-V3.1",
      • "2024.07",
      • "",
      • [
        • "2024_07"
        ],
      • "deepseek-ai",
      • "671B",
      • "N/A",
      • "2025.08.21",
      • "2023.10",
      • 1.2,
      • 1.32,
      • 1,
      • 1.37,
      • 1.07,
      • 1.21,
      • 1.2,
      • 0.75,
      • 1.11,
      • 1.31,
      • 1.29,
      • 1.14,
      • 1.43,
      • 1.11,
      • 1.39,
      • 1.09,
      • 1,
      • 1.21,
      • 1.29,
      • 0.86,
      • 1.07,
      • 0.91,
      • 1.21,
      • 1.21,
      • 1.31,
      • 1.43,
      • 1.07,
      • 0.94,
      • 1.14,
      • 1.29,
      • 0.86,
      • 1.25,
      • 1.18,
      • 1.14,
      • 1.04,
      • 1.21,
      • 1.34,
      • 1.04,
      • 1.36,
      • 1.11,
      • 1,
      • 1.21,
      • 0.54,
      • 0.07,
      • 0,
      • 0.04,
      • 0,
      • 0.1,
      • 0.11,
      • 0.05,
      • 0.11,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.16,
      • 1.15,
      • 1.16,
      • 0.65,
      • 0.04,
      • 0.87
      ],
    • [
      • "deepseek-ai/DeepSeek-R1-0528",
      • "2024.07",
      • "",
      • [
        • "2024_07"
        ],
      • "deepseek-ai",
      • "685B",
      • "N/A",
      • "2025.01.22",
      • "2024.07",
      • 1.46,
      • 1.64,
      • 1.32,
      • 1.46,
      • 1.46,
      • 1.54,
      • 1.51,
      • 1.14,
      • 1.61,
      • 1.63,
      • 1.39,
      • 1.48,
      • 1.69,
      • 1.79,
      • 1.75,
      • 1.31,
      • 1.43,
      • 1.39,
      • 1.69,
      • 1.57,
      • 1.21,
      • 1.31,
      • 1.54,
      • 1.45,
      • 1.6,
      • 1.39,
      • 1.21,
      • 1.49,
      • 1.79,
      • 1.57,
      • 1.37,
      • 1.57,
      • 1.54,
      • 1.2,
      • 1.29,
      • 1.34,
      • 1.46,
      • 1.5,
      • 1.43,
      • 1.6,
      • 1.5,
      • 1.43,
      • 0.8,
      • 0.21,
      • 0.11,
      • 0.04,
      • 0.07,
      • 0.13,
      • 0.11,
      • 0.05,
      • 0.14,
      • 0.03,
      • 0,
      • 0.04,
      • 0.06,
      • 0,
      • 0.14,
      • 1.47,
      • 1.51,
      • 1.45,
      • 0.86,
      • 0.06,
      • 1.12
      ],
    • [
      • "google/gemini-2.0-flash-001",
      • "2024.06",
      • "",
      • [
        • "2024_06"
        ],
      • "google",
      • "N/A",
      • "2024.06",
      • "2024.12.11",
      • "2021.09/2023",
      • 1.54,
      • 1.5,
      • 1.61,
      • 1.29,
      • 1.46,
      • 1.14,
      • 1.51,
      • 1.18,
      • 1.54,
      • 1.66,
      • 1.64,
      • 1.28,
      • 1.74,
      • 1.71,
      • 1.68,
      • 1.57,
      • 1.29,
      • 1.5,
      • 1.6,
      • 1.86,
      • 1.46,
      • 1.4,
      • 1.64,
      • 1.69,
      • 1.69,
      • 1.61,
      • 1.11,
      • 1.34,
      • 1.54,
      • 1.39,
      • 1.46,
      • 1.54,
      • 1.5,
      • 1.06,
      • 1.18,
      • 1.31,
      • 1.46,
      • 1.36,
      • 1.46,
      • 1.43,
      • 1.29,
      • 1.14,
      • 0.14,
      • 0,
      • 0.14,
      • 0.04,
      • 0.07,
      • 0.07,
      • 0.09,
      • 0,
      • 0.11,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0.07,
      • 0,
      • 1.45,
      • 1.59,
      • 1.39,
      • 0.72,
      • 0.04,
      • 1.09
      ],
    • [
      • "google/gemini-2.5-flash",
      • "2024.03",
      • "2024.06",
      • [
        • "2024_03",
        • "2024_06"
        ],
      • "google",
      • "N/A",
      • "2025.01",
      • "2025.07.22",
      • "2024.07",
      • 1.23,
      • 1.21,
      • 1.46,
      • 1.29,
      • 1.18,
      • 1.21,
      • 1.43,
      • 0.79,
      • 1.39,
      • 1.34,
      • 1.14,
      • 1.41,
      • 1.46,
      • 1.43,
      • 1.43,
      • 1.29,
      • 1.21,
      • 1.11,
      • 1.6,
      • 1.43,
      • 1.46,
      • 1.09,
      • 1.43,
      • 1.52,
      • 1.34,
      • 1.46,
      • 0.68,
      • 0.86,
      • 1.36,
      • 1.21,
      • 1.23,
      • 1.39,
      • 1.07,
      • 0.97,
      • 0.93,
      • 1.1,
      • 1.17,
      • 1.25,
      • 1.14,
      • 0.63,
      • 0.61,
      • 0.57,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.03,
      • 0,
      • 0,
      • 1.26,
      • 1.37,
      • 1.13,
      • 0.45,
      • 0,
      • 0.89
      ],
    • [
      • "google/gemma-3-27b-it",
      • "2023.02",
      • "2024.05",
      • [
        • "2023_02",
        • "2024_05"
        ],
      • "google",
      • "27B",
      • "2024.08",
      • "2025.08.14",
      • "2023.11.09",
      • 1.09,
      • 0.79,
      • 1.18,
      • 1.23,
      • 1.07,
      • 0.82,
      • 1.06,
      • 0.64,
      • 1,
      • 1.17,
      • 0.75,
      • 1.21,
      • 1.26,
      • 1.36,
      • 1.43,
      • 1.11,
      • 0.96,
      • 1.07,
      • 1.26,
      • 0.96,
      • 1.18,
      • 1.06,
      • 1.25,
      • 1.17,
      • 1.43,
      • 1.14,
      • 0.86,
      • 0.66,
      • 1.04,
      • 0.96,
      • 0.77,
      • 1.07,
      • 0.89,
      • 0.91,
      • 0.96,
      • 0.86,
      • 1.06,
      • 0.68,
      • 1,
      • 0.54,
      • 0.79,
      • 0.25,
      • 0.23,
      • 0.07,
      • 0.2,
      • 0.11,
      • 0.18,
      • 0.23,
      • 0.06,
      • 0.19,
      • 0.07,
      • 0.03,
      • 0,
      • 0.11,
      • 0.26,
      • 0.07,
      • 0,
      • 1,
      • 1.17,
      • 0.96,
      • 0.45,
      • 0.09,
      • 0.77
      ],
    • [
      • "google/gemma-3-4b-it",
      • "2024.01",
      • "",
      • [
        • "2024_01"
        ],
      • "google",
      • "4B",
      • "2024.08",
      • "2025.08.14",
      • "2023.11.09",
      • 0.43,
      • 0.21,
      • 0.14,
      • 0.37,
      • 0.25,
      • 0.21,
      • 0.29,
      • 0.25,
      • 0.5,
      • 0.43,
      • 0.18,
      • 0.48,
      • 0.37,
      • 0.5,
      • 0.36,
      • 0.51,
      • 0.68,
      • 0.25,
      • 0.34,
      • 0.32,
      • 0.29,
      • 0.46,
      • 0.46,
      • 0.59,
      • 0.34,
      • 0.25,
      • 0.36,
      • 0.2,
      • 0.18,
      • 0.36,
      • 0.37,
      • 0.43,
      • 0.32,
      • 0.4,
      • 0.39,
      • 0.31,
      • 0.37,
      • 0.18,
      • 0.21,
      • 0.17,
      • 0.18,
      • 0.11,
      • 0.09,
      • 0.11,
      • 0.03,
      • 0.14,
      • 0.14,
      • 0.33,
      • 0.23,
      • 0.24,
      • 0.14,
      • 0.09,
      • 0,
      • 0.04,
      • 0.17,
      • 0.14,
      • 0.29,
      • 0.31,
      • 0.43,
      • 0.33,
      • 0.17,
      • 0.15,
      • 0.28
      ],
    • [
      • "gpt-3.5-turbo-0125",
      • "2021.09",
      • "2022.04",
      • [
        • "2021_09",
        • "2022_04"
        ],
      • "openai",
      • "175B",
      • "2021.09.01",
      • "2022.11.30",
      • "2021.09",
      • 1.4,
      • 1.5,
      • 1.46,
      • 1.46,
      • 1.39,
      • 1.43,
      • 1.43,
      • 0.86,
      • 1.29,
      • 1,
      • 0.96,
      • 1,
      • 0.71,
      • 0.96,
      • 1.11,
      • 0.77,
      • 0.18,
      • 0.32,
      • 0.11,
      • 0.18,
      • 0.21,
      • 0.11,
      • 0.18,
      • 0.24,
      • 0.2,
      • 0.14,
      • 0.11,
      • 0.17,
      • 0.18,
      • 0.07,
      • 0.11,
      • 0.14,
      • 0.07,
      • 0.06,
      • 0.14,
      • 0.1,
      • 0.23,
      • 0.14,
      • 0.11,
      • 0.03,
      • 0.21,
      • 0.04,
      • 0.06,
      • 0.04,
      • 0.17,
      • 0.25,
      • 0.32,
      • 0.2,
      • 0.09,
      • 0.14,
      • 0.07,
      • 0.06,
      • 0.04,
      • 0.18,
      • 0.2,
      • 0.07,
      • 0.64,
      • 1.26,
      • 0.42,
      • 0.12,
      • 0.15,
      • 0.17,
      • 0.44
      ],
    • [
      • "gpt-4o-2024-08-06",
      • "2023.10",
      • "",
      • [
        • "2023_10"
        ],
      • "openai",
      • "N/A",
      • "2023.10.01",
      • "2024.08.06",
      • "2023.10",
      • 1.34,
      • 1.57,
      • 1.46,
      • 1.31,
      • 1.36,
      • 1.32,
      • 1.63,
      • 1.36,
      • 1.57,
      • 1.51,
      • 1.68,
      • 1.59,
      • 1.6,
      • 1.5,
      • 1.71,
      • 1.49,
      • 1.36,
      • 1.5,
      • 1.46,
      • 1.75,
      • 1.89,
      • 1.54,
      • 1.54,
      • 1.45,
      • 1.46,
      • 1.46,
      • 1.21,
      • 1.4,
      • 1.11,
      • 1.54,
      • 1.29,
      • 1.32,
      • 1.18,
      • 0.97,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.48,
      • 1.57,
      • 1.08,
      • 0,
      • 0.01,
      • 0.87
      ],
    • [
      • "gpt-4o-mini-2024-07-18",
      • "2023.02",
      • "2023.09",
      • [
        • "2023_02",
        • "2023_09"
        ],
      • "openai",
      • "N/A",
      • "2023.10.01",
      • "2024.07.18",
      • "2021.10",
      • 1.03,
      • 1.18,
      • 1.29,
      • 1.31,
      • 1.04,
      • 1.14,
      • 1,
      • 0.57,
      • 1.25,
      • 1.03,
      • 0.93,
      • 0.97,
      • 1.46,
      • 1.14,
      • 1.21,
      • 1.14,
      • 1,
      • 1,
      • 1.17,
      • 0.57,
      • 1.14,
      • 1.06,
      • 1.29,
      • 1.03,
      • 1.31,
      • 0.86,
      • 0.64,
      • 0.71,
      • 0.93,
      • 0.36,
      • 0.51,
      • 0.86,
      • 0.57,
      • 0.14,
      • 0,
      • 0.07,
      • 0,
      • 0.14,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.04,
      • 0.17,
      • 0,
      • 0,
      • 1.06,
      • 1.1,
      • 0.58,
      • 0.02,
      • 0.02,
      • 0.58
      ],
    • [
      • "meta-llama/Llama-3.1-8B-Instruct",
      • "2023.03",
      • "",
      • [
        • "2023_03"
        ],
      • "meta-llama",
      • "8B",
      • "2023.12",
      • "2024.07.21",
      • "2023.12",
      • 0.8,
      • 0.61,
      • 0.68,
      • 0.8,
      • 0.25,
      • 0.61,
      • 0.4,
      • 0.14,
      • 0.64,
      • 0.57,
      • 0.39,
      • 0.62,
      • 0.69,
      • 0.64,
      • 0.71,
      • 0.69,
      • 0.39,
      • 0.5,
      • 0.6,
      • 0.25,
      • 0.61,
      • 0.57,
      • 0.71,
      • 0.66,
      • 0.57,
      • 0.46,
      • 0.36,
      • 0.11,
      • 0.18,
      • 0,
      • 0.06,
      • 0.07,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.14,
      • 0,
      • 0,
      • 0.54,
      • 0.58,
      • 0.16,
      • 0,
      • 0.02,
      • 0.27
      ],
    • [
      • "meta-llama/Llama-3.3-70B-Instruct",
      • "2023.02",
      • "2023.09",
      • [
        • "2023_02",
        • "2023_09"
        ],
      • "meta-llama",
      • "70B",
      • "2023.12",
      • "2024.12.06",
      • "2023.12",
      • 1.54,
      • 1.64,
      • 1.64,
      • 1.51,
      • 1.43,
      • 1.25,
      • 1.54,
      • 1.07,
      • 1.39,
      • 1.31,
      • 1.46,
      • 1.38,
      • 1.63,
      • 1.36,
      • 1.61,
      • 1.34,
      • 1.21,
      • 1.43,
      • 1.4,
      • 1.36,
      • 1.36,
      • 1.06,
      • 1.43,
      • 1.31,
      • 1.54,
      • 1.18,
      • 0.61,
      • 0.43,
      • 0.57,
      • 0.21,
      • 0.29,
      • 0.5,
      • 0.21,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.43,
      • 1.38,
      • 0.46,
      • 0,
      • 0.01,
      • 0.69
      ],
    • [
      • "meta-llama/Llama-4-Scout-17B-16E-Instruct",
      • "2023.01",
      • "2024.03",
      • [
        • "2023_01",
        • "2024_03"
        ],
      • "meta-llama",
      • "17B",
      • "2024.08",
      • "2025.04.05",
      • "2024.08",
      • 1.03,
      • 0.75,
      • 0.79,
      • 1.06,
      • 0.5,
      • 0.61,
      • 0.77,
      • 0.25,
      • 0.68,
      • 0.66,
      • 0.57,
      • 0.66,
      • 0.91,
      • 1.14,
      • 0.89,
      • 0.8,
      • 1,
      • 0.61,
      • 0.71,
      • 0.57,
      • 0.79,
      • 0.83,
      • 0.93,
      • 0.83,
      • 0.94,
      • 0.39,
      • 0.43,
      • 0.66,
      • 0.32,
      • 0.36,
      • 0.43,
      • 0.79,
      • 0.36,
      • 0.43,
      • 0.46,
      • 0.45,
      • 0.49,
      • 0.36,
      • 0.29,
      • 0.09,
      • 0.18,
      • 0.29,
      • 0.11,
      • 0.14,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0.09,
      • 0,
      • 0,
      • 0.69,
      • 0.83,
      • 0.5,
      • 0.17,
      • 0.02,
      • 0.47
      ],
    • [
      • "meta-llama/Meta-Llama-3.1-70B-Instruct",
      • "2023.02",
      • "2023.09",
      • [
        • "2023_02",
        • "2023_09"
        ],
      • "meta-llama",
      • "70B",
      • "2023.12",
      • "2024.07.21",
      • "2023.12",
      • 1.4,
      • 1.61,
      • 1.61,
      • 1.46,
      • 1.29,
      • 1.25,
      • 1.51,
      • 0.96,
      • 1.36,
      • 1.57,
      • 1.46,
      • 1.38,
      • 1.66,
      • 1.54,
      • 1.75,
      • 1.46,
      • 1.07,
      • 1.5,
      • 1.4,
      • 1.54,
      • 1.43,
      • 1.11,
      • 1.43,
      • 1.28,
      • 1.66,
      • 1.46,
      • 0.93,
      • 0.74,
      • 0.79,
      • 0.46,
      • 0.17,
      • 0.86,
      • 0.14,
      • 0.14,
      • 0.04,
      • 0,
      • 0,
      • 0.07,
      • 0.04,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.04,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.41,
      • 1.43,
      • 0.62,
      • 0.01,
      • 0.01,
      • 0.73
      ],
    • [
      • "mistralai/Mixtral-8x22B-Instruct-v0.1",
      • "2022.07",
      • "2023.02",
      • [
        • "2022_07",
        • "2023_02",
        • "2023_06"
        ],
      • "mistralai",
      • "22B",
      • "N/A",
      • "2024.01.08",
      • "2023",
      • 1.34,
      • 1.36,
      • 1.39,
      • 1.43,
      • 1.39,
      • 1.39,
      • 1.51,
      • 1.11,
      • 1.46,
      • 1.51,
      • 1.43,
      • 1.17,
      • 1.37,
      • 1.32,
      • 1.57,
      • 1.29,
      • 1.21,
      • 1.32,
      • 1.23,
      • 1,
      • 1.29,
      • 0.77,
      • 0.96,
      • 1.21,
      • 1.14,
      • 0.57,
      • 0.36,
      • 0.51,
      • 0.57,
      • 0.14,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.04,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.03,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.09,
      • 0,
      • 0,
      • 1.37,
      • 1.21,
      • 0.28,
      • 0.01,
      • 0.01,
      • 0.61
      ],
    • [
      • "mistralai/ministral-8b",
      • "2023.01",
      • "",
      • [
        • "2023_01"
        ],
      • "mistralai",
      • "8B",
      • "N/A",
      • "2024.10.09",
      • "2023.10",
      • 0.37,
      • 0.54,
      • 0.11,
      • 0.37,
      • 0.07,
      • 0.36,
      • 0.11,
      • 0.18,
      • 0.18,
      • 0.17,
      • 0.39,
      • 0.28,
      • 0.26,
      • 0.18,
      • 0.61,
      • 0.34,
      • 0.25,
      • 0.32,
      • 0.11,
      • 0.07,
      • 0.39,
      • 0.2,
      • 0.18,
      • 0.28,
      • 0.29,
      • 0.04,
      • 0.14,
      • 0.09,
      • 0.21,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.1,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.14,
      • 0.26,
      • 0.27,
      • 0.06,
      • 0.01,
      • 0.02,
      • 0.13
      ],
    • [
      • "mistralai/mistral-medium-3",
      • "2023.02",
      • "2024.07",
      • [
        • "2023_02",
        • "2024_07"
        ],
      • "mistralai",
      • "N/A",
      • "N/A",
      • "2025.05.07",
      • "2023.12",
      • 1.2,
      • 1.46,
      • 1.46,
      • 1.31,
      • 1.21,
      • 1.21,
      • 1.2,
      • 0.82,
      • 1.32,
      • 1.4,
      • 1.46,
      • 1.24,
      • 1.51,
      • 1.39,
      • 1.54,
      • 1.2,
      • 1.36,
      • 1.29,
      • 1.37,
      • 1.25,
      • 1.14,
      • 1.06,
      • 1.57,
      • 1.41,
      • 1.57,
      • 1.14,
      • 1,
      • 0.89,
      • 0.96,
      • 0.89,
      • 1.09,
      • 1.21,
      • 0.86,
      • 0.8,
      • 0.96,
      • 0.97,
      • 0.83,
      • 0.75,
      • 0.96,
      • 0.66,
      • 0.93,
      • 0.71,
      • 0.49,
      • 0.14,
      • 0.11,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 1.27,
      • 1.34,
      • 1.03,
      • 0.47,
      • 0.01,
      • 0.87
      ],
    • [
      • "mistralai/mistral-medium-3.1",
      • "2023.02",
      • "2024.07",
      • [
        • "2023_02",
        • "2024_07"
        ],
      • "mistralai",
      • "N/A",
      • "N/A",
      • "2025.08.12",
      • "2024.11",
      • 1.46,
      • 1.5,
      • 1.36,
      • 1.29,
      • 1.32,
      • 1.25,
      • 1.29,
      • 0.96,
      • 1.32,
      • 1.46,
      • 1.57,
      • 1.28,
      • 1.37,
      • 1.43,
      • 1.64,
      • 1.17,
      • 1.18,
      • 1.29,
      • 1.43,
      • 1.29,
      • 1.43,
      • 1.03,
      • 1.5,
      • 1.31,
      • 1.43,
      • 1.36,
      • 1,
      • 1,
      • 1.07,
      • 1.14,
      • 1.2,
      • 1.32,
      • 0.86,
      • 0.89,
      • 0.93,
      • 1.1,
      • 0.89,
      • 0.71,
      • 0.93,
      • 0.74,
      • 0.89,
      • 0.82,
      • 0.49,
      • 0.29,
      • 0.23,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0.03,
      • 0,
      • 0,
      • 1.34,
      • 1.34,
      • 1.11,
      • 0.5,
      • 0.01,
      • 0.9
      ],
    • [
      • "moonshotai/kimi-k2",
      • "2023.02",
      • "2025.01",
      • [
        • "2023_02",
        • "2025_01"
        ],
      • "moonshotai",
      • "1T",
      • "2025.07",
      • "2025.08.28",
      • "2025.04",
      • 1.46,
      • 1.79,
      • 1.68,
      • 1.31,
      • 1.5,
      • 1.54,
      • 1.74,
      • 1.32,
      • 1.5,
      • 1.54,
      • 1.75,
      • 1.34,
      • 1.63,
      • 1.57,
      • 1.82,
      • 1.54,
      • 1.36,
      • 1.57,
      • 1.74,
      • 1.64,
      • 1.36,
      • 1.31,
      • 1.57,
      • 1.34,
      • 1.77,
      • 1.64,
      • 1.07,
      • 1.26,
      • 1.5,
      • 1.32,
      • 1.37,
      • 1.46,
      • 1.25,
      • 1.03,
      • 1.14,
      • 1.52,
      • 1.63,
      • 1.43,
      • 1.29,
      • 1.34,
      • 1.25,
      • 1.43,
      • 1.26,
      • 1.18,
      • 1.31,
      • 0.96,
      • 1.07,
      • 1,
      • 0.74,
      • 0.38,
      • 0.25,
      • 0.11,
      • 0,
      • 0.14,
      • 0.09,
      • 0,
      • 0.29,
      • 1.54,
      • 1.54,
      • 1.36,
      • 1.26,
      • 0.22,
      • 1.24
      ],
    • [
      • "openai/gpt-oss-120b",
      • "2023.08",
      • "",
      • [
        • "2023_08"
        ],
      • "openai",
      • "120B",
      • "2024.07",
      • "2025.08.05",
      • "2021.09",
      • 0.8,
      • 0.93,
      • 0.64,
      • 1.06,
      • 0.61,
      • 0.82,
      • 0.74,
      • 0.64,
      • 0.86,
      • 0.69,
      • 0.54,
      • 0.72,
      • 0.71,
      • 0.86,
      • 0.82,
      • 0.69,
      • 1.29,
      • 0.5,
      • 0.8,
      • 0.71,
      • 0.5,
      • 0.8,
      • 0.93,
      • 0.83,
      • 0.91,
      • 0.43,
      • 0.57,
      • 0.49,
      • 0.39,
      • 0.68,
      • 0.4,
      • 0.61,
      • 0.32,
      • 0.31,
      • 0.14,
      • 0.21,
      • 0.34,
      • 0.36,
      • 0.14,
      • 0.31,
      • 0.14,
      • 0.32,
      • 0.17,
      • 0.36,
      • 0.14,
      • 0.18,
      • 0.07,
      • 0.3,
      • 0.11,
      • 0.14,
      • 0.11,
      • 0,
      • 0,
      • 0.25,
      • 0.11,
      • 0.14,
      • 0.07,
      • 0.75,
      • 0.79,
      • 0.46,
      • 0.24,
      • 0.1,
      • 0.49
      ],
    • [
      • "openai/gpt-oss-20b",
      • "2023.09",
      • "",
      • [
        • "2023_09"
        ],
      • "openai",
      • "20B",
      • "2024.07",
      • "2025.08.05",
      • "2021.09",
      • 0.46,
      • 0.36,
      • 0.25,
      • 0.43,
      • 0.21,
      • 0.32,
      • 0.31,
      • 0.11,
      • 0.21,
      • 0.26,
      • 0.32,
      • 0.45,
      • 0.2,
      • 0.43,
      • 0.39,
      • 0.31,
      • 0.75,
      • 0.5,
      • 0.26,
      • 0.14,
      • 0.25,
      • 0.37,
      • 0.32,
      • 0.41,
      • 0.29,
      • 0.39,
      • 0.29,
      • 0.31,
      • 0.46,
      • 0.39,
      • 0.23,
      • 0.25,
      • 0.11,
      • 0.14,
      • 0.18,
      • 0.14,
      • 0.06,
      • 0.18,
      • 0.07,
      • 0.06,
      • 0.04,
      • 0.14,
      • 0.03,
      • 0.11,
      • 0.17,
      • 0.07,
      • 0.07,
      • 0.07,
      • 0.11,
      • 0.1,
      • 0.14,
      • 0.11,
      • 0.11,
      • 0.21,
      • 0.23,
      • 0.14,
      • 0.14,
      • 0.31,
      • 0.36,
      • 0.27,
      • 0.09,
      • 0.14,
      • 0.24
      ],
    • [
      • "openchat/openchat-3.5-1210",
      • "2021.04",
      • "2023.01",
      • [
        • "2021_04",
        • "2023_01"
        ],
      • "openchat",
      • "7B",
      • "N/A",
      • "2024.01.06",
      • "2021.09",
      • 1,
      • 0.57,
      • 0.71,
      • 0.86,
      • 0.32,
      • 0.43,
      • 0.43,
      • 0.36,
      • 0.46,
      • 0.46,
      • 0.36,
      • 0.38,
      • 0.57,
      • 0.43,
      • 0.36,
      • 0.43,
      • 0.36,
      • 0.25,
      • 0.43,
      • 0.14,
      • 0.25,
      • 0.17,
      • 0.29,
      • 0.17,
      • 0.26,
      • 0,
      • 0.21,
      • 0,
      • 0.07,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0,
      • 0.53,
      • 0.32,
      • 0.05,
      • 0.01,
      • 0.01,
      • 0.19
      ],
    • [
      • "x-ai/grok-3",
      • "2024.12",
      • "",
      • [
        • "2024_12"
        ],
      • "x-ai",
      • "N/A",
      • "2024.11",
      • "2025.02.19",
      • "2023.10",
      • 1.46,
      • 1.61,
      • 1.5,
      • 1.63,
      • 1.46,
      • 1.46,
      • 1.63,
      • 1.14,
      • 1.82,
      • 1.8,
      • 1.64,
      • 1.72,
      • 1.77,
      • 1.82,
      • 1.61,
      • 1.31,
      • 1.5,
      • 1.64,
      • 1.74,
      • 1.5,
      • 1.71,
      • 1.4,
      • 1.61,
      • 1.66,
      • 1.74,
      • 1.64,
      • 1.43,
      • 1.66,
      • 1.75,
      • 1.5,
      • 1.63,
      • 1.54,
      • 1.64,
      • 1.6,
      • 1.46,
      • 1.62,
      • 1.66,
      • 1.82,
      • 1.57,
      • 1.71,
      • 1.54,
      • 1.57,
      • 1.66,
      • 1.71,
      • 1.6,
      • 1.71,
      • 1.61,
      • 1.03,
      • 0.11,
      • 0.05,
      • 0.14,
      • 0.03,
      • 0,
      • 0.07,
      • 0.06,
      • 0.04,
      • 0.14,
      • 1.57,
      • 1.61,
      • 1.6,
      • 1.6,
      • 0.07,
      • 1.35
      ],
    • [
      • "x-ai/grok-4",
      • "2024.11",
      • "2025.01",
      • [
        • "2024_11",
        • "2025_01"
        ],
      • "x-ai",
      • "N/A",
      • "2024.11",
      • "2025.08.20",
      • "2023.10",
      • 1.77,
      • 1.86,
      • 1.79,
      • 1.71,
      • 1.79,
      • 1.57,
      • 1.77,
      • 1.36,
      • 1.79,
      • 1.74,
      • 1.54,
      • 1.76,
      • 1.71,
      • 1.86,
      • 1.82,
      • 1.69,
      • 1.46,
      • 1.61,
      • 1.71,
      • 1.61,
      • 1.75,
      • 1.46,
      • 1.75,
      • 1.86,
      • 1.66,
      • 1.96,
      • 1.54,
      • 1.77,
      • 1.68,
      • 1.68,
      • 1.8,
      • 1.54,
      • 1.64,
      • 1.51,
      • 1.68,
      • 1.69,
      • 1.49,
      • 1.68,
      • 1.5,
      • 1.57,
      • 1.71,
      • 1.61,
      • 1.49,
      • 1.64,
      • 1.6,
      • 1.71,
      • 1.25,
      • 1,
      • 0.23,
      • 0,
      • 0.07,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.14,
      • 1.7,
      • 1.69,
      • 1.68,
      • 1.52,
      • 0.05,
      • 1.4
      ],
    • [
      • "z-ai/glm-4.5",
      • "2024.03",
      • "2024.12",
      • [
        • "2024_03",
        • "2024_12"
        ],
      • "z-ai",
      • "355B",
      • "N/A",
      • "2025.08.08",
      • "2024.06",
      • 1.11,
      • 1.18,
      • 1.18,
      • 1.2,
      • 1.29,
      • 1.25,
      • 1.26,
      • 0.86,
      • 1.39,
      • 1.2,
      • 1.18,
      • 1.03,
      • 1.51,
      • 1.46,
      • 1.71,
      • 1.43,
      • 1.29,
      • 1.5,
      • 1.54,
      • 1.5,
      • 1.36,
      • 1.09,
      • 1.5,
      • 1.59,
      • 1.43,
      • 1.39,
      • 1.04,
      • 1.09,
      • 1.39,
      • 1.39,
      • 1.2,
      • 1.36,
      • 1,
      • 1.23,
      • 1.39,
      • 1.24,
      • 0.97,
      • 1.21,
      • 1,
      • 1,
      • 0.89,
      • 1,
      • 0.94,
      • 0.71,
      • 0.63,
      • 0.5,
      • 0.57,
      • 0.43,
      • 0.23,
      • 0.1,
      • 0.11,
      • 0,
      • 0,
      • 0,
      • 0.06,
      • 0,
      • 0.14,
      • 1.18,
      • 1.46,
      • 1.26,
      • 0.82,
      • 0.07,
      • 1
      ]
    ],
  • "metadata": null
}


Model Comparison: Faithfulness to Ideal Answer with PELT Changepoints

The visualizations below present detailed per-model analysis using the PELT (Pruned Exact Linear Time) changepoint detection algorithm to identify significant shifts in faithfulness scores over time.

  • Blue scatter points represent individual faithfulness scores (0-2 scale, left y-axis) for questions ordered by event date.
  • Red horizontal lines indicate mean faithfulness within segments identified by PELT, with red dashed vertical lines marking detected changepoints—possible training boundaries where performance characteristics shift.
  • The green curve shows cumulative average faithfulness over time.
  • The orange line (right y-axis) tracks cumulative refusals, revealing how often models decline to answer questions beyond their knowledge boundaries.
  • Yellow percentage labels indicate refusal rates within each segment—models instruction-tuned to acknowledge their limitations exhibit sharp increases in refusals after cutoff dates, while others continue attempting answers despite lacking relevant training data, potentially leading to hallucination.

Select models to compare how different architectures and training approaches handle temporal knowledge boundaries. Some models exhibit single sharp cutoffs, while others show multiple partial boundaries possibly corresponding to distinct pretraining, continued pretraining, and post-training phases.

Enable to compare two models side by side

Select Model 1