GPT-5.5 Performance Issue
GPT-5.5 model shows degraded performance due to token clustering anomaly, affecting complex tasks
Key points
- GPT-5.5 responses cluster at exactly 516 reasoning tokens, with additional spikes at 1034 and 1552 tokens
- This anomaly is model-specific and coincides with lower overall reasoning-token intensity
- Analysis of 390,195 response-level token records from Feb 1-Jun 27, 2026, reveals 82% of exact-516 events are from GPT-5.5
- GPT-5.5 accounts for 19.3% of all responses, but 82% of exact-516 events, indicating a potential thresholded reasoning-budget behavior
- The issue is related to a previous report of GPT-5.5 returning wrong answers when reasoning tokens equal 516
- The anomaly may contribute to degraded performance on complex and high-stakes tasks
Introduction
The GPT-5.5 model has been found to exhibit a token clustering anomaly, which may be contributing to its degraded performance on complex tasks.
Analysis
An analysis of 390,195 response-level token records from February 1 to June 27, 2026, revealed a disproportionate number of responses with exactly 516 reasoning tokens.
Implications
This anomaly is specific to the GPT-5.5 model and may be related to a thresholded reasoning-budget behavior. The issue has significant implications for the performance of the model on high-stakes tasks.
Sources
The WireByte editorial team synthesises technology news from multiple primary sources, verifies the facts, and links every source. Articles are produced with AI assistance and reviewed under our editorial policy.