The Sequence Radar #885: Last Week in AI: Models, Games, and the Future of Evaluation

TL;DR

OpenAI released GPT 5.6 with Sol, Terra, and Luna models, emphasizing tiered intelligence for different market needs and a phased-access strategy focused on safety and control.
Anthropic introduced Claude Tag, a feature that allows users to structure prompts and responses with semantic markers, facilitating better context tracking and evolving human-AI interaction towards structured collaboration.
General Intuition raised $320M to develop 'large action models' trained on action-labeled gameplay data, viewing video games as a rich substrate for embodied AI.
The LayerLens Stratix Cup demonstrated a new method of AI evaluation through a soccer tournament, where models competed by writing their own strategies and adapting in real-time.
The article notes a shift in AI development from chatbots to more organism-like systems that sense, plan, act, fail, and adapt.
Research papers covered include Autodata for synthetic data generation, iLLaDA for large language diffusion models, evaluations of agent memory systems (MEMPROBE), Qwen-AgentWorld for general agents, and Tapered Language Models.
Recent AI tech releases include GPT 5.6 Sol, Claude Tag, and Mistral OCR.
Several AI companies received significant funding, including Patronus AI ($50M), General Intuition ($320M), Netris ($15M), and Groq ($650M).

Continue reading the original article