The Sequence Radar #885: Last Week in AI: Models, Games, and the Future of Evaluation

New model releases, new agents and a soccer cup.

The Sequence Radar #885: Last Week in AI: Models, Games, and the Future of Evaluation

TL;DR

  • OpenAI released GPT 5.6 with Sol, Terra, and Luna models, emphasizing tiered intelligence for different market needs and a phased-access strategy focused on safety and control.
  • Anthropic introduced Claude Tag, a feature that allows users to structure prompts and responses with semantic markers, facilitating better context tracking and evolving human-AI interaction towards structured collaboration.
  • General Intuition raised $320M to develop 'large action models' trained on action-labeled gameplay data, viewing video games as a rich substrate for embodied AI.
  • The LayerLens Stratix Cup demonstrated a new method of AI evaluation through a soccer tournament, where models competed by writing their own strategies and adapting in real-time.
  • The article notes a shift in AI development from chatbots to more organism-like systems that sense, plan, act, fail, and adapt.
  • Research papers covered include Autodata for synthetic data generation, iLLaDA for large language diffusion models, evaluations of agent memory systems (MEMPROBE), Qwen-AgentWorld for general agents, and Tapered Language Models.
  • Recent AI tech releases include GPT 5.6 Sol, Claude Tag, and Mistral OCR.
  • Several AI companies received significant funding, including Patronus AI ($50M), General Intuition ($320M), Netris ($15M), and Groq ($650M).