SIMA 2

SIMA 2 is a generalist AI agent developed by Google DeepMind that operates in interactive 3D virtual environments, combining language understanding, reasoning, and action execution. It leverages the Gemini model to interpret multimodal inputs (text, voice, images) and perform complex tasks in games and simulated worlds.
The core value of SIMA 2 lies in its ability to bridge language-based instructions with dynamic, goal-oriented behavior in virtual environments, enabling collaborative problem-solving and open-ended learning.

Advanced Reasoning with Gemini Integration: SIMA 2 uses Gemini’s reasoning capabilities to decompose high-level goals into actionable steps, such as interpreting abstract commands like "Build a shelter before nightfall" in survival games. It generates real-time explanations of its decisions, enabling transparent collaboration with users.
Cross-Environment Generalization: The agent applies learned skills (e.g., resource gathering in Minecraft) to novel scenarios (e.g., harvesting in ASKA) without additional training. It achieves 68% task success in unseen games like MineDojo, compared to SIMA 1’s 42% on the same benchmarks.
Self-Improvement via Generative Feedback: SIMA 2 iteratively refines its performance using Gemini-generated reward signals and autonomously expands its skill library. In tests, it improved task completion rates by 31% across three self-training cycles in procedurally generated Genie 3 environments.

Limited Adaptability of Narrow AI Agents: Traditional game AI struggles with open-ended tasks and cross-environment transfer. SIMA 2 solves this through its unified architecture trained on 10+ commercial games and synthetic environments.
Target User Group: Developers of interactive 3D platforms (games, simulations) and researchers studying embodied AI. Early partners include ASKA developers Thunderful Games and Space Engineers creators Keen Software House.
Use Case Scenarios: Enabling players to verbally guide AI companions in survival games, training robotics algorithms in procedurally generated worlds, and stress-testing game mechanics through adaptive AI behavior.

Multimodal World Modeling: Unlike single-modality agents, SIMA 2 processes visual inputs (game pixels), text commands, and hand-drawn sketches simultaneously through Gemini’s multimodal architecture. This enables tasks like following map drawings with 89% accuracy in tests.
Hybrid Training Framework: Combines human demonstrations (200K+ labeled gameplay clips) with synthetic data from self-play in Genie 3-generated worlds. This hybrid approach achieves 73% generalization efficiency across 15+ game engines.
Latency-Optimized Execution: Operates with 150ms mean response time through compressed Gemini-2.5 Flash integration, enabling real-time collaboration without gameplay disruption. Maintains 60 FPS compatibility across DirectX 11/12 and Vulkan-based environments.

How does SIMA 2 differ from its predecessor? SIMA 2 introduces Gemini-powered reasoning chains and self-improvement capabilities, enabling it to handle 4x more complex commands (up to 12-step tasks) compared to SIMA 1. It also expands environment support from 8 to 22 game engines.
Can SIMA 2 learn without human demonstrations? Yes, through its self-improvement cycle: Gemini generates initial task prompts, evaluates SIMA’s attempts, and iteratively updates the agent’s policy. In Genie 3 environments, it achieved 57% autonomous skill acquisition without human data.
What games/platforms does SIMA 2 support? Current integration includes commercial titles (No Man’s Sky, Valheim), research environments (MineDojo), and Genie 3-generated worlds. Full compatibility requires Vulkan/DirectX 11+ graphics APIs and x86/ARM64 CPU architectures.
How is safety ensured with self-improving AI? SIMA 2 operates in a controlled research preview with activity logging, behavior auditing via Gemini, and task constraint layers. All self-generated training data undergoes automated toxicity filtering before model updates.
What are the real-world applications beyond gaming? The architecture forms the foundation for future robotics systems, with demonstrated skill transfer to navigation (87% success in maze environments) and tool manipulation tasks (73% efficiency in simulated labs).

Google's most capable AI agent for virtual 3D worlds