Sun logo

Sun

Collaborative voice API for agents

2026-06-04

Product Introduction

  1. Definition: Sun is a Collaborative Voice Model (CVM), a real-time, multi-speaker AI model built specifically for voice-first interaction in group settings. It is not a text-to-speech (TTS) layer, voice wrapper, or speech SDK; it is a purpose-built AI model designed to understand and manage complex, overlapping human conversation in real-time.
  2. Core Value Proposition: Sun exists to solve the fundamental limitations of current voice AI (like ChatGPT Realtime and Gemini Live) which are designed for one-on-one chat. Its primary value is enabling seamless multi-speaker voice collaboration for meetings, group calls, and classrooms with a 10x larger context window, providing instantaneous responses, intelligent interruption handling, and proactive follow-up without wake words.

Main Features

  1. Real-Time Multi-Speaker Awareness & Barge-in Prevention: Sun's core architecture processes audio streams from multiple concurrent speakers simultaneously. It uses advanced speech diarization and speaker identification to track "who is speaking when." How it works: The model continuously analyzes the audio feed, detects active speech, and intelligently holds its response when multiple humans are speaking or in the middle of a sentence (barge-in prevention). When a user naturally pauses, Sun can respond instantly. If interrupted, it stops speaking immediately to address the new input, mimicking natural human conversational turn-taking.
  2. Large Context Window for Sustained Conversations: The model boasts a 350K token context window, enabling it to maintain coherent, multi-turn conversation over several hours. How it works: Unlike competitors limited to "few minutes" of sustained context, Sun stores and references a vast history of the dialogue within a single session. This allows for deep, ongoing discussions, recall of earlier points without repetition, and complex multi-step reasoning throughout a long meeting or call.
  3. Intelligent Intent Recognition & Wake Word Elimination: Sun differentiates between casual mentions of its name and a direct command. It operates proactively after being activated once. How it works: Advanced natural language understanding (NLU) allows the model to parse conversational nuance. Once in a session, users can ask follow-up questions, correct information, or change topics naturally without repeating a wake word like "Hey Sun." The model stays in a state of "follow-up readiness."
  4. Native Tool and Agent Orchestration via WebSocket API: Sun acts as a collaborative bridge between humans and backend systems. How it works: Through its WebSocket API, it can dynamically call tools, query databases, run live web searches, and interpret structured outputs (JSON) from other AI agents (e.g., a "metrics-dashboard-agent"). It then translates these machine-generated results back into natural, conversational speech for the human participants, enabling actions like live data lookups without leaving the conversation.
  5. Speech Injection & Dynamic Context Management: System events or notifications can be injected into the live session for Sun to announce. Furthermore, the conversation context itself can be updated dynamically. How it works: APIs allow external systems to push messages (e.g., "Start of a new agenda item") for the AI to vocalize. The "Dynamic Context Memory Edit" feature enables the modification of the conversational context in real-time, allowing the AI's behavior and knowledge base to adapt to changing meeting scenarios on the fly.

Problems Solved

  1. Pain Point: The "Voice AI Awkward Pause." Traditional voice models have significant latency (3-5 second delays), killing conversational flow.
  2. Target Audience: Remote Team Leaders, Project Managers, Agile Coaches, Classroom Instructors, Customer Support Supervisors, and any Multi-Agent System Developers needing human-in-the-loop orchestration.
  3. Use Cases:
    • Real-Time Meeting Analysis: In a strategy meeting, multiple team members discuss KPIs. Sun listens, tracks individual speakers, and when asked, provides a consolidated summary or live data (e.g., "Current MRR is $48.2K") without disrupting the flow.
    • Collaborative Classroom Tutoring: A teacher and several students discuss a problem. Sun identifies the student asking the question, provides a tailored explanation, and can handle follow-up questions from other students without requiring each to repeat a wake word.
    • Multi-Agent Debugging Session: A team monitors multiple AI agents. Sun interprets status updates from a "deployment-agent," announces them in natural language, and allows the human team to ask follow-up questions which Sun routes back to the appropriate agent via API.

Unique Advantages

  1. Differentiation: Sun is fundamentally differentiated from ChatGPT Realtime API and Google Gemini Live API by its design philosophy. While competitors offer a conversational layer for a single user, Sun is architected from the ground up for multi-speaker collaboration. This is reflected in its superior performance in key metrics: it supports "Few Hours" of context versus "Few Minutes" from rivals, offers "Collaborative" and "Proactive Follow Up" interaction quality versus "None" or "Limited," and is positioned as 50% more cost-effective for audio token processing.
  2. Key Innovation: The creation of the Collaborative Voice Model (CVM) category itself is the primary innovation. By combining real-time multi-speaker diarization, massive long-context memory, proactive intent handling, and native API/tool orchestration into a single, low-latency model accessible via WebSocket, Sun moves beyond simple "chatbots with voices" to become a true, real-time conversation operating system for AI-native teams and systems.

Frequently Asked Questions (FAQ)

  1. How is Sun different from ChatGPT's Realtime Voice Mode or Google Gemini Live? Sun is not a general conversational AI like those products, which are optimized for single-user interaction. Sun is a Collaborative Voice Model (CVM) specifically built for multi-speaker environments like meetings and group calls. It features a 10x larger context window (350K tokens), native multi-speaker awareness, intelligent turn-taking without wake words, and is designed for agent-to-human collaboration.
  2. What is the context window of the Sun voice model and why does it matter? Sun has a 350,000-token context window, enabling sustained conversation for several hours. This is crucial for lengthy meetings, workshops, or ongoing collaborative sessions, as the AI can remember and reference the entire history of the discussion without losing coherence or requiring constant repetition from users.
  3. Can Sun handle interruptions and multiple people talking at once? Yes. This is a core feature. Sun uses real-time barge-in detection and speaker diarization. It will pause if you interrupt it and can distinguish between different speakers in a group conversation. It waits for natural pauses in multi-party discussions before responding, mimicking effective human facilitation.
  4. How does Sun integrate with our existing tools and agents? Sun provides a WebSocket API that allows deep integration. It can perform live web searches, access real-time data, and act as a voice interface for other AI agents. You can send structured data (like JSON from a data agent) to Sun, and it will interpret and relay that information as natural speech to human participants in the call.
  5. What are the pricing plans for the Sun Zero API? Sun offers tiered plans: a Free plan with 15 minutes/month for testing; a Starter plan ($19/mo) for prototyping; a recommended Pro plan ($29/mo) for production apps with priority support; a Premium plan ($99/mo) for high-volume use; and custom Enterprise plans for large-scale deployments with dedicated infrastructure and SLAs. All paid plans include unlimited API keys.

Submit to 240+ Directories with 1-Click

Maximize your product's SEO and drive massive traffic by automatically submitting it to over 240 curated startup directories using DirSubmit.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news