Pegasus 1.5 by TwelveLabs

Definition: Pegasus 1.5 by TwelveLabs is a state-of-the-art Video Language Model (VLM) and video intelligence infrastructure designed to transform raw, unstructured video footage into structured, queryable, and time-based metadata. Unlike standard Large Language Models (LLMs) that rely on text transcripts, Pegasus 1.5 is a video-native model that reasons across the full temporal arc of visual data, interpreting actions, emotions, and causal relationships within a video stream.
Core Value Proposition: Pegasus 1.5 exists to eliminate the "dark data" problem inherent in massive video archives. It turns passive video files into computable assets by allowing users to define custom schemas for metadata extraction. By providing a single API call to index, search, and analyze video content up to two hours in length, it enables organizations to achieve "human-level" understanding of their visual data at scale. Key keywords include video search API, multimodal AI, spatiotemporal embeddings, and automated video tagging.

Continuous Temporal Reasoning: Unlike traditional models that sample isolated frames and guess at context, Pegasus 1.5 analyzes the entire temporal arc of a video (up to 2 hours). This allows the model to track specific entities, understand narrative flow, and identify complex causation over long durations. It functions as a video-native reasoning engine rather than a simple transcript reader, making it capable of identifying events that have no accompanying audio or text.
Multimodal Indexing and Prompting (Marengo Integration): Pegasus 1.5 leverages the Marengo multimodal embedding model to convert video into spatiotemporal embeddings. This technology enables users to perform "Image-to-Video" queries, where an uploaded image can be used to locate every instance of a specific object, person, or setting within a massive video library. It supports 47 languages and achieves a 78.5% composite accuracy in identifying complex visual concepts without manual tagging.
High-Speed Infrastructure and API Orchestration: The platform is built for massive scale, capable of ingesting multimodal data at approximately 60x real-time speed. This allows for the indexing of one hour of video in just sixty seconds, with a throughput capacity exceeding 10,000 hours per day. Developers can integrate these capabilities via a robust SDK and Developer Hub, utilizing a single pipeline for ingestion, indexing, and structured data retrieval.

Pain Point: Unsearchable Video Archives and Manual Logging: Traditional video management requires human researchers to manually tag clips, a process that can take days for even a single shoot. Pegasus 1.5 solves this by providing instant, natural language search across entire libraries, reducing research time from days to seconds. It addresses the lack of "discoverability" in media archives, advertising repositories, and security footage.
Target Audience:

Media & Entertainment Professionals: Producers and editors needing to find specific "hero shots" or emotional beats across years of footage.
AdTech and Marketing Strategists: Teams requiring contextual targeting to place ads in brand-safe scenes without relying on inaccurate metadata.
Public Sector and Security Analysts: Government agencies managing evidence, anomaly detection, and incident reporting.
Machine Learning Engineers and Developers: Technical teams building video-centric applications who require SOC 2 Type II certified, scalable video AI infrastructure.

Sports Content Production: Real-time generation of highlights by identifying specific game plays, player emotions, or brand logos.
Evidence Management: Rapidly scanning hours of bodycam or surveillance footage to locate specific incidents or individuals.
Contextual Advertising: Automatically identifying brand-safe environments for ad placement based on the actual visual content of a scene.
Content Compliance: Scanning long-form media for regulatory violations or specific objects/actions that require blurring or removal.

Differentiation from General-Purpose LLMs: While models like Gemini 1.5 Pro offer multimodal capabilities, Pegasus 1.5 is specifically architected for video-native perception. Internal benchmarks indicate a +13.1% performance advantage over Gemini 1.5 Pro in multimodal prompting tasks. It does not just "see" frames; it understands the spatiotemporal relationship between them, making it significantly more accurate for complex action recognition.
Key Innovation: Programmable Video Schema: The most significant innovation is the ability for users to define a custom domain-specific schema. Instead of receiving a generic description, a user can tell the API exactly what matters to their business (e.g., "identify every time a specific jersey appears and provide a 5-second buffer"). This turns video into a "computable asset" that fits directly into existing enterprise databases and workflows.

How long of a video can Pegasus 1.5 analyze in a single request? Pegasus 1.5 is designed to handle long-form content, supporting single video files up to 2 hours in length. Users can extract structured, timestamped metadata for the entire duration through a single API call, maintaining temporal consistency across the whole asset.
Does Pegasus 1.5 require existing tags or transcripts to search video? No. Pegasus 1.5 uses spatiotemporal embeddings and video-native reasoning to understand what is happening visually. It does not require manual tags, metadata, or even an audio transcript to locate specific actions, emotions, or objects within your video library.
Is the TwelveLabs platform secure for sensitive government or enterprise data? Yes. TwelveLabs is SOC 2 Type II certified and employs encrypted data handling protocols. The platform is designed for high-security environments, offering flexible deployment options to ensure that the intelligence stack resides where the organization requires it, maintaining strict data sovereignty.

AI model for transforming video into Time-Based Metadata