Edit Mind

Definition: Edit Mind is a self-hosted, AI-powered video indexing and semantic search platform designed for local deployment. Technically categorized as a privacy-first video intelligence system, it processes personal video libraries using computer vision (OpenCV), machine learning (PyTorch, Whisper), and vector databases (ChromaDB) entirely on-device.
Core Value Proposition: It solves video discoverability challenges by enabling natural language search across unindexed video archives while guaranteeing zero data leakage – critical for sensitive media like security footage, family archives, or proprietary content.

Multi-Modal AI Indexing Engine:
- How it works: Automatically extracts 7 metadata layers: speech-to-text (Whisper), object detection (YOLOv8), face recognition (DeepFace), emotion analysis, scene segmentation, text-in-video (OCR), and EXIF/telemetry parsing.
- Technologies: Utilizes Python-based ML pipelines with PyTorch models, orchestrated via BullMQ job queues. Embeds results in ChromaDB using Xenova Transformers for text/visual/audio vectors.
Local Vector Search Hub:
- How it works: Converts natural language queries ("Mom laughing with dog in backyard") into hybrid semantic searches across transcribed dialogue, visual objects, and audio cues.
- Technologies: Implements cosine similarity search in ChromaDB with query routing to specialized collections (text/visual/audio). Supports Ollama, Gemini, or local GGUF models for query interpretation.
Privacy-Enforced Architecture:
- How it works: All processing occurs in isolated Docker containers (Node.js, Python, PostgreSQL). Video files never leave the host machine, with encryption via AES-256 (OpenSSL) for metadata at rest.
- Technologies: Docker Compose deployment with bind mounts for media, automatic SSL provisioning, and session secret rotation.

Pain Point: Eliminates manual scrubbing through hours of footage to find specific moments – reducing search time from hours to seconds for content creators, researchers, and security teams.
Target Audience:
- Journalists: Verify quotes across interview archives
- Homelab Enthusiasts: Index family videos with facial recognition
- Indie Filmmakers: Locate b-roll by scene composition
- Security Operators: Search surveillance feeds for object/activity patterns
Use Cases:
- Forensic review of doorbell camera footage via queries like "stranger at door Tuesday 3PM"
- Academic research analysis in recorded lectures using semantic topic search
- Content repurposing by finding all "wide-angle sunset shots" in raw footage

Differentiation: Outperforms cloud alternatives (Google Video AI, AWS Rekognition) with 100% offline operation, avoiding API costs ($15+/hour) and compliance risks. Unlike Plex/Jellyfin, adds semantic search beyond basic metadata.
Key Innovation: Patent-pending multi-embedding fusion – synchronizes text/visual/audio vectors into unified scene objects, enabling cross-modal queries like "Find scenes where people cheer when soccer ball appears."

Does Edit Mind work with 4K video files?
Yes, it processes 4K/HDR videos via GPU-accelerated ffmpeg in Docker containers, with adjustable quality presets to optimize hardware usage.
Can I use my own AI models instead of Gemini/Ollama?
Absolutely. The architecture supports custom GGUF/ONNX models through the local model pathway, documented in the configuration templates.
How much storage is needed for indexing?
Metadata typically adds 5-15% overhead versus original video size (e.g., 100GB library ≈ 5-15GB for vectors/thumbnails/DB).
Is facial recognition GDPR-compliant?
When self-hosted, compliance is achieved by keeping biometric data on-premises. The system includes privacy toggles to disable face processing.
What hardware specs are required?
Minimum: Quad-core CPU, 16GB RAM, NVIDIA GTX 1650 (4GB VRAM). Recommended: RTX 3060+ for real-time 1080p processing. ARM64 (Raspberry Pi 5) supported for lightweight indexing.

AI-Powered Local Video Search & Analysis