Product Introduction
Definition: Clicky is an open-source, multimodal AI assistant and interactive teacher application built natively for macOS. Technically categorized as a screen-aware AI companion, it integrates vision and voice capabilities to provide real-time, context-aware guidance directly on the user's desktop. It is developed using Swift and utilizes a proxy-based architecture via Cloudflare Workers to interface with advanced Large Language Models (LLMs) and audio processing APIs.
Core Value Proposition: Clicky exists to bridge the gap between static AI chatbots and the user's active workflow. By functioning as a "buddy" that follows the cursor, Clicky provides a low-friction interface for real-time learning, debugging, and software navigation. Its primary value lies in its screen-capture intelligence and voice-first interaction, enabling a "hands-free" AI experience that understands the user's visual context without requiring manual copy-pasting of screenshots or code snippets.
Main Features
1. Contextual Screen Intelligence (ScreenCaptureKit): Clicky leverages Apple’s ScreenCaptureKit framework (requiring macOS 14.2+) to capture real-time visual data from the user’s display. When a user activates the push-to-talk feature, the app captures a high-resolution screenshot of the current workspace. This visual context is sent to the Claude 3 model family, allowing the AI to "see" exactly what the user is working on, whether it is a code editor, a design tool, or a complex terminal sequence.
2. Real-Time Voice Synthesis and Transcription: The application features a robust audio pipeline for seamless communication. It utilizes AssemblyAI for real-time streaming transcription (STT), converting user speech into text with minimal latency. Once the AI processes the request, the response is synthesized into high-quality, human-like speech using ElevenLabs TTS. This creates a conversational feedback loop that mimics a real-life teacher or pair programmer.
3. Interactive Cursor Overlay and Spatial Tagging: One of Clicky's most distinctive technical features is the OverlayWindow system. Using a transparent NSPanel that spans across multiple monitors, Clicky can physically "point" at UI elements. The AI model can embed specific spatial tags—format: [POINT:x,y:label:screenN]—within its text response. The application parses these tags to move a blue cursor overlay to the exact coordinates specified, effectively highlighting buttons, lines of code, or menu items for the user.
4. Secure API Proxying via Cloudflare Workers: To ensure security and scalability, Clicky employs a worker-based architecture. Instead of hardcoding sensitive API keys for Anthropic, AssemblyAI, and ElevenLabs into the client-side binary, the app routes requests through a Cloudflare Worker. This proxy manages secret keys, handles Server-Sent Events (SSE) for streaming responses, and acts as a middleware for token generation, ensuring that the user’s environment remains secure and the application is easy to deploy locally.
Problems Solved
1. Context Switching Fatigue: Traditional AI tools require users to move away from their active window to interact with a chatbot. Clicky eliminates this friction by living at the cursor level. It solves the "context-switching" problem where developers or creators lose focus while trying to explain their screen state to an AI.
2. Learning Curve for Complex Software: Users often struggle with feature-rich applications (like Xcode, Blender, or Final Cut Pro). Clicky acts as an on-demand tutor that can see where a user is stuck and point directly to the next step in the workflow, significantly reducing the time required to master new professional tools.
3. Target Audience:
- Software Developers: For pair programming, debugging UI layouts, and explaining complex stack traces.
- Students and Educators: As an interactive tutor that provides guided walkthroughs for digital assignments.
- Accessibility Users: Individuals who benefit from voice-controlled navigation and visual cues on the screen.
- Open-Source Contributors: Developers looking for a template to build multimodal macOS applications.
4. Use Cases:
- Code Review: Asking Clicky to explain a specific block of code currently visible in the IDE.
- UI/UX Audits: Pointing out alignment issues or design inconsistencies on a live web page.
- Tutorial Following: Having the AI walk through a complex installation process or software setup by watching the user's progress.
Unique Advantages
1. Native Performance and Low Latency: Unlike web-based wrappers, Clicky is written in 95.2% Swift, ensuring it runs efficiently on both Apple Silicon and Intel-based Macs. By using native frameworks like ScreenCaptureKit and specialized streaming protocols (SSE and Websockets), it achieves lower latency in voice-to-visual response times compared to general-purpose AI assistants.
2. Spatial Awareness and Multi-Monitor Support: While most AI assistants are restricted to a text box, Clicky’s ability to interact with the macOS coordinate system across multiple screens is a major innovation. The cursor "buddy" is aware of monitor boundaries and can direct a user’s attention to specific pixels, making it a true spatial AI.
3. Open-Source Hackability:
The project is distributed under the MIT License, allowing users to inspect the source code, modify the system prompts (defined in the worker), or integrate different LLM providers. The inclusion of a CLAUDE.md file makes it optimized for "Claude Code" and other AI-driven development tools, allowing the AI to literally build and improve upon itself.
Frequently Asked Questions (FAQ)
Is Clicky free to use? The Clicky software itself is free and open-source. However, since it utilizes third-party APIs (Anthropic, AssemblyAI, and ElevenLabs), users must provide their own API keys. Usage costs are determined by these individual providers based on the volume of data processed.
What are the system requirements for Clicky on macOS? Clicky requires macOS 14.2 or later to function, as it relies on the modern ScreenCaptureKit framework for screen recording. It is compatible with both Intel and Apple Silicon (M1/M2/M3) architectures and requires Xcode 15+ for those intending to build it from the source.
How does Clicky protect my privacy and API keys? Clicky does not ship with hardcoded API keys. Instead, it uses a Cloudflare Worker as a secure proxy. Your keys are stored as secrets within your own Cloudflare account, ensuring that your sensitive credentials never leave your controlled environment or appear in the compiled application binary.
Can Clicky control my mouse and keyboard? No, Clicky is designed to be an assistant that "points" and "sees." While it requests Accessibility permissions to enable global keyboard shortcuts (like Control + Option for push-to-talk), it does not take control of your hardware. It uses a visual overlay to guide you rather than performing actions on your behalf.
