nCompass AI Assistant logo

nCompass AI Assistant

Enabling everyone to write GPU kernels

2026-03-30

Product Introduction

  1. Definition: The nCompass AI Assistant is a specialized Performance Optimization IDE and AI-driven diagnostic agent designed to bridge the gap between source code development and runtime execution analysis. It functions as a sophisticated layer that integrates directly with IDEs like VSCode and Cursor, providing an environment where developers can profile, trace, and analyze software performance without switching contexts.

  2. Core Value Proposition: nCompass exists to solve the "intelligence gap" in modern AI coding tools like Claude Code and Cursor. While standard LLMs excel at generating functional code, they lack the runtime visibility required to diagnose performance regressions or optimize hardware utilization. By reasoning with actual profiling data—including CPU/GPU traces and memory metrics—nCompass enables engineers to automate root-cause analysis of bottlenecks, such as poorly placed synchronization primitives or underutilized GPU kernels. This results in significantly faster optimization cycles, as demonstrated by its ability to generate matrix multiplication kernels that outperform NVIDIA’s hand-tuned CUTLASS libraries.

Main Features

  1. Integrated Profiling and Runtime Tooling: nCompass unifies a fragmented ecosystem of low-level profiling tools into a single interface. It supports native execution of system profilers such as 'perf' for CPU analysis, 'nsys' (NVIDIA Nsight Systems) and 'torch profiler' for AI systems, and 'ncu' (Nsight Compute) or 'rocprof' for deep-dive GPU kernel analysis. This integration allows the assistant to capture raw performance data directly from the execution environment.

  2. Advanced Trace Analysis Suite: The platform includes a suite of visualization and comparison tools, including a high-performance Trace Viewer and Flamegraph generator. A standout capability is 'TraceDiff', which allows developers to compare two execution traces side-by-side to identify performance regressions or evaluate the impact of a specific code change. The 'Kernel Analyzer' provides granular stats across the entire execution timeline, identifying specific GPU kernel groups that fail to meet occupancy or throughput targets.

  3. Autonomous nCompass Agent: The core of the product is an expert AI agent that consumes runtime performance data to provide actionable insights. Unlike general-purpose agents, the nCompass agent can read execution timelines and identify complex issues like CPU/GPU synchronization overhead, thread contention in multi-threaded systems, and memory leaks. It provides specific optimization strategies and code modifications that are grounded in actual execution metrics rather than theoretical best practices.

  4. Seamless Coding Agent Augmentation: nCompass is designed to "supercharge" existing coding agents. By feeding the nCompass agent's performance insights back into tools like Claude Code or Cursor, developers create a closed-loop system where the AI not only writes the fix but does so based on an automated understanding of the performance bottleneck, eliminating the manual "profile-analyze-prompt" cycle.

Problems Solved

  1. The "Root-Cause" Latency Gap: In performance engineering, identifying the cause of a bottleneck often takes four times longer than implementing the fix. nCompass addresses this by automating the data-heavy phase of trace analysis, reducing the time spent on "finding" issues from days to minutes.

  2. Tool Fragmentation and UX Friction: Traditional profiling tools like Nsight Compute or perf often exist as standalone applications with steep learning curves and poor UI. nCompass eliminates the friction of context-switching by embedding these capabilities directly into the development environment, ensuring that performance data is always linked back to the specific lines of code that generated it.

  3. Target Audience: The product is specifically engineered for Performance Optimization Engineers, Systems Programmers, Machine Learning Engineers (MLEs) focused on model latency, and Software Architects working on high-concurrency or heterogeneous hardware systems (e.g., CUDA, ROCm).

  4. Use Cases:

  • Optimizing Large Language Model (LLM) inference kernels for better GPU throughput.
  • Debugging memory leaks and synchronization bottlenecks in multi-threaded C++/Rust applications.
  • Identifying and resolving CPU-side bottlenecks that lead to GPU starvation in AI training pipelines.
  • Benchmarking and comparing hardware-specific kernel performance against industry standards like NVIDIA CUTLASS.

Unique Advantages

  1. Runtime-Aware AI Reasoning: Unlike standard AI coding assistants that rely solely on static code analysis, nCompass possesses "dynamic awareness." It understands how code behaves on specific hardware by analyzing real-time execution traces, making it significantly more accurate at identifying performance-critical bugs.

  2. Performance Outperformance: In documented test cases, nCompass-augmented sessions produced matrix multiplication kernels that were 3% faster than NVIDIA's highly optimized CUTLASS kernels. This proves that the combination of human oversight and AI-driven profiling can surpass even the most advanced manual optimizations.

  3. Hybrid Manual-Automated Control: nCompass offers a non-black-box approach. Developers can choose between a fully automated flow—where the AI profiles, analyzes, and suggests—or a manual flow where the engineer uses nCompass’s advanced TraceDiff and visualization tools to perform their own analysis. This flexibility ensures that the expert remains in control while the AI handles the data-intensive heavy lifting.

Frequently Asked Questions (FAQ)

  1. How does nCompass AI Assistant integrate with Claude Code or Cursor? nCompass acts as a performance-aware companion. It captures and analyzes profiling data (like nsys or torch traces) and converts that raw data into technical insights that can be fed into Claude Code or Cursor. This allows those agents to write code based on actual hardware performance data rather than just syntax and logic.

  2. What hardware and profilers does nCompass support? nCompass supports a wide array of industry-standard profiling tools including perf for Linux systems, nsys and ncu for NVIDIA GPUs, torch profiler for PyTorch-based AI applications, and rocprof for AMD hardware. It is designed to handle both multi-threaded CPU systems and heterogeneous GPU systems.

  3. Can nCompass help with GPU kernel optimization for AI models? Yes, nCompass is specifically designed for this. It includes a Kernel Analyzer and Bottleneck Finder that can identify underutilized GPU resources, memory bandwidth limitations, and inefficient kernel launches, providing specific strategies to optimize CUDA or Triton kernels for maximum performance.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news