Product Introduction
- Definition: "What YC Is Really Betting On?" is a data-driven analytical report and interactive dashboard created by Krishna Goyal. It falls under the technical category of startup investment intelligence platforms, leveraging web scraping, natural language processing (NLP), and statistical analysis.
- Core Value Proposition: This product exists to decode Y Combinator’s investment strategy by analyzing 793 startups and 1,625 founder bios across 5 batches (Winter 2025–Winter 2026). It provides empirical, data-backed insights into YC’s AI startup preferences, founder archetypes, competitive overlap, and sector trends, replacing speculation with quantifiable patterns.
Main Features
- AI Depth Classification Engine:
- How it works: Uses NLP keyword extraction, founder credential analysis (e.g., PhD prevalence), and description clustering to categorize AI companies into: Wrappers (15.2%), Applied AI (51.6%), AI Infrastructure (11.4%), and Deep Tech (21.8%).
- Technology: TF-IDF vectorization, signal scoring based on terms like "proprietary model," "LLM wrapper," or "robotics," cross-referenced with founder backgrounds.
- Competitive Overlap Analyzer:
- How it works: Calculates cosine similarity between startup descriptions and tags to identify "near-competitors." Quantifies crowding scores per vertical (e.g., Robotics: 5.5 avg competitors) and flags same-batch competitor pairs funded by YC (11 instances).
- Technology: Scikit-learn for similarity metrics, Algolia API data ingestion, automated clustering.
- Partner Fingerprint Profiling:
- How it works: Tracks investment patterns of 12 YC partners (e.g., Diana Hu: +10.8pp DevTools focus; Garry Tan: -18.7pp B2B aversion). Computes over-indexing/under-indexing vs. batch averages for verticals, founder traits (ex-FAANG, PhD), and team dynamics.
- Technology: Delta calculation against base rates, portfolio tagging, statistical significance testing (p<0.05).
- Founder DNA Correlation Matrix:
- How it works: Identifies statistically significant links between founder backgrounds (e.g., ex-finance, Ivy League, PhDs) and startup verticals (e.g., PhDs 1.9x more likely in Healthcare).
- Technology: Cross-tabulation analysis, ratio calculation (observed vs. expected), background extraction via bio scraping.
- NLP Theme Clustering:
- How it works: Applies K-Means clustering on TF-IDF vectors of company descriptions to surface 15 thematic groups (e.g., "Agents Cluster," "Healthcare AI Cluster").
- Technology: Scikit-learn K-Means, dimensionality reduction (PCA) for 2D visualization.
Problems Solved
- Pain Point: Ambiguity in understanding Y Combinator’s actual investment theses beyond public statements.
- Target Audience:
- Early-Stage Founders: Optimizing YC applications by aligning with partner preferences (e.g., AI infra founders targeting Diana Hu).
- VC Analysts: Validating market trends (e.g., decline of AI wrappers, surge in deep tech).
- Tech Researchers: Studying AI startup evolution and founder-vertical fit.
- Incubator Managers: Benchmarking against YC’s batch composition.
- Use Cases:
- A founder pivoting from an AI wrapper to applied AI after seeing YC’s 29.3% deep-tech focus in W26.
- A VC avoiding oversaturated dev tools investments after noting its 4.7 crowding score.
- A PhD researcher targeting healthcare AI after identifying its 1.9x founder correlation.
Unique Advantages
- Differentiation: Unlike generic startup databases (Crunchbase, PitchBook), this product offers granular, batch-specific pattern detection (e.g., "SF startups hire 19% less") and partner-level behavioral insights absent in aggregated platforms.
- Key Innovation: Proprietary "Wrapper vs. Deep Tech" signal scoring combining technical keywords, founder credentials, and monetization models, plus competitive overlap heatmaps revealing YC’s deliberate co-funding of rivals.
Frequently Asked Questions (FAQ)
- How accurate is the "What YC Is Really Betting On?" dataset?
Data is sourced from YC’s public Algolia API and founder bios, covering 793 companies and 1,625 founders across 5 batches. AI/vertical classifications use automated NLP with manual spot-checks for validity. - Which YC partner invests most in AI infrastructure startups?
Diana Hu over-indexes in AI Infrastructure (+9.5pp) and DevTools (+10.8pp), with 41.9% of her founders being ex-FAANG. - Are most YC AI companies just thin wrappers?
No. Only 15.2% are classified as wrappers, while 51.6% are Applied AI and 21.8% are Deep Tech. Wrappers declined from 17.1% to 12.9% across batches. - What industries are least crowded for YC startups?
Education has the lowest crowding score (1.6 near-competitors). Non-AI opportunities persist in regulated sectors like Aerospace/Defense. - Do solo founders get funded by YC?
Yes (114 solo founders), but duo teams have higher hiring rates (32% vs. solo’s 27.2%). Solo founders have lower FAANG (15.8%) and PhD (2.6%) representation.