Data Tools
Explore the best new Data tools and products curated by the community.
Marmot is an open-source data catalog designed for teams who want powerful data discovery without enterprise complexity. Catalog every data asset, enrich it with the context that matters and make it accessible to your team and your AI tools.
Panorama analyzes your workplace data to recommend AI workflows your team can run together. Instead of building automations from scratch, discover what to automate and execute collaboratively in one place.
DataSieve helps you turn unstructured text into clean, usable data in seconds. Drop in text, files, folders, or even archives, and extract what you need in one pass. Emails, phone numbers, URLs, dates, financial data, and more. Everything runs locally on your device, with no cloud and no tracking. What you can do - Extract multiple data types at once - Process text, PDFs, EPUBs, CSV, JSON, Word files, and more - Export results to JSON, XLSX, DOCX, and more - Define your own custom extractors
Context.dev (previously Brand.dev) gives your AI agents and apps real-time access to structured web data, no brittle scraping infrastructure needed. Scrape any URL as clean markdown or HTML, extract brand data (logos, colors, fonts, socials) from any domain, crawl sitemaps, resolve transaction descriptors, and more. Typed SDKs for TypeScript, Python, and Ruby. Trusted by 5,000+ businesses including Mintlify, Daily.dev, Ferndesk.com, and more. Most teams integrate in under 10 minutes.
Most AI agents & complex automations fail because they’re operating in the dark. Boost.space provides the persistent context layer that turns siloed LLMs into an integrated business intelligence system. Give your automations & agents a "Shared Brain." so all workflows has the full context of your business—from past interactions to live database states—allowing workflows to compound instead of breaking.
PredictLeads Technographics Dataset provides structured data on what technologies companies use, sourced from company websites, job descriptions, DNS records, cookies, and more. Each detection includes first/last seen timestamps and the signals used, so you can track adoption curves, technology migrations, and competitive shifts over time. Available via API, flat files, and webhooks, with an MCP server for AI agents.
Fundable is a startup, investor, and people dataset (like a Crunchbase) with a few improvements: Surfaces new deals before other platforms, provides sources for every datapoint, allows for natural language deal alerts ("coding agent startups in SF looking to hire"), much better UI, cheaper. First month is free. Hit us up if you want access to our API, Datafeed, or MCP!
Stacksync powers real-time and bidirectional data synchronization between CRMs (e.g. Salesforce, Hubspot or SAP) and databases (e.g. Postgres, Google BigQuery,...). Edits made in your CRM will instantly update in your Database, and vice-versa. To set up a sync, users simply have to connect the two chosen apps in one click and select the tables they want to sync, no-code! Stacksync reduces implementation delays from months to minutes for CRM integration projects
Firecrawl /agent is a magic API that searches, navigates, and gathers data from even the most complex websites. Describe what data you want and agent handles the rest. Find information in hard-to-reach places, return single datapoints or entire datasets at scale.
Turn your blood work into actionable insights.
Control real web browsers with a simple API
Querri transforms how teams work with data, making it easy to connect, clean, analyze, and visualize - all in one place. With new integrations and interactive, drag-and-drop dashboards, anyone can now build automated workflows and shareable insights - no technical expertise required
Floqer lets RevOps and growth teams automate GTM data in seconds. Build multi-step workflows that detect relevant signals, enrich from 80+ sources, and trigger personalized outreach – all without a single line of code.
Updog by Datadog lets you spot issues early, backed by real impact across Datadog customer base. Do not have to wait for any status page updates.
DeepSeek-OCR is a model that compresses long text by treating it as an image. This optical compression uses far fewer vision tokens to represent documents, unlocking new levels of efficiency for long-context tasks while delivering powerful OCR capabilities.
Effortlessly export, download, and sync all your Genspark conversations to Markdown files in bulk with just one click! Perfect for Obsidian, Notion, Logseq, and other knowledge management tools.
Tinker is a flexible API for efficiently fine-tuning open source models with LoRA. It's designed for researchers and developers who want flexibility and full control of their data and algorithms without worrying about infrastructure management.
Your data, your choice. Process locally for complete privacy or leverage cloud when you need to collaborate.
Answer questions, create visualizations, and surface insights on all your company's data. Unlock the power of a data analyst with the ease of ChatGPT.
Atla is the only eval tool that helps you automatically discover the underlying issues in your AI agents. Understand step-level errors, prioritize recurring failure patterns, and fix issues fast–before your users ever notice.
A free, all-in-one PDF toolkit. Extract tables and images, edit and reorganize pages (delete, rotate, reorder) and merge PDF files. Secure, private and works entirely in your browser. No uploads required.