Prompt & Writing Assistance APIs

The Prompt & Writing Assistance APIs are experimental web APIs integrated into Microsoft Edge, enabling developers to access a built-in local small language model (Phi-4-mini) for AI-driven text generation and modification. These APIs simplify the integration of AI capabilities into web applications without requiring cloud-based services or specialized machine learning expertise. They operate locally within the browser, ensuring data privacy and reducing operational costs.
The core value lies in providing a privacy-focused, cost-effective alternative to cloud-based AI models by leveraging on-device processing. Developers can implement AI features like text summarization, rewriting, and structured output generation with minimal code while avoiding per-token costs and network dependencies.

The Prompt API allows direct interaction with the Phi-4-mini model using simple JavaScript calls, enabling tasks like sentiment analysis, classification, and text generation. For example, developers can prompt the model to score user feedback on a 0-5 scale or generate JSON-structured outputs for programmatic use.
Writing Assistance APIs include specialized interfaces for summarization, rewriting, and content creation: the Summarizer API condenses text with context-aware constraints (e.g., "tl;dr" style), the Rewriter API refines language while preserving intent, and the Writer API generates text in specified tones (e.g., formal inquiries).
Structured Output Constraints let developers define JSON schemas to enforce consistent model responses, reducing variability across executions. This ensures predictable outputs for programmatic workflows, such as extracting numerical ratings from free-text feedback.

The APIs eliminate the need for developers to host or manage AI models, reducing complexity and costs associated with cloud-based solutions. Traditional methods require expertise in WebNN/WebGPU and incur expenses for model downloads and per-token API usage.
Target users include web developers seeking to add AI features to applications without compromising user privacy or incurring high operational costs. Extensions and websites requiring real-time text processing (e.g., feedback analysis, content moderation) benefit directly.
Typical use cases include sentiment scoring of customer reviews, summarizing news articles, rewriting user-generated content to remove profanity, and generating formal documents like bank inquiries. For instance, an e-commerce site could automate review moderation using the Rewriter API.

Unlike cloud-based APIs, the built-in Phi-4-mini model processes data locally, ensuring no user data leaves the device. This contrasts with services like OpenAI or Gemini, which require network calls and raise privacy concerns.
The structured output feature is innovative, allowing developers to enforce JSON schema constraints on model responses, ensuring compatibility with backend systems. This reduces post-processing effort and improves reliability.
Competitive advantages include zero per-token costs, automatic model updates handled by Edge, and shared model caching across domains to reduce redundant downloads. Edge optimizes the model for specific hardware, ensuring consistent performance across devices.

What hardware is required to use these APIs? The Phi-4-mini model runs on devices meeting Edge’s hardware requirements, with automatic download and caching upon first use. Edge handles optimizations for compatible CPUs/GPUs, ensuring broad accessibility.
How are model updates managed? Microsoft Edge automatically updates the Phi-4-mini model in the background, ensuring developers always use the latest version without manual intervention. Updates do not disrupt existing API integrations.
Can these APIs replace cloud-based LLMs like GPT-4? They are designed for specific on-device tasks where privacy and cost matter, but may lack the breadth of cloud models. Use cases like real-time feedback processing or lightweight text generation are ideal.
How do structured outputs work? Developers define a JSON schema (e.g., rating ranges) passed to the Prompt API, forcing the model to return validated JSON. This simplifies integration with databases or analytics tools.
Is user data sent to Microsoft servers? No—the Phi-4-mini model runs entirely locally, ensuring all processing occurs on the user’s device. This aligns with strict data privacy regulations and reduces latency.

On-device AI for web apps, built into Microsoft Edge