V4-Pro
Use deepseek-v4-pro when the task rewards stronger reasoning, agent planning, code review, tool orchestration, document analysis, or high-value answers where extra latency is acceptable.
Developer guide for the V4 Preview era
The DeepSeek V4 API Preview introduces V4-Pro and V4-Flash for teams that need long-context reasoning, lower-cost inference, agent workflows, and an OpenAI-compatible migration path without rewriting their whole stack.
Overview
DeepSeek V4 API Preview gives developers two new model identifiers to plan around: deepseek-v4-pro for stronger reasoning and coding workloads, and deepseek-v4-flash for fast, efficient production paths. The shift matters because older aliases are no longer the safest way to describe new builds.
The practical attraction is compatibility. Teams already using chat-completions style SDKs can keep the same message structure, set the DeepSeek base URL, change the API key, and select the right V4 model. The harder work is operational: evaluating latency, context cost, prompt behavior, fallbacks, and output quality before moving real traffic.
Use deepseek-v4-pro when the task rewards stronger reasoning, agent planning, code review, tool orchestration, document analysis, or high-value answers where extra latency is acceptable.
Use deepseek-v4-flash when the product needs faster response time, broad throughput, lower unit cost, routine chat, classification, extraction, support automation, or draft generation.
Treat deepseek-chat and deepseek-reasoner as compatibility names, not future-facing brand names. New integrations should prefer explicit V4 model IDs where available.
Integration
DeepSeek keeps the integration shape familiar: a base URL, bearer token, model name, and chat-style messages. That makes a proof of concept quick, but production adoption still needs deterministic test prompts, cost sampling, rate-limit handling, streaming behavior checks, and rollback logic.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com"
});
const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "system", content: "Answer with concise technical detail." },
{ role: "user", content: "Summarize this API migration plan." }
],
stream: false
});
console.log(response.choices[0].message.content);
Readiness checklist
Route expensive reasoning tasks to V4-Pro and default high-volume tasks to V4-Flash. Keep the route explicit so cost and quality can be measured.
Do not fill the 1M window by default. Chunk, rank, and compress context so the model receives the best evidence rather than the largest dump.
Test stream parsing, timeout windows, partial failures, idempotent retries, and user-visible fallback messages before the feature is released.
Keep API keys in server-side secrets, redact logs, filter tool inputs, and add prompt-injection tests for any workflow that reads untrusted content.
Build path
Capture representative prompts, expected outputs, latency, token usage, and failure cases from the current model before switching anything.
Measure quality and cost by workload. Do not assume the larger model is the best default for every route.
Use a small traffic slice, compare outputs, track user corrections, and keep a fallback model available until the integration is stable.
Quick answers
No. In OpenAI-compatible API paths, /v1 is an API compatibility path, not the model generation number. Choose the model by setting the model ID.
Start with V4-Flash for common production tasks, then add V4-Pro only where evaluation shows meaningful quality gains.
No. Large context helps, but retrieval, ranking, deduplication, and context compression still improve cost, speed, and answer focus.