Traditional code review tools — ESLint, SonarQube, CodeClimate — are excellent at catching known patterns: unused variables, style violations, common security anti-patterns. What they can't do is reason about your specific architecture, trace logic flow across 30 files, or understand that this database query is dangerous in the context of how the caller is built.
DeepSeek V4's 1 million-token context makes a different kind of analysis possible: load the entire codebase, ask a question, get a structured answer that understands the whole system. This tutorial shows you exactly how to build it.
Before writing code, it helps to understand what the model is actually doing differently from a linter:
utils/db.py that's only exploitable because of how api/users.py calls itPaymentProcessor.refund() method has no tests, but it's called from 4 different places"These are things a human senior engineer would catch in a thorough review. With DeepSeek V4, you can run this analysis in 90 seconds.
The API takes text, not files. You need to serialize your codebase into a flat text representation. The simplest approach is to zip your source directory and extract relevant files:
import zipfile
import io
TEXT_EXTENSIONS = {
".py", ".js", ".ts", ".tsx", ".jsx", ".go", ".rs",
".java", ".c", ".h", ".cpp", ".hpp", ".cs", ".rb",
".php", ".swift", ".kt", ".sh", ".yml", ".yaml",
".json", ".toml", ".md", ".sql", ".html", ".css",
}
def extract_repo(zip_bytes: bytes, max_file_bytes: int = 500_000) -> str:
parts = []
with zipfile.ZipFile(io.BytesIO(zip_bytes)) as z:
for info in z.infolist():
if info.is_dir() or info.file_size > max_file_bytes:
continue
ext = "." + info.filename.rsplit(".", 1)[-1].lower() if "." in info.filename else ""
if ext not in TEXT_EXTENSIONS:
continue
# Skip common non-source directories
if any(d in info.filename for d in ["node_modules/", ".venv/", "__pycache__/", ".git/"]):
continue
try:
text = z.read(info).decode("utf-8", errors="ignore")
parts.append(f"=== {info.filename} ===\n{text}")
except Exception:
pass
return "\n\n".join(parts)
The system prompt is what separates a generic analysis from a useful one. For code review, you want structured output with actionable sections:
CODE_REVIEW_PROMPT = """You are a senior staff engineer conducting a thorough code review.
Analyze the provided repository for the following categories:
**Critical Issues** — Security vulnerabilities, data loss risks, authentication bypasses, SQL injection,
XSS, SSRF, insecure deserialization, hard-coded credentials, exposed secrets.
**Design Issues** — God classes, excessive coupling, circular dependencies, missing abstractions,
inconsistency between stated architecture and actual structure.
**Performance Issues** — N+1 queries, missing indexes, synchronous blocking in async context,
unbounded loops or memory allocations.
**Test Coverage Gaps** — Critical paths with no tests, mocked tests that can't catch real bugs,
missing edge case coverage.
**Quick Wins** — Low-effort improvements with meaningful impact.
For each issue, cite the specific file(s) and approximate line range.
Output well-formatted Markdown. Be direct and precise. Do not summarize what is already obvious."""
For large inputs, streaming is essential — a 500k-token analysis can take 3–5 minutes to complete, and streaming shows progress rather than a spinning indicator:
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com/v1",
)
def review_repo(repo_text: str) -> str:
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": CODE_REVIEW_PROMPT},
{"role": "user", "content": f"Repository to review:\n\n{repo_text}"},
],
temperature=0.2, # lower = more deterministic, better for analysis
stream=True,
)
result = ""
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
result += delta
print(delta, end="", flush=True) # real-time streaming
return result
DeepSeek V4 charges $0.27 per million input tokens and $1.10 per million output tokens. A rough estimate for a code review:
You can pre-screen the token count before calling the API to avoid surprises:
def estimate_tokens(text: str) -> int:
# ~4 characters per token is a reliable approximation for English/code
return len(text) // 4
repo_text = extract_repo(zip_bytes)
estimated_tokens = estimate_tokens(repo_text)
estimated_cost = (estimated_tokens / 1_000_000) * 0.27
print(f"Estimated: {estimated_tokens:,} tokens, ${estimated_cost:.3f} input cost")
Everything above applies equally to document analysis. For contract review:
CONTRACT_REVIEW_PROMPT = """You are a paralegal AI reviewing legal contracts.
For each document, extract and summarize:
**Parties** — Full legal names and roles
**Term** — Start date, end date, renewal conditions
**Key Obligations** — What each party must do, by when
**Payment Terms** — Amounts, schedule, late payment penalties
**Termination** — Conditions, notice requirements, consequences
**Liability & Indemnification** — Caps, exclusions, indemnity scope
**Red Flags** — Non-standard clauses, one-sided terms, missing protections
**Action Items** — What needs to happen before this can be signed
Output structured Markdown. Be precise and cite section numbers."""
def review_documents(pdf_texts: list[tuple[str, str]]) -> str:
combined = "\n\n".join(
f"## Document: {name}\n\n{text}"
for name, text in pdf_texts
)
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": CONTRACT_REVIEW_PROMPT},
{"role": "user", "content": combined},
],
temperature=0.1,
stream=True,
)
# ... same streaming loop
Large context analyses can take several minutes. Set generous HTTP timeouts — 600 seconds for the read timeout is reasonable. If you're building a web app, use streaming to keep the connection alive and show progress.
DeepSeek's API can return 429 (rate limit) or 503 (overload) under high load. Implement exponential backoff with jitter for retries. Don't retry immediately on 413 (input too large) — that requires reducing the input.
If you're running code reviews on a repo that hasn't changed between runs, cache the output keyed by the repo's git SHA. DeepSeek V4's analysis of a given codebase is deterministic enough (at temperature=0.2) that caching is valid.
Don't want to build this yourself? Agent Workbench handles extraction, streaming, credits, and structured output — no code required.
Try free — 2 runs/dayLLM-based code review is not a replacement for static analysis tools. It's a complement. ESLint catches syntax errors and style issues faster and cheaper. SonarQube has curated rule sets for common vulnerability patterns validated against thousands of CVEs. DeepSeek V4 adds the layer that static tools can't: contextual reasoning about your specific architecture, business logic, and cross-file design.
The best setup is both: automated static analysis in CI (free, fast, no tokens) plus periodic LLM review for the architecture-level issues that linters miss.