Legal Tools
Sign in

Technical Showcase · How the AI protects documents

How Redact knows what to replace.

A full walkthrough of the two-pass redaction pipeline, regional pattern recognition, entity classification, output format generation, and privacy architecture.

2 processing passes
4 regional rule sets
3 output formats
2 engine options

Architecture

Two passes. Deterministic first, intelligent second.

The pipeline is intentionally layered — Pass 1 catches everything that can be caught by rules alone; Pass 2 handles context-dependent entities that rules cannot identify.

Pass 1 · PHP / regex

Detect & replace known patterns

A deterministic regex pass runs before any LLM call. It scans the full input for identifiers matching the active regional profile:

  • Norwegian fødselsnummer (11-digit) and D-number
  • Phone numbers in +47 format, Norwegian mobile (4xx/9xx) and landline patterns
  • Email addresses (RFC 5322 simplified)
  • Norwegian postal addresses: street name + number + postnummer + poststed
  • Additional patterns per region (IBAN, CPR, ECHR numbers, SSN, etc.)

Pattern-matched tokens are replaced immediately and marked as already redacted so the LLM pass does not double-process them.

Pass 2 · gpt-4o-mini / gpt-4o

Sweep for named entities

The LLM reads the full document (with pattern-replaced tokens visible as placeholders). It identifies all remaining named entities:

  • Personal names — first name, surname, or full name; handles Norwegian, Sami, and foreign names
  • Organisations — companies, government agencies, NGOs, religious bodies, sports clubs
  • Places — streets, neighbourhoods, municipalities, counties, countries
  • Date-of-birth and age phrases (when Dates entity type is checked)

Each matched entity is classified by type and role (FATHER, MOTHER, JUDGE, SOCIAL WORKER, etc.) and replaced according to the selected output format. The prompt explicitly instructs the model not to infer or guess identities — only replace what is explicitly named.

Pass 3 · PHP post-processor

Post-processing & alias substitution

After both passes, PHP applies final transformations:

  • Officials pass — if Keep official names is checked, named judges, experts, and caseworkers get labelled tags ([JUDGE: Andersen]) instead of generic ones
  • Alias substitution — user-defined aliases are applied as a final regex replacement, overriding whatever the LLM assigned
  • Exempt name protection — any token matching an exempt name is restored to its original value
  • Pseudonym generation — if Pseudonym output is selected, all role tags are replaced with plausible Norwegian names, phone numbers, and addresses drawn from a generation pool

The final output is assembled and returned as plain text. The DOCX export converts this to an OOXML document via PHP ZipArchive.

Regional pattern sets

Four regions. Each adds patterns to the last.

Regional profiles are cumulative — European adds to Nordic, ECHR adds to European, Global adds to ECHR.

Region Patterns covered Notes
Nordic ★ Fødselsnummer, D-number, +47 phone, email, Norwegian address Default for Norwegian documents. All local ID formats.
European + IBAN, Swedish personnummer, Danish CPR, Finnish HETU, UK NI Cross-border EU documents, Nordic neighbours.
ECHR + ECHR application numbers, DOB phrases, ECtHR case references Complaints to the European Court of Human Rights.
Global + US SSN, driver's licence formats, generic document numbers Documents involving non-European parties or jurisdictions.

Entity classification

What the LLM identifies and how it labels each type.

Named entity types

Entity What qualifies Default output (contextual)
person Any personal name — first, last, or full [ROLE] — inferred from context (FATHER, MOTHER, JUDGE, etc.)
organisation Companies, agencies, authorities, institutions, clubs [ORG: partial name] or generic [ORG]
place Streets, towns, counties, countries, regions [PLACE] or [CITY] or [ADDRESS]
date Dates of birth, age references, personal date phrases [DOB] or [AGE: xx]

Output format comparison

Format Person example Org example
Contextual ★ [FATHER] or [JUDGE: Andersen] [BARNEVERNET: Oslo] or [ORG]
Generic [PERSON] [ORG]
Pseudonym Ola Nordmann (generated) Nordnes AS (generated)

Engines

Two engines, one redaction schema.

Both engines produce the same redaction output schema. Engine choice affects accuracy on complex documents and credit cost only.

Engine Model Latency Best for
Azure gpt-4o-mini ★ gpt-4o-mini (Azure West Europe) ~15 s Default. Most documents, single subject, clear formatting.
Azure gpt-4o gpt-4o (Azure West Europe) ~45 s Complex documents with many named parties, overlapping roles, or degraded source text.

Privacy & security

Processed in memory. Saved only when you say so.

Privacy by design

See it work on your documents.

Free for Do Better Norge members.

Sign in to use Redact → Register free User guide