Legal Tools
Sign in

Technical Showcase · Hard-RAG · Fine-tuned LLM · 3-pass pipeline

How Korrespond knows what to write.

A full walkthrough of the retrieval-augmented generation pipeline, citation verification system, fine-tuned legal model, and the formal citation refine pass that produces court-ready references.

220K+ passages indexed
8 corpus slices
3 pipeline passes
gpt-4o drafting model

Architecture

Three passes. Each with a distinct job.

The pipeline is intentionally sequential — Pass 1 is cheap and fast (gpt-4o-mini); Pass 2 is expensive and only runs if the situation is clear enough; Pass 3 is optional and user-triggered.

Pass 1 · gpt-4o-mini

Classify & gap-check

Parses the intake and returns a structured JSON classification:

  • summary — one-sentence case summary
  • parties — identified actors
  • applicable_acts — relevant statute sets
  • missing_facts[] — gaps that would hurt draft quality
  • suggested_goal — inferred goal if none stated

If missing_facts is non-empty → emits clarify gate. No credit deducted until Pass 2 starts.

Pass 2 · gpt-4o

Retrieve → draft → check → translate

Four sub-steps, each verified before proceeding:

  • Retrieve: hybrid dense + BM25 search across the preset corpus slices; top 8 passages returned with source IDs
  • Draft: gpt-4o generates the letter using [CITE:N] tokens referencing only retrieved source IDs
  • Self-check: strips any [CITE:N] token whose source ID isn't in the retrieved pool; flags deadline/goal/tone compliance
  • Translate: Norwegian draft → working language (single call)
Pass 3 · optional

Formal citation refine

User-triggered (+1 credit). Jurisdiction-scoped retrieval, then rewrites inline citations to formal style and appends Rettskilder block:

  • Norwegian: jf. forvaltningsloven § 17
  • ECHR: full case name, application number, date, paragraph
  • Both: combined domestic + ECHR grounds

Hard-RAG

Every § citation is verified before it reaches you.

Hard-RAG means the model is constrained to only cite what it retrieved. No § number can appear in the final draft unless a corresponding source passage was actually found and fetched.

User intake + body preset
Corpus slice selection
Hybrid search (dense vector + BM25)
Top 8 passages with source IDs
Passages injected into gpt-4o prompt
Draft with [CITE:N] tokens only
Self-check: verify each [CITE:N] resolves
Strip unverified citations

The self-check pass parses every [CITE:N] token in the draft and looks up the source ID N in the retrieved pool. If it doesn't match — the citation is removed and the paragraph is rewritten without it. The output also flags whether the deadline was addressed, whether the stated goal was achieved, and whether the tone matched the selected chip.

What happens when no statute fits?

If no corpus passage closely matches the situation, the draft is produced in plain language without § references. A note in the output says: "No cited law sources — draft is plain-language (no § references available from corpus)." This is the intentional, honest behaviour — a blank draft is better than one with fake citations.

Knowledge base

220,000+ passages across 8 corpus slices.

The legal corpus is split into named slices. Each recipient body preset maps to a set of slices, so retrieval is always scoped to the right area of law.

220K+ total indexed passages
8 corpus slices
1,731 FNV tribunal decisions
23 ECHR Norwegian-family cases
Azure AI Search (West Europe)
Hybrid dense vector + BM25

Corpus slices

child_welfare echr family_core bufdir_guidance norwegian_courts broader_legal dbn_resources hague

Body preset → slice mapping (examples)

Recipient bodyCorpus slices loaded
Barnevernetchild_welfare · echr · family_core
Bufdirfamily_core · echr · bufdir_guidance
NAVbroader_legal (NAV-loven)
Skole / Barnehage / SFObroader_legal (opplæringslova / barnehageloven)
Statsforvalterenchild_welfare · broader_legal
Trygderetten / Tingrettennorwegian_courts · broader_legal

Fine-tuned model

dbn-legal-agent: trained on Norwegian legal text.

QLoRA fine-tune

dbn-legal-agent

A QLoRA (Quantized Low-Rank Adaptation) fine-tune trained on Norwegian child-welfare and administrative law text. Unlike a general-purpose LLM, dbn-legal-agent has internalized the procedural vocabulary and reasoning patterns of forvaltningsloven: what triggers a § 17 right to be heard, what a lawful § 24 reasoned decision must contain, how barnevernsloven § 6-3 frames the child's best interest standard.

In the Korrespond pipeline, dbn-legal-agent runs as a domain adapter alongside Azure gpt-4o. The retrieval prompt is constructed using dbn-legal-agent's representation of the intake, while gpt-4o handles the final generation within the Hard-RAG constraint. This separation gives structural clarity (gpt-4o) and domain precision (dbn-legal-agent) in the same pipeline.

QLoRA forvaltningsloven barnevernsloven child-welfare corpus Norwegian bokmål output gpt-4o co-pipeline

Model responsibilities in the pipeline

PassModelRole
Pass 1 classifygpt-4o-miniFast structured classification + gap detection
Pass 1 clarify questionsgpt-4o-mini + dbn-legal-agentDomain-aware question generation
Pass 2 draftgpt-4oFull letter generation within Hard-RAG constraints
Pass 2 self-checkgpt-4o-miniCitation verification + tone/goal/deadline audit
Pass 2 translategpt-4o-miniNorwegian → working language translation
Pass 3 refinegpt-4oFormal citation rewrite + Rettskilder block

Pass 3 — Formal citation refine

Court-ready citations in two styles.

The optional third pass does a jurisdiction-scoped retrieval run, then rewrites the draft with formal inline citations and a Rettskilder appendix. Two distinct citation formats are supported:

🇳🇴

Norwegian statute style

Inline citations use jf. (with reference to) and the official statute name + section: jf. forvaltningsloven § 17, jf. opplæringslova § 9 A-4, jf. barnevernsloven § 6-3. Section numbers are verified against the corpus before inclusion.

⚖️

ECHR citation style

Full European Court of Human Rights citation format: case name · application number · date · chamber/Grand Chamber · paragraph. Example: Strand Lobben m.fl. mot Norge, EMD-37283/13 (Storkammer, 10.09.2019), § 207. Sources pulled from the ECHR corpus slice and HUDOC.

Example refined output

Refined output showing formal citations including opplæringslova §9 A-4 and EMK artikkel 8

Refined draft (Norwegian + English) with opplæringslova § 9 A-4 and EMK artikkel 8 inline citations.

Anchor queries for ECHR mode

For Barnevernet and Bufdir cases, the ECHR refine pass runs specific anchor queries targeting the most-cited Norwegian family cases in the HUDOC corpus:

Strand Lobben m.fl. mot Norge Johansen mot Norge K.O. og V.M. mot Norge Aune mot Norge EMK Art. 8 family life Norway EMK Art. 6 fair trial

Privacy & security

Your documents never leave your session.

Privacy by design

  • All uploaded files are extracted to text in memory using PHP's in-process file handlers. The raw binary is never written to disk on the server.
  • Session context (your narrative, uploaded text, drafts) is scoped to your authenticated session and discarded when the session ends.
  • Azure OpenAI (gpt-4o, gpt-4o-mini) is configured on the West Europe region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.
  • Azure AI Search (bnl-legal-search) stores only the public legal corpus — statutes, tribunal decisions, ECHR judgments. None of your case information is stored in the search index.
  • Qdrant vector database stores only the public corpus embeddings — no user data.
  • Telemetry logged: tool name, language, output type, pass count, latency, source count. No case text, no names, no case references are logged.

See it work on your case.

Free for Do Better Norge members. All 3 passes available to every member.

Sign in to use Korrespond → Register free User guide