Technical Showcase · Hard-RAG · Fine-tuned LLM · 3-pass pipeline
A full walkthrough of the retrieval-augmented generation pipeline, citation verification system, fine-tuned legal model, and the formal citation refine pass that produces court-ready references.
Architecture
The pipeline is intentionally sequential — Pass 1 is cheap and fast (gpt-4o-mini); Pass 2 is expensive and only runs if the situation is clear enough; Pass 3 is optional and user-triggered.
Parses the intake and returns a structured JSON classification:
summary — one-sentence case summaryparties — identified actorsapplicable_acts — relevant statute setsmissing_facts[] — gaps that would hurt draft qualitysuggested_goal — inferred goal if none statedIf missing_facts is non-empty → emits clarify gate. No credit deducted until Pass 2 starts.
Four sub-steps, each verified before proceeding:
[CITE:N] tokens referencing only retrieved source IDs[CITE:N] token whose source ID isn't in the retrieved pool; flags deadline/goal/tone complianceUser-triggered (+1 credit). Jurisdiction-scoped retrieval, then rewrites inline citations to formal style and appends Rettskilder block:
Hard-RAG
Hard-RAG means the model is constrained to only cite what it retrieved. No § number can appear in the final draft unless a corresponding source passage was actually found and fetched.
The self-check pass parses every [CITE:N] token in the draft and looks up the source ID N in the retrieved pool. If it doesn't match — the citation is removed and the paragraph is rewritten without it. The output also flags whether the deadline was addressed, whether the stated goal was achieved, and whether the tone matched the selected chip.
If no corpus passage closely matches the situation, the draft is produced in plain language without § references. A note in the output says: "No cited law sources — draft is plain-language (no § references available from corpus)." This is the intentional, honest behaviour — a blank draft is better than one with fake citations.
Knowledge base
The legal corpus is split into named slices. Each recipient body preset maps to a set of slices, so retrieval is always scoped to the right area of law.
| Recipient body | Corpus slices loaded |
|---|---|
| Barnevernet | child_welfare · echr · family_core |
| Bufdir | family_core · echr · bufdir_guidance |
| NAV | broader_legal (NAV-loven) |
| Skole / Barnehage / SFO | broader_legal (opplæringslova / barnehageloven) |
| Statsforvalteren | child_welfare · broader_legal |
| Trygderetten / Tingretten | norwegian_courts · broader_legal |
Fine-tuned model
A QLoRA (Quantized Low-Rank Adaptation) fine-tune trained on Norwegian child-welfare and administrative law text. Unlike a general-purpose LLM, dbn-legal-agent has internalized the procedural vocabulary and reasoning patterns of forvaltningsloven: what triggers a § 17 right to be heard, what a lawful § 24 reasoned decision must contain, how barnevernsloven § 6-3 frames the child's best interest standard.
In the Korrespond pipeline, dbn-legal-agent runs as a domain adapter alongside Azure gpt-4o. The retrieval prompt is constructed using dbn-legal-agent's representation of the intake, while gpt-4o handles the final generation within the Hard-RAG constraint. This separation gives structural clarity (gpt-4o) and domain precision (dbn-legal-agent) in the same pipeline.
| Pass | Model | Role |
|---|---|---|
| Pass 1 classify | gpt-4o-mini | Fast structured classification + gap detection |
| Pass 1 clarify questions | gpt-4o-mini + dbn-legal-agent | Domain-aware question generation |
| Pass 2 draft | gpt-4o | Full letter generation within Hard-RAG constraints |
| Pass 2 self-check | gpt-4o-mini | Citation verification + tone/goal/deadline audit |
| Pass 2 translate | gpt-4o-mini | Norwegian → working language translation |
| Pass 3 refine | gpt-4o | Formal citation rewrite + Rettskilder block |
Pass 3 — Formal citation refine
The optional third pass does a jurisdiction-scoped retrieval run, then rewrites the draft with formal inline citations and a Rettskilder appendix. Two distinct citation formats are supported:
Inline citations use jf. (with reference to) and the official statute name + section: jf. forvaltningsloven § 17, jf. opplæringslova § 9 A-4, jf. barnevernsloven § 6-3. Section numbers are verified against the corpus before inclusion.
Full European Court of Human Rights citation format: case name · application number · date · chamber/Grand Chamber · paragraph. Example: Strand Lobben m.fl. mot Norge, EMD-37283/13 (Storkammer, 10.09.2019), § 207. Sources pulled from the ECHR corpus slice and HUDOC.
Refined draft (Norwegian + English) with opplæringslova § 9 A-4 and EMK artikkel 8 inline citations.
For Barnevernet and Bufdir cases, the ECHR refine pass runs specific anchor queries targeting the most-cited Norwegian family cases in the HUDOC corpus:
Privacy & security
Privacy by design
bnl-legal-search) stores only the public legal corpus — statutes, tribunal decisions, ECHR judgments. None of your case information is stored in the search index.Free for Do Better Norge members. All 3 passes available to every member.