Technical Showcase · How the AI reads time
A full walkthrough of the 3-pass extraction pipeline, Norwegian date format recognition, event classification schema, live SSE streaming, and one-click Word export.
Architecture
The pipeline is intentionally sequential — Pass 1 is rule-based and near-instant; Pass 2 is the LLM extraction; Pass 3 post-processes and scores the output.
A deterministic pattern-matching pass runs before any LLM call. It scans the full input for dates matching 12+ Norwegian formats and normalises them to ISO 8601:
dd.mm.yyyy → YYYY-MM-DDNormalised anchors are injected into the LLM prompt to reduce hallucinated or misread dates.
The LLM reads the full document alongside the pre-pass anchors. For every temporal reference it returns a structured JSON event object:
date — resolved ISO date, or verbatim string if unresolvabledate_type — absolute | relative | recurring | conditional | periodconfidence — high | medium | lowactor — attributed entity (from source text, not inferred)description — one-sentence event summarysource_excerpt — verbatim text fragment (max 200 chars)The prompt explicitly instructs the model not to invent dates or actors not present in the source. Temperature is set to 0.1 for deterministic output.
PHP applies all active filters before returning the result:
The post-processor then assembles the what_remains_uncertain list and the next_practical_step recommendation.
Date recognition
Norwegian legal documents use a wide variety of date notations. The Pass 1 pre-pass recognises all of these deterministically; the LLM handles the rest in Pass 2.
| Format | Example | Notes |
|---|---|---|
dd.mm.yyyy |
30.07.2015 | Standard Norwegian numeric |
dd.mm.yy |
09.04.25 | Two-digit year → always 20YY |
d. månedsnavn yyyy |
3. mars 2024 | Written month in bokmål/nynorsk |
d. månedsnavn |
15. januar | Year inferred by proximity scanning |
yyyy-mm-dd |
2024-03-12 | ISO 8601 |
månedsnavn yyyy |
mars 2024 | Month + year only |
yyyy |
2024 | Year-only reference |
| Season + year | høsten 2023 | Seasonal reference → Q3/Q4 |
| Diary-format line | 18.09.2025: Møte avholdt | Date + colon → auto-tagged as event |
| Relative reference | tre uker etter vedtaket | Anchored to nearest resolved event |
| Recurring pattern | hver mandag | Classified as recurring |
| Period / range | fra mars til juni 2024 | Yields start_date + end_date |
Classification schema
| date_type | Definition | Example |
|---|---|---|
absolute |
A specific, resolvable calendar date | 30.07.2015 → 2015-07-30 |
relative |
A date expressed relative to another event | tre uker etter vedtaket |
recurring |
A pattern that repeats on a schedule | each Monday, every 6 months |
conditional |
A date contingent on a condition being met | if no response within 14 days |
period |
A date range or duration with start and end | fra mars til juni 2024 |
| confidence | Meaning | Visual in timeline |
|---|---|---|
high |
Date is explicitly and unambiguously stated in the source text | Green badge |
medium |
Date is inferred, approximate, or stated with slight ambiguity | Amber badge |
low |
Date is implied, undated, or extracted from a degraded/ambiguous passage | Grey badge |
| Rule | Example |
|---|---|
| Named entity in the same sentence | “Trude [saksbehandler] ringte 14. mars” → actor: Trude |
| Role label without a name | “Barnevernet fattet vedtak” → actor: Barnevernet |
| No clear attribution in sentence | actor: [unattributed] |
| Document-level default | If no per-event actor, defaults to the document sender/issuing body |
Engines
Both engines return the same JSON schema — the post-processor handles them identically. Engine choice affects speed, quality, and credit cost only.
| Engine | Model | Latency | Best for |
|---|---|---|---|
| Azure gpt-4o-mini ★ | gpt-4o-mini (Azure West Europe) |
~15 s | Default. Fast, cost-efficient, handles most legal documents well. |
| Azure gpt-4o | gpt-4o (Azure West Europe) |
~45 s | Complex documents, overlapping events, poor-quality or dense source text. |
Live updates & export
Timeline uses Server-Sent Events (SSE) to stream live status messages to the browser as extraction runs. Instead of staring at a spinner for 30–60 seconds, you see "Preparing document…", "Calling gpt-4o-mini…", "Parsing events…" in real time.
Once extraction completes, click Export to Word to download a formatted .docx with every event as a labelled paragraph, source excerpts, and a divider line between events. No third-party DOCX library is used — the file is assembled directly from OOXML via PHP ZipArchive.
Privacy & security
Privacy by design
gpt-4o, gpt-4o-mini) is configured on the West Europe region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.Free for Do Better Norge members. All engines available to every member.