Legal Tools
Sign in

Technical Showcase · How the AI reads time

How Timeline knows when things happened.

A full walkthrough of the 3-pass extraction pipeline, Norwegian date format recognition, event classification schema, live SSE streaming, and one-click Word export.

12+ date formats
5 event types
3 pipeline passes
2 engine options

Architecture

Three passes. Each with a distinct job.

The pipeline is intentionally sequential — Pass 1 is rule-based and near-instant; Pass 2 is the LLM extraction; Pass 3 post-processes and scores the output.

Pass 1 · PHP / regex

Detect & normalise known formats

A deterministic pattern-matching pass runs before any LLM call. It scans the full input for dates matching 12+ Norwegian formats and normalises them to ISO 8601:

  • dd.mm.yyyyYYYY-MM-DD
  • d. månedsnavn yyyy → resolved calendar date
  • Diary-format lines (starting with a date + colon) → auto-tagged as events
  • Two-digit years → always interpreted as 20YY

Normalised anchors are injected into the LLM prompt to reduce hallucinated or misread dates.

Pass 2 · gpt-4o-mini / gpt-4o

Extract, classify & score

The LLM reads the full document alongside the pre-pass anchors. For every temporal reference it returns a structured JSON event object:

  • date — resolved ISO date, or verbatim string if unresolvable
  • date_typeabsolute | relative | recurring | conditional | period
  • confidencehigh | medium | low
  • actor — attributed entity (from source text, not inferred)
  • description — one-sentence event summary
  • source_excerpt — verbatim text fragment (max 200 chars)

The prompt explicitly instructs the model not to invent dates or actors not present in the source. Temperature is set to 0.1 for deterministic output.

Pass 3 · PHP post-processor

Filter, sort & assemble

PHP applies all active filters before returning the result:

  • Focus filter — strips events not matching the requested focus mode (deadlines / hearings / CPS)
  • Confidence filter — removes LOW-confidence events if requested
  • Background filter — strips background/narrative events if unchecked
  • Date-type filter — strips relative/recurring events if unchecked

The post-processor then assembles the what_remains_uncertain list and the next_practical_step recommendation.

Date recognition

12+ Norwegian date formats, all recognised.

Norwegian legal documents use a wide variety of date notations. The Pass 1 pre-pass recognises all of these deterministically; the LLM handles the rest in Pass 2.

Format Example Notes
dd.mm.yyyy 30.07.2015 Standard Norwegian numeric
dd.mm.yy 09.04.25 Two-digit year → always 20YY
d. månedsnavn yyyy 3. mars 2024 Written month in bokmål/nynorsk
d. månedsnavn 15. januar Year inferred by proximity scanning
yyyy-mm-dd 2024-03-12 ISO 8601
månedsnavn yyyy mars 2024 Month + year only
yyyy 2024 Year-only reference
Season + year høsten 2023 Seasonal reference → Q3/Q4
Diary-format line 18.09.2025: Møte avholdt Date + colon → auto-tagged as event
Relative reference tre uker etter vedtaket Anchored to nearest resolved event
Recurring pattern hver mandag Classified as recurring
Period / range fra mars til juni 2024 Yields start_date + end_date

Classification schema

Five event types. Three confidence levels.

date_type values

date_type Definition Example
absolute A specific, resolvable calendar date 30.07.2015 → 2015-07-30
relative A date expressed relative to another event tre uker etter vedtaket
recurring A pattern that repeats on a schedule each Monday, every 6 months
conditional A date contingent on a condition being met if no response within 14 days
period A date range or duration with start and end fra mars til juni 2024

confidence levels

confidence Meaning Visual in timeline
high Date is explicitly and unambiguously stated in the source text Green badge
medium Date is inferred, approximate, or stated with slight ambiguity Amber badge
low Date is implied, undated, or extracted from a degraded/ambiguous passage Grey badge

Actor attribution rules

Rule Example
Named entity in the same sentence “Trude [saksbehandler] ringte 14. mars” → actor: Trude
Role label without a name “Barnevernet fattet vedtak” → actor: Barnevernet
No clear attribution in sentence actor: [unattributed]
Document-level default If no per-event actor, defaults to the document sender/issuing body

Engines

Two engines, one structured output.

Both engines return the same JSON schema — the post-processor handles them identically. Engine choice affects speed, quality, and credit cost only.

Engine Model Latency Best for
Azure gpt-4o-mini ★ gpt-4o-mini (Azure West Europe) ~15 s Default. Fast, cost-efficient, handles most legal documents well.
Azure gpt-4o gpt-4o (Azure West Europe) ~45 s Complex documents, overlapping events, poor-quality or dense source text.

Live updates & export

See progress as it happens. Download in Word.

SSE + DOCX

SSE streaming + DOCX export

Timeline uses Server-Sent Events (SSE) to stream live status messages to the browser as extraction runs. Instead of staring at a spinner for 30–60 seconds, you see "Preparing document…", "Calling gpt-4o-mini…", "Parsing events…" in real time.

Once extraction completes, click Export to Word to download a formatted .docx with every event as a labelled paragraph, source excerpts, and a divider line between events. No third-party DOCX library is used — the file is assembled directly from OOXML via PHP ZipArchive.

Server-Sent Events OOXML / .docx ZipArchive live progress Save to My Docs

Privacy & security

Your documents never leave your session.

Privacy by design

  • All uploaded files are extracted to text in memory using PHP's in-process file handlers. The raw binary is never written to disk on the server.
  • Session context (pasted text, uploaded content, extracted timeline events) is scoped to your authenticated session and discarded when the session ends.
  • Azure OpenAI (gpt-4o, gpt-4o-mini) is configured on the West Europe region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.
  • Azure OpenAI is called only for the extraction pass. No document content is retained by Azure after the response is returned, per the enterprise data-handling agreement.
  • Telemetry logged: tool name, engine, focus mode, event count, latency. No document text, case references, actor names, or extracted events are logged.

See it work on your case.

Free for Do Better Norge members. All engines available to every member.

Sign in to use Timeline → Register free User guide