Hackathon submission · Track 3 — Multimodal Geospatial Workloads

Technical Report

Architecture, model usage, dataset, reproducibility

Project: Helion · Investigative Console Submission for: Geospatial Video Intelligence Hackathon, Track 3 — Multimodal Geospatial Workloads Live deploy: https://helion.metisos.co Source: https://github.com/metisos/helion

This document covers everything DevPost asks for under "Technical documentation": full-pipeline architecture, how Marengo + Pegasus are used and why, repo + setup, dataset documentation, preprocessing, and reproducibility notes. Companion documents: validation-report.md (quantitative metrics) and mission-impact-brief.md (operational value).


1. Architecture — full pipeline

┌──────────────────────────────────────────────────────────────────────────────┐
│                              BROWSER                                          │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Marketing landing (/) · Dashboard (/dashboard) · Wizard (/cases/new)  │ │
│  │  Console shell (3 rails): Header nav │ Helion Agent │ content │ tabs   │ │
│  │  Tabs: Overview · Viewer · Officers · Timeline · Map · Statements ·    │ │
│  │        Witness · Policy · Report                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
        │ HTTPS via nginx → :4288 (Next.js prod)
        ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                    NEXT.JS 16 APP (App Router, React 19)                      │
│                                                                               │
│  ┌──────────────────────────────────┐    ┌────────────────────────────────┐ │
│  │ Server components (pages)        │    │ Client components              │ │
│  │  • read store + presign URLs     │◄───│  • viewer (multi-angle sync)   │ │
│  │  • compose case-report markdown  │    │  • map (Mapbox GL)             │ │
│  │  • route handlers (/api/*)       │    │  • agent rail (chat + canvas)  │ │
│  └──────────────────────────────────┘    └────────────────────────────────┘ │
│                                                                               │
│  ┌──────────────────────────────────┐    ┌────────────────────────────────┐ │
│  │ lib/store.ts                     │    │ lib/bedrock.ts                 │ │
│  │  Async case store, S3-backed     │    │  invokePegasus (sync)          │ │
│  │  s3://bucket/store/state.json    │    │  startMarengoEmbed (async)     │ │
│  └──────────────┬───────────────────┘    └──────────────┬─────────────────┘ │
└─────────────────┼─────────────────────────────────────────┼──────────────────┘
                  │                                         │
                  ▼                                         ▼
        ┌──────────────────┐                ┌──────────────────────────────┐
        │      AWS S3      │                │       AWS BEDROCK            │
        │  • case state    │                │  • twelvelabs.pegasus-1-2    │
        │  • video uploads │                │       (sync InvokeModel)     │
        │  • Marengo out   │                │  • twelvelabs.marengo-3-0    │
        │  • per-tenant    │                │       (async StartInvoke)    │
        │    presigned URLs│                │                              │
        └──────────────────┘                └──────────────────────────────┘
                                                          │
                                                          ▼
                                              ┌──────────────────────────────┐
                                              │  Mapbox APIs                 │
                                              │  • GL JS (tiles, basemaps)   │
                                              │  • Directions (road snap)    │
                                              │  • Geocoding (new cases)     │
                                              └──────────────────────────────┘

Pipeline path for a fresh upload (input → output):

  1. Wizard (browser) — investigator names case + classification, drag-drops video file(s)
  2. Presigned PUT to S3 — direct browser → S3, no proxy through Next
  3. POST /api/cases/[id]/process — orchestrator kicks off in parallel:
    • Mapbox Geocoding — address → lat/lng (~300 ms)
    • Pegasus Overview — first feed → 3-paragraph synopsis (~10–15 s)
    • Pegasus Timeline per feed in parallel (~30–60 s each, Promise.all)
    • Pegasus Transcribe per feed in parallel (~30–60 s each, Promise.all)
    • Marengo Embed per feed (async invoke, runs in background)
    • Policy template attach — generic UoF doctrine if classification matches
  4. Wizard polls /api/cases/[id]/process/status every 1.5 s until done: true
  5. Open case → /overview — every tab now has data; agent rail is always-on
  6. Investigator question → /api/agent — transcript-hit search routes to a feed → Pegasus answers grounded in the cited line
  7. Generate report → /api/agent/report — server-composed markdown synthesizing all modalities

2. How Marengo and Pegasus are used (and why)

Pegasus 1.2 (sync, video-language model)

Used for: structured event extraction, transcription with speaker labels + key-statement categorization, executive synopsis generation, single-clause policy regrade, and the agent's grounded Q&A.

Why Pegasus and not a generic LLM + Whisper: Pegasus understands video AND audio AND visual cues simultaneously. Asking "did the officer give a verbal warning" requires audio comprehension and visual confirmation that the right person said it at the right moment. A pipeline that splits audio (Whisper) and video (Yolo) doesn't preserve that joint reasoning; Pegasus does.

Where in the codebase:

  • lib/bedrock.tsinvokePegasus({ prompt, media: { s3Location }, responseSchema })
  • app/api/cases/[id]/process/route.ts — orchestrator runs timeline + transcript + overview in parallel
  • app/api/policy/evaluate/route.ts — single-clause live regrade with structured response schema
  • app/api/agent/route.ts — Q&A; transcript hits injected into prompt as primary evidence

Prompt-engineering decisions:

  • Always pass responseSchema (JSON Schema) when we want structured output. Pegasus respects it; downstream parsing is reliable.
  • For the agent, when transcript hits exist on the routed feed, we inject them into the prompt with [mm:ss] SPEAKER: "text" so Pegasus quotes the real lines instead of re-inferring from audio.
  • temperature=0 for extraction tasks (timeline, transcription, policy grade); temperature=0.2 for narrative tasks (overview, agent answers).

Marengo 3.0 (async, multimodal embeddings)

Used for: asset-level visual embeddings (one 512-dim vector per video) and clip-level visual embeddings (every ~2 s) for cross-feed retrieval.

Why Marengo and not text-only embeddings (e.g. Cohere, OpenAI): Marengo embeds the video itself, not just the caption. The asset-level vector represents the content of the entire feed; the clip-level vectors let us locate a moment. Text-only embeddings can't do either without first running an expensive video-to-text pipeline.

Where in the codebase:

  • scripts/embed-houston.py — fans out async StartAsyncInvoke jobs for the 7 Houston feeds
  • lib/bedrock.tsstartMarengoEmbed + getMarengoStatus helpers
  • lib/embeddings.ts — local cosine similarity over pre-pulled asset-level vectors
  • data/houston-embeddings.json — bundled asset-level (1 vector per feed; ~70 KB)
  • data/houston-embeddings-clips.json — clip-level (~84 vectors per feed; ~750 KB; server-only)

Honest limitation: Marengo text-embedding on Bedrock is async-only (~30 s wall-clock per query). For the agent's interactive Q&A, this would add unacceptable latency, so we route via transcript-hit + keyword search instead. The Marengo vectors are still computed and bundled; the moment Bedrock ships sync text-embed, we plug retrieval into the agent.


3. GitHub repository, README, setup

Source: https://github.com/metisos/helion

git clone https://github.com/metisos/helion.git
cd helion
npm install
cp .env.example .env.local       # fill AWS + Mapbox creds
npm run dev                       # http://localhost:3000 (or 3001 if busy)

Required env vars (in .env.local for dev, in deploy environment for prod):

AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
SENTINELVIEW_S3_BUCKET=...
NEXT_PUBLIC_MAPBOX_ACCESS_TOKEN=pk....

IAM policy minimums for the AWS keys:

  • bedrock:InvokeModel, bedrock:StartAsyncInvoke, bedrock:GetAsyncInvoke on the two TwelveLabs model ARNs
  • s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket on the SentinelView bucket
  • s3:PutBucketCORS if you want to update CORS programmatically

Package scripts (package.json):

  • npm run dev — Turbopack dev server
  • npm run build — production build
  • npm run start — start production server (used by helion-frontend.service on the self-host)

Production deploy: self-hosted on a long-lived Node process behind nginx + Let's Encrypt, with a GitHub-webhook-driven push-to-deploy. See docs/self-host-plan.md for the full setup or deploy/webhook.py + the systemd units in /etc/systemd/system/helion-{frontend,deploy}.service.


4. Dataset documentation

Hero demo case: HPD officer-involved shooting at 600 W Mt Houston Rd, 9/10/2022.

Source video: 7 publicly released body-cam and dashcam files from the Houston Police Department, ~3 min each. They live in S3 at s3://sentinelview-959025079752/cases/CASE-HOU-2022-MTHOUSTON/*.mp4. Local copies at videos/ (gitignored — too large for the repo).

FilenameSourceOfficer / camera
Video1.mp4HPD releaseOfficer Ready BWC (the discharging officer)
Dashcam-Video2.mp4HPD releasePatrol dashcam (auto-activated with emergency lights)
OfcEngland-Video3.mp4HPD releaseOfficer England BWC
ofcMunoz-Video4.mp4HPD releaseOfficer Munoz BWC #1
OfcDuron-Video5.mp4HPD releaseOfficer Duron BWC
OfcServise-Video6.mp4HPD releaseOfficer Service BWC
OfficerMonoz-Video7.mp4HPD releaseOfficer Munoz BWC #2 (post-incident)

Ground-truth document: the HPD public notice issued 9/12/2022 (/public/houston-public-notice.md). Used as the canonical narrative for validation-report §2 (recall against the 9 documented events).

Bundled bake-out artifacts (committed in the repo so the demo always works without re-running Bedrock):

  • data/houston-timelines.json — Pegasus event extraction (63 events, 7 feeds)
  • data/houston-transcripts.json — Pegasus transcription (33 utterances, 26 key statements)
  • data/houston-embeddings.json — Marengo asset-level vectors (7 × 512 dim)
  • data/houston-embeddings-clips.json — Marengo clip-level vectors (server-only, ~750 KB)
  • data/houston-policies.json — 9 hand-seeded HPD GO 600-17 findings calibrated against the public notice

5. Preprocessing steps (one-time, to reproduce the bake-out)

The Houston seeds were generated by four Python scripts. To re-run from scratch:

# 0. Set AWS + S3 creds
source .env.local

# 1. Upload videos to S3
python3 scripts/upload-houston.py
# Writes scripts/houston-uploads.json (S3 URIs per filename)

# 2. Marengo async embeddings (kicks off jobs; pull when complete)
python3 scripts/embed-houston.py
# Writes scripts/houston-embeds.json (invocationArns)
# Wait ~5 min for jobs to finish, then run a small pull script to collect
# output.json files into data/houston-embeddings.json + clips.json

# 3. Pegasus timeline extraction (sync, parallel)
python3 scripts/pegasus-houston.py
# Writes scripts/houston-timelines.json then copy to data/

# 4. Pegasus transcription (sync, parallel via concurrent.futures)
python3 scripts/transcribe-houston.py
# Writes data/houston-transcripts.json directly

Total wall-clock: ~10 minutes for all four passes.

Calibration step (manual): the incidentStartSec offsets in lib/seed.ts were hand-calibrated against the HPD public-notice narrative so the multi-angle viewer aligns the shots-fired moment across all 7 feeds. Pegasus timestamps drift 5–15 s on noisy BWC audio; trusting them directly produces a misaligned reconstruction. Future versions will read GPS metadata from the video files for automatic alignment.


6. Reproducibility notes

Determinism:

  • Pegasus calls use temperature=0 for extraction tasks (timeline, transcription, policy grade) and temperature=0.2 for narrative (overview, agent). At T=0, repeated calls on the same video produce ~98% identical structured output.
  • The data/agent-cache/ and data/policy-cache/ directories cache Pegasus responses by prompt-hash so demo replays are instant and do not re-burn Bedrock.

Caveats:

  • Pegasus 1.2 is non-deterministic at the token level (sampling jitter even at T=0). Structured-response-schema outputs converge to the same JSON; free-text outputs (agent answers, narratives) vary slightly word-by-word.
  • AWS STS tokens expire (12–36 h). Local development with STS keys breaks once the token expires; production deploys should use long-lived IAM users.

To reproduce the validation report's metrics (§5 of validation-report.md):

# Live policy re-evaluation across all 9 clauses (~73 s wall-clock)
for id in auth-necessity auth-proportional deadly-imminence \
          deadly-deescalation deadly-warning conduct-intervene \
          conduct-aid report-bwc-activation report-notification; do
  curl -X POST https://helion.metisos.co/api/policy/evaluate \
    -H 'content-type: application/json' \
    -d "{\"findingId\":\"$id\"}"
done

# Agent retrieval coverage across the 6 demo questions
for q in "Did anyone call out shots fired?" \
         "Was the suspect armed?" \
         "Did the officer give a verbal warning before discharging?" \
         "Was anyone asking for medical aid?" \
         "What did Officer Munoz observe?" \
         "Did anyone announce their presence as police?"; do
  curl -X POST https://helion.metisos.co/api/agent \
    -H 'content-type: application/json' \
    -d "{\"question\":\"$q\"}"
done

7. Project documentation index

DocPurpose
README.mdTop-level project intro + Track 3 alignment table
ROSETTA.mdFull project spec: architecture, conventions, entry points, gotchas
.rosetta/modules/agent.mdAgent + report templates + Canvas
.rosetta/modules/viewer.mdMulti-angle viewer + incident-clock cascade
.rosetta/modules/process-pipeline.mdNew-case ingestion orchestrator
.rosetta/modules/policy.mdHPD GO 600-17 grading + report composition
.rosetta/modules/bedrock.mdPegasus + Marengo wrappers + S3 grounding
docs/validation-report.mdQuantitative metrics + baseline comparison + cost benchmarks
docs/mission-impact-brief.mdOne-page operational value summary
docs/self-host-plan.mdnginx + systemd + webhook deploy pattern
docs/track3-gap-analysis.mdTrack 3 criteria checklist (planning artifact)