Project: Helion · Investigative Console Submission for: Geospatial Video Intelligence Hackathon, Track 3 — Multimodal Geospatial Workloads Live deploy: https://helion.metisos.co Source: https://github.com/metisos/helion
This document covers everything DevPost asks for under "Technical documentation": full-pipeline architecture, how Marengo + Pegasus are used and why, repo + setup, dataset documentation, preprocessing, and reproducibility notes. Companion documents: validation-report.md (quantitative metrics) and mission-impact-brief.md (operational value).
1. Architecture — full pipeline
┌──────────────────────────────────────────────────────────────────────────────┐
│ BROWSER │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Marketing landing (/) · Dashboard (/dashboard) · Wizard (/cases/new) │ │
│ │ Console shell (3 rails): Header nav │ Helion Agent │ content │ tabs │ │
│ │ Tabs: Overview · Viewer · Officers · Timeline · Map · Statements · │ │
│ │ Witness · Policy · Report │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│ HTTPS via nginx → :4288 (Next.js prod)
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ NEXT.JS 16 APP (App Router, React 19) │
│ │
│ ┌──────────────────────────────────┐ ┌────────────────────────────────┐ │
│ │ Server components (pages) │ │ Client components │ │
│ │ • read store + presign URLs │◄───│ • viewer (multi-angle sync) │ │
│ │ • compose case-report markdown │ │ • map (Mapbox GL) │ │
│ │ • route handlers (/api/*) │ │ • agent rail (chat + canvas) │ │
│ └──────────────────────────────────┘ └────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────┐ ┌────────────────────────────────┐ │
│ │ lib/store.ts │ │ lib/bedrock.ts │ │
│ │ Async case store, S3-backed │ │ invokePegasus (sync) │ │
│ │ s3://bucket/store/state.json │ │ startMarengoEmbed (async) │ │
│ └──────────────┬───────────────────┘ └──────────────┬─────────────────┘ │
└─────────────────┼─────────────────────────────────────────┼──────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────────────────┐
│ AWS S3 │ │ AWS BEDROCK │
│ • case state │ │ • twelvelabs.pegasus-1-2 │
│ • video uploads │ │ (sync InvokeModel) │
│ • Marengo out │ │ • twelvelabs.marengo-3-0 │
│ • per-tenant │ │ (async StartInvoke) │
│ presigned URLs│ │ │
└──────────────────┘ └──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Mapbox APIs │
│ • GL JS (tiles, basemaps) │
│ • Directions (road snap) │
│ • Geocoding (new cases) │
└──────────────────────────────┘
Pipeline path for a fresh upload (input → output):
- Wizard (browser) — investigator names case + classification, drag-drops video file(s)
- Presigned PUT to S3 — direct browser → S3, no proxy through Next
- POST
/api/cases/[id]/process— orchestrator kicks off in parallel:- Mapbox Geocoding — address → lat/lng (~300 ms)
- Pegasus Overview — first feed → 3-paragraph synopsis (~10–15 s)
- Pegasus Timeline per feed in parallel (~30–60 s each,
Promise.all) - Pegasus Transcribe per feed in parallel (~30–60 s each,
Promise.all) - Marengo Embed per feed (async invoke, runs in background)
- Policy template attach — generic UoF doctrine if classification matches
- Wizard polls
/api/cases/[id]/process/statusevery 1.5 s untildone: true - Open case →
/overview— every tab now has data; agent rail is always-on - Investigator question →
/api/agent— transcript-hit search routes to a feed → Pegasus answers grounded in the cited line - Generate report →
/api/agent/report— server-composed markdown synthesizing all modalities
2. How Marengo and Pegasus are used (and why)
Pegasus 1.2 (sync, video-language model)
Used for: structured event extraction, transcription with speaker labels + key-statement categorization, executive synopsis generation, single-clause policy regrade, and the agent's grounded Q&A.
Why Pegasus and not a generic LLM + Whisper: Pegasus understands video AND audio AND visual cues simultaneously. Asking "did the officer give a verbal warning" requires audio comprehension and visual confirmation that the right person said it at the right moment. A pipeline that splits audio (Whisper) and video (Yolo) doesn't preserve that joint reasoning; Pegasus does.
Where in the codebase:
lib/bedrock.ts—invokePegasus({ prompt, media: { s3Location }, responseSchema })app/api/cases/[id]/process/route.ts— orchestrator runs timeline + transcript + overview in parallelapp/api/policy/evaluate/route.ts— single-clause live regrade with structured response schemaapp/api/agent/route.ts— Q&A; transcript hits injected into prompt as primary evidence
Prompt-engineering decisions:
- Always pass
responseSchema(JSON Schema) when we want structured output. Pegasus respects it; downstream parsing is reliable. - For the agent, when transcript hits exist on the routed feed, we inject them into the prompt with
[mm:ss] SPEAKER: "text"so Pegasus quotes the real lines instead of re-inferring from audio. temperature=0for extraction tasks (timeline, transcription, policy grade);temperature=0.2for narrative tasks (overview, agent answers).
Marengo 3.0 (async, multimodal embeddings)
Used for: asset-level visual embeddings (one 512-dim vector per video) and clip-level visual embeddings (every ~2 s) for cross-feed retrieval.
Why Marengo and not text-only embeddings (e.g. Cohere, OpenAI): Marengo embeds the video itself, not just the caption. The asset-level vector represents the content of the entire feed; the clip-level vectors let us locate a moment. Text-only embeddings can't do either without first running an expensive video-to-text pipeline.
Where in the codebase:
scripts/embed-houston.py— fans out asyncStartAsyncInvokejobs for the 7 Houston feedslib/bedrock.ts—startMarengoEmbed+getMarengoStatushelperslib/embeddings.ts— local cosine similarity over pre-pulled asset-level vectorsdata/houston-embeddings.json— bundled asset-level (1 vector per feed; ~70 KB)data/houston-embeddings-clips.json— clip-level (~84 vectors per feed; ~750 KB; server-only)
Honest limitation: Marengo text-embedding on Bedrock is async-only (~30 s wall-clock per query). For the agent's interactive Q&A, this would add unacceptable latency, so we route via transcript-hit + keyword search instead. The Marengo vectors are still computed and bundled; the moment Bedrock ships sync text-embed, we plug retrieval into the agent.
3. GitHub repository, README, setup
Source: https://github.com/metisos/helion
git clone https://github.com/metisos/helion.git
cd helion
npm install
cp .env.example .env.local # fill AWS + Mapbox creds
npm run dev # http://localhost:3000 (or 3001 if busy)
Required env vars (in .env.local for dev, in deploy environment for prod):
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
SENTINELVIEW_S3_BUCKET=...
NEXT_PUBLIC_MAPBOX_ACCESS_TOKEN=pk....
IAM policy minimums for the AWS keys:
bedrock:InvokeModel,bedrock:StartAsyncInvoke,bedrock:GetAsyncInvokeon the two TwelveLabs model ARNss3:GetObject,s3:PutObject,s3:DeleteObject,s3:ListBucketon the SentinelView buckets3:PutBucketCORSif you want to update CORS programmatically
Package scripts (package.json):
npm run dev— Turbopack dev servernpm run build— production buildnpm run start— start production server (used byhelion-frontend.serviceon the self-host)
Production deploy: self-hosted on a long-lived Node process behind nginx + Let's Encrypt, with a GitHub-webhook-driven push-to-deploy. See docs/self-host-plan.md for the full setup or deploy/webhook.py + the systemd units in /etc/systemd/system/helion-{frontend,deploy}.service.
4. Dataset documentation
Hero demo case: HPD officer-involved shooting at 600 W Mt Houston Rd, 9/10/2022.
Source video: 7 publicly released body-cam and dashcam files from the Houston Police Department, ~3 min each. They live in S3 at s3://sentinelview-959025079752/cases/CASE-HOU-2022-MTHOUSTON/*.mp4. Local copies at videos/ (gitignored — too large for the repo).
| Filename | Source | Officer / camera |
|---|---|---|
Video1.mp4 | HPD release | Officer Ready BWC (the discharging officer) |
Dashcam-Video2.mp4 | HPD release | Patrol dashcam (auto-activated with emergency lights) |
OfcEngland-Video3.mp4 | HPD release | Officer England BWC |
ofcMunoz-Video4.mp4 | HPD release | Officer Munoz BWC #1 |
OfcDuron-Video5.mp4 | HPD release | Officer Duron BWC |
OfcServise-Video6.mp4 | HPD release | Officer Service BWC |
OfficerMonoz-Video7.mp4 | HPD release | Officer Munoz BWC #2 (post-incident) |
Ground-truth document: the HPD public notice issued 9/12/2022 (/public/houston-public-notice.md). Used as the canonical narrative for validation-report §2 (recall against the 9 documented events).
Bundled bake-out artifacts (committed in the repo so the demo always works without re-running Bedrock):
data/houston-timelines.json— Pegasus event extraction (63 events, 7 feeds)data/houston-transcripts.json— Pegasus transcription (33 utterances, 26 key statements)data/houston-embeddings.json— Marengo asset-level vectors (7 × 512 dim)data/houston-embeddings-clips.json— Marengo clip-level vectors (server-only, ~750 KB)data/houston-policies.json— 9 hand-seeded HPD GO 600-17 findings calibrated against the public notice
5. Preprocessing steps (one-time, to reproduce the bake-out)
The Houston seeds were generated by four Python scripts. To re-run from scratch:
# 0. Set AWS + S3 creds
source .env.local
# 1. Upload videos to S3
python3 scripts/upload-houston.py
# Writes scripts/houston-uploads.json (S3 URIs per filename)
# 2. Marengo async embeddings (kicks off jobs; pull when complete)
python3 scripts/embed-houston.py
# Writes scripts/houston-embeds.json (invocationArns)
# Wait ~5 min for jobs to finish, then run a small pull script to collect
# output.json files into data/houston-embeddings.json + clips.json
# 3. Pegasus timeline extraction (sync, parallel)
python3 scripts/pegasus-houston.py
# Writes scripts/houston-timelines.json then copy to data/
# 4. Pegasus transcription (sync, parallel via concurrent.futures)
python3 scripts/transcribe-houston.py
# Writes data/houston-transcripts.json directly
Total wall-clock: ~10 minutes for all four passes.
Calibration step (manual): the incidentStartSec offsets in lib/seed.ts were hand-calibrated against the HPD public-notice narrative so the multi-angle viewer aligns the shots-fired moment across all 7 feeds. Pegasus timestamps drift 5–15 s on noisy BWC audio; trusting them directly produces a misaligned reconstruction. Future versions will read GPS metadata from the video files for automatic alignment.
6. Reproducibility notes
Determinism:
- Pegasus calls use
temperature=0for extraction tasks (timeline, transcription, policy grade) andtemperature=0.2for narrative (overview, agent). At T=0, repeated calls on the same video produce ~98% identical structured output. - The
data/agent-cache/anddata/policy-cache/directories cache Pegasus responses by prompt-hash so demo replays are instant and do not re-burn Bedrock.
Caveats:
- Pegasus 1.2 is non-deterministic at the token level (sampling jitter even at T=0). Structured-response-schema outputs converge to the same JSON; free-text outputs (agent answers, narratives) vary slightly word-by-word.
- AWS STS tokens expire (12–36 h). Local development with STS keys breaks once the token expires; production deploys should use long-lived IAM users.
To reproduce the validation report's metrics (§5 of validation-report.md):
# Live policy re-evaluation across all 9 clauses (~73 s wall-clock)
for id in auth-necessity auth-proportional deadly-imminence \
deadly-deescalation deadly-warning conduct-intervene \
conduct-aid report-bwc-activation report-notification; do
curl -X POST https://helion.metisos.co/api/policy/evaluate \
-H 'content-type: application/json' \
-d "{\"findingId\":\"$id\"}"
done
# Agent retrieval coverage across the 6 demo questions
for q in "Did anyone call out shots fired?" \
"Was the suspect armed?" \
"Did the officer give a verbal warning before discharging?" \
"Was anyone asking for medical aid?" \
"What did Officer Munoz observe?" \
"Did anyone announce their presence as police?"; do
curl -X POST https://helion.metisos.co/api/agent \
-H 'content-type: application/json' \
-d "{\"question\":\"$q\"}"
done
7. Project documentation index
| Doc | Purpose |
|---|---|
README.md | Top-level project intro + Track 3 alignment table |
ROSETTA.md | Full project spec: architecture, conventions, entry points, gotchas |
.rosetta/modules/agent.md | Agent + report templates + Canvas |
.rosetta/modules/viewer.md | Multi-angle viewer + incident-clock cascade |
.rosetta/modules/process-pipeline.md | New-case ingestion orchestrator |
.rosetta/modules/policy.md | HPD GO 600-17 grading + report composition |
.rosetta/modules/bedrock.md | Pegasus + Marengo wrappers + S3 grounding |
docs/validation-report.md | Quantitative metrics + baseline comparison + cost benchmarks |
docs/mission-impact-brief.md | One-page operational value summary |
docs/self-host-plan.md | nginx + systemd + webhook deploy pattern |
docs/track3-gap-analysis.md | Track 3 criteria checklist (planning artifact) |