QUANTARA • QUANTUM-RESISTANT L1

Postmortem template for Quantara Devnet-0

A structured, blameless template for capturing what happened, why it happened, and how we make sure it does not happen again.

Docs • Ops

Postmortem template for Quantara Devnet-0

Use this template whenever we have a meaningful incident on Devnet-0. The goal is simple: learn quickly, share clearly, and avoid repeating the same mistakes as we grow into public testnet and mainnet.

A good postmortem is not about blame. It's a factual story of what happened, why it happened, how we fixed it, and what we're changing so it's much less likely to happen again.

This page gives you a reusable outline for Devnet-0 incidents — from short, single-node issues to full network events. It pairs with the Incident Response and Rollback & Recovery docs, which focus on what to do during an incident. This template focuses on what we do after.

As we move toward public testnet and mainnet, we'll keep refining this format — but the core sections (summary, impact, timeline, root cause, fix, follow-ups) will stay the same.

Devnet-0Postmortem templateQTR • 12 decimals • SS58=73

Postmortems are blameless by design. We focus on systems, code, and processes — not on punishing individuals. If a postmortem feels unsafe to write, we fix that culture bug first.

When to write a postmortem

• Any Sev-1 incident on Devnet-0.
• Most Sev-2 incidents (especially recurring ones).
• Repeated Sev-3 issues with similar causes.
• Any event where we decide to roll back or pin versions.

Network: Devnet-0. See /status for canonical incident listings and links to finalized postmortems.

1 • Overview

High-level template structure

Every Quantara postmortem follows the same skeleton, so readers know where to look for each piece of information.

1.1 — Header

• Title and short summary.
• Date and time window.
• Severity and status (open / closed).
• Primary owner and reviewers.

1.2 — Core sections

• Impact.
• Timeline of events.
• Root cause analysis.
• Remediation & recovery.

1.3 — Follow-ups

• Action items with owners and due dates.
• Longer-term projects / tech debt.
• Links to dashboards, logs, and patches.
• Lessons learned and communication summary.

2 • Header & summary

Capture the essentials in the first screen

Someone scanning the top of a postmortem should immediately understand what happened, how bad it was, and whether it’s resolved.

2.1 — Recommended header block

• Title: one-line description of the incident.
• Date: UTC start and end time.
• Severity: Sev-1 / Sev-2 / Sev-3.
• Status: detected / mitigated / resolved / in follow-up.
• Owners: primary incident commander + reviewers.

2.2 — Executive summary (3–6 sentences)

• What broke and how users were affected.
• High-level root cause.
• How we fixed or mitigated it.
• What we’re doing to prevent recurrence.

This is often the only part busy readers see — write it last, but make it crisp.

3 • Impact

Describe who was affected and how

Impact should be factual and specific, not vague. Avoid hand-waving; use numbers where you can.

3.1 — Impact questions

• Which networks were affected (Devnet-0, localnets, etc.)?
• Which components (validators, RPCs, explorers, wallet)?
• How long were they degraded or unavailable?
• Were there any data integrity risks?

3.2 — Example phrasing

• “Block production stalled for 17 minutes.”
• “~60% of validators were unable to maintain peers.”
• “Public RPC latency spiked above 10s for 40 minutes.”
• “No evidence of finalized state corruption was found.”

3.3 — Links & evidence

• Grafana dashboards and snapshots.
• Explorer views or block ranges.
• Key log excerpts (with timestamps and node IDs).
• Any user-facing status page updates.

4 • Timeline

Build a precise, timestamped story

The timeline is the backbone of a good postmortem. It shows what happened, when, and how quickly we reacted.

4.1 — Format

• Use UTC timestamps and a consistent format.
• Include actor (who/what), action, and outcome.
• Highlight key decision points and missteps.
• Keep entries short — one or two sentences each.

4.2 — Example snippet

2025-09-12 14:03 UTC — Alerts fire for stalled finality on Devnet-0.
2025-09-12 14:06 UTC — Validators report low peer counts in #validators.
2025-09-12 14:11 UTC — Incident commander declared; Sev-1 assigned.
2025-09-12 14:24 UTC — Suspect runtime upgrade X as trigger; rollback plan drafted.
2025-09-12 14:37 UTC — Rollback executed; blocks resume and finalize normally.

If the timeline reveals confusion or slow reaction, that's valuable signal. Capture it honestly — future drills and training will be built around those gaps.

5 • Root cause & remediation

Explain why it happened and how we fixed it

Root cause is about systems, not people. Focus on mechanisms, assumptions, and missing safeguards.

5.1 — Root cause analysis

• Immediate trigger (what broke first).
• Contributing factors (config, code, infra, human).
• Why existing safeguards didn't prevent it.
• Any known similar incidents in the past.

5.2 — Fix & verification

• Steps taken to mitigate or resolve the issue.
• How we confirmed the fix worked (metrics, tests).
• Any temporary workarounds still in place.
• Links to patches, PRs, or configuration changes.

5.3 — Action items

• Short list of follow-up tasks.
• Each with owner, priority, and target date.
• Flags for tasks that must land before testnet/mainnet.
• Link to tracking tickets where relevant.

Next steps

Treat every Devnet-0 incident as a rehearsal

By writing disciplined postmortems now, we make mainnet incidents rarer, shorter, and far less surprising.

Copy this template into your preferred doc system for every qualifying incident. Keep links to Incident Response, Rollback & Recovery and Security checklist close at hand — they're the operational backdrop to every postmortem.

The operators and builders who consistently write clear, honest postmortems on Devnet-0 are the ones we'll trust most as Quantara grows into public testnet and mainnet.