Quantara • Devnet-0
Devnet-0 is liveView global status

QUANTARA • QUANTUM-RESISTANT L1

Incident response on Quantara Devnet-0

How we classify incidents, communicate with validators, and move from detection to containment, recovery, and postmortem.

Docs • Ops

Incident response for Quantara Devnet-0

A shared playbook for validators, infra operators, and builders when something looks wrong on Devnet-0 — from the first alert to the final postmortem.

Devnet-0 is where we discover and fix problems early. Incidents are expected. What matters is that we detect them quickly, communicate clearly, and recover in a controlled way.

This document defines how Quantara classifies incidents, how we expect validators and operators to respond, and how it fits with the Rollback & Recovery and Security checklist.

Treat this as the operational counterpart to the Devnet-0 overview and Validator runbook — those explain the steady state; this one covers “when things go sideways.”

Devnet-0Incident responseQTR • 12 decimals • SS58=73

When in doubt, assume it might be a real incident. Capture evidence, check /status, and share what you're seeing — even if it later turns out to be benign.

Current networkDevnet-0

Devnet-0 is a non-financial rehearsal network. We intentionally push new runtimes, node versions, and tooling to find issues before public testnet and mainnet.

• RPC: wss://rpc.devnet-0.quantara.xyz

• Explorer: https://explorer.devnet-0.quantara.xyz

Last updated: 2025-11-23 22:00 UTC. The Status page is the canonical source for live incident state, maintenance windows, and advisories.

1 • Classification

How we classify incidents on Devnet-0

A shared severity language keeps everyone aligned on urgency and expectations — even on a devnet.

Sev-1 — Network-wide impact

  • • Finality stalled or blocks not being produced.
  • • Majority of validators unable to stay in sync.
  • • Critical consensus or runtime bug suspected.
  • • Requires coordinated response from core team.

Sev-2 — Partial degradation

  • • Some validators / RPCs degraded, but chain healthy.
  • • Performance issues (high latency, resource spikes).
  • • Non-critical bugs impacting a subset of operators.
  • • Workarounds exist, but we still want a fix.

Sev-3 — Local / cosmetic

  • • Single validator down or misconfigured.
  • • Explorer or wallet UI glitch with no chain impact.
  • • Intermittent RPC errors that self-resolve quickly.
  • • Good candidates for GitHub issues and follow-up.

2 • First response

First five minutes when you see something weird

Whether you’re a validator or running supporting infra, these are the steps we expect you to take before diving into deep debugging.

2.1 — Confirm the signal

  • • Check if /status reports an incident or maintenance.
  • • Compare your block height with reference RPC / explorer.
  • • Verify that alerts are not just firing from noisy rules.
  • • Check a second vantage point (another node, region, or tool).

2.2 — Capture evidence

  • • Note current time, block height, and peer count.
  • • Grab a short log snippet around the first error.
  • • Capture key metrics (CPU, RAM, disk, network).
  • • Save any error messages exactly as they appear.

2.3 — Share a concise report

  • • Post in the validator / ops channel with your findings.
  • • Include time, node role (validator / RPC / sentry), and region.
  • • Attach logs / metrics or links where safe to do so.
  • • Suggest your initial severity (Sev-1/2/3) if you can.

A good first incident message answers: “what changed, when, where, and how bad does it look?” Perfection is not required — clarity is.

3 • Communication

Who says what, where, and when

Even on a devnet, we want predictable communication patterns — both inside Quantara and across the validator set.

3.1 — Channels & sources of truth

  • /status — canonical incident state, timelines, and summaries.
  • • Validator / ops channels — real-time coordination and updates.
  • • Docs — long-lived guidance, updated post-incident.
  • • Social / public feeds — used selectively for larger events.

3.2 — Typical timeline

  1. 1) Detection — alert fires or operator observes an anomaly.
  2. 2) Triage — severity assigned, initial scope determined.
  3. 3) Containment — temporary mitigations applied.
  4. 4) Recovery — permanent fix rolled out and verified.
  5. 5) Review — incident logged and postmortem drafted.

4 • Symptom playbooks

Common incident patterns & first actions

These patterns cover most incidents you’ll see on Devnet-0. Use them as a starting point while we evolve more detailed runbooks.

4.1 — Chain not finalizing / blocks stalled

  • • Confirm stall via explorer and multiple RPC endpoints.
  • • Check for consensus-related errors in node logs.
  • • Verify you're on the canonical chain (hash / spec).
  • • Treat as Sev-1 until downgraded by core team.

4.2 — Node stuck or constantly behind

  • • Compare height with reference nodes and explorer.
  • • Check disk, CPU, RAM, and network saturation.
  • • Review logs for repeated I/O or DB-related errors.
  • • If localized, treat as Sev-2/3 and consider rebuild or snapshot restore.

4.3 — RPC / wallet issues

  • • Test multiple methods (health, system, chain RPCs).
  • • Determine if issue is specific to one RPC or region.
  • • Check for CORS, rate limiting, or TLS errors.
  • • Coordinate with other operators to confirm scope.

More detailed flows live in the Rollback & Recovery doc, especially for controlled rollbacks and version pinning.

5 • Rollback & recovery

When we decide to roll back or pin a version

Most incidents resolve forward — but sometimes the safest move is to roll back to a known-good runtime or node binary.

5.1 — When rollback is on the table

  • • New runtime causes consensus or finality instability.
  • • Node version introduces severe performance regression.
  • • Data corruption suspected after a specific upgrade.
  • • Recovery forward would be slower / riskier than rollback.

5.2 — Follow the rollback runbook

  • • Read the Rollback & Recovery doc before you attempt any coordinated rollback.
  • • Never roll back alone if the rest of the network is moving forward.
  • • Always record versions, hashes, and timing of rollback steps.
  • • After recovery, verify metrics and chain state match expectations.

6 • Post-incident review

Turn every incident into an upgrade

The goal is not zero incidents; it’s zero repeat incidents with the same root cause.

6.1 — Minimum incident record

  • • Short description, date, and severity.
  • • Impacted components (validators, RPCs, explorers, users).
  • • Root cause (once understood) and triggering conditions.
  • • Concrete actions taken during response and recovery.

6.2 — Follow-up actions

  • • Update runbooks and checklists with lessons learned.
  • • Adjust alerts, dashboards, and thresholds if needed.
  • • File or update issues in the relevant code repositories.
  • • Share a summary with the validator / ops community.

For longer-lived incidents, we align on a shared postmortem format inspired by the Postmortem template.

Next steps

Practice now, so mainnet incidents feel familiar

If Devnet-0 incidents feel routine — not chaotic — you’re in the right place for public testnet and mainnet.

Keep this page close to the Validator runbook, Security checklist and Rollback & Recovery docs. Together they form the core of Quantara's operational handbook for early networks.

The strongest Devnet-0 operators are the ones who treat every incident as a chance to upgrade their systems, tooling, and habits. That mindset is exactly what we're building Quantara with.