Skip to content

Services

Maintenance & SRE · $25–80k/mo retainer

Your on-call team pages a human only when a human matters.

AI sits between the alert and the engineer. Known faults close without a person in the loop. Unknown faults page the right human with diagnostics already gathered. The retainer goes down as the automation rate goes up.

73%

reduction in human pages — ASX-200 retailer, 200 pages a week to 54

8 min

Black Friday CDN fault caught before it reached production — precursor in telemetry, zero human pages

$1.2M

annual run-cost saved at the ASX-200 retail engagement in year one

≤60s

mean time to detect for known fault patterns, versus 4–12 minutes before the triage agents

The structural problem with traditional SRE

Your on-call team is spending 70% of their nights on alerts a machine could close.

Read 12 months of incident logs for a mid-market stack and the same five fault patterns account for the majority of out-of-hours pages. The same alert fires. The same runbook gets pulled. The same engineer types the same commands at 3am. An ASX-200 retailer ran 200 pages a week into a small SRE team. Effektiv read twelve months of their incident logs, extracted the real fix steps from actual resolution data — not from runbooks — and built triage agents around those patterns.

Human pages dropped 73%. Eight minutes before a Black Friday CDN failure would have reached production, the triage agent caught the precursor in telemetry and closed the incident without waking anyone. A vendor priced on seat count has no incentive to reduce the volume of incidents a human handles. Effektiv's retainer is written the opposite way: the bill goes down as the automation rate goes up.

What changes

The same challenge. Two very different outcomes.

Without Effektiv

  • 200 pages a week into a small SRE team
  • Same five fault patterns re-paged every week
  • Engineers type the same commands at 3am, every week
  • Post-mortems in a shared drive nobody re-reads
  • Mean time to detect 4–12 minutes from precursor to alert
  • Vendor priced by seat — no incentive to reduce volume

With Effektiv

  • 27% of pages reach a human — the rest auto-close
  • Triage agents read alerts against 12 months of actual fix data
  • Known faults close behind a rollback gate without a person in the loop
  • Resolution database queryable and extendable by your team
  • Mean time to detect under 60 seconds for known patterns
  • Retainer goes down as the automation rate goes up — incentives aligned

Why incentive alignment matters

Three on-call vendor models.

Dimension Effektiv agent triage Vendor priced by seat Alert-to-jira automation
Incentive alignmentBill goes down as automation risesBill rises with seatsPer-event pricing
Human-page reduction50–70%0–10%15–25%
Rollback gate per stepYes, named in DesignNoneManual rollback
Mean time to detect≤60s for known patterns4–12 minutes1–3 minutes

How we deliver

Diagnose. Design. Deliver.

Two weeks of listening before a line of code. The price is fixed at the end of Design — not at kick-off.

Phase 1 · 1–2 weeks

Diagnose

We map your incident log, runbooks, and cost telemetry. We read 12 months of actual incident history — the commands engineers actually ran to resolve each fault, not the runbooks people meant to follow. We identify which patterns are candidates for automation and which need a human in the loop by design.

Phase 2 · 1–2 weeks

Design

Triage rig spec, rollback rules, and eval gates. Human-in-the-loop requirements documented. Any fault pattern touching a money write or a record of truth stays gated. All model inference on AWS Bedrock in AU regions, inheriting VPC, IAM, PrivateLink, CloudTrail, and KMS controls.

Phase 3 · 4–8 weeks

Deliver

Triage agents built and tuned in a parallel run alongside your existing on-call process. The switch-over is incremental, not a single cut-over. The outcome contract names the deflect rate and MTTR targets — both measured and reported weekly.

What you walk away with

Everything ships to your team at exit. No lock-in.

🛠

Triage agents in production

Trained on 12 months of your incident history. Your repo, your control. Extendable by your team without us.

🧪

Resolution database

Real fix steps from real incidents. Indexed, queryable, and extendable — not a static runbook.

🗄

Eval gates as code

Triage accuracy, MTTR, false-positive rate, human-page reduction, incident review completion. Runnable code.

📒

Detection latency board

Mean time to detect tracked weekly against the contract target. Visible to your team and ours.

🎓

On-call handover pack

Roles and protocols documented. Your team extends with new fault patterns without calling us back.

Quality gates

What the eval rig measures.

Every output passes a multi-gate evaluation before it merges or ships. Outputs that fail do not proceed. The eval rig and all gate code are yours at exit.

  • Triage accuracy — correct routing as a percentage of total alerts, threshold agreed in Design
  • Mean time to detect for known fault patterns — target under 60 seconds
  • False-positive rate on AI triage decisions — any drift triggers an eval refresh and a paused-automation period
  • Human-page reduction vs the prior baseline — measured weekly against the contract target
  • Incident review completion rate — AI agent contributes diagnostics on every paged incident

Eval rig · sample run

Triage accuracy — correct routing as a percentagPASS
Mean time to detect for known fault patterns — tPASS
False-positive rate on AI triage decisions — anyPASS
Human-page reduction vs the prior baseline — meaPASS
Incident review completion rate — AI agent contrPASS

Eval rig source code shipped to your repo at exit.

Sample engagement

An ASX-200 retailer ran a peak-trade stack with a small SRE team and 200 pages a week. Effektiv read twelve months of incident logs, pulled the real fix steps from resolution data, and built triage agents from those patterns over six weeks. Human pages dropped 73%. A CDN precursor was caught eight minutes before it would have reached production on Black Friday. Annual run-cost saved: $1.2M.

Read the full case →

Compliance posture

ISO 27001 in progress (Q3 2026) ISO 42001 aligned NIST AI RMF mapped IRAP path Q4 2026 Full governance posture →

Other services

Other ways we work with you.

Common questions

Frequently asked questions.

The retainer goes down as automation goes up

See what your on-call stack looks like with AI in the alert path.

Show us 12 months of incident logs or your current on-call setup. We diagnose which fault patterns are candidates for automation and price the triage rig on outcomes — your page reduction is the benchmark.