Instrument once.
We handle the rest.

Wrap your workflows with the SDK. Upkeel's platform detects every silent failure, fires the alerts, escalates to the right people, and shows you exactly what happened — without touching your application code again.

Start free trial
SDK — Instrumentation

Seconds to install. Nothing else to configure.

The SDK is pure telemetry — it records what happened and when. That's the entire job. No alerting logic, no retry handling, no dashboards to wire up in your application. Install it once and forget it. Everything else happens on our side.

Zero production dependencies — can't crash your application
Fully async, non-blocking — under 1ms overhead on your hot path
Works in Node.js, edge runtimes, and browser environments
Local dev mode: structured console output, no network calls
Test mode: virtual clock, fully synchronous, no external I/O
terminal
# One command. That's the entire setup.
npm install @upkeel/sdk
payment-flow.ts
import { upkeel } from '@upkeel/sdk'
const keel = upkeel.init(process.env.UPKEEL_API_KEY)
// Your existing checkout code — unchanged
await validateCart(order)
await stripe.charges.create({ amount: order.total })
// One line: Stripe webhook must arrive within 30s
keel.expect('payment.succeeded', {
within: '30s',
meta: { orderId: order.id }
})
// In your webhook handler, confirm it
app.post('/webhooks/stripe', (req) => {
keel.fulfill('payment.succeeded', {
id: req.body.payment_intent
})
})
Detection Engine

We watch your expectations so you don't have to.

Every expectation you register runs through our server-side detection pipeline. We check for missing events on multiple schedules — down to the second — using heuristic rules, statistical baselines, and AI-powered pattern analysis. All of it happens on our infrastructure, not yours.

Configurable detection windows from 5 seconds to 30 days
Four check tiers: seconds / minutes / hours / days
Statistical baseline detection catches gradual degradation early
AI layer identifies novel failure patterns automatically
Crash-safe and durable — Postgres-backed, survives any restart
Idempotent event delivery — no duplicate alerts from SDK retries
upkeel.dev / flows
Flows tracked
14
Checks / day
48k
Missed (30d)
7
Flow health — last 30 days
payment-processing
99.2%
email-confirmation
94.1%
order-fulfillment
99.8%
research-agent
97.4%
kyc-verification
100%
Dashboard preview
Alerting & Escalation

We send the alerts. You write zero notification code.

When Upkeel detects a missing event or a degrading flow, we handle the entire notification chain — initial alert, escalation, incident ticket. No webhooks to configure in your application, no Slack bots to build, no on-call routing logic to maintain. Configure your channels once in the dashboard and you're done.

Email alerts with full flow context, timeline, and affected run IDs
Slack messages to any channel with formatted diagnostic detail
PagerDuty pages with severity routing and automatic escalation
Jira and Linear tickets created automatically on detection
Multi-channel routing — critical failures page, degraded flows email
Alert deduplication — one notification per incident, not one per check
#alerts-payments just now
🚨 Missing event: payment.succeeded
Expected within 30s window for flow payment-processing (run abc-4829). Time since last event: 4m 12s. 3 customers potentially affected.
Critical · payment-processing
PagerDuty · payments-oncall just now
TRIGGERED: Payment pipeline critical
payment.succeeded missing for 3 consecutive runs. Auto-escalated to P1. Acknowledge in PagerDuty to stop escalation.
P1 · Auto-escalated
Jira · OPS-1847 created just now
[Critical] payment.succeeded missing — payment-processing
Auto-created with full timeline, 3 affected run IDs, and link to Upkeel incident view. Assigned to payments-team.
Dashboard & Insights

Uptime charts, flow health, and AI summaries — out of the box.

Your Upkeel dashboard is your integration control center. Uptime percentages, event volumes, expectation fulfillment rates, step timing trends — all visible immediately with no setup. The AI insight layer surfaces what matters in plain language, so you don't have to dig through data to know what's happening.

Per-flow uptime charts going back to your full retention window
Step timing histograms — catch performance regressions before they break
Per-customer health views for multi-tenant SaaS products
AI-generated weekly summaries with plain-language insights
Full incident timelines: what happened, what triggered it, when resolved
Exportable compliance reports for enterprise SLA evidence
upkeel.dev / insights / payment-processing
payment-processing · last 14 days
Apr 20 Apr 26 May 2 ⚠ May 8
✦ AI insight — this week
One 18-minute incident on May 2nd from a Stripe webhook delivery delay — automatically resolved. All other days clean. P95 fulfillment latency improved 12% vs. last week.
payment.succeeded missed · run abc-4829 May 2, 14:22
Slack PD Jira
Resolved automatically · 18m incident May 2, 14:40
Dashboard preview
Optional — In-App Reactions

For teams who want their app to self-heal.

Most teams are well served by Upkeel's alerting alone. But if you also want your running application to react to integration health — disable checkout, switch email providers, pause a queue — the SDK's optional status polling makes that clean and straightforward. Zero required complexity for teams who don't need it.

SDK polls our status API on a plan-defined interval (60s → 1s)
No inbound webhooks, no open ports — works behind any proxy or firewall
on() handlers fire on status change: healthy, degraded, or down
onTransition() for fine-grained specific state changes
optional-reactions.ts
// Optional — react to integration health in your UI
// Skip this and Upkeel still alerts your team automatically
keel.on('payment.succeeded', 'failing', () => {
checkout.disable()
banner.show('Payments temporarily unavailable')
})
keel.on('payment.succeeded', 'recovered', () => {
checkout.enable()
banner.hide()
})
// Switch to fallback provider when emails start failing
keel.on('email.delivered', 'failing', () => {
emailProvider.setFallback('postmark')
})
Testing

Test that your flows do what you think they do.

@upkeel/testing brings Upkeel's entire detection model into your test suite. Assert on expectations, simulate known failure scenarios, and advance virtual time in a single line — no real waiting, no network calls, no flakiness.

Jest and Vitest compatible out of the box
Virtual clock — advance 24 hours in microseconds
Fluent assertion API: .wasRegistered().withTimeoutOf().andWasFulfilled()
Pre-built scenarios via @upkeel/scenarios (Stripe, SendGrid, OpenAI)
Snapshot testing catches unintended flow shape changes in CI
payment.test.ts
import { createTestKit } from '@upkeel/testing'
it('detects a missing Stripe webhook', async () => {
const kit = createTestKit()
// Run your real checkout code
await checkout({ amount: 9900 })
// The expectation was registered — now simulate time passing
// with no webhook arriving
await kit.advanceTime('31s')
// Upkeel should have caught it
kit.expectation('payment.succeeded')
.wasNotFulfilled()
.andTriggeredAlert()
})

Frequently asked questions

Datadog monitors infrastructure — request rates, latencies, error counts. Upkeel monitors outcomes. It answers "did the payment webhook actually arrive?" and "did the fulfillment workflow complete every step?" Infrastructure monitoring can't answer those questions because the API already returned 200 OK. They're complementary tools, not competing ones.
No — that's the whole point. Instrument your flows with the SDK, configure your alert channels once in the Upkeel dashboard, and we handle everything from there. No webhooks to build in your application, no notification logic, no on-call routing to maintain. We send the Slack messages, emails, PagerDuty pages, and tickets.
No meaningful latency. The SDK's hot path is fully asynchronous and non-blocking. Telemetry is queued in-process and flushed to our API without interrupting your application. Measured overhead is under 1ms. The SDK fails silently if it can't reach our API — it never crashes your application.
Retention is configurable within your plan's bounds: Starter 14 days, Base 90 days, Pro 1 year, Enterprise negotiated up to 7 years for audit logs. A dry-run preview shows exactly what would be deleted before any policy change takes effect. The audit log is immutable and can never be shortened ahead of schedule.
Three packages are MIT licensed: @upkeel/sdk, @upkeel/testing, and @upkeel/scenarios. The backend detection engine, alerting pipeline, and dashboard are proprietary. The SDK being open source means you can audit exactly what runs in your production application before shipping it.

Ready to see what you've been missing?

Join the waitlist — SDK access rolling out soon.