Use Cases
Cron job monitoring: know when your background jobs stop running
Your billing run didn't execute last night. Customers weren't charged. Nobody noticed for three days. The cron job died silently after a server reboot and there was nothing in the logs because the job never started.
April 10, 2026 · 7 min read
The most dangerous failure is the one that never fires an error
Cron jobs are the backbone of every production system. They run billing cycles, generate reports, sync data between services, warm caches, clean up old files, and send scheduled emails. They run in the background, on a schedule, with no human watching.
And when they stop running, nothing happens. That is the problem. A web server crash returns a 500. A failed API call throws an exception. But a cron job that never starts produces no output at all. No error. No stack trace. No log entry. Just silence.
You find out days later when a customer asks why their invoice is missing, when a dashboard shows stale numbers, or when a disk fills up because the cleanup job stopped running last Tuesday.
Six ways cron jobs die without telling you
Server rebooted, crontab didn't survive
A kernel update, a cloud provider maintenance window, an accidental reboot. The cron daemon restarts, but the crontab was in a user session that didn't get re-initialized.
Job hangs and blocks the next run
A database query locks up. The job sits there holding a connection. The next scheduled run sees a lock file and skips. Now nothing runs until someone manually kills the process.
OOM killed by the kernel
The job processes more data than expected. Memory usage spikes. The Linux OOM killer terminates the process. The exit code is 137, but nobody checks cron job exit codes.
Timezone mismatch after migration
You moved from a bare metal server set to America/New_York to a cloud VM defaulting to UTC. The job that should run at 2 AM local time now runs at 2 AM UTC — or during a DST transition, not at all.
Deployment overwrote the cron configuration
A new container image shipped without the cron entries. The deploy succeeded. Health checks passed. But the background jobs vanished because they weren't part of the health check surface area.
Dependency changed or credential expired
An API token expired. A database password rotated. The job fails immediately on startup, writes an error to a log file nobody reads, and exits. Every run. For weeks.
The jobs you cannot afford to lose
Every team has a handful of cron jobs that are load-bearing. If they stop, something visibly breaks — but only after the damage is done.
- - Billing and invoicing runs. Charges don't go out. Revenue stalls. Customers complain about missing receipts.
- - Report generation. The CEO's daily dashboard shows yesterday's numbers. Or last week's. Nobody trusts the data anymore.
- - Data syncs between services. Your CRM is out of sync with your billing system. Sales sees stale customer records.
- - Cache warming. The first user of the day hits a cold cache. Page load times spike from 200ms to 8 seconds.
- - Cleanup and retention tasks. Temp files pile up. Log directories fill the disk. The server goes down at 3 AM on a Saturday.
- - Scheduled email sends. Onboarding drip sequences, weekly digests, renewal reminders — they just stop. Users churn silently.
The heartbeat pattern: monitoring by absence
Traditional monitoring watches for errors. Heartbeat monitoring watches for silence. The idea is simple:
1. Your job pings a unique URL when it completes
A single HTTP GET at the end of the script. Takes one line of code.
2. The monitoring service expects a ping within a window
If your job runs every hour, the service expects a ping every hour, plus a configurable grace period for jobs that run a few minutes late.
3. No ping within the window triggers an alert
Slack, Discord, email, webhook — you get notified that the job missed its expected check-in. Not when it fails. When it doesn't run at all.
This catches every failure mode listed above. Server reboot? No ping. Job hangs? No completion ping. OOM killed? No ping. Timezone mismatch? Ping arrives at the wrong time, grace period expires, alert fires. It does not matter why the job didn't run. It matters that it didn't.
Add heartbeat monitoring in one line
Every Uptrack heartbeat monitor gives you a unique URL. Add a ping at the end of your job script — only runs if the job completes successfully.
Bash — append to any cron script
#!/bin/bash
# /usr/local/bin/nightly-billing.sh
set -e # Exit on any error
python3 /opt/app/billing/run_invoices.py
python3 /opt/app/billing/send_receipts.py
# Ping Uptrack only if both steps succeeded
curl -fsS -m 10 -o /dev/null \
https://uptrack.app/api/heartbeat/hb_billing_nightlyPython — after a data sync job
import requests
from myapp.sync import run_full_sync
def main():
run_full_sync(source="crm", target="warehouse")
# Signal successful completion to Uptrack
requests.get(
"https://uptrack.app/api/heartbeat/hb_crm_sync",
timeout=10,
)
if __name__ == "__main__":
main()Node.js — after a report generation task
import { generateDailyReport } from "./reports.js";
async function main() {
await generateDailyReport();
// Ping Uptrack heartbeat on success
await fetch(
"https://uptrack.app/api/heartbeat/hb_daily_report",
{ signal: AbortSignal.timeout(10000) }
);
}
main().catch((err) => {
console.error("Report generation failed:", err);
process.exit(1); // No ping sent — Uptrack will alert
});The key insight: the ping is the last line. If the job crashes, hangs, or exits early, the ping never fires. The heartbeat monitor's timer expires and the alert goes out.
How Uptrack compares to other cron monitoring tools
Several services offer heartbeat-style cron job monitoring. Here is how they stack up:
Healthchecks.io is open-source and solid. The free tier gives you 20 monitors with a 20-second minimum period. Self-hosting is an option but requires maintaining a Django app, a PostgreSQL database, and background workers. Good if you want full control.
Cronitor is developer-friendly with a polished UI and good integrations. The free tier is limited to 5 monitors. Paid plans start at $14/month for 20 monitors. Pricing scales up quickly for teams with many jobs.
Dead Man's Snitch (now part of PagerDuty) focuses purely on cron monitoring. No free tier — plans start at $5/month for 1 snitch. Good if you already use PagerDuty for incident management.
Uptrack includes 50 heartbeat monitors on the free tier — 10 at 30-second check intervals, 40 at 1-minute. Alerts go to Slack, Discord, and email out of the box. No credit card required. If you also need uptime monitoring for your web services, it is the same dashboard.
How Uptrack heartbeat monitoring works
When you create a heartbeat monitor in Uptrack, you configure two values:
Expected interval
How often your job runs. Every minute, every hour, every day — matches your cron schedule.
Grace period
Extra time to allow for jobs that run slightly long. A 5-minute grace period on an hourly job means Uptrack won't alert until 65 minutes have passed since the last ping.
Each heartbeat monitor runs as a lightweight process on Uptrack's servers. When a ping arrives, the timer resets. When the timer expires, the alert fires. When the next ping arrives after a missed window, a recovery notification goes out.
Job completes → pings URL → timer resets → waiting for next ping
│
├── Ping arrives in time ──→ timer resets ──→ ✅ healthy
│
└── No ping within window ──→ status: DOWN
│
└── Alert: Slack / Discord
/ email / webhook
Job resumes → pings URL → status: UP → recovery alert sentNo polling. No agents to install on your servers. One outbound HTTP request from your job is the entire integration.
Best practices for cron job heartbeat monitoring
- - Ping on completion, not on start. A job that starts but hangs will still send a start ping. You want confirmation that the job finished.
- - Use
set -ein bash scripts. If any command fails, the script exits before reaching the ping. The absence of the ping is your alert. - - Set a generous grace period at first. If your hourly job usually takes 3 minutes but sometimes takes 15, set a 20-minute grace period. Tighten it later once you understand the variance.
- - One monitor per job, not per server. If the same job runs on three servers, each needs its own heartbeat URL. Otherwise a ping from server A masks a failure on server B.
- - Name monitors after what they do. "nightly-billing", "crm-sync-hourly", "weekly-report" — not "cron1", "cron2", "cron3". When the alert fires at 3 AM, the name should tell you what broke.
Stop finding out about dead cron jobs from your customers
50 free heartbeat monitors — 10 at 30-second checks, 40 at 1-minute. Slack, Discord, and email alerts included. No credit card required.
Start Monitoring Free