Use Cases

Cron job monitoring: know when your background jobs stop running

Your billing run didn't execute last night. Customers weren't charged. Nobody noticed for three days. The cron job died silently after a server reboot and there was nothing in the logs because the job never started.

April 10, 2026 · 7 min read

The most dangerous failure is the one that never fires an error

Cron jobs are the backbone of every production system. They run billing cycles, generate reports, sync data between services, warm caches, clean up old files, and send scheduled emails. They run in the background, on a schedule, with no human watching.

And when they stop running, nothing happens. That is the problem. A web server crash returns a 500. A failed API call throws an exception. But a cron job that never starts produces no output at all. No error. No stack trace. No log entry. Just silence.

You find out days later when a customer asks why their invoice is missing, when a dashboard shows stale numbers, or when a disk fills up because the cleanup job stopped running last Tuesday.

Six ways cron jobs die without telling you

Server rebooted, crontab didn't survive

A kernel update, a cloud provider maintenance window, an accidental reboot. The cron daemon restarts, but the crontab was in a user session that didn't get re-initialized.

Job hangs and blocks the next run

A database query locks up. The job sits there holding a connection. The next scheduled run sees a lock file and skips. Now nothing runs until someone manually kills the process.

OOM killed by the kernel

The job processes more data than expected. Memory usage spikes. The Linux OOM killer terminates the process. The exit code is 137, but nobody checks cron job exit codes.

Timezone mismatch after migration

You moved from a bare metal server set to America/New_York to a cloud VM defaulting to UTC. The job that should run at 2 AM local time now runs at 2 AM UTC — or during a DST transition, not at all.

Deployment overwrote the cron configuration

A new container image shipped without the cron entries. The deploy succeeded. Health checks passed. But the background jobs vanished because they weren't part of the health check surface area.

Dependency changed or credential expired

An API token expired. A database password rotated. The job fails immediately on startup, writes an error to a log file nobody reads, and exits. Every run. For weeks.

The jobs you cannot afford to lose

Every team has a handful of cron jobs that are load-bearing. If they stop, something visibly breaks — but only after the damage is done.

- Billing and invoicing runs. Charges don't go out. Revenue stalls. Customers complain about missing receipts.
- Report generation. The CEO's daily dashboard shows yesterday's numbers. Or last week's. Nobody trusts the data anymore.
- Data syncs between services. Your CRM is out of sync with your billing system. Sales sees stale customer records.
- Cache warming. The first user of the day hits a cold cache. Page load times spike from 200ms to 8 seconds.
- Cleanup and retention tasks. Temp files pile up. Log directories fill the disk. The server goes down at 3 AM on a Saturday.
- Scheduled email sends. Onboarding drip sequences, weekly digests, renewal reminders — they just stop. Users churn silently.

The heartbeat pattern: monitoring by absence

Traditional monitoring watches for errors. Heartbeat monitoring watches for silence. The idea is simple:

1. Your job pings a unique URL when it completes

A single HTTP GET at the end of the script. Takes one line of code.

2. The monitoring service expects a ping within a window

If your job runs every hour, the service expects a ping every hour, plus a configurable grace period for jobs that run a few minutes late.

3. No ping within the window triggers an alert

Slack, Discord, email, webhook — you get notified that the job missed its expected check-in. Not when it fails. When it doesn't run at all.

This catches every failure mode listed above. Server reboot? No ping. Job hangs? No completion ping. OOM killed? No ping. Timezone mismatch? Ping arrives at the wrong time, grace period expires, alert fires. It does not matter why the job didn't run. It matters that it didn't.

Add heartbeat monitoring in one line

Every Uptrack heartbeat monitor gives you a unique URL. Add a ping at the end of your job script — only runs if the job completes successfully.

Bash — append to any cron script

#!/bin/bash
# /usr/local/bin/nightly-billing.sh

set -e  # Exit on any error

python3 /opt/app/billing/run_invoices.py
python3 /opt/app/billing/send_receipts.py

# Ping Uptrack only if both steps succeeded
curl -fsS -m 10 -o /dev/null \
  https://uptrack.app/api/heartbeat/hb_billing_nightly

Python — after a data sync job

import requests
from myapp.sync import run_full_sync

def main():
    run_full_sync(source="crm", target="warehouse")

    # Signal successful completion to Uptrack
    requests.get(
        "https://uptrack.app/api/heartbeat/hb_crm_sync",
        timeout=10,
    )

if __name__ == "__main__":
    main()

Node.js — after a report generation task

import { generateDailyReport } from "./reports.js";

async function main() {
  await generateDailyReport();

  // Ping Uptrack heartbeat on success
  await fetch(
    "https://uptrack.app/api/heartbeat/hb_daily_report",
    { signal: AbortSignal.timeout(10000) }
  );
}

main().catch((err) => {
  console.error("Report generation failed:", err);
  process.exit(1);  // No ping sent — Uptrack will alert
});

The key insight: the ping is the last line. If the job crashes, hangs, or exits early, the ping never fires. The heartbeat monitor's timer expires and the alert goes out.

How Uptrack compares to other cron monitoring tools

Several services offer heartbeat-style cron job monitoring. Here is how they stack up:

Healthchecks.io is open-source and solid. The free tier gives you 20 monitors with a 20-second minimum period. Self-hosting is an option but requires maintaining a Django app, a PostgreSQL database, and background workers. Good if you want full control.

Cronitor is developer-friendly with a polished UI and good integrations. The free tier is limited to 5 monitors. Paid plans start at $14/month for 20 monitors. Pricing scales up quickly for teams with many jobs.

Dead Man's Snitch (now part of PagerDuty) focuses purely on cron monitoring. No free tier — plans start at $5/month for 1 snitch. Good if you already use PagerDuty for incident management.

Uptrack includes 50 heartbeat monitors on the free tier — 10 at 30-second check intervals, 40 at 1-minute. Alerts go to Slack, Discord, and email out of the box. No credit card required. If you also need uptime monitoring for your web services, it is the same dashboard.

How Uptrack heartbeat monitoring works

When you create a heartbeat monitor in Uptrack, you configure two values:

Expected interval

How often your job runs. Every minute, every hour, every day — matches your cron schedule.

Grace period

Extra time to allow for jobs that run slightly long. A 5-minute grace period on an hourly job means Uptrack won't alert until 65 minutes have passed since the last ping.

Each heartbeat monitor runs as a lightweight process on Uptrack's servers. When a ping arrives, the timer resets. When the timer expires, the alert fires. When the next ping arrives after a missed window, a recovery notification goes out.

Job completes → pings URL → timer resets → waiting for next ping
                                                      │
                          ├── Ping arrives in time ──→ timer resets ──→ ✅ healthy
                          │
                          └── No ping within window ──→ status: DOWN
                                                        │
                                                        └── Alert: Slack / Discord
                                                            / email / webhook

Job resumes → pings URL → status: UP → recovery alert sent

No polling. No agents to install on your servers. One outbound HTTP request from your job is the entire integration.

Best practices for cron job heartbeat monitoring

- Ping on completion, not on start. A job that starts but hangs will still send a start ping. You want confirmation that the job finished.
- Use set -e in bash scripts. If any command fails, the script exits before reaching the ping. The absence of the ping is your alert.
- Set a generous grace period at first. If your hourly job usually takes 3 minutes but sometimes takes 15, set a 20-minute grace period. Tighten it later once you understand the variance.
- One monitor per job, not per server. If the same job runs on three servers, each needs its own heartbeat URL. Otherwise a ping from server A masks a failure on server B.
- Name monitors after what they do. "nightly-billing", "crm-sync-hourly", "weekly-report" — not "cron1", "cron2", "cron3". When the alert fires at 3 AM, the name should tell you what broke.

Stop finding out about dead cron jobs from your customers

50 free heartbeat monitors — 10 at 30-second checks, 40 at 1-minute. Slack, Discord, and email alerts included. No credit card required.

Start Monitoring Free