Monitoring Glossary
Plain-English definitions of uptime monitoring, incident management, and site reliability concepts. Built for developers and ops teams.
Alert Fatigue
When too many false alerts cause teams to ignore real incidents.
DNS Propagation
How DNS record changes spread across the global network of servers.
Downtime
Any period when a service is unavailable to its users.
Error Budget
The allowed amount of downtime before an SLA is violated.
Escalation Policy
Rules for escalating unacknowledged incidents to additional responders.
Five Nines
99.999% uptime — just 5.26 minutes of downtime per year.
Heartbeat Monitoring
Passive monitoring where the service pings the monitor on a schedule.
Incident Management
The process of identifying, analyzing, and resolving service disruptions.
Latency
The time delay between sending a request and receiving a response.
MTBF
Mean Time Between Failures — average time from one failure to the next.
MTTD
Mean Time To Detect — how long before a failure is noticed.
MTTF
Mean Time To Failure — average operating time before a failure occurs.
MTTR
Mean Time To Repair — average time to restore service after a failure.
On-Call
A rotation system for who responds to incidents outside working hours.
Real User Monitoring
Collecting performance data from actual user sessions.
SLA
Service Level Agreement defining expected availability and consequences for breaches.
SSL Certificate
Digital certificate enabling encrypted HTTPS communication.
Status Page
A public page showing the current health and history of your services.
Synthetic Monitoring
Simulating user requests to proactively test service availability.
Uptime
The percentage of time a service is operational and accessible.
Start monitoring your sites now
20 monitors free — 10 at 30s, 10 at 1min. No credit card required.
Start Monitoring Free