Zabbix Project
Zabbix Backup Monitor Project

Project Description
Vital Systems Engineering delivered DB Monitor, a practical backup visibility project that gives teams a clear picture of Oracle backup health across distributed Linux servers. The focus is simple operations, fast detection, and clean handover, without exposing client details or touching databases directly.
Lightweight collectors written in Bash read Oracle backup logs from defined directories and use a central CSV inventory to map each job by container database, pluggable database, schema, and job type. The parser normalizes results to status, start and end time, duration, and last log write, then sends structured metrics to Zabbix for reliable reporting.
A single Zabbix template drives low level discovery so jobs are found automatically on each host. Item prototypes track status, last success, duration, and log freshness. Trigger prototypes evaluate success, error, and not started states with configurable grace windows to avoid noise, and alerts recover automatically after the next successful run.
Operations teams use a live dashboard that shows color coded job states with drilldowns to log excerpts and trend views for duration and success rate. Leadership receives a daily success summary and monthly compliance snapshots so policy breaches are easy to spot. Everything is version controlled and peer reviewed so changes are safe to deploy and easy to roll back.
Security and reliability are built in. Collectors run with least privilege and only read logs. Store and forward behavior prevents missed telemetry during temporary network issues. No direct database connection is required which reduces risk and keeps the footprint small. Handover includes a clear runbook, troubleshooting checklist, and short training so the system is easy to own.
Result The DB Monitor project shows how precise log parsing and disciplined Zabbix modeling can reduce time to detect and resolve backup issues, increase trust in alerts, and make retention compliance visible and actionable for both operators and managers.
Services Provided
- Discovery of existing backup processes and log locations on Linux
- Zabbix template design with discovery, items, triggers, tags, and value maps
- Bash collectors that parse logs and create structured metrics with strict error handling
- Python simulator to create success, error, and not started logs for safe testing
- Alert rules, quiet hours, and escalation flows to email and chat
- Operator dashboards with drilldowns and trend widgets for duration and success rate
- Executive summary views and scheduled daily pass or fail reports
- Administrator guide, operator runbook, and troubleshooting checklist
- Knowledge transfer and training with recordings and slides
- Version control setup with peer review workflow and rollback plan
- Rollout plan for onboarding more databases and hosts