High availability (HA) has been supported in Zabbix since version 6.0, but I had never actually tested it myself.
According to most online information, Zabbix HA only provides redundancy for the Zabbix server process itself. It does not include database redundancy or VIP-based failover.
In the past, I built a Zabbix cluster using Pacemaker and DRBD back in the Zabbix 2.0 era. This time, I decided to test how the built-in HA feature really works in practice.
Many articles explain how to enable Zabbix HA. However, very few explain what actually happens internally when a failover occurs.
In this article, I focus on the actual behavior observed during testing, rather than configuration steps.
Although Zabbix HA has been available since version 6.0, many explanations still implicitly assume traditional HA concepts such as VIPs or heartbeat communication between nodes.
By isolating the web frontend and closely observing failover behavior, this article clarifies what Zabbix HA actually does — and what it does not do.
Test Environment
OS
Rocky Linux 9
Zabbix version
7.4
Number of nodes
2 (EC2 Instances)
Web server deployment
Separated from Zabbix servers (EC2 Instance/Apache)
Database
Amazon Aurora and Amazon RDS (MySQL Community Edition)
Initial Assumptions
Before testing Zabbix HA, I assumed that the Zabbix Web frontend communicates directly with the active Zabbix server. However, after testing, I found that this assumption was incorrect.
Why I Separated the Web Server
In most Zabbix deployments, the web frontend is installed on the same host as the Zabbix server. Because of this, the actual behavior of the HA feature is not always easy to observe.
To better understand how Zabbix HA really works, I decided to separate the web server from the Zabbix servers and test the behavior in this setup.
What I Observed
Instead, I observed the following:
The HA feature provides redundancy only for Zabbix server processes. As a result, if web server redundancy is not required, a single web server is sufficient.
The Zabbix Web frontend accesses the database directly to retrieve data.
The Zabbix Web frontend dynamically follows the active Zabbix server when displaying server status.
The following diagram summarizes the actual communication flow observed during testing.
Figure 1: Zabbix HA architecture observed during testing. No VIP or heartbeat is used; coordination is done entirely via the database.
Failover Behavior During Testing
The address of the active node is displayed in System Information on the dashboard. In the above example, the address of the ZABBIX-1 node is displayed as the Zabbix server is running.
You can check the cluster status from Dashboard > Reports > System Information.
Check the zabbix-server process
[rocky@ZABBIX-1 ~]$ sudo systemctl status zabbix-server
You can also check the cluster status with the “zabbix_server -R ha_status” command.
I confirmed that data could be continuously get from the monitored hosts before and after the zabbix-server failover.
“Zabbix server is running”
At first, I configured one web server and one ZABBIX server without HA. In this case, I had to set the address and port of the ZABBIX server in /etc/zabbix/web/zabbix.conf.php
ZBX_SERVER = ‘172.31.32.199’;
ZBX_SERVER_PORT = ‘10051’;
If the address is not set, an error will occur when connecting to the zabbix-server on localhost.
Zabbix HA determines node availability solely by monitoring updates to the ha_node.lastaccess field in the database. If the active node fails to update its timestamp within the configured failover_delay, a standby node promotes itself to active. No direct node-to-node communication is involved.
status of the node. 0 – standby; 1 – stopped manually; 2 – unavailable; 3 – active.
Zabbix HA does not rely on direct node-to-node communication. Instead, each node independently evaluates HA state by reading the ha_node table in the database.
Zabbix HA avoids split brain by using the database as the single coordination point. Only the node that successfully updates its HA status in the database can become active. Because all failover decisions and state transitions are serialized through database transactions, simultaneous active nodes cannot occur.
zabbix-agent
/etc/zabbix/zabbix_agentd.conf
Server=172.31.32.199,172.31.35.136
ServerActive=172.31.32.199,172.31.35.136
Set both node addresses in conf.
Conclusion
Zabbix HA does not rely on VIPs or direct node-to-node communication. Instead, availability is coordinated through the database, while the web frontend dynamically follows the active server.
Understanding this behavior makes it clear that Zabbix HA is designed to protect server processes — not to provide full-stack redundancy. This distinction is essential when designing real-world Zabbix architectures.