domingo, 3 de agosto de 2014

Yet Another Zabbix review



This will be yet another zabbix review. But I feel like it is worth to document and show some very nice things that I managed to do in my environment in very little time using this tool. Not that can't be done with similar tools, but Zabbix seems to have the advantages of many concentrated in one, and everything you want to do is pretty natural and most of times won't need documentation support to get done. You just figure out as you need features.

However I'll will try to be complementary to other ones I have read and focus on the features I used it and how interesting they are. The overview is that it does availability monitoring as good as Nagios would and it helps you to not need Cacti as much because nearly anything that it monitors can be plotted. It is not as precise as Cacti though.

I will start mentioning one of the most interesting features that I have not seen in other software -- it discovers your Network. This will be useful if you don't want to register every single node by hand or if you want to detect intruders. Worth noting that although I never heard of it in Nagios it is a very common feature on proprietary tools, I heard from colleagues working in places that are willing to afford those.

One can also control how often the discovery runs so your hosts don't become overloaded. This is specially necessary on big enough networks. Mine with 300+ hosts needed some tuning. In addition to that there is an alternative setup that whenever zabbix agents start in hosts you want to monitor they get automatically added interest groups X, Y or Z depending on some rules you define on the given registering action.

This is also possible with normal discovery but the set of parameters you can play with is reduced, as discovery is based on ICMP or basic services open ports.

The best setup I found here was to discover hosts on the public network and to let the hosts in the private network to add themselves as zabbix agents start, right after Puppet configures them automatically. They are also added to a special host group that already gets the monitoring templates I want automatically.

So this should tell you how easy will be to set up all hosts you want to monitor at first. Now the interesting features. Zabbix agent monitors many aspects of your system automatically already. It also allows you to request for additional parameters to be monitored for example if a process with name like X is running or if the port Y is open and so on. No SSH checks needed for that :-)

As well as Nagios it supports SSH checks. It has support to SSH keys with passphrases which will increase slightly the security of your systems.

There is a fundamental difference, while nagios expects an exit code from the script as the check status, Zabbix expects a value which will be an integer, a string, a Boolean and then make a decision later about this value, for example if value is bigger than 10, trigger an alarm.

I should mention that every single metric it collects from a host can become a historical plot. Also correlating these plots is very easy. For example, if I want to look at how to values correlate I can plot them together and see how they evolve. I could also the two values separately or add them up, even more complex functions are possible to implement for the end-result. You can also create aggregated plots for an entire host group.

A real example is that I have a pool of transfer servers and I want to know in total what are the transfer rates. For that all I do is to configure an aggregated plot for the entire host group for the transfer rate of the public network interfaces.

It also has this very straightforward architecture of proxies if you want to monitor remote networks that you don't have direct access to. For basic setups nearly the default configuration will be enough.

Then the proxy will be able to though SSH checks on the remote network for you, or receive metrics from all the hosts on that private network and forward them to the Zabbix server.

Another aspect I like is that the Template system is really good. You can assign monitoring templates to given host groups, and have changes applied on all hosts. You can very quickly create your custom ones containing :


  • Metrics to monitor (SSH checks included)
  • Triggers (alarm) for these metrics
  • Plots for these metrics
  • Probably other things I'm forgetting now


That's it for now, but I will update the post (or create a new one) with some screenshots to illustrate all I'm saying here.

Nenhum comentário:

Postar um comentário