We choose to automate the most. The choice of tools changes during years, as we trying to use the most modern tools which make it easier to monitor, detect problems, and auto-fix them (or to notify appropriate people in appropriate time).
The turnaround to Icinga happen in late of 2014 - when we found that Zabbix via zabconf is not so easy to maintain over the automatically deployed clusters. Our choice if Icinga2 has proven to be good.
The great part about Icinga2 is that it stores the most of configuration as plain text files, and what makes it confortable to configure per-service/host in deployment scripts.
We monitor our web sites and services using Zabbix software. Previously we've used Nagios and Cacti together, Nagios was used to check the service state, Cacti - to draw graphs. But Zabbix does all of that and more. The main useful difference is that it allows us to specify 'thresholds' on graphs where on average rate of some measurement we can get alerts. This way we obtain all Nagios+Cacti does but in single software. All our customers get their Zabbix account.
Login to your zabbix account here: http://zabbix.ua2web.com
Monitoring internal server errors errors is a very important task. We should see first signs of any problem before actually customer reports it to us, or worse when end users report it.
Sentry is real time software health monitor. It allows to monitor and count all the errors or problems in the software. It also solves the problem of 'mail bomb' in Django when the single error repeated number of times made a DOS attach on the mail server. Every type of error saved just once, and mail notification is sent once per error type.
Every project has its own set of permissions. Error reports are sent via TCP or UDP and they can be relayed to a external Sentry instances.
You can contact our sysadmins team on this address: firstname.lastname@example.org (note: as anti-spam measure we will randomly change this alias as soon as it becames spammed).