I’m trying to knock up an alert that will fire if an environment doesn't have the right number of agents running, (we run multiple environments from a single logscape server, separating environment using tags).
Unfortunately, I’m struggling to define a search / alert combo that will alert if the number of agents drops! I could search for the log entries used within the ‘Agent Joined Left’ search on the agent audit page. (By the way, it would be good if the series on this page was ‘<hostname>_<left/joined>’ as opposed to ‘Host_<hostname>’). But this would mean the alert would only fire once per lost agent, and what I’d like to do is alert constantly while the agent count is lower than it should be. I’d also like a ‘per-environment’ alert, rather than one alert for all environments – this is preferable as we often have environments going down for maintenance, so a single alert would just be miss-firing so often as to be useless.
I can easily enough set up a search that counts unique agents, (or rather hosts), e.g. ‘ | _host.countUnique(,agentCount) chart(line-zero)’, which can be run per-environment. What I'm struggling with is setting up an alert if this aggregate drops below a certain threshold.