Here you will find more detailed information on how monitors can bring you the most relevant notifications, and how they update the data on your dashboard and topology. (For a first look overview of monitor and health, see this topic.)
Agent metrics and availability checks & AWS metrics are fed into our monitors. Many monitors are created automatically upon the detection of certain services. The particular monitor configuration settings will vary depending on which service has been auto-detected. You can edit these default monitors or create your own from scratch. Whenever a monitor's criteria are met, a notification is triggered and the displayed topological health of that service will be affected.
There are five monitor types:
- Availability (Port)
- Availability (Http)
- Violation (Threshold)
- Violation (Moving Average)
- Statistical Anomaly
The fastest way to access the monitor customization menus is to click on a service in the topological view  then click on "configure" in the details panel :
The rightmost "Configure Monitors" tab will be selected on the popup that appears. You can select a monitor type to add by clicking on the "Choose Type" box:
Port monitors simply reach out to a specified port with a ping request X number of times in a 30 second window. If the ping response failure ratio reaches the designated threshold the monitor will send a notification via the chosen method. You can add any number of port monitors for your custom service configurations. Keep in mind, that port monitors will be applied at every host level on the service:
HTTP Monitors are a second way to observe the health of a service by checking constantly against a certain URL for responses within a 30 second time frame. Again, set the thresholds and notifications as needed:
The metric monitor is a logically qualified observer for your service that will generate health notifications and events based on almost any raw data metric that exists in the system. While many metric monitors are created and configured automatically upon service detection, you also have the option to create your own using this interface:
The first field can have the values: MEAN, MIN, MAX, SUM. The next field contains an exhaustive list of metrics for the particular service and for the system in general. You can then specify how many minutes constitutes a window, select >, >=, <=, or <, and enter a number against which to compare the metrics. Finally, you have the option to classify the triggered event as critical or merely as a warning:
The statistical metric monitor lets you detect a variation in any metric without setting any static thresholds. All you have to do is pick one of the following anomaly detection models that OpsClarity provides.
- General - Latency, RequestCount, QueueSize, ErrorCount, ConnectionCount
- System - DishFree
- Kafka - MaxLag, MinFetch
- JAVA - JavaMemory, JavaCollectionCount, JavaCollectionTime
- RabbitMQ - MessagesDelivered, MessagesInQueue, MessagesPublished
- Twemproxy - ServerRequests, ServerResponses, InQueue, OutQueue
- AWSSQS - MessagesReceived, QueueSize, MessagesSent
OpsClarity Statistical Metric Monitors are a special case of Statistical Metric Monitors that OpsClarity ships with out-of-the-box for a given integration.
Once triggered, the model needs to be in a healthy state for at least one hour, before it is cleared. This will prevent fluttering of the model:
How Monitors Affect Health
All the metric data from our agents and AWS integrations are fed into the monitors. The ongoing monitor results are then synthesized together to ultimately create an overall health satus for the service and also an overall health status for the underlying host(s).
In depth configuration instructions for newly added monitors can be found here.
If you have any questions or comments about this article, feel free to contact us at firstname.lastname@example.org.