ServicePilot SaaS Documentation

Alerting

ServicePilot can alert users as soon as some event of importance occurs. It can also generate alerts proactively if a trend is likely to pass a threshold in the future. Alerts might also be held back if some event is expected to clear itself without requiring intervention.

By default, ServicePilot will present all data via its web interface but no alerts will be generated. To add alerts, new Alert Policies need to be configured. Note that alert policies are all independent of one another. Care is required when creating new alerts in order to avoid generating overlapping alerts that might notify users of the same issue multiple times.

To add alert policies, see the Policies documentation.

Each alert has three components:

  • A Condition defines what will trigger the alert.
  • A Delay indicates if the alert should be held back for a time or a number of similar events.
  • An Action to take when the alert conditions have been met and any delay has been handled.

Alert Condition

For an alert to trigger, certain conditions must be met. These conditions are associated with events that ServicePilot detects.

Condition Type Event
Resources A change in status of a resource during a defined time period.
Objects A change in status of an object during a defined time period. The objects triggering the alert can be filtered by name, class, view and if the alarms have been acknowledged for them.
Views A change in status of a view during a defined time period. The views triggering the alert can be filtered by name, class and if the alarms have been acknowledged for them.
Indicators A change in status of an individual indicator during a defined time period. The indicators triggering the alert can be filtered by name, object name, object class, view and if their object's alarms have been acknowledged.
SNMP Trap An SNMP Trap or Notification has been received by ServicePilot during a defined time period. Traps can be categorized using SNMP Trap categorization rules before being filtered here by rule name, rule category, rule message, rule severity, enterprise OID, generic and specific type, sender IP address and agent IP address.
Syslog A syslog message has been received by ServicePilot during a defined time period. Syslogs are filtered here by source IP address, severity, facility, host, description, tag, PID, message ID and data.

Note: Operators may mark resources, views and objects alert statuses as being acknowledged. Acknoledged elements can then be included or excluded from Alert conditions and the status view.

Alert Delay

Although all conditions of an alert might be met, the alert action will not be taken until the delay type has been considered.

Delay Type Use
No Delay The action will be taken as soon as the conditions are met.
Action and ignore Condition for x Minutes The action will be taken as soon as the conditions are met. However, the alert will then not trigger again if it occurs within the Duration specified. This is useful for conditions that are likely to occur repeatedly in bursts but when only one alert is needed.
Action after x Minutes if Condition still true The action will be delayed by the Duration specified. Only if the conditions are still true after this delay will the action take place. This is useful for conditions that are expected to occur and recover by themselves. Only if the problem persists will the action be triggered.
Action after x Condition Hits during y Minutes The action will only be triggered if it occurs a Number of times within the Duration specified. This is useful for things like bad password attempts received by syslog that would indicate a security breach attempt.

Alert Action

A number of different actions may be taken.

Condition Type Event
Email Send an email.
Webhook Send a web GET or POST request.
UDP Send a UDP packet. If the UDP packet is formatted correctly and sent to the correct port, this might be defined as a syslog message.
Trap Send an SNMP Trap.

Alert variables

When an alert is triggered, information is stored that can then be used in the alert action. An email subject might therefore contain the object name that triggered the alert or a UDP syslog message might include the time at which the event occurred.

Some variables are common to all alert conditions while other variables differ depending on the action conditions used. If you need the value of an indicator above a threshold, then this will only be available for indicator condition alerts.

Common variables

Common information is collected for all alerts.

Variable Content
{DATE} Alert date based on the ServicePilot server's local French time
{TIME} Alert time based on the ServicePilot server's local French time
{DATEUTC} Alert date in UTC
{TIMEUTC} Alert Time in UTC
{BASEURL} Base URL of ServicePilot
{LOCALIP} IP address of ServicePilot
{LOCALWEBPORT} Web port of ServicePilot

Variables based on condition

These variables are only available depending on the Alert policy condition.

Condition Variable Content
Resource, View, Object, Indicator {RESOURCE} The resource name
{PACKAGE} The package type of the resource
{STATUS} The current status of the resource, view or object as character (?,-,1,2,3,+)
{STRSTATUS} The current status of the resource, view or object as text (unknown,unavailable,minor,major,critical,ok)
{OLDSTATUS} The previous status of the resource, view or object as a character (?,-,1,2,3,+)
{STROLDSTATUS} The previous status of the resource, view or object as text (unknown,unavailable,minor,major,critical,ok)
View, Object, Indicator {CLASS} The type of view or object
{VIEW} The view name
{PARENTVIEW} The view above the view that triggered the alert
{PROBLEMNOTE} A operator entered problem note
{OBJECT_1} ... {OBJECT_5} The content of the view or object constants 1 through 5
{VIEW_0} ... {VIEW_9} The name of the views from level 0 to 9 under which this view is found
{DURATION} The time during which the view or object has been in the current state
View, Object {TEXT} A text reason for the latest change of state of a view or object
Object, Indicator {OBJ} The object name
{IP} The IP address of the object
{HOST} The FQDN or IP address of the object, depending on how the resource was configured
Indicator {INDICATORSTATUS} The current status of the indicator as character (?,-,1,2,3,+)
{INDICATOROLDSTATUS} The previous status of the indicator as a character (?,-,1,2,3,+)
{INDICATORNAME} The name of the indicator
{INDICATORVALUE} The current value of the indicator
SNMP Trap {TRAPNAME} The trap rule name
{TRAPCATEGORY} The trap rule associated category
{TRAPSEVERITY} The trap rule associated severity
{TRAPMESSAGE} The trap rule associated message
{TRAPIPSENDER} The IP address of the sender of the trap
{TRAPIPAGENT} The IP address of the SNMP Agent that originally sent the trap
{TRAPALLOIDVALUES} All content of the trap OID values received
{TRAPOID1} ... {TRAPOID20} The trap OID variable name 1 through 20
{TRAPVALUE1} ... {TRAPVALUE20} The trap OID variable value 1 through 20
Syslog {TIMESTAMP} The timestamp found in the syslog
{HOST} The host found in the syslog
{IP} The IP address from which the syslog was received
{PID} The PID found in the syslog
{TAG} The Tag found in the syslog
{TEXT} The text of the syslog
{DESCRIPTION} The text of the syslog after all of the named components have been parsed
{FACILITY} The syslog Facility
{SEVERITY} The syslog Severity
{MSGID} The Message ID found in the syslog
{DATA} The structured data found in the syslog

Acknowledge status changes

When elements in ServicePilot change status and become unavailable or have a performance issue, the object, views and resources will reflect this problem. It is possible to acknowledged the issue so that it may be discounted in the Status views and when matching alerting conditions. Acknowledging an issue will not change its status or hide the problem but a note will be visible against the acknowledged element.

If the issue is cleared and the elements become available and nominal then the acknowledgement will disappear. This may be a problem for elements that continually change between nominal and a bad status as an acknowledgement will not be maintained. In this case, a Note may be added instead as this will not be removed automatically.

Access Acknowledge/Note object from the map

  1. As a user with operator privileges, navigate the Map until the object you wish to acknowledge/note is open Map menu item
  2. Click on the Acknowledge or Note buttonManage button

Access Acknowledge/Note view from the map

  1. As a user with operator privileges, navigate the Map until the view you wish to acknowledge/note is open Map menu item
  2. Click on the View information icon View information icon
  3. Click on the Acknowledge or Note button Manage button

Access Acknowledge/Note from status lists

  1. As a user with operator privileges, navigate to Status Status menu item
  2. Select Resource, Object or View from the Status sub-menu depending on the component you wish to acknowledge/note Status sub-menu
  3. Select one or more elements to acknowledge or note and click on the green acknowledge or blue note Manage button

Filter Acknowledged elements

Once the acknowledgement note has been added you can use the Exclude ManualAck and Only ManualAck filters in the Status views.

When creating Alert policies with Object or View conditions, the Ack field can be set to include or exclude acknowledged elements.

Alerting examples

Receive an email when any Ping does not respond

To receive emails when a ping no longer responds an alert policy is required.

  1. Add a new policy and set the type to Alert
  2. Set the alert policy name appropriately. For example: alert_ping_no_response_email
  3. Check Apply this Policy to the entire configuration to that this will apply to all Ping objects in the configuration
  4. In the Condition tab, set the Condition type to Object
  5. Set the From status to all colors except red
  6. Set the To status to only red
  7. Set the Filter Classes to Ping
  8. In the Action tab, set the Action type to email
  9. Set the From address and set the To email addresses (semi-colon separated) as required
  10. Set the Subject. For example: (ServicePilot) Ping not responding to {OBJ}
  11. Set the Message. For example: Ping not responding to {OBJ} at {DATE} {TIME}
  12. Save the new policy

This alert might be sent for part of the configuration by not applying this policy to the entire configuration. Instead apply this policy to a view or a number of resources individually.

Alert when a hard disk goes over a usage threshold

To obtain notifications when a hard disk volume passes the major or critical space usage threshold, add a new alert policy.

  1. Add a new policy and set the type to Alert
  2. Set the alert policy name appropriately. For example: alert_disk_space_usage_high
  3. Check Apply this Policy to the entire configuration to that this will apply to all Server Disk objects in the configuration
  4. In the Condition tab, set the Condition type to Indicators
  5. Set the From status to gray, green and blue
  6. Set the To status to yellow and purple
  7. Set the Filter Classes to Server Disk
  8. Set the Filter Indicator to Space Usage
  9. Save the new policy

With the Condition set to the Indicators type, the Indicator name and current values can be used in the action. For example: {STRSTATUS} disk alert: {OBJ} usage at {INDICATORVALUE}

Alert when resources on a site become unavailable out of office hours

To obtain an alert outside office hours start by creating a Time period defining the out of office hours timespans. Then include this Time period in the new alert policy.

  1. Add a new Time period with name Out of hours 1
  2. Set the Ranges to 00:00 - 09:00 and 18:00 - 23:59 from Monday to Friday
  3. Add a second new Time period with name Out of hours 2
  4. Set the Ranges to 00:00 - 23:59 for Saturday and Sunday
  5. Save the new Time periods
  6. Add a new policy and set the type to Alert
  7. Set the alert policy name appropriately. For example: alert_ooh_site_resource_unavailable
  8. In the Condition tab, set the Condition type to Resources
  9. Set the Alerting time period to Out of hours 1|Out of hours 2
  10. Set the From status to all colors except red
  11. Set the To status to only red
  12. Set the action as required
  13. Save the new policy
  14. Apply this new policy to the view called Sites to affect all resources in this view and sub-views.

Get started Now