ServicePilot Agents collect data or are sent data to be added to the ServicePilot database. The ServicePilot configuration determines what needs to be retained and for how long. Depending on the source monitored data, further calculated statistics may be stored.
Example: A server is queried to obtain disk size and disk bytes used. Using this information a disk usage percentage is calculated and stored in the database.
Many of the statistics collected by ServicePilot are obtained by polling devices every minute to obtain their current value or status. The collected statistics are defined in the provisioned packages and are affected by the resource configuration and any policies applied to the resources.
A simple example of polled data is a Ping or ICMP Echo to an IP address. In the ServicePilot configuration, if a resource is added with a remote IP address to ping, then the returned data consists of the Response Time in milliseconds between the ServicePilot Agent and the remote IP address.
Once a minute the ServicePilot Agent will send out the Ping and wait for the response. If the ServicePilot Agent does not get a response then it will try a second time, within the same minute. If it does not get a response to either request then the ServicePilot object will be considered to be in a no response pending state. A second minute of polling will either succeed or fail. Based on the number of times that this needs to be confirmed, the object will transition to a no response status. This will trigger the object to be classified as unavailable.
The frequency of polling, the number of confirmations and the state into which the object is transitioned can all be modified by applying policies.
Data may be obtained by querying devices using SNMP polling. This is similar to the simple Ping polling but with some differences. The data obtained consists of a number of SNMP OID queries, either for individual values or for a whole table of data.
In general, individual OIDs will be obtained once a minute while tables will be downloaded every 6 hours. The discovery table is downloaded at the discovery frequency and used to see if new equipment has been added. New objects are then created and subsequently monitored once a minute.
Example: A switch is monitored to obtain data from a number of active Ethernet interfaces. Every 6 hours, the list of active interfaces is downloaded and previously inactive interfaces are added to the list of interfaces to be polled once a minute.
If configuration changes are made to resources then the associated discovery scripts are relaunched just after the changes have been applied.
An object obtaining data using SNMP Polling will by default pass into a no response pending state if it has not received any data within the minute without confirmation. This means that the object will go straight to no response status unless it receives data. If no data is received then an object will normally change state to unknown. The reason for using unknown rather than unavailable is that it is customary to have a Ping query going to the same device and only one of the objects needs to become unavailable or multiple alerts for the same issue would be shown.
The frequency of polling, the discovery frequency, the number of confirmations and the state into which the object is transitioned can all be modified by applying policies.
Many other methods of obtaining data by querying devices are used by the ServicePilot Agents. For example, Windows WMI queries, TCP checks, SQL queries and web page queries amongst others. In these cases the polling frequency is determined by the package and resource parameters configured.
The minimum polling frequency is still 1 minute but it is common to poll elements less often. Note that although the polling frequency may be set, this does not allow you to specify when each poll will take place. For this reason setting a large polling frequency value does not make much sense as you will not know when during the day this might happen.
For all these other data collection types classified as custom, the no response state for the object will be set after a number of minutes during which no data has been sent between the ServicePilot Agent and ServicePilot.
When the object is determined to be in a no response state, the status of the object will change to unknown or unavailable depending on the type of object as defined in the package.
For example, a web check object will be classified as unavailable if no data is received for an hour.
A server disk object will be classified as unknown if no data is received for 10 minutes.
The no response duration before a timeout is declared and the status to use can be modified by applying policies.
By default ServicePilot will query and store data all the time. It is possible to modify this by using monitoring policies which include time periods over which to collect data.
Example: Apply a monitoring policy to a view containing all resources on a site. This monitoring policy includes a time period definition indicating that monitoring should only take place during open hours for the business. Outside of this time, the site's resources will not be monitored and their state will be unknown.
It is often useful to define monitoring periods when elements are known to have down time for maintenance or scheduled reboots.
Although maintenance times or scheduled reboots might happen at known times, ad-hoc management of resources might be needed to stop alerting issues that are being worked on.
Example: A network interface is causing problems and has been taken out of service until the issue is resolved. The ServicePilot alert for this interface should be hidden.
It is possible to unmanage an individual object or a whole part of the monitored hierarchy by selecting a view. Access to the unmanage function is available in the view hierarchy or the status list.
Access Unmanage object from the map
- As a user with operator privileges, navigate the Map until the object you wish to unmanage is open
- Click on the Manage button
Access Unmanage view from the map
- As a user with operator privileges, navigate the Map until the view you wish to unmanage is open
- Click on the View information icon
- Click on the Manage button
Access Unmanage from status lists
- As a user with operator privileges, navigate to Status
- Select Resource, Object or View from the Status sub-menu depending on the component you wish to unmanage
- Select one or more elements to unmanage and click on the gray unmanage icon
Manage or unmanage items
Once you open the management dialog, you can choose to manage (restart monitoring) or unmanage (stop monitoring) the item you selected. If you selected a view then this will affect the view and all sub-elements.
When unmanaging an item, you can also ask ServicePilot to stop storing data for the items in the database. If you only unmanage the device, then monitored indicators will still be retreived and stored but the status of the items will be set to unknown.
If you want to start the operation in the future or also say when ServicePilot should start monitoring again, then you can fill in the date-time fields. These are optional as the default action is to stop or start the operation immediately.
A note can be added so that users of ServicePilot can understand why this action was performed.
Although rarely needed, it is possible to remove objects from ServicePilot configuration. Only objects that where auto-created by ServicePilot can be removed in this way. To stop monitoring devices, it is usually a ServicePilot administrative user that will delete the resource from the configuration or change the resource parameters to stop monitoring a particular element.
Deleting an object will not remove historical data associated with this object from the database so it will still be shown in dashboards that query information while the object was still present.
Note that if an object is deleted, it might re-appear if the component is still present when the next discovery script runs. In this case the object should be removed by changing the resource parameters, and if that is not possible then unmanaging the object.
Example: ServicePilot is monitoring a server with multiple disk volumes. One of the disk volumes is removed permanently. The object can be deleted as it will not re-appear later.
Delete an object from the map
- As a user with administrative privileges, navigate the Map until the object you wish to delete is open
- Click on the Delete Object link
Some data that ServicePilot receives might be based on unsolicited events. For example a syslog message or an SNMP Trap is sent to the ServicePilot Agent.
This type of data is associated with the resource that was used to configure the ServicePilot Agent to accept the data. However, the data is not stored as indicator data in objects. Instead the events are stored in databases based on the type of data (Syslogs, SNMP Traps, VoIP call records). Dashboard are then made available to view this event data in standard ways. Custom queries might be added to filter the data further or show the information in completely new ways.
ServicePilot keeps data for a limited period of time to limit disk space requirements and manage the speed of data queries. Numerical indicator data can be summarized and kept for longer but as averages, minimums and maximums of the real collected data. It is therefore possible to create a graph of an indicator by looking at only the daily averages over a year. If you then zoom in to a smaller time span, you could see hourly averages but only for the last 3 months or quater hour averages for the last month or every minute of data but only for the last week.
Other kinds of data cannot be compressed in this way and so the data is kept for less time. It is also a far more costly operation to query this data, so selecting a shorter time span will return results quicker.
Some data is kept in the database but no history is maintained. For example the current state of all of the objects and inventory data.
Note: The free ServicePilot monitoring does not store any historical data in the database. Only the current state of the monitored resources are visible. Many dashboards and reports will therefore appear empty.
|Indicator Data||7 days|
|Quarter hour summary indicator data||30 days|
|Hourly summary indicator data||90 days|
|Daily summary indicator data||365 days|
|Object availability and performance||90 days|
|Daily summary of object availability and performance||365 days|
|ServicePilot detected events and status changes||90 days|
|SNMP Traps and notifications||60 days|
|VoIP call quality records||90 days|
|IP Flow, IPFIX, NetFlow, sFlow, Jflow||30 days|
|Web application traces||7 days|
|Log data associated with objects||30 days|
Get started Now