AI
Role of AI in ServicePilot
Artificial Intelligence plays a central role in ServicePilot to speed up diagnostics, automatically detect abnormal behavior, and support decision-making. The platform combines several complementary approaches:
ServicePilot’s internal AI:
Based on integrated machine learning algorithms and models (anomaly detection, event correlation, trend analysis, smart search). It operates automatically, without requiring configuration.
External AI (LLM):
Option to connect an external language model (OpenAI, Azure OpenAI, etc.) to take advantage of conversational search and advanced natural language interpretation.
These different AI features are accessible in several sections of the product, each addressing a specific need.
Anomalies
Anomalies in ServicePilot refer to unexpected changes in the status of monitored objects. They are triggered when the status of a metric changes in an unusual way compared to its normal behavior.
A metric’s status may change when it exceeds a threshold (for example: OK → Warning → Critical). This status change becomes an anomaly when it exhibits one of the following characteristics:
- It does not occur regularly (not cyclical, not normal for this object).
- It does not correspond to the object’s usual state.
- It is not consistent with the object’s history.
- It is not expected in the operational context.
The internal AI therefore analyzes the status history, associated events, and the object’s normal behavior to determine whether this change is truly abnormal.
Example:
An object named Server - eth0 changes to the Down state. However, this interface is usually Up at all times.
This status change is therefore considered an anomaly because it does not correspond to its normal behavior.
In this case, the object is marked as being in an anomaly state, and an associated event is generated.
Anomalies appear on the dedicated page under MONITOR > Status > Anomalies.
A drop-down menu at the top allows you to view the number of resource anomalies in the badges.
Problems
Problems in ServicePilot are intelligent groupings of resources that exhibit abnormal states at the same time. They help reduce noise, deduplicate alerts, and provide a consolidated view of incidents affecting multiple related objects. Unlike Anomalies, which refer to an unexpected state change in a single object, Problems group together multiple anomalies that are linked over time and by their technical relationship.
A Problem is created when a resource remains in an abnormal state for more than 3 minutes. Subsequently, other resources can be added to the same Problem if:
- They become abnormal within 90 minutes.
- They are linked to the initial resource (by IP, hostname, dependencies, etc.).
This allows for the automatic grouping of incidents that likely share a common cause. A problem is automatically closed when no resource in the group has remained in an abnormal state for more than 30 minutes.
Problems allow you to:
- Group multiple related anomalies.
- Avoid a proliferation of isolated alerts.
- Identify the likely cause more quickly.
- Visualize the overall impact of an incident across multiple resources.
They constitute an AI layer for temporal and relational correlation based on anomaly detection.
Example:
An anomaly is generated when a host named VMHost1 becomes unavailable. Shortly thereafter, the virtual machines VM1 and VM2 also go into an error state. Since these resources are linked and the errors occur within the same time frame, they are grouped into a single issue.
This grouping makes it easy to quickly understand that the likely cause is the loss of the host, rather than three independent incidents.
The anomalies appear on the dedicated page under MONITOR > Status > Problems.
A drop-down menu at the top allows you to view the number of issues in the badges.
ML Pages
The ML Pages included in every standard ServicePilot dashboard provide advanced metric analysis using machine learning models applied to time series. They focus on how values change over time: spikes, trends and forecasts.
This makes it possible to identify unusual or emerging patterns in metrics, anticipate risks and better understand the dynamics of monitored resources.
The built-in ML widgets perform several types of analysis:
- Spike detection: Automatic identification of abnormally high or low values for a given metric, based on its historical data.
- Trend analysis: Calculation of 24-hour and 30-day trends to visualize how a metric is changing: increase, decrease, or stability.
- Critical threshold forecast: Estimation of the number of days remaining before an indicator reaches a critical threshold, based on the observed trend.
Examples:
- An isolated CPU spike may indicate a one-time load, but a strong upward trend over 30 days may reveal a risk of saturation.
- A forecast indicating that a disk will reach its critical threshold in 12 days allows you to plan for a capacity expansion or data cleanup.
- A downward trend in application traffic may signal a usage or connectivity issue.
ML Pages are therefore a decision-support tool based on the analysis of metric behavior over time.
ML analyses are accessible in a tab on the standard dashboards for each technology family, each package and specific resources.
Internal AI Search
ServicePilot’s internal AI search is an intelligent search engine designed to help users quickly find information within the platform.
It relies on a lightweight, LLM-free internal AI to improve the relevance of results while ensuring speed and consistency. Unlike external LLM search, this search does not interpret natural language in a conversational manner. It optimizes search within product content and documentation, not in free-form text.
The internal AI search covers several key areas of the platform:
- Documentation: help pages, guides, concepts.
- Packages: search by name, technology, use case.
- Dashboards: standard, custom, and resource-based dashboards.
- SQL: queries, views, and data-related elements.
- Data: objects, resources, metrics, and events.
This centralized search feature allows you to quickly navigate the entire ServicePilot ecosystem.
The ServicePilotAI modal can be accessed by clicking the robot icon in the top menu.
External LLM Search
ServicePilot allows you to connect an external LLM to take advantage of conversational search and advanced natural language understanding.
By configuring an LLM (using the OpenAI API), users can:
- Ask questions in natural language.
- Get context-aware answers.
- Request explanations or summaries.
- Generate text or analyses.
Configuration is done in the ServicePilot settings, in the section dedicated to external AI integrations.
The ServicePilotAI modal is accessible via the robot icon in the top menu.