ServicePilot Clustering and Scaling
The following explains ServicePilot Clustering and Scaling configuration for On Premise ServicePilot deployments.
What is a ServicePilot Cluster?
The ServicePilot solution may be clustered for performance or data partitioning reasons. To partition configuration, authentication and data, the notion of a tenant is used to indicate a separate instance of ServicePilot with its own users, monitored equipment and data history store. ServicePilot Agents are associated with a single tenant.
A single ServicePilot Manager and Database on recent hardware with 8 vCores, 16GB RAM and SSD storage can handle up to:
- 500,000 to 1,000,000 indicators per minute
- 50,000 to 80,000 objects
- 500 to 1,000 Hosts Full-Stack
The ServicePilot solution may be clustered in a number of ways:
- High availability: By deploying further ServicePilot Managers and Databases, in the same cluster, that can take over in case of failure or maintenance. This does not increase the capacity of the solution.
- Database scaling: By adding more Database services, one dedicated to writing data while others only reading data. This improves performance with large datasets and big database queries.
- Multi-tenant partitioning and scaling: By deploying a number of clusters of ServicePilot Managers and Databases, each with data separate. Users may be given different rights by each tenant with single sign-on working between tenants.
- Multi-region clustering: By deploying multiple clusters of ServicePilot Managers and Databases geographically, data received is kept within the cluster's region for legal or other requirements. Users may be given access to different regions with different rights in each region with automatic single sign-on working between the regions.
Data is received from one or more ServicePilot Agents that may be directly connected to the ServicePilot Managers or via ServicePilot Proxies and Load-balancers. The cluster of ServicePilot Managers will direct the ServicePilot Agents' traffic to the correct ServicePilot Manager and tenant's partition.
A ServicePilot Cluster provides service availability detection and a means to replicate configuration across the ServicePilot solution. A ServicePilot Cluster requires 1, 3 or 5 SPCluster services. For high availability 3 or 5 SPCluster services are needed on different servers. The SPCluster services can be co-hosted with other ServicePilot software.
SPCluster services form a quorum when more than half of the SPCluster services can communicate with each other. Other ServicePilot services can then start and will only continue to operate while the quorum is maintained.
Once the cluster is setup, High availability and Database scaling do not require further configuration.
SPCluster service
All SPCluster services in a cluster require the same Cluster ID and other configuration. However, each SPCluster service needs to be configured with a unique Node ID. Other ServicePilot services making use of the cluster will use a Node ID unique to the node on which the service is installed. The lowest numbered active node is considered the Leader.
Communication between the SPCluster services and access to the SPCluster services from other ServicePilot services is done on secured TCP port 8980
. We suggest defining the remote IP addresses of all cluster services in firewall rules.
For Linux cluster services, the following environment variables are used:
Environment Variable | Notes |
---|---|
sp_cluster |
A list of pipe separated URLs, one for each SPCluster service |
sp_node_id |
The Node ID of this service, SPCluster or otherwise. For SPCluster services this indicates which URL in the sp_cluster list is itself. |
sp_cluster_id |
The Cluster ID of all services, SPCluster or otherwise that are part of the same cluster. Cluster IDs are 1 to 4 alphanumeric characters. |
For Windows cluster services, the ServicePilot Setup Console is used to set cluster configuration. The SPCluster service can be started automatically by the Windows ServicePilot Manager.
Cluster configuration files
Once the per node ServicePilot Cluster configuration is setup, further cluster configuration is managed in files on Node ID 1
. These cluster configuration files are then replicated to other nodes when a configuration reload or restart is performed on node 1
.
WorkFolder | Content |
---|---|
cluster.conf |
Cluster configuration |
servicepilot.conf |
ServicePilot configuration common to all nodes and tenants |
servicepilotdb.conf |
Database configuration |
storage.conf |
Storage configuration |
Each tenant has the following configuration:
WorkFolder | Content |
---|---|
tenant.conf |
Per tenant configuration |
provisioning.conf |
Per tenant resources |
Packages , Pictures |
Per tenant custom packages and pictures |
ServicePilot Manager redundancy
If multiple ServicePilot Managers are running with the same Cluster ID, then the one with the lowest Node ID will be the leader and take control. Other ServicePilot Managers will redirect traffic to the leader.
Each ServicePilot Manager in a cluster requires a separate ServicePilot license.
ServicePilot Manager access methods
Users of ServicePilot with redundancy configured will need a reliable way to access the ServicePilot web interface when one of the nodes becomes unavailable.
This can be achieved with a load-balancer that will need to be installed separately. A load-balancer can be configured to check the /ping.html
URL on each ServicePilot Manager. If the response is a 302
return code this indicates that the service is not a leader. A 200
return code indicates that the service is the leader.
ServicePilot Database redundancy
If multiple ServicePilot Databases are running with the same Cluster ID, then the one with the lowest Node ID will be the leader and handle read/write access to the database files. Other ServicePilot Database services will be on standby until the old leader is no longer available.
Failover to ServicePilot Databases on a second node will provide a degraded experience as new data added to the second node will not be replicated to the first node. When service is restored on the first node, a gap in historical data is to be expected.
Database high availability with shared storage
If nodes containing the ServicePilot Database services share a common file share, then the ServicePilot WorkFolder can be placed on this device. This allows for common historical data storage.
ServicePilot Databases running with the same Cluster ID can be configured so that the one with the lowest Node ID will be the leader and handle read/write access to the database files. Other ServicePilot Database services with the same Cluster ID can provide read-only access to the database files to improve performance.
Note that the performance of storage needs to be specified with equivalent performance to local SSD storage.
High availability
A minimal ServicePilot Cluster configuration consists of:
- 3 SPCluster services
- 2 SPManager services
- 2 SPDB services
Each node should be in a separate availability zone in the data center. This setup will provide redundancy if one node becomes unavailable or unreachable.
Common Environment Variable | Notes |
---|---|
sp_cluster |
https://spus1.company.com:8980|https://spus2.company.com:8980|https://spus3.company.com:8980 |
Node | Services configured | sp_node_id |
sp_cluster_id |
---|---|---|---|
SP1 | SPCluster,SPManager,SPDB | 1 | USA |
SP2 | SPCluster,SPManager,SPDB | 2 | USA |
SP3 | SPCluster | 3 | USA |
Database scaling
To scale database performance, configure a second node with a ServicePilot Database service and common file storage between the servers for the database files.
Common Environment Variable | Notes |
---|---|
sp_cluster |
https://spus1.company.com:8980 |
Node | Services configured | sp_node_id |
sp_cluster_id |
---|---|---|---|
SP1 | SPCluster,SPManager,SPDB | 1 | USA |
SP2 | SPDB | 2 | USA |
Multi-tenant partitioning and scaling
To partition or scale a ServicePilot solution create a cluster of independent ServicePilot Managers and ServicePilot Databases with a minimum of 3 SPCluster services.
Node IDs are unique for every node between all nodes that use the same SPCluster services irrespective of the Cluster ID to which they belong.
Common Environment Variable | Notes |
---|---|
sp_cluster |
https://sp1.company.com:8980|https://sp2.company.com:8980|https://sp3.company.com:8980 |
Node | Services configured | sp_node_id |
sp_cluster_id |
---|---|---|---|
SP1 | SPCluster,SPManager,SPDB | 1 | SRV1 |
SP2 | SPCluster,SPManager,SPDB | 2 | SRV2 |
SP3 | SPCluster,SPManager,SPDB | 3 | NET1 |
SP4 | SPManager,SPDB | 4 | NET2 |
SP5 | SPManager,SPDB | 5 | VOIP |
SP# | SPManager,SPDB | # | Tenant# |
Multi-region clustering
The SPCluster services, once deployed, can be used by several clusters of distinct ServicePilot Managers and ServicePilot Databases. In this scenario, each separate cluster of redundant services is defined by its own Cluster ID.
ServicePilot Managers in separate clusters support single sign-on, meaning that if a user logs in to one ServicePilot Manager in one cluster, they can open web pages on ServicePilot Managers in other clusters without having to log in again as long as the user's ID is defined in the other cluster.
Common Environment Variable | Notes |
---|---|
sp_cluster |
https://spus1.company.com:8980|https://speu3.company.com:8980|https://spaz5.company.com:8980 |
Node | Services configured | sp_node_id |
sp_cluster_id |
---|---|---|---|
SP1 | SPCluster,SPManager,SPDB | 1 | USA |
SP2 | SPManager,SPDB | 2 | USA |
SP3 | SPCluster,SPManager,SPDB | 3 | EU |
SP4 | SPManager,SPDB | 4 | EU |
SP5 | SPCluster | 5 | Azure |