Documentation
Discover the zero configuration mode

ServicePilot Clustering and Scaling

The following explains ServicePilot Clustering and Scaling configuration for On-Premise ServicePilot deployments.

To deploy a clustered ServicePilot configuration, we strongly recommend contacting our technical support team to ensure that the requested architecture has been validated and is compatible with ServicePilot.

What is a ServicePilot Cluster?

The ServicePilot solution may be clustered for performance or data partitioning reasons. To partition configuration, authentication and data, the notion of a tenant is used to indicate a separate partition of ServicePilot with its own users, monitored equipment and data history store. ServicePilot Agents are associated with a single tenant.

A single ServicePilot Manager and Database on recent hardware with 8 vCores, 16GB RAM and SSD storage can handle up to:

  • 500,000 to 1,000,000 indicators per minute
  • 50,000 to 80,000 objects
  • 500 to 1,000 Hosts Full-Stack

The ServicePilot solution may be clustered in a number of ways:

  • High availability: By deploying further ServicePilot Managers and Databases, in the same cluster, that can take over in case of failure or maintenance. This does not increase the capacity of the solution.
  • Database scaling: By adding more Database services, one dedicated to writing data while others only reading data. This improves performance with large datasets and big database queries.
  • Multi-tenant partitioning and scaling: By deploying a number of clusters of ServicePilot Managers and Databases, each with separate data. Users may be given different rights by each tenant with single sign-on working between tenants.
  • Multi-region clustering: By deploying multiple clusters of ServicePilot Managers and Databases geographically, data received is kept within the cluster's region for legal or other requirements. Users may be given access to different regions with different rights in each region with automatic single sign-on working between the regions.

ServicePilot cluster requirements

A ServicePilot cluster deployment requires a number of further prerequisites on top of standard ServicePilot server requirements:

  1. ServicePilot Cluster services.
  2. ServicePilot data is stored in one or more S3 compatible object store buckets managed by the ServicePilot administrator.
  3. Round-robin DNS definitions for ServicePilot Managers to handle fail over and load sharing and redirection for tenants hosted on other ServicePilot Managers.

SPCluster services

SPCluster services provides availability detection and a means to replicate configuration across the ServicePilot solution. For high availability 3 or 5 SPCluster services are needed on different servers. The SPCluster services can be co-hosted with other ServicePilot Managers and Databases.

SPCluster services form a quorum when more than half of the SPCluster services can communicate with each other. Other ServicePilot services can then start and will only continue to operate while the quorum is maintained.

Communication between the SPCluster services and access to the SPCluster services from other ServicePilot services is done on TCP port 8980. This port needs to be secured to only allow traffic from other SPCluster services as well as ServicePilot Managers and Databases.

S3 data object storage

The customer is responsible for providing S3 compatible object stores that are accessible by all ServicePilot nodes on which the ServicePilot Databases are running. Different S3 servers can be used but only one store can be used per tenant.

Copies of the data can be sent to secondary S3 object stores for backup purposes.

DNS configuration

When users and ServicePilot Agents initially connect to ServicePilot Managers, they will use a cluster wide FQDN that refers to all ServicePilot Managers in the cluster. Each ServicePilot Manager will also have a unique FQDN and IP address.

Example DNS configuration

|Component|FQDN|Type|Value|
|Cluster Node 1|spc1.company.com|A|10.1.1.10|
|Cluster Node 2|spc2.company.com|A|10.1.2.10|
|Cluster Node 3|spc3.company.com|A|10.1.3.10|
|Manager Node 1|sp1.company.com|A|10.1.1.11|
|Manager Node 2|sp2.company.com|A|10.1.2.11|
|ServicePilot client access|sp.company.com|CNAME|sp1.company.com, sp2.company.com|

Certificates

Each ServicePilot Manager needs to respond to initial requests made on the cluster FQDN as well as its own FQDN. Therefore a single ServicePilot Manager certificate is required that is valid for the cluster and all node FQDNs. For example: sp.company.com, sp1.company.com, sp2.company.com

ServicePilot Managers

ServicePilot Managers host tenants. Only one ServicePilot Manager will serve a tenant at a time. If the Manager fails then its responsibility will be passed on to a backup ServicePilot Manager based on cluster configuration.

ServicePilot Databases

Only one ServicePilot Database service is responsible for writing a tenant's data at a time. If the Database service is stopped or fails, then a backup ServicePilot Database will take over writing the tenant's data.

Multiple ServicePilot Database services can be configured to read a tenant's data to share the reading load. Data is cashed locally by the Database services to speed up future database queries.

Managing cluster configuration

A quorum of ServicePilot Cluster services is always required to operate clustered ServicePilot. Initial configuration is uploaded to the Cluster services, defining the cluster's layout and common configuration. ServicePilot Managers and Databases are then configured to access the cluster services and given the Cluster name and their Node ID within the cluster.