Documentation

Discover the zero configuration mode

APM and distributed traces

ServicePilot's APM and distributed tracing provide real-time analysis of application behavior and performance issues.

Distributed traces are the records of the paths that each request takes through multiple microservices that constitute an application. Tracing becomes very complex to observe and analyze as soon as you encounter this kind of architecture. It is certain that when an application is made up of hundreds of microservices communicating with as many or more hosts, it is no longer possible to rely on a single trace. Moreover, static pages are served by CDN which completely blocks the visibility to some parts of the application. Compared to more "classical", more monolithic applications, where tracing is trivial, the natural complexity of modern architectures makes it a real challenge.

Distributed Traces

In the following diagram, for a single transaction, several requests are sent by a user. These requests then propagate throughout the application. Therefore, if any problem occurs, it is impossible to identify the cause without the distributed traces.

Modern tools are therefore needed to understand complexity, and this is where ServicePilot Technologies comes in, offering a concrete and simple solution based on powerful Root Cause Analysis.

Distributed traces are automatically correlated with logs, synthetic checks and network and infrastructure metrics.

APM types

APM & RUM

In order to have a complete vision of the behavior of your application, it is important to know what is going on in its entirety, whether on the client or server side. This is why RUM and APM go hand in hand and complement each other perfectly.
By coupling RUM and APM, it is possible to take full advantage of the power of distributed tracing in order to have an accurate overview of each step of each transaction. One without the other leads to a big loss of information about the requests and makes troubleshooting much more difficult.

APM & Synthetic Monitoring

Synthetic Monitoring is a monitoring technique that simulates an action or a series of actions that a user might perform on a web site. Because these actions are monitored in a continual manner, availability, response time, and performance metrics can be monitored 24/7.
When adding technologies such as APM to Synthetic Monitoring, it is possible to have a much deeper insight into the performance and availability of functionality and services delivered by a website.

APM & Application Traces

To better understand the actions of microservices and hosts distributed within an infrastructure, it is necessary to use APPTrace. This tool allows the ServicePilot user to trace all the requests made by the microservices and to know the connections between clients and servers and notably to know who sends what to whom and how.
If you put APPTrace and APM together, it is possible to have an overview of all microservices that make up the applications located in the infrastructure.

APM & Web server logs

Web server logs in W3C format can be integrated with ServicePilot APM instrumentation. It is still preferable to instrument web application code with application traces as above. Application traces allow the correlation of sub-application calls e.g. database queries.

W3C logs can be extended with Request Headers if web applications are instrumented for RUM. Add a ServicePilot Agent on the server and a W3C monitoring package to enable web server APM monitoring.

APM & Network Traces

Network Tracing is a monitoring technique that aims to study all data flows in and out of a host or a group of hosts. It is possible with this method to have visual monitoring of the communications within a network and to recover all the data in a pcap file.
When you combine Nettrace and APM, it is possible to obtain a large amount of information about your network and the hosts that occupy it, with different ways of viewing and filtering the data.

Instruments for Tracing

ServicePilot's APM technology correlates several dependent technologies:

Browser: Passive monitoring of all user interactions with an application.
Synthetic monitoring: Active monitoring of the availability and performance of a website.
Application traces: Supervision of all transactions and requests between the different microservices of a monitored application.
Network traces: Supervision of all incoming and outgoing network communications to the monitored hosts.
System metrics: Supervision of systems, such as servers CPU Load, Memory usage, Disk I/O ...

The choice of application instrumentation is important and can be very simple even with very complex applications:

RUM

Embed scripts to report RUM metrics into web pages. This can be done by:

Automatically using a plugin to embed the script to report RUM metrics in Tomcat and Jetty served web pages
Parameter web servers or proxies to rewrite served web pages to embed the RUM script (for example using the IIS URL Rewrite extension)
Manually add the reporting script into key web pages

See the ServicePilot RUM instructions under SETUP > Parameters > Basic > APM Intrumentation > RUM Instrumentation

Synthetic monitoring

In order to implement Synthetic monitoring, different packages can be used.

The user-webcheck package monitors the responses of a server using a HTTP(S) request issued by the ServicePilot agent.
It is also possible to use the user-web-scenario package which offers monitoring of a server's response times via a series of HTTP(S) requests issued by the ServicePilot agent.

All requests are issued by the ServicePilot agent at a defined interval, allowing continuous monitoring of the performance and availability of a website.

Network traces

In order to collect network traces, all you need to do is use ServicePilot's auto-provisioning. When creating an auto-provisioning rule, be sure to set the discovery type to SP Agent and check the NPM box in the APM/NPM Instrumentation tab.

Application traces

In order to collect application traces, all you need to do is use ServicePilot's auto-provisioning. When creating an auto-provisioning rule, be sure to set the discovery type to SP Agent and check the required boxes in the APM/NPM Instrumentation tab.

System Metrics

In order to collect system metrics, the principle is the same as for network traces, this time you have to make sure that the System Packages box is checked.

W3C Logs

To collect APM traces from web servers like IIS ou Apache, you may deploy the apptrace-appservice-w3c package.
This package collets logs from the path defined during its configuration.

Application Traces

How to collect distributed traces

Based on the application to be monitored, there are three ways to instrument the application code:

Fully automatic: For Linux, the ServicePilot Agent automatically modifies the program environment variables and command line to insert APM libraries. The program will need to be restarted for this change to take effect.
Automatic: APM monitoring instrumentation is inserted into the application code automatically after the operator modifies the program environment variables and command line and the program is restarted with these new parameters.
Manual: The APM monitoring instrumentation needs to be manually added to the application code.

ServicePilot accepts a number of different Open Source APM instrumentation protocols providing a choice of instrumentation libraries and methods to collect APM data:

Datadog
OpenTelemetry
Zipkin

When open source instrumentation is available

If one of the previously listed instrumentation technologies is already used, the ServicePilot Agent integrates natively.

When instrumentation is yet to be integrated

If APM instrumentation has yet to be added to your application, ServicePilot supports APM integration with a number of languages:

Language	Fully automatic	Automatic	Manual
.NET	X	X	X
Java	X	X	X
Node.js	X	X	X
Python		X	X
PHP		X	X
Ruby		X	X
Go			X

For further information, contact ServicePilot support.

Datadog

ServicePilot Agents can receive APM Traces and metrics from Datadog Tracing Libraries.

Select APM Ports for Datadog collection: 8125, 8126
Select the automatic download of Libraries depending on the language of the application that needs to be instrumented
For the libraries that do not support centrally provisionned instrumentation, follow the Datadog documentation to send APM traces to the ServicePilot Agent on ports 8125, 8126

OpenTelemetry

OpenTelemetry also known as OTel, is a vendor-neutral open-source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs. ServicePilot Agents act as OTel Collectors for traces sent from application code instrumented using OTel automatic instrumentation, instrumentation with an OTel library or by manually sending data using the OTLP/HTTP or Zipkin/HTTP protocols.

Automatic OpenTelemetry Instrumentation is available for a number of languages with libraries and code documented from the OpenTelemetry web site.

Select APM Ports for OpenTelemetry collection: 4318
Select the automatic download of Libraries depending on the language of the application that needs to be instrumented
For the libraries that do not support centrally provisionned instrumentation, follow the OpenTelemetry documentation to send APM traces to the ServicePilot Agent on port 4318

Zipkin

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. ServicePilot Agents act as Zipkin Collectors for traces sent from application code instrumented using Zipkin instrumentation librairies or by manually sending data using the Zipkin/HTTP protocol.

Zipkin Tracers and Instrumentation documents libraries supporting the instrumentation of application code to send traces to a ServicePilot Agent.

Select APM Ports for Zipkin collection: 9411
Follow the Zipkin documentation to send APM traces to the ServicePilot Agent on port 9411

How to view APM traces

Monitor topology (live)

Service dependencies

Information is represented in two different forms: vertical dependencies and horizontal dependencies. The vertical dependencies are represented as a layered system with each layer having a category. At the top we find the Applications then the Services then the Processes etc... While the vertical dependencies will represent the communications between the elements of the same layer as links between the different hosts.

Problem identification and Root Cause Analysis (real time)

Thanks to the Topology page, a relational display by section of your different systems is created. It is possible to identify the different problems that the monitored systems on your network might encounter. You can then, thanks to the architecture display, quickly find out which server is the cause of the incident and solve the problem as soon as possible.

1. Layer selection

Seven tabs are used to navigate through the vertical views: Applications, Services, Processes, Hosts, Network, Security and Misc.

In each of these tabs there is an alert level color code of the elements displayed in the central part of the page. Each color can be selected and allows to display only the elements of the requested color.

2. Horizontal dependencies

Dependency	Explanation
View Dependencies	Display of the different elements of the tabs according to the view level in which they are located in the view hierarchy. For example an element in MAIN > France will find the same path in this dependency.
APP Dependencies	Display of the different links between the elements in relation to their application traces. An element in communication with another will have a dependency relationship
NET Dependencies	Display of the different links between elements according to their network communication. An element in communication with another will see itself in dependency relationship
Table	Display in table form. View a real-time table of "status", "resource", "package", "host", "Mbps", "TCPReject", "TcpStartPublic", "RPM", "AvgDuration" and "errors".

Metrics	Explanation
Spring layout	Display of dependencies according to a dynamically discovered architecture. The elements are automatically placed according to their links.
Horizontal layout	Display of a view with elements organized from left to right. The communicating elements are represented by a more structured architecture than the Spring layout display.

Tooltip: by hovering over a pad, the latter returns related information such as its type (Linux or Windows for example), its host, its resource and its package.

3. Vertical dependencies

On the right screen is a split display. At the top, the vertical dependencies display allows you to see a tree structure representing the different layers.

Below, a menu with three tabs allows you to obtain additional information about the selected server.

Note: it is when an element is selected that the vertical dependencies and details are displayed.

Section	Content
Status	Details of the selected item
Cause	Causes of the different status

APM Metrics dashboards (Historical analysis)

The APM Metrics dashboard pages summarize all the data related to application and network traces that have been collected. To access the APM dashboards, simply go to ANALYSIS > Traces and select APPTrace and NetTrace resources.

1. Choice of data to visualize


The collected data can be consulted globally by selecting the required category.
Data can also be viewed for a specific item in a category.

2. Data

Three categories are presented here, AppService, NetProcess and NetServer.

AppService

This is where the analysis of traces of monitored applications and the support of APM technology comes in. The data presented offers a precise analysis of the performance and behavior of monitored applications, including the number of requests per minute, user satisfaction and other application metrics.

NetProcess

In the NetProcess category, data related to the network traces of each supervised application process is available. Several metrics are displayed, in particular the quantity of "Retransmitted" and "Rejected" packets, but also the maximum throughput and the volume of data travelling through the supervised processes.

NetServer

The NetServer category displays the same metrics as NetProcess however in this category the data is for each server as a whole and not just the monitored processes.

Transactions page (Historical analysis)

The Transactions page provides an analysis of all transactions that have taken place between clients and machines monitored by ServicePilot APM technology. To access the Transactions page, simply go to ANALYSIS > Traces and then choose the Transactions tab.

Details

Data presented

This Transactions page offers a highlight of the various transactions and in particular the different HTTP(S) responses, the response time, the hosts involved, the methods used and the paths taken. The data is summarized in graphs that provide a complete view of the transaction flow within an application.
It is also possible to obtain the details of each transaction with all relevant data (traceID, duration, httppath, httpstatuscode...):

Filter transactions

This page has a filter feature that allows you to refine the data to better understand and analyze multiple transactions.

On the left side of the page, you can select predefined filters based on the recorded transactions. Just click on a filter to apply it. To remove a filter, simply deselect it. For more advanced filters it is possible to use the query bar.

APPTrace page (Real time trace analysis)

The APPTrace tab allows you to analyze in real time applications on servers, which are monitored by auto-provisioning or by microservice labeled packages. This page provides an overview of the monitored applications and allows you to better understand how they work and their communications with the external environment. To access the APPTrace page, simply go to ANALYSIS > Traces and then choose the APPTrace tab.

Explorer

Using the menu on the left, it is possible to navigate through the different hosts and microservices that have APPTraces. It is possible to get information from all hosts at the same time, from only one host and from only one microservice.

Main Menu

In the main interface, 3 types of data are available. For each data type, the details of each application trace are available with information about the hosts involved in the collected trace (client IP, server IP, host name...). In addition, each data type has specific data:

Data type details

Data type	Description
- Hosts	Displays application traces by host. Information about the system and the processes responsible for the collected traces are displayed, as well as indications about the requests themselves (HTTP method, HTTP response code, request time...)
- Processes	Displays application traces by process. The available information is identical to that described for the traces by hosts (Additionally the port used by the process located on the server is available)
- Conversations	Displays application traces by conversation. Information about the requests are displayed (HTTP method, HTTP response code, request time, request path...)

Several views

Data can be displayed in different ways using the 5 buttons in the upper right part of the interface.

Display mode details

Display mode	Description
- Details	Presents the data in the form of a table detailing each of the traces collected
- Topo graph	Presents data as a dynamic graph
- Topo top-down	Presents data as a hierarchical graph from top to bottom
- Topo left-right	Presents the data as a hierarchical graph from left to right
- Host map	Presents the data on a world map geolocating the hosts (the Host map is mainly useful and usable for microservices using public IP addresses)

It is possible to pause the capture at any time by pressing the pause button, located at the top right of the window. You can also refresh data from the page to replace the currently viewed data.

NetTrace page (Real time trace analysis)

The NetTrace page allows you to view live network traces of resources monitored by ServicePilot agents. To access the NetTrace page, simply go to ANALYSIS > Traces and then choose the NetTrace tab.

1. Data presented

The NetTrace page provides a quick and accurate view of all live traffic in a network. More precisely, NetTrace provides the visualization of the different connections established between servers monitored by ServicePilot agents and clients. For each connection detected, several data points will be available such as the IP of the hosts communicating with each other, but also more precise application data (conversations, blocked and rejected connections, bytes per second...)
The highlighting and resolving of problems within the network infrastructure is much more trivial with the use of NetTrace.

2. Data visualization

Data can be visualized in a more or less global way:


The collected data can be viewed for an entire network by selecting the required network.
The data can also be viewed for a specific host of a network.

After selecting the network or host, it is possible to view the data and the various associated links in several ways, either as a table of more detailed information about each communication, or as graphs for an overview of the status of the selected network.

3. Capture traffic

NetTrace also offers a very interesting feature that allows you to capture network traffic at any time and make a PCAP trace according to various filters that can be set (IP, ports, protocol...):


1. Open ANALYSIS > Traces
2. Open the NetTrace tab
3. Select a server on which the capture will take place from the left-hand list
4. Click on the Trace button at the top left
5. Modify or add filters on IPs and/or ports
6. Start the trace for as long as you want
7. Stop the trace, it will then be automatically downloaded as a PCAP file