Version 9.3.0 (10 aug 2021)

ServicePilot Documentation

APM and distributed traces

ServicePilot's APM and distributed tracing provide real-time analysis of application behavior and performance issues.

Distributed traces are the records of the paths that each request takes through multiple microservices that constitute an application. Tracing becomes very complex to observe and analyze as soon as you encounter this kind of architecture. It is certain that when an application is made up of hundreds of microservices communicating with as many or more hosts, it is no longer possible to rely on a single trace. Moreover, static pages are served by CDN which completely blocks the visibility to some parts of the application. Compared to more "classical", more monolithic applications, where tracing is trivial, the natural complexity of modern architectures makes it a real challenge.

In the following diagram, for a single transaction, several requests are sent by a user. These requests then propagate throughout the application. Therefore, if any problem occurs, it is impossible to identify the cause without the distributed traces.

Distributed Traces

Modern tools are therefore needed to understand complexity, and this is where ServicePilot Technologies comes in, offering a concrete and simple solution based on powerful Root Cause Analysis.

Distributed traces are automatically correlated with logs, synthetic checks and network and infrastructure metrics.

APM types

APM & RUM

In order to have a complete vision of the behavior of your application, it is important to know what is going on in its entirety, whether on the client or server side. This is why RUM and APM go hand in hand and complement each other perfectly.
By coupling RUM and APM, it is possible to take full advantage of the power of distributed tracing in order to have an accurate overview of each step of each transaction. One without the other leads to a big loss of information about the requests and makes troubleshooting much more difficult.

APM & Synthetic Monitoring

Synthetic Monitoring is a monitoring technique that simulates an action or a series of actions that a user might perform on a web site. Because these actions are monitored in a continual manner, availability, response time, and performance metrics can be monitored 24/7.
When adding technologies such as APM to Synthetic Monitoring, it is possible to have a much deeper insight into the performance and availability of functionality and services delivered by a website.

APM & Application Traces

To better understand the actions of microservices and hosts distributed within an infrastructure, it is necessary to use APPTrace. This tool allows the ServicePilot user to trace all the requests made by the microservices and to know the connections between clients and servers and notably to know who sends what to whom and how.
If you put APPTrace and APM together, it is possible to have an overview of all microservices that make up the applications located in the infrastructure.

APM & Network Traces

Network Tracing is a monitoring technique that aims to study all data flows in and out of a host or a group of hosts. It is possible with this method to have visual monitoring of the communications within a network and to recover all the data in a pcap file.
When you combine Nettrace and APM, it is possible to obtain a large amount of information about your network and the hosts that occupy it, with different ways of viewing and filtering the data.

Instruments for Tracing

ServicePilot's APM technology correlates several dependent technologies:

  • Browser: Passive monitoring of all user interactions with an application.
  • Synthetic monitoring: Active monitoring of the availability and performance of a website.
  • Application traces: Supervision of all transactions and requests between the different microservices of a monitored application.
  • Network traces: Supervision of all incoming and outgoing network communications to the monitored hosts.
  • System metrics: Supervision of systems, such as servers CPU Load, Memory usage, Disk I/O ...

The choice of application instrumentation is important and can be very simple even with very complex applications:

Automatically embed scripts to report RUM metrics in Tomcat and Jetty served web pages or manually add the reporting scripts into key web pages.

See the ServicePilot RUM Agent instructions under SETTINGS > Agents > Install an agent > Developer Agents > RUM > Get started

In order to implement Synthetic monitoring, different packages can be used.

  • The user-webcheck package monitors the responses of a server using a HTTP(S) request issued by the ServicePilot agent.
  • It is also possible to use the user-web-scenario package which offers monitoring of a server's response times via a series of HTTP(S) requests issued by the ServicePilot agent.

All requests are issued by the ServicePilot agent at a defined interval, allowing continuous monitoring of the performance and availability of a website.

There are two different ways to set up Application Traces, one by manually provisioning some resources and the other by using auto-provisioning.

  • The packages that can be used by APPTraces are: apptrace-appservice-dotnet, apptrace-appservice-java, apptrace-appservice-nodejs, apptrace-dynatrace-service, apptrace-apphost, apptrace-zipkin.
  • Add an SP Agent discovery type Auto-provisioning rule, with the Application tab APM Ports field completed with listening ports separated by ",". Configure Application trace code to send data to these ports opened by the ServicePilot Agent. See doc Provisioning > Manage resources > Auto-provisioning > Add an auto-provisioning rule

In order to collect network traces, all you need to do is use ServicePilot's auto-provisioning. When creating an auto-provisioning rule, be sure to set the discovery type to SP Agent and check the Network traces box.
NETTRACE Auto Prov

In order to collect system metrics, the principle is the same as for network traces, this time you have to make sure that the System Packages box is checked.
System Auto Prov

How to see APM traces

Monitor topology (live)

Fullstack page

Service dependencies

Information is represented in two different forms: vertical dependencies and horizontal dependencies. The vertical dependencies are represented as a layered system with each layer having a category. At the top we find the Applications then the Services then the Processes etc... While the vertical dependencies will represent the communications between the elements of the same layer as links between the different hosts.

Problem identification and Root Cause Analysis (real time)

Thanks to the Topology page, a relational display by section of your different systems is created. It is possible to identify the different problems that the monitored systems on your network might encounter. You can then, thanks to the architecture display, quickly find out which server is the cause of the incident and solve the problem as soon as possible.

Seven tabs are used to navigate through the vertical views: Applications, Services, Processes, Hosts, Network, Security and Misc.

In each of these tabs there is an alert level color code of the elements displayed in the central part of the page. Each color can be selected and allows to display only the elements of the requested color.
Fullstack tab

Dependency Explanation
View Dependencies Display of the different elements of the tabs according to the view level in which they are located in the view hierarchy. For example an element in MAIN > France will find the same path in this dependency. Fullstack view
APP Dependencies Display of the different links between the elements in relation to their application traces. An element in communication with another will have a dependency relationship Fullstack APP
NET Dependencies Display of the different links between elements according to their network communication. An element in communication with another will see itself in dependency relationship Fullstack NET
Table Display in table form. View a real-time table of "status", "resource", "package", "host", "Mbps", "TCPReject", "TcpStartPublic", "RPM", "AvgDuration" and "errors". Fullstack table
Metrics Explanation
Spring layout Display of dependencies according to a dynamically discovered architecture. The elements are automatically placed according to their links. Fullstack Spring layout
Horizontal layout Display of a view with elements organized from left to right. The communicating elements are represented by a more structured architecture than the Spring layout display. Fullstack Horizontal layout

Tooltip: by hovering over a pad, the latter returns related information such as its type (Linux or Windows for example), its host, its resource and its package.

On the right screen is a split display. At the top, the vertical dependencies display allows you to see a tree structure representing the different layers.

Below, a menu with three tabs allows you to obtain additional information about the selected server.

Note: it is when an element is selected that the vertical dependencies and details are displayed.

Section Content
Status Details of the selected item Fullstack details
Cause Causes of the different status Fullstack cause
Impact Consequences related to the identified problem Fullstack impact

APM Metrics dashboards (Historical analysis)

The APM Metrics dashboard pages summarize all the data related to application and network traces that have been collected. To access the APM dashboards, simply go to ANALYSIS > Metrics and select Apptrace and Nettrace resources.

APM Page

The collected data can be consulted globally by selecting the required category. Category selection
Data can also be viewed for a specific item in a category. Unique selection

Three categories are presented here, AppService, NetProcess and NetServer.

AppService

This is where the analysis of traces of monitored applications and the support of APM technology comes in. The data presented offers a precise analysis of the performance and behavior of monitored applications, including the number of requests per minute, user satisfaction and other application metrics.

AppService page

NetProcess

In the NetProcess category, data related to the network traces of each supervised application process is available. Several metrics are displayed, in particular the quantity of "Retransmitted" and "Rejected" packets, but also the maximum throughput and the volume of data travelling through the supervised processes.

NetProcess page

NetServer

The NetServer category displays the same metrics as NetProcess however in this category the data is for each server as a whole and not just the monitored processes.

NetServer page

Transactions page (Historical analysis)

The Transactions page provides an analysis of all transactions that have taken place between clients and machines monitored by ServicePilot APM technology. To access the Transactions page, simply go to ANALYSIS > Traces and then choose the Transactions tab.

Transactions page

Data presented

This Transactions page offers a highlight of the various transactions and in particular the different HTTP(S) responses, the response time, the hosts involved, the methods used and the paths taken. The data is summarized in graphs that provide a complete view of the transaction flow within an application.
It is also possible to obtain the details of each transaction with all relevant data (traceID, duration, httppath, httpstatuscode...):

Transactions details

Filter transactions

This page has a filter feature that allows you to refine the data to better understand and analyze multiple transactions.

On the left side of the page, you can select predefined filters based on the recorded transactions. Just click on a filter to apply it. To remove a filter, simply deselect it. For more advanced filters it is possible to use the query bar.

APPTrace page (Real time trace analysis)

The APPTrace tab allows you to analyze in real time applications on servers, which are monitored by auto-provisioning or by microservice labeled packages. This page provides an overview of the monitored applications and allows you to better understand how they work and their communications with the external environment. To access the APPTrace page, simply go to ANALYSIS > Traces and then choose the APPTrace tab.

APPTrace Page

Explorer

Using the menu on the left, it is possible to navigate through the different hosts and microservices that have APPTraces. It is possible to get information from all hosts at the same time, from only one host and from only one microservice.

Main Menu

In the main interface, 3 types of data are available. For each data type, the details of each application trace are available with information about the hosts involved in the collected trace (client IP, server IP, host name...). In addition, each data type has specific data:

Data type Description
Hosts button - Hosts Displays application traces by host. Information about the system and the processes responsible for the collected traces are displayed, as well as indications about the requests themselves (HTTP method, HTTP response code, request time...)
Processes button - Processes Displays application traces by process. The available information is identical to that described for the traces by hosts (Additionally the port used by the process located on the server is available)
Conversations button - Conversations Displays application traces by conversation. Information about the requests are displayed (HTTP method, HTTP response code, request time, request path...)

Several views

Data can be displayed in different ways using the 5 buttons in the upper right part of the interface.

Display mode Description
Details button - Details Presents the data in the form of a table detailing each of the traces collected
Topo graph button - Topo graph Presents data as a dynamic graph
Topo top-down button - Topo top-down Presents data as a hierarchical graph from top to bottom
Topo left-right button - Topo left-right Presents the data as a hierarchical graph from left to right
Host map button - Host map Presents the data on a world map geolocating the hosts (the Host map is mainly useful and usable for microservices using public IP addresses)

It is possible to pause the capture at any time by pressing the pause button, located at the top right of the window. You can also refresh data from the page to replace the currently viewed data.

NetTrace page (Real time trace analysis)

The NetTrace page allows you to view live network traces of resources monitored by ServicePilot agents. To access the NetTrace page, simply go to ANALYSIS > Traces and then choose the NetTrace tab.

Nettrace main

The NetTrace page provides a quick and accurate view of all live traffic in a network. More precisely, NetTrace provides the visualization of the different connections established between servers monitored by ServicePilot agents and clients. For each connection detected, several data points will be available such as the IP of the hosts communicating with each other, but also more precise application data (conversations, blocked and rejected connections, bytes per second...)
The highlighting and resolving of problems within the network infrastructure is much more trivial with the use of NetTrace.

Nettrace data

Data can be visualized in a more or less global way:

The collected data can be viewed for an entire network by selecting the required network. Nettrace network
The data can also be viewed for a specific host of a network. Nettrace host

After selecting the network or host, it is possible to view the data and the various associated links in several ways, either as a table of more detailed information about each communication, or as graphs for an overview of the status of the selected network.
Nettrace graphs

NetTrace also offers a very interesting feature that allows you to capture network traffic at any time and make a PCAP trace according to various filters that can be set (IP, ports, protocol...):

Nettrace pcap

1. Open ANALYSIS > Traces
2. Open the NetTrace tab
3. Select a server on which the capture will take place from the left-hand list
4. Click on the Trace button at the top left
5. Modify or add filters on IPs and/or ports
6. Start the trace for as long as you want
7. Stop the trace, it will then be automatically downloaded as a PCAP file

Free installation in a few clicks