APM and distributed traces
ServicePilot's APM and distributed tracing provide real-time analysis of application behavior and performance issues.
Distributed traces are the records of the paths that each request takes through multiple microservices that constitute an application. Tracing becomes very complex to observe and analyze as soon as you encounter this kind of architecture. It is certain that when an application is made up of hundreds of microservices communicating with as many or more hosts, it is no longer possible to rely on a single trace. Moreover, static pages are served by CDN which completely blocks the visibility to some parts of the application. Compared to more "classical", more monolithic applications, where tracing is trivial, the natural complexity of modern architectures makes it a real challenge.
In the following diagram, for a single transaction, several requests are sent by a user. These requests then propagate throughout the application. Therefore, if any problem occurs, it is impossible to identify the cause without the distributed traces.
Modern tools are therefore needed to understand complexity, and this is where ServicePilot Technologies comes in, offering a concrete and simple solution based on powerful Root Cause Analysis.
Distributed traces are automatically correlated with logs, synthetic checks and network and infrastructure metrics.
APM & RUM
In order to have a complete vision of the behavior of your application, it is important to know what is going on in its entirety, whether on the client or server side. This is why RUM and APM go hand in hand and complement each other perfectly.
By coupling RUM and APM, it is possible to take full advantage of the power of distributed tracing in order to have an accurate overview of each step of each transaction. One without the other leads to a big loss of information about the requests and makes troubleshooting much more difficult.
APM & Synthetic Monitoring
Synthetic Monitoring is a monitoring technique that simulates an action or a series of actions that a user might perform on a web site. Because these actions are monitored in a continual manner, availability, response time, and performance metrics can be monitored 24/7.
When adding technologies such as APM to Synthetic Monitoring, it is possible to have a much deeper insight into the performance and availability of functionality and services delivered by a website.
APM & Application Traces
To better understand the actions of microservices and hosts distributed within an infrastructure, it is necessary to use APPTrace. This tool allows the ServicePilot user to trace all the requests made by the microservices and to know the connections between clients and servers and notably to know who sends what to whom and how.
If you put APPTrace and APM together, it is possible to have an overview of all microservices that make up the applications located in the infrastructure.
APM & Web server logs
Web server logs in W3C format can be integrated with ServicePilot APM instrumentation. It is still preferable to instrument web application code with application traces as above. Application traces allow the correlation of sub-application calls e.g. database queries.
W3C logs can be extended with Request Headers if web applications are instrumented for RUM. Add a ServicePilot Agent on the server and a W3C monitoring package to enable web server APM monitoring.
APM & Network Traces
Network Tracing is a monitoring technique that aims to study all data flows in and out of a host or a group of hosts. It is possible with this method to have visual monitoring of the communications within a network and to recover all the data in a pcap file.
When you combine Nettrace and APM, it is possible to obtain a large amount of information about your network and the hosts that occupy it, with different ways of viewing and filtering the data.
Instruments for Tracing
ServicePilot's APM technology correlates several dependent technologies:
- Browser: Passive monitoring of all user interactions with an application.
- Synthetic monitoring: Active monitoring of the availability and performance of a website.
- Application traces: Supervision of all transactions and requests between the different microservices of a monitored application.
- Network traces: Supervision of all incoming and outgoing network communications to the monitored hosts.
- System metrics: Supervision of systems, such as servers CPU Load, Memory usage, Disk I/O ...
The choice of application instrumentation is important and can be very simple even with very complex applications:
Embed scripts to report RUM metrics into web pages. This can be done by:
- Automatically using a plugin to embed the script to report RUM metrics in Tomcat and Jetty served web pages
- Parameter web servers or proxies to rewrite served web pages to embed the RUM script (for example using the IIS URL Rewrite extension)
- Manually add the reporting script into key web pages
See the ServicePilot RUM Agent instructions under SETTINGS > Agents > Install an agent > Developer Agents > RUM > Get started
In order to implement Synthetic monitoring, different packages can be used.
user-webcheckpackage monitors the responses of a server using a HTTP(S) request issued by the ServicePilot agent.
- It is also possible to use the
user-web-scenariopackage which offers monitoring of a server's response times via a series of HTTP(S) requests issued by the ServicePilot agent.
All requests are issued by the ServicePilot agent at a defined interval, allowing continuous monitoring of the performance and availability of a website.
There are two different ways to set up Application Traces, one by manually provisioning some resources and the other by using auto-provisioning.
- The packages that can be used by APPTraces are:
- Add an SP Agent discovery type Auto-provisioning rule, with the Application tab APM Ports field completed with listening ports separated by ",". Configure Application trace code to send data to these ports opened by the ServicePilot Agent. See doc Provisioning > Manage resources > Auto-provisioning > Add an auto-provisioning rule
In order to collect network traces, all you need to do is use ServicePilot's auto-provisioning. When creating an auto-provisioning rule, be sure to set the discovery type to SP Agent and check the Network traces box.
In order to collect system metrics, the principle is the same as for network traces, this time you have to make sure that the System Packages box is checked.
How to collect distributed traces
Based on the application to be monitored, there are three ways to instrument the application code:
- Automatic: APM monitoring instrumentation is inserted into the application code automatically and may require a restart of the application depending on the instrumentation technology.
- Manual: The APM monitoring instrumentation needs to be manually added to the application code.
- SDK: Use Zipkin or OpenTelemetry libraries in the application code to send personalized traces to the ServicePilot Agent APM receiver.
ServicePilot accepts a number of different Open Source APM instrumentation protocols providing a choice of agents and methods to collect APM data:
When open source instrumentation is available
If one of the previously listed instrumentation technologies is already used, the ServicePilot Agent integrates natively.
When instrumentation is yet to be integrated
If APM instrumentation has yet to be added to your application, ServicePilot supports APM integration with a number of languages:
For further information, contact ServicePilot support.
How to view APM traces
Monitor topology (live)
Information is represented in two different forms: vertical dependencies and horizontal dependencies. The vertical dependencies are represented as a layered system with each layer having a category. At the top we find the Applications then the Services then the Processes etc... While the vertical dependencies will represent the communications between the elements of the same layer as links between the different hosts.
Problem identification and Root Cause Analysis (real time)
Thanks to the Topology page, a relational display by section of your different systems is created. It is possible to identify the different problems that the monitored systems on your network might encounter. You can then, thanks to the architecture display, quickly find out which server is the cause of the incident and solve the problem as soon as possible.
Seven tabs are used to navigate through the vertical views: Applications, Services, Processes, Hosts, Network, Security and Misc.
In each of these tabs there is an alert level color code of the elements displayed in the central part of the page. Each color can be selected and allows to display only the elements of the requested color.
|View Dependencies||Display of the different elements of the tabs according to the view level in which they are located in the view hierarchy. For example an element in MAIN > France will find the same path in this dependency.|
|APP Dependencies||Display of the different links between the elements in relation to their application traces. An element in communication with another will have a dependency relationship|
|NET Dependencies||Display of the different links between elements according to their network communication. An element in communication with another will see itself in dependency relationship|
|Table||Display in table form. View a real-time table of "status", "resource", "package", "host", "Mbps", "TCPReject", "TcpStartPublic", "RPM", "AvgDuration" and "errors".|
|Spring layout||Display of dependencies according to a dynamically discovered architecture. The elements are automatically placed according to their links.|
|Horizontal layout||Display of a view with elements organized from left to right. The communicating elements are represented by a more structured architecture than the Spring layout display.|
Tooltip: by hovering over a pad, the latter returns related information such as its type (Linux or Windows for example), its host, its resource and its package.
On the right screen is a split display. At the top, the vertical dependencies display allows you to see a tree structure representing the different layers.
Below, a menu with three tabs allows you to obtain additional information about the selected server.
Note: it is when an element is selected that the vertical dependencies and details are displayed.
|Status||Details of the selected item|
|Cause||Causes of the different status|
APM Metrics dashboards (Historical analysis)
The APM Metrics dashboard pages summarize all the data related to application and network traces that have been collected. To access the APM dashboards, simply go to ANALYSIS > Traces and select APPTrace and NetTrace resources.
|The collected data can be consulted globally by selecting the required category.|
|Data can also be viewed for a specific item in a category.|
Three categories are presented here, AppService, NetProcess and NetServer.
This is where the analysis of traces of monitored applications and the support of APM technology comes in. The data presented offers a precise analysis of the performance and behavior of monitored applications, including the number of requests per minute, user satisfaction and other application metrics.
In the NetProcess category, data related to the network traces of each supervised application process is available. Several metrics are displayed, in particular the quantity of "Retransmitted" and "Rejected" packets, but also the maximum throughput and the volume of data travelling through the supervised processes.
The NetServer category displays the same metrics as NetProcess however in this category the data is for each server as a whole and not just the monitored processes.
Transactions page (Historical analysis)
The Transactions page provides an analysis of all transactions that have taken place between clients and machines monitored by ServicePilot APM technology. To access the Transactions page, simply go to ANALYSIS > Traces and then choose the Transactions tab.
This Transactions page offers a highlight of the various transactions and in particular the different HTTP(S) responses, the response time, the hosts involved, the methods used and the paths taken. The data is summarized in graphs that provide a complete view of the transaction flow within an application.
It is also possible to obtain the details of each transaction with all relevant data (traceID, duration, httppath, httpstatuscode...):
This page has a filter feature that allows you to refine the data to better understand and analyze multiple transactions.
On the left side of the page, you can select predefined filters based on the recorded transactions. Just click on a filter to apply it. To remove a filter, simply deselect it. For more advanced filters it is possible to use the query bar.
APPTrace page (Real time trace analysis)
The APPTrace tab allows you to analyze in real time applications on servers, which are monitored by auto-provisioning or by microservice labeled packages. This page provides an overview of the monitored applications and allows you to better understand how they work and their communications with the external environment. To access the APPTrace page, simply go to ANALYSIS > Traces and then choose the APPTrace tab.
Using the menu on the left, it is possible to navigate through the different hosts and microservices that have APPTraces. It is possible to get information from all hosts at the same time, from only one host and from only one microservice.
In the main interface, 3 types of data are available. For each data type, the details of each application trace are available with information about the hosts involved in the collected trace (client IP, server IP, host name...). In addition, each data type has specific data:
|- Hosts||Displays application traces by host. Information about the system and the processes responsible for the collected traces are displayed, as well as indications about the requests themselves (HTTP method, HTTP response code, request time...)|
|- Processes||Displays application traces by process. The available information is identical to that described for the traces by hosts (Additionally the port used by the process located on the server is available)|
|- Conversations||Displays application traces by conversation. Information about the requests are displayed (HTTP method, HTTP response code, request time, request path...)|
Data can be displayed in different ways using the 5 buttons in the upper right part of the interface.
|- Details||Presents the data in the form of a table detailing each of the traces collected|
|- Topo graph||Presents data as a dynamic graph|
|- Topo top-down||Presents data as a hierarchical graph from top to bottom|
|- Topo left-right||Presents the data as a hierarchical graph from left to right|
|- Host map||Presents the data on a world map geolocating the hosts (the Host map is mainly useful and usable for microservices using public IP addresses)|
It is possible to pause the capture at any time by pressing the pause button, located at the top right of the window. You can also refresh data from the page to replace the currently viewed data.
NetTrace page (Real time trace analysis)
The NetTrace page allows you to view live network traces of resources monitored by ServicePilot agents. To access the NetTrace page, simply go to ANALYSIS > Traces and then choose the NetTrace tab.
The NetTrace page provides a quick and accurate view of all live traffic in a network. More precisely, NetTrace provides the visualization of the different connections established between servers monitored by ServicePilot agents and clients. For each connection detected, several data points will be available such as the IP of the hosts communicating with each other, but also more precise application data (conversations, blocked and rejected connections, bytes per second...)
The highlighting and resolving of problems within the network infrastructure is much more trivial with the use of NetTrace.
Data can be visualized in a more or less global way:
|The collected data can be viewed for an entire network by selecting the required network.|
|The data can also be viewed for a specific host of a network.|
After selecting the network or host, it is possible to view the data and the various associated links in several ways, either as a table of more detailed information about each communication, or as graphs for an overview of the status of the selected network.
NetTrace also offers a very interesting feature that allows you to capture network traffic at any time and make a PCAP trace according to various filters that can be set (IP, ports, protocol...):
|1. Open ANALYSIS > Traces|
|2. Open the NetTrace tab|
|3. Select a server on which the capture will take place from the left-hand list|
|4. Click on the Trace button at the top left|
|5. Modify or add filters on IPs and/or ports|
|6. Start the trace for as long as you want|
|7. Stop the trace, it will then be automatically downloaded as a PCAP file|
Free installation in a few clicks