MTP collects a variety of statistics about each transaction which can be useful for diagnosing network, storage, or CPU problems. When diagnostic logging is enabled, the receiving end of an MTP transaction will report these statistics in the log output at the end of the transaction. Here is an example:
MTP1: 0DE551F7 2 MTP Statistics MTP1: Sequence: 524288000 Received: 524288000 Speed: 282234555/ 926385909 5% MTP1: Wsize: 2703360/5087104 Transit: 1408 Datagram: 1408/1408 MTP1: RTT: 76/74/74/76/99 Wtime: 0.077s RDh: 12 Rnh: 261888 Rpt: 152 Gap: 75 MTP1: Req: 2469 Resp: 372431 Del: 241 Rptd: 241 Dup: 0 MTP1: Wait: 0 Stall: 0 Pause: 0 Slow: 0 Over: 0 Loss: 0.098/0.001 MTP1: BWDP: 1784 Flow: 206426 RTNR: 0 Gaps: 8 Lower: 241 Raise: 372430 MTP1: Write Count: Total: 1202 Low: 525 Mid: 282 High: 9 Peak: 1441792 Avg: 436179
Each MTP update may make changes to the format and meaning of this diagnostic output. This Tech Note provides an overview for MTP version 4.5.0. Contact support for an analysis of your particular logs.
Transaction ID
MTP1: 0DE551F7 2 MTP Statistics
The first line provides the MTP transaction ID. MTP transactions are the communication primitives used to build more complex interactions like uploads and downloads. For example, a typical ExpeDat file transfer consists of two or three MTP transactions, some on the sender and some on the receiver. The transaction of interest will be the one on the receiving side, typically moving a significant amount of data.
For most applications, you can map the MTP transaction ID to a DOC transaction ID by looking for a line like the following at the start:
DOC1: 04AC0D64 Initiated Upload 0DE551F7 (type 3) to 107.170.213.228:48491 @ 88099
Most MTP applications use the DOC ID to identify data transfers in their logs. For example, servedat identified the completion of the above transaction like this:
F 20240814 14:04:01.087 04AC0D64 SEND 107.170.213.228:48491:M dei - "/tmp/test500mb.dat" "" 524288000 (0,0) "11.82 seconds (355 megabits/sec)"
Remember that MTP Statistics are only reported on the receiving side, so upload statistics will appear on the server and download statistics will appear on the client.
Data Transfer
The second line shows how much data transferred and how fast.
MTP1: Sequence: 524288000 Received: 524288000 Speed: 282234555/ 926385909 5%
Sequence counts the number of bytes successfully delivered to the parent application. Received counts the total number of bytes received, including out-of-sequence data which may be buffered but not yet utilized. For most application purposes, Sequence is the important number.
Speed is the data transfer rate in bits per second averaged over about 20 milliseconds. The first value was measured at the end of the transfer and may include time waiting for storage. The second value is the peak. For diagnostic purposes, the peak value offers insight into the capabilities of the infrastructure: you know the path is capable of moving data at least this fast. Both values have an error margin of ± 5%.
Pipeline
MTP1: Wsize: 2703360/5087104 Transit: 1408 Datagram: 1408/1408
Wsize describes the ending and peak numbers of bytes which were pipelined on the network path at any given time. A large difference in values here indicates throughput variation over the life of the transfer.
Transit is the number of bytes pending near the end of the transaction. This should be small unless the transfer ended with an error.
Datagram describes the current and maximum number of payload bytes in each MTP datagram. The full network packet size, including MTP, UDP, and IP headers, is typically 56 bytes larger than this value. Values above 1408 indicate attempts to utilize jumbo datagrams, which requires a compatible data path.
Path Latency
MTP1: RTT: 76/74/74/76/99 Wtime: 0.077s RDh: 12 Rnh: 261888 Rpt: 152 Gap: 75
The path Round Trip Time (RTT) is measured in a variety of ways which reflect changing conditions on the network path:
On saturated paths without QoS management, the RTT highs are often significantly higher than the lows. RTT values which show little variation indicate that the network path is not saturated and other factors are limiting flow.
Wtime is simply the ratio of Wsize to speed and should be close to the average RTT.
RDh and Rnh are measures of MTP's maximum buffer depth, in milliseconds and bytes. Large values here indicate a limitation of CPU capacity.
Rpt is the baseline number of milliseconds MTP will wait for a datagram before declaring it lost. The actual timeout varies depending on circumstances and the baseline may be affected by various MTP performance options.
Gap measures the longest interval without receiving any data. This should be close the minimum RTT. Larger values indicate times when no data is getting through the network. A Gap higher than the maximum RTT suggests connectivity problems.
Error Recovery
MTP1: Req: 2469 Resp: 372431 Del: 241 Rptd: 241 Dup: 0
Req counts the number of request datagrams sent from the receiver to the sender and Resp counts the number of payload datagrams received.
Del counts how many requests did not receive a complete response on time and Rptd counts how many of those were ultimately considered lost and had to be repeated. It is possible for a datagram to be lost and repeated multiple times. Dup counts datagrams which arrive more than once. Ideally, these values should be very low or zero.
Flow Control
MTP1: Wait: 0 Stall: 0 Pause: 0 Slow: 0 Over: 0 Loss: 0.098/0.001
Wait, Stall, and Pause count underflow and overflow events related to dynamic or streaming transactions such as ExpeDat's Streaming Folders or when downloading from Object Storage with CloudDat.
Wait counts the number of times the flow of data was stopped by the sending application due to a lack of data. A value larger than zero or one indicates that the data source could not keep up with the network.
Stall is similar to Wait, counting the number of times the server had some data available, but not as much as the network could handle.
Pause is the opposite of Wait: it counts how many times the receiving application stopped the flow of data because it could not be delivered fast enough.
Slow and Over track the condition of MTP's internal datagram buffering. Slow counts the number of times the sender requested a slowdown due to overflowing buffers. A non-zero value indicates that the sender is overloaded. Over counts the number of time this receiver's buffers became too large. This can happen if MTP is targeting filesystem storage that is unable to keep up with the network.
Loss is actually related to the error recovery statistics on the previous line. The first value is the ratio of Rptd to Req and is often an upper bound for packet loss. The second value is the ratio of bytes lost to bytes received and is often a lower bound for packet loss. Ideally these numbers should be close to zero. A value above 0.02 for the second number would indicate especially high packet loss on the network.
More Flow Control
MTP1: BWDP: 1784 Flow: 206426 RTNR: 0 Gaps: 8 Lower: 241 Raise: 372430
BWDP, Flow, and RTNR count the number of times MTP declined to increase the transfer rate due to various internal statistics.
Gaps counts the number of times no data arrived for more than half an RTT. A large Gaps count indicates an unstable network. When accompanied by high Loss and Gap, it may indicate intermittent loss of connectivity.
Lower and Raise count the number of times MTP considered lowering or raising the rate of low. Raise is often close to the Resp value.
Filesystem Performance
MTP1: Write Count: Total: 1202 Low: 525 Mid: 282 High: 9 Peak: 1441792 Avg: 436179
These values are given when MTP is writing data directly to a file. Streaming Folders, compression, piping, and object handlers write to buffers and so will not show this line.
Total gives the total number of write operations and Avg gives the average number of bytes per write. MTP varies the size of each write depending on relative speeds of the network and storage. When storage is slower, the block sizes may become larger to increase I/O efficiency.
Low counts the number of writes which were smaller than the preferred block size. This may be caused by packet loss, out-of-order packets, and other network problems.
Mid counts the number of writes which are close to the preferred block size. Ideally, this should be close to Total.
High counts the number of writes near the largest permitted block size. Large writes occur when storage is slower than the network and may indicate that throughput would benefit from faster storage.
Peak is the size of the largest write operation. This value may be limited by the operating system and datagram size.
Analysis
These statistics can be used to diagnose a variety of network conditions and identify the causes of performance issues. For example, high Loss values accompanied by high Gaps suggest connectivity problems. High Loss accompanied by flat (very similar) RTT values suggests QoS throttling by a router. Many Wait, Stall, Pause, Slow, or Over events indicate storage problems at the source or destination.
DEI engineers can use these statistics, along with other diagnostic logs, to help isolate bottlenecks in the the network, storage, or CPU. See Tech Note 0033 for instructions on collecting and submitting logs for various DEI applications then contact support for analysis.
Tech Note History
Aug | 24 | 2024 | MTP 4.5.0: New Filesystem Performance statistics |
Mar | 18 | 2021 | MTP 4.3.2 |