Characterizing Supercomputer Traffic Networks Through Link-Level Analysis

Abstract

We present techniques for characterizing bandwidth and congestion characteristics of supercomputer High-Speed Networks (HSN). By utilizing a link-level perspective, we gain generality over analyses which are tied to specific topologies. We illustrate these techniques using five months of a Blue Waters pro- duction dataset consisting of network utilization and congestion counters. We find that: i) execution time of the communication- heavy applications is highly correlated to network stalls observed in the network topology and increase in application runtime can be as high as 1.7x with nominal increase in stalls, ii) heterogeneity in the available link bandwidth in the network can lead to back- pressure and congestion even when the network is not under- provisioned , and (iii) links connected to I/O nodes are no more likely to observe congestion during operational hours than any other link in the system.

Publication
HPCMASPA 2018, CLUSTER 2018
Date
Links