In machine learning, it is often necessary to statistically compare the overall performance of two algorithms (e.g., our proposed algorithm and each compared baseline) based on multiple benchmark ...