| Sign In to gain access to subscriptions and/or personal tools. |
Statistical Models for Empirical Search-Based Performance TuningCOMPUTER SCIENCE DIVISION DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES UNIVERSITY OF CALIFORNIA AT BERKELEY, BERKELEY, CA 94720, USA
COMPUTER SCIENCE DIVISION DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES AND DEPARTMENT OF MATHEMATICS UNIVERSITY OF CALIFORNIA AT BERKELEY, BERKELEY, CA 94720, USA
DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF WASHINGTON, SEATTLE, WA, USA Achieving peak performance from the computational kernels that dominate application performance often requires extensive machine-dependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate by (1) generating a large number of possible, reasonable implementations of a kernel, and (2) selecting the fastest implementation by a combination of heuristic modeling, heuristic pruning, and empirical search (i.e. actually running the code). This paper presents quantitative data that motivate the development of such a search-based system, using dense matrix multiply as a case study. The statistical distributions of performance within spaces of reasonable implementations, when observed on a variety of hardware platforms, lead us to pose and address two general problems which arise during the search process. First, we develop a heuristic for stopping an exhaustive compiletime search early if a near-optimal implementation is found. Secondly, we show how to construct run-time decision rules, based on run-time inputs, for selecting from among a subset of the best implementations when the space of inputs can be described by continuously varying features. We address both problems by using statistical modeling techniques that exploit the large amount of performance data collected during the search. We demonstrate these methods on actual performance data collected by the PHiPAC tuning system for dense matrix multiply. We close with a survey of recent projects that use or otherwise advocate an empirical search-based approach to code generation and algorithm selection, whether at the level of computational kernels, compiler and run-time systems, or problem-solving environments. Collectively, these efforts suggest a number of possible software architectures for constructing platform-adapted libraries and applications.
Key Words: algorithm selection automatic performance tuning early stopping feedback-directed optimization matrix multiplication performance distribution performance optimization software engineering support vector method
International Journal of High Performance Computing Applications, Vol. 18, No. 1,
65-94 (2004) |
|||