|
Sign In to gain access to subscriptions and/or personal tools.
|
Automatic Performance Tuning for Fast Fourier Transforms
Dragan Mirkovi\#263;
Lennart Johnsson
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF HOUSTON, HOUSTON, TX 77204, USA
In this paper we discuss architecture-specific performance tuning for fast Fourier transforms (FFTs) implemented in the UHFFT library. The UHFFT library is an adaptive and portable software library for FFTs developed by the authors. We present the optimization methods used at different levels, starting with the algorithm selection used for the library code generation and ending with the actual implementation and specification of the appropriate compiler optimization options. We report on the performance results for several modern microprocessor architectures.
Key Words: fast Fourier transform (FFT) discrete Fourier transform (DFT) automatic performance tuning software libraries
References
- Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine computation of complex Fourier series . Mathematics of Computation 19: 297301 .[CrossRef][Web of Science]
- Duhamel, P. and Hollmann, H. 1984. Split radix FFT algorithms . Electronic Letters 20: 1416 .
- Duhamel, P. and Vetterli, M. 1990. Fast Fourier transforms: a tutorial review and a state of the art . Signal Processing 19: 259299 .[CrossRef]
- Frigo, M. 1999. A fast Fourier transform compiler . In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, GA, May 14, pp. 169180 .
- Frigo, M. and Johnson, S. G. 1997. The fastest Fourier transform in the West. Technical Report MIT-LCS-TR-728, MIT, Cambridge, MA .
- Frigo, M. and Johnson, S. G. 1998. FFTW: an adaptive software architecture for the FFT . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seattle, WA, May 1215, Vol. 3, pp. 13811384 . See http://www.fftw.org.
- Gatlin, K. S. and Carter, L. 2001. Faster FFTs via architecturecognizance . In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT00), Philadelphia, PA, October 1618, pp. 249260 .
- Good, I. J. 1958. The interaction algorithm and practical Fourier analysis . Journal of the Royal Statistical Society, Series B 20: 361375 .
- Intel Corporation. 2000. IA-64 Architecture Software Developers Manual Volume 1: IA-64 Application Architecture, Rev. 1.1, Document Number 245470, Intel Corporation.
- Intel Corporation. 2001. IA-32 Intel Architecture Software Developers Manual Volume 1: Basic Architecture, Order Number 245470, Intel Corporation.
- Leroy, X. November 1995. Le système Caml Special Light: modules et compilation efficace en Caml. Technical Report 2721, INRIA.
- Motorola, Inc. 2001. MPC7450 RISC Microprocessor Technical Summary, Order Number MPC7450TS/D, Motorola, Inc.
- Rader, C. M. 1968. Discrete Fourier transforms when the number of data samples is prime . Proceedings of the IEEE 56: 11071108 .
- Selesnick, I. and Burrus, C. S. 1996. Automatic generation of prime length FFT programs . IEEE Transactions on Signal Processing 44(1): 1424 .
- Temperton, C. 1983. A note on prime factor FFT algorithms . Journal of Computational Physics 52: 198204 .
- Thomas, L. H. 1963. Using a computer to solve problems in physics. In Application of Digital Computers, Ginn and Co., Boston, MA .
- Tolimieri, R., An, M., and Lu, C. 1989. Algorithms for Discrete Fourier Transforms and Convolution, 1st edition, Springer-Verlag, New York .
- Van Loan, C. 1992. Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia .
International Journal of High Performance Computing Applications, Vol. 18, No. 1,
47-64 (2004)
DOI: 10.1177/1094342004041292

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
|
|