|
Sign In to gain access to subscriptions and/or personal tools.
|
International Journal of High Performance Computing Applications, Vol. 18, No. 1,
21-45 (2004)
DOI: 10.1177/1094342004041291
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms
Markus Püschel
José M. F. Moura
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING CARNEGIE MELLON UNIVERSITY, PITTSBURGH, PA 15213-3890, USA
Bryan Singer
716 QUIET POND CT., ODENTON, MD 21113, USA
Jianxin Xiong
33315 DIGITAL COMPUTER LABORATORY, 1304 W SPRINGFIELD AVE, URBANA, IL 61801, USA
Jeremy Johnson
DEPARTMENT OF COMPUTER SCIENCE, DREXEL UNIVERSITY PHILADELPHIA, PA 19104-2875, USA
David Padua
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 3318 DIGITAL COMPUTER LABORATORY, URBANA, IL 61801, USA
Manuela Veloso
SCHOOL OF COMPUTER SCIENCE, CARNEGIE MELLON UNIVERSITY, PITTSBURGH, PA 15213-3890, USA
Robert W. Johnson
3324 21ST AVE. SOUTH ST. CLOUD, MN 56301, USA
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations the "best" match to the given computing platform. We present empirical data that demonstrate the high performance of SPIRAL generated code.
Key Words: program generation automatic performance tuning signal processing domain-specific language signal transform Fourier transform DFT FFT seach optimization
References
- Auslander, L., Johnson, J., and Johnson, R. W. 1996. Automatic implementation of FFT algorithms . Technical Report DUMCS-96-01, Deparrtment of Mathematics and Computer Science, Drexel University, Philadelphia, PA. Presented at the DARPA ACMP PI meeting.
- Bilmes, J., Asanovi\#263;, K., Chin, C. W., and Demmel, J. 1997. Optimizing matrix multiply using PHiPAC: a portable, highperformance, ANSI C coding methodology . In Proceedings of Supercomputing San Jose, CA, ACM SIGARC. See http://www.icsi.berkeley.edu/bilmes/phipac.
- Cooley, J. W. and Tukey, J. W. 1965. An algorithm for the machine calculation of complex Fourier series . Mathematics of Computation 19: 297301 .[CrossRef][ISI]
- Egner, S. 1997. Zur algorithmischen Zerlegungstheorie linearer Transformationen mit symmetrie. Ph.D. Thesis, Institut für Informatik, Univ. Karlsruhe, Germany.
- Egner, S. and Püschel, M. 1998. AREP Constructive Representation Theory and Fast Signal Transforms, GAP share package. See http://www.ece.cmu.edu/smart/arep/arep.html.
- Elliott, D. F. and Rao, K. R. 1982. Fast Transforms: Algorithms, Analyses, Applications, Academic Press, New York .
- Franchetti, F., Karner, H., Kral, S., and Ueberhuber, C. W. 2001. Architecture independent short vector FFTs . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, UT, Vol. 2, pp. 11091112 .
- Franchetti, F. and Püschel, M. 2002. A SIMD vectorizing compiler for digital signal processing algorithms . In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), Fort Lauderdale, FC, pp. 2026 .
- Franchetti, F. and Püschel, M. 2003. Short-vector code generation for the discrete Fourier transform . In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), Nice, France, pp. 5867 .
- Frigo, M. 1999. A fast Fourier transform compiler . In Proceedings of ACM SIGPLAN99 Conference on Programming Language Design and Implementation (PLDI), Atlanta, GA, pp. 169180 .
- Frigo, M. and Johnson, S. G. 1998. FFTW: an adaptive software architecture for the FFT . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seattle, WA, Vol. 3, pp. 13811384 . See http://www.fftw.org.
- The GAP Team, University of St. Andrews, Scotland. 1997. GAP Groups, Algorithms, and Programming. See http://www-gap.dcs.st-and.ac.uk/gap/.
- Gatlin, K. S. and Carter, L. 1999. Architecture-cognizant divide and conquer algorithms . In Proceedings of Supercomputing Portland, OR.
- Gacbi\#263;, A., Püschel, M., and Moura, J. M. F. 2003. Fast automatic implementations of FIR filters . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, Vol. 2, pp. 541544 .
- Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA .
- Haentjens, G. 2000. An investigation of recursive FFT implementations. Masters thesis, ECE Department, Carnegie Mellon University.
- Im, E.-J. and Yelick, K. 2001. Optimizing sparse matrix computations for register reuse in SPARSITY . In Proceedings of International Conference on Computational Science (ICCS), San Francisco, CA, pp. 127136 .
- Johnson, H. W. and Burrus, C. S. 1983. The design of optimal DFT algorithms using dynamic programming . IEEE Transactions on Acoustics, Speech, and Signal Processing 31: 378387 .[CrossRef]
- Johnson, J., Johnson, R. W., Rodriguez, D., and Tolimieri, R. 1990. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures . IEEE Transactions on Circuits and Systems 9(4): 449500 .
- Johnson, J. and Püschel, M. 2000. Search for the optimal WalshHadamard transform . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, Vol. IV, pp. 33473350 .
- Mirkovi&263;, D. and Johnsson, S. L. 2001. Automatic performance tuning in the UHFFT library . In Proceedings of International-Conference on Computational Science (ICCS), San Francisco, CA, Lecture Notes in Computer Science, Vol. 2073, Springer-Verlag, Berlin, pp. 7180 .
- Moura, J. M. F., Johnson, J., Johnson, R. W., Padua, D., Prasanna, V., Püschel, M., and Veloso, M. M. 1998. SPIRAL: Portable Library of Optimized Signal Processing Algorithms. See http://www.spiral.net.
- Moura, J. M. F. and Püschel, M. 2003. SPIRAL: an overview . In Proceedings of the Workshop on Optimizations for DSP and Embedded Systems, held in conjunction with CGO San Francisco, CA.
- Park, N., Kang, D., Bondalapati, K., and Prasanna, V. K. 2000. Dynamic data layouts for cache-conscious factorization of DFT . In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), Cancun, Mexico, pp. 693701 .
- Park, N. and Prasanna, V. K. 2001. Cache conscious Walsh Hadamard transform . In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, CA, Vol. 2, pp. 12051208 .
- Rao, K. R. and Hwang, J. J. 1996. Techniques and Standards for Image, Video and Audio Coding, Prentice-Hall PTR, Englewood Cliffs, NJ .
- Révész, G. E. 1983. Introduction to Formal Languages, McGraw-Hill, New York .
- Sepiashvili, D. 2000. Performance models and search methods for optimal FFT implementations. Masters thesis, ECE Department, Carnegie Mellon University.
- Singer, B. and Veloso, M. M. 2001. Stochastic search for signal processing algorithm optimization . In Proceedings of Supercomputing Denver, CO.
- Singer, B. and Veloso, M. M. 2002. Automating the modeling and optimization of the performance of signal transforms . IEEE Transactions on Signal Processing 50(8): 20032014 .[CrossRef]
- Tolimieri, R., An, M., and Lu, C. 1997. Algorithms for Discrete Fourier Transforms and Convolution, 2nd edition, Springer-Verlag, Berlin .
- Van Loan, C. 1992. Computational Framework of the Fast Fourier Transform, SIAM, Philadelphia .
- Vetterli, M. and Nussbaumer, H. J. 1984. Simple FFT and DCT algorithms with reduced number of operations . Signal Processing 6: 267278 .
- Wang, Z. 1984. Fast algorithms for the discrete W transform and for the discrete Fourier transform . IEEE Transactions on Acoustics, Speech, and Signal Processing 32(4): 803816 .[CrossRef]
- Wang, Z. and Hunt, B. R. 1985. The discrete W transform . Applied Mathematics and Computation 16: 1948 .[CrossRef]
- Whaley, R. C. and Dongarra, J. 1998. Automatically tuned linear algebra software (ATLAS) . In Proceedings of Supercomputing Orlando, FL, http://math-atlas.sourceforge.net/.
- Xiong, J. 2001. Automatic optimization of DSP algorithms. Ph.D. Thesis, Computer Science, University of Illinois at Urbana-Champaign. Also as Technical Report UIUCDCSR-2001-224, University of Illinois.
- Xiong, J., Johnson, J., Johnson, R., and Padua, D. 2001. SPL: a language and compiler for DSP algorithms . In Proceedings of PLDI, Snowbird, UT, pp. 298308 .

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
|