International Journal of High Performance Computing Applications

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here for more information

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Strout, M. M.
Right arrow Articles by Kreaseck, B.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
International Journal of High Performance Computing Applications, Vol. 18, No. 1, 95-113 (2004)
DOI: 10.1177/1094342004041294

Sparse Tiling for Stationary Iterative Methods

Michelle Mills Strout

ARGONNE NATIONAL LABORATORY

Larry Carter

Jeanne Ferrante

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Barbara Kreaseck

LA SIERRA UNIVERSITY

In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparsetiled Gauss–Seidel generates a solution that is bitwise identical to traditional Gauss–Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al.

Key Words: tiling • iterative alogorithms • static and dynamic analysis • irregular grids • data locality • sparse matrix • computer architecture

References

  • Adams, M. F. 2001. A distributed memory unstructured Gauss– Seidel algorithm for multigrid smoothers . In Proceedings of 2001 Conference on Supercomputing, Denver, CO, ACM/ IEEE, New York.
  • Ahmed, N., Mateev, N., and Pingali, K. 2000a. Synthesizing transformations for locality enhancement of imperfectlynested loop nests . In Proceedings of the 2000 ACM International Conference on Supercomputing (ICS), pp. 141–152 .
  • Ahmed, N., Mateev, N., and Pingali, K. 2000b. Synthesizing transformations for locality enhancement of imperfectlynested loop nests . In Conference Proceedings of the 2000 International Conference on Supercomputing, Santa Fe, NM, ACM SIGARCH, pp. 141–152 .
  • Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edition, SIAM, Philadelphia, PA .
  • Bassetti, F., Davis, K., and Quinlan, D. 1998. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. Lecture Notes in Computer Science, Vol. 1505, Spinger-Verlag, Berlin .
  • Carr, S. and Kennedy, K. 1992. Compiler blockability of numerical algorithms . In Proceedings of 1992 Conference on Supercomputing, Minneapolis, MN, IEEE Computer Society Press, Los Alamitos, CA, pp. 114–124
  • Das, R., Uysal, M., Saltz, J., and Hwang, Y.-S. S. 1994. Communication optimizations for irregular scientific computations on distributed memory architectures . Journal of Parallel and Distributed Computing 22(3): 462–478 .[CrossRef]
  • Ding, C. and Kennedy, K. 1999. Improving cache performance in dynamic applications through data and computation reorganization at run time . In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, Atlanta, GA, ACM SIGPLAN Notices, pp. 229–241 .
  • Douglas, C. C., Hu, J., Kowarschik, M., Rüde, U., and Weiss, C. 2000. Cache o ptimization for structured and unstructured grid multigrid . Electronic Transaction on Numerical Analysis 10: 21–40 .
  • Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformation . Journal of Parallel and Distributed Computing 5(5): 587–616 .[CrossRef]
  • Garey, M. R., Johnson, D. S., and Stockmeyer, L. 1976. Some simplified NP-complete graph problems . Theoretical Computer Science 1: 237–267 .[CrossRef]
  • Gatlin, K. S. and Carter, L. 1999. Architecture-cognizant divide and conquer algorithms . In Proceedings of 1999 Conference on Supercomputing, Portland, OR, ACM/IEEE, New York.
  • Han, H. and Tseng, C.-W. 2000. Efficient compiler and runtime support for parallel irregular reductions . Parallel Computing 26(13–14): 1861–1887 .[CrossRef]
  • Hoare, C. A. R. 1969. An axiomatic basis of computer programming . Communications of the ACM 12: 576–580 .[CrossRef][ISI]
  • Im, E.-J. 2000. Optimizing the performance of sparse matrix– vector multiply. Ph.D. Thesis, University of California, Berkeley.
  • Im, E.-J. and Yelick, K. 2001. Optimizing sparse matrix computations for register reuse in sparsity. In Computational Science – ICCS 2001, San Jose, CA, V. N. Alexandrov et al., editors, Lecture Notes in Computer Science, Springer-Verlag, Berlin , pp. 127–136.
  • Irigoin, F. and Triolet, R. 1988. Supernode partitioning . In Proceedings of the 15th Annual ACM SIGPLAN Symposium on Principles of Programming Languages, San Diego, CA, pp. 319–329 .
  • Jin, G., Mellor-Crummey, J., and Fowler, R. 2001. Increasing temporal locality with skewing and recursive blocking . In Proceedings of 2001 ACM/IEEE Conference on Supercomputing, Denver, CO.
  • Karypis, G. and Kumar, V. 1998. Multilevel k-way partitioning scheme for irregular graphs . Journal of Parallel and Distributed Computing 48(1): 96–129 .[CrossRef][ISI]
  • Kelly, W. and Pugh, W. 1994. Finding legal reordering transformations using mappings . In Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, Ithaca, NY, K. Pingali et al., editors, LNCS Vol. 892, Springer-Verlag, Berlin, pp. 107–124 .
  • Kelly, W. and Pugh, W. 1995. A unifying framework for iteration reordering transformations, Technical Report CSTR-3430, University of Maryland, College Park .
  • Kodukula, I., Ahmed, N., and Pingali, K. 1997. Data-centric multi-level blocking . In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Las Vegas, NV, pp. 346–357 .
  • London, K., Dongarra, J., Moore, S., Mucci, P., Seymour, K., and Spencer, T. 2001. End-user tools for application performance analysis using hardware counters . In International Conference on Parallel and Distributed Computing Systems, Kyongju City, Korea.
  • McKinley, K. S., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations . ACM Transactions on Programming Languages and Systems 18(4): 424–453 .[CrossRef]
  • Mellor-Crummey, J., Whalley, D., and Kennedy, K. 2001. Improving memory hierarchy performance for irregular applications using data and computation reorderings . International Journal of Parallel Programming 29(3): 217–247 .[CrossRef]
  • Mirchandaney, R., Saltz, J. H., Smith, R. M., Nicol, D. M., and Crowley, K. 1988. Principles of run-time support for parallel processors . In Proceedings of the 1988 ACM International Conference on Supercomputing (ICS), Saint Malo, France, pp. 140–152 .
  • Mitchell, N., Carter, L., and Ferrante, J. 1999. Localizing nonaffine array references . In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, Newport Beach, CA, pp. 192–202 .
  • Pugh, W. and Rosser, E. 1999. Iteration space slicing for locality . In Proceedings of the 12th Workshop on Languages and Compilers for Parallel Computing (LCPC) San Diego, CA.
  • Pugh, W. and Shpeisman, T. 1998. SIPR: A new framework for generating efficient code for sparse matrix computations . In Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC) Chapel Hill, NC.
  • Pugh, B. and Wonnacott, D. 1994. Nonlinear array dependence analysis, Technical Report CS-TR-3372, Department of Computer Science, University of Maryland .
  • Rivera, G. and Tseng, C.-W. 2000. Tiling optimizations for 3D scientific computations . In Proceedings of 2000 Conference on Supercomputing, Dallas, TX, ACM/IEEE, New York, pp. 60–61 .
  • Sellappa, S. and Chatterjee, S. 2001. Cache-efficient multigrid algorithms. In Computational Science – ICCS 2001, San Jose, CA, V. N. Alexandrov et al., editors, Lecture Notes in Computer Science, Springer-Verlag, Berlin , pp. 107–116.
  • Song, Y. and Li, Z. 1999. New tiling techniques to improve cache temporal locality . In Proceedings of the ACM SIGPLAN ’99 Conference on Programming Language Design and Implementation, Atlanta, GA, pp. 215–228 .
  • Strout, M. M. 2003. Performance transformations for irregular applications, Ph.D. Thesis, University of California, San Diego.
  • Strout, M. M., Carter, L., and Ferrante, J. 2001. Rescheduling for locality in sparse matrix computations. In Computational Science – ICCS 2001, San Jose, CA, V. N. Alexandrov et al., editors, Lecture Notes in Computer Science, Springer-Verlag, Berlin .
  • Strout, M. M., Carter, L., Ferrante, J., Freeman, J., and Kreaseck, B. 2002. Combining performance aspects of irregular Gauss–Seidel via sparse tiling . In Proceedings of the 15th Workshop on Languages and Compilers for Parallel Computing (LCPC) College Park, MD.
  • Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm . In Proceedings of ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada, ACM SIGPLAN Notices, Vol. 26, pp. 30–44 .
  • Wolfe, M. J. 1987. Iteration space tiling for memory hierarchies . In Proceedings of the 3rd Conference on Parallel Processing for Scientific Computing, Los Angeles, CA, SIAM, Philadelphia, PA, pp. 357–361 .
  • Wolfe, M. 1989. More iteration space tiling . In Proceedings of 1989 Conference on Supercomputing, Reno, NV, ACM, New York, pp. 655–664 .
  • Wonnacott, D. 2002. Achieving scalable locality with time skewing . International Journal of Parallel Programming 30(3): 181–221 .
  • Wu, C. H. 1990. A multicolour SOR method for the finite-element method . Journal of Computational and Applied Mathematics 30(3): 283–294 .[CrossRef]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Strout, M. M.
Right arrow Articles by Kreaseck, B.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?