|
Sign In to gain access to subscriptions and/or personal tools.
|
Towards an Accurate Model for Collective Communications
Sathish S. Vadhiyar
Graham E. Fagg
Jack J. Dongarra
COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF TENNESSEE, KNOXVILLE, USA
The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. Hence, collective communications have to be tuned for the system on which they will be executed. In order to determine the optimum parameters of collective communications on a given system in a time-efficient manner, the collective communications need to be modeled efficiently. In this paper, we discuss various techniques for modeling collective communications.
Key Words: MPI collectives communications tuning modeling/broadcast
References
- Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. May 1993. LogP: towards a realistic model of parallel computation . In Proceedings of the Symposium on Principles and Practice of Parallel Programming, San Diego, CA, pp. 112 .
- Fagg, G. E. and Dongarra, J. J. 2000. FT-MPI: fault tolerant MPI, supporting dynamic applications in a dynamic world . In Proceedings of EuroPVM-MPI 2000, Lecture Notes in Computer Science Vol. 1908, Springer-Verlag, Berlin, pp. 346353 .
- Fagg, G. E., Vadhiyar, S. S., and Dongarra, J. J. 2000. ACCT: automatic collective communications tuning . In Proceedings of EuroPVM-MPI 2000, Lecture Notes in Computer Science Vol. 1908, Springer-Verlag, Berlin, pp. 354361 .
- Frigo, M. 1998. FFTW: an adaptive software architecture for the FFT . In Proceedings of the ICASSP Conference, Vol. 3, pp. 1381-1381 .
- Hensgen, D., Finkel, R., and Manber, U. 1988. Two algorithms for barrier synchronization . International Journal of Parallel Programming 17(1): 117 .
- Huse, L. P. September 1999. Collective communication on dedicated clusters of workstations . In Proceedings of the 6th European PVM/MPI Users Group Meeting, Barcelona, Spain, LNCS Vol. 1697, Springer-Verlag, Berlin. pp. 469476 .
- Kielmann, T., Bal, H. E., and Gorlatch, S. May 2000. Bandwidth-efficient collective communication for clustered wide area systems. In IPDPS 2000, Cancun, Mexico .
- Rabenseifner, R. 1997. A new optimized MPI reduce algorithm. http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html.
- Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. 1998. MPI the complete reference. In The MPI Core, Vol. 1, 2nd edition.
- Vadhiyar, S. S., Fagg, G. E., and Dongarra, J. J. November 2000. Automatically tuned collective communications . In Proceedings of SuperComputing2000, Dallas, TX.
- Whaley, R. C. and Dongarra, J. 1998. Automatically tuned linear algebra software. In SC98: High Performance Networking and Computing, Orlando, FL. See .
International Journal of High Performance Computing Applications, Vol. 18, No. 1,
159-167 (2004)
DOI: 10.1177/1094342004041297

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
|
|