| Sign In to gain access to subscriptions and/or personal tools. |
Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments ska
DEPT. OF COMPUTER SCIENCE VRIJE UNIVERSITEIT, AMSTERDAM, THE NETHERLANDS
DEPT. OF COMPUTER SCIENCE VRIJE UNIVERSITEIT, AMSTERDAM, THE NETHERLANDS; KIELMANN{at}CS.VU.NL
DEPT. OF COMPUTER SCIENCE VRIJE UNIVERSITEIT, AMSTERDAM, THE NETHERLANDS Divide-and-conquer is a well-suited programming paradigm for parallel Grid applications. Our Satin system efficiently schedules the fine-grained tasks of a divide-andconquer application across multiple clusters in a grid. To accommodate long-running applications, we present a fault-tolerance mechanism for Satin that has negligible overhead during normal execution, while minimizing the amount of redundant work done after a crash of one or more nodes. We study the impact of our fault-tolerance mechanism on application efficiency, both on the Dutch DAS-2 system and using the European testbed of the ECfunded project GridLab.
Key Words: fault-tolerance divide-and-conquer grid computing Java
International Journal of High Performance Computing Applications, Vol. 20, No. 1,
103-114 (2006) |
|||
ska