<?xml version="1.0" encoding="ISO-8859-1"?>

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
 xmlns:prism="http://purl.org/rss/1.0/modules/prism/"
 xmlns:admin="http://webns.net/mvcb/"
>

<channel rdf:about="http://hpc.sagepub.com">
<title>International Journal of High Performance Computing Applications RSS feed -- OnlineFirst Articles</title>
<link>http://hpc.sagepub.com</link>
<description>International Journal of High Performance Computing Applications RSS feed -- OnlineFirst Articles</description>
<prism:publicationName>International Journal of High Performance Computing Applications</prism:publicationName>
<prism:issn>1094-3420</prism:issn>
<items>
 <rdf:Seq>
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009347892v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009352483v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009350469v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009350886v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009348231v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009347890v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009347891v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009106597v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009106416v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009106293v1?rss=1" />
  <rdf:li rdf:resource="http://hpc.sagepub.com/cgi/content/abstract/1094342009106066v1?rss=1" />
 </rdf:Seq>
</items>
<image rdf:resource="http://hpc.sagepub.com:80/icons/banner/title.gif" />
</channel>

<image rdf:about="http://hpc.sagepub.com:80/icons/banner/title.gif">
<title>International Journal of High Performance Computing Applications</title>
<url>http://hpc.sagepub.com:80/icons/banner/title.gif</url>
<link>http://hpc.sagepub.com</link>
</image>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009347892v1?rss=1">
<title><![CDATA[Operation Stacking for Ensemble Computations with Variable Convergence]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009347892v1?rss=1</link>
<description><![CDATA[
<p><P>
Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, index-based matrix representations, which degrade cache utilization by imposing irregular memory accesses and increasing the number of overall accesses. Compounding the problem, the small number of floating-point operations in a single sparse iteration leads to low floating-point pipeline utilization. Operation stacking addresses these problems for large ensemble computations that solve multiple systems of linear equations with identical sparsity structure. By combining the data of multiple problems and solving them as one, operation stacking improves locality, reduces cache misses, and increases floating-point pipeline utilization. Operation stacking also requires less memory bandwidth because it involves fewer index array accesses. In this paper we present the Operation Stacking Framework (OSF), an object-oriented framework that provides runtime and code generation support for the development of stacked iterative solvers. OSF's runtime component provides an iteration engine that supports efficient ejection of converged problems from the stack. It separates the specific solver algorithm from the coding conventions and data representations that are necessary to implement stacking. Stacked solvers created with OSF can be used transparently without requiring significant changes to existing applications. Our results show that stacking can provide speedups up to 1.94x with an average of 1.46x, even in scenarios in which the number of iterations required to converge varies widely within a stack of problems. Our evaluation shows that these improvements correlate with better cache utilization, improved floating-point utilization, and reduced memory accesses.
</P>
]]></description>
<dc:creator><![CDATA[Belgin, M., Back, G., Ribbens, C. J.]]></dc:creator>
<dc:date>Tue, 03 Nov 2009 02:06:51 PST</dc:date>
<dc:identifier>info:doi/10.1177/1094342009347892</dc:identifier>
<dc:title><![CDATA[Operation Stacking for Ensemble Computations with Variable Convergence]]></dc:title>
<prism:publicationDate>2009-11-03</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009352483v1?rss=1">
<title><![CDATA[High-Performance Quantum Simulation for Coupled Josephson Junctions on the Earth Simulator: a Challenge to Schrodinger Equation on 2564 Grids]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009352483v1?rss=1</link>
<description><![CDATA[
<p><P>
In order to explore quantum dynamics of coupled Josephson junctions, we develop a program solving directly the time-dependent Schr&ouml;dinger equation by diagonalizing the Hamiltonian matrix and obtaining its ground and multiple low-lying excitation states. The Schr&ouml;dinger equation is defined on <I>m</I><SUP>n</SUP> grids, in which <I>m</I> is the number of grid points discretized on a characteristic phase space of each junction and <I>n</I> is the number of coupled junctions. In this paper, the calculated maximum system is that <I>m</I> = 256 and <I>n</I> = 4, i.e. the number of degrees of freedom reaches 256<SUP>4</SUP> (=4,294,967,296). We examine possible effective numerical schemes and make a parallel tuning to optimize the communication on the Earth Simulator. We sustain floating-point operation performance exceeding 20% of the peak on 512 nodes (4,096 PEs). From systematic calculations, we find a new concept that "quantum-assisted synchronization" occurs with downsizing the junction plane. This is a discovery adding a quantum flavor to the classical concept "synchronization".
</P>
]]></description>
<dc:creator><![CDATA[Imamura, T., Kano, T., Yamada, S., Okumura, M., Machida, M.]]></dc:creator>
<dc:date>Mon, 02 Nov 2009 03:18:32 PST</dc:date>
<dc:identifier>info:doi/10.1177/1094342009352483</dc:identifier>
<dc:title><![CDATA[High-Performance Quantum Simulation for Coupled Josephson Junctions on the Earth Simulator: a Challenge to Schrodinger Equation on 2564 Grids]]></dc:title>
<prism:publicationDate>2009-11-02</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009350469v1?rss=1">
<title><![CDATA[End-to-End Cache System for Grid Computing: Design and Efficiency Analysis of a High-Throughput Bioinformatic Docking Application]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009350469v1?rss=1</link>
<description><![CDATA[
<p><P>
Cache techniques are an efficient tool to reduce latency times in transfer operations through Grid systems. Although different approximations to introduce cache facilities into Grid computing have already been studied, they require intrusive modifications of Grid software and hardware. Here, we propose an end-to-end cache system that is implemented over scheduling services. This cache system requires neither changes in the Grid software nor introduction of new software in the Grid resources. Parallel Grid adaptation of many high-throughput computing applications that use the same data intensively could enjoy great benefits from our cache system. The maintenance of cacheable data in the resources of already-executed tasks allows faster executions of future tasks assigned to the same resources. To analyze the performance of our end-to-end cache system, we tested it with a new protein&ndash;protein docking application. The obtained results confirm our cache system's robustness and efficiency gain for this kind of high-throughput application.
</P>
]]></description>
<dc:creator><![CDATA[Garzon, J. I., Huedo, E., Montero, R. S., Llorente, I. M., Chacon, P.]]></dc:creator>
<dc:date>Mon, 02 Nov 2009 03:18:32 PST</dc:date>
<dc:identifier>info:doi/10.1177/1094342009350469</dc:identifier>
<dc:title><![CDATA[End-to-End Cache System for Grid Computing: Design and Efficiency Analysis of a High-Throughput Bioinformatic Docking Application]]></dc:title>
<prism:publicationDate>2009-11-02</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009350886v1?rss=1">
<title><![CDATA[A Solution Framework for Environmental Characterization Problems]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009350886v1?rss=1</link>
<description><![CDATA[
<p><P>
This paper describes experiences developing a grid-enabled framework for solving environmental inverse problems. The solution approach taken here couples environmental simulation models with global search methods and requires readily available computational resources of the grid for computational tractability. The solution framework developed by the authors uses a master&ndash;worker strategy for task distribution and a pool for task mapping. Solution and computational performance results are presented for groundwater source identification and release history reconstruction problems. They indicate that high-quality solutions and significant raw performance improvements were attained for a deployment of the solution framework on the TeraGrid.
</P>
]]></description>
<dc:creator><![CDATA[Tryby, M. E., Mirghani, B. Y., Mahinthakumar, G. K., Ranjithan, S. R.]]></dc:creator>
<dc:date>Mon, 26 Oct 2009 08:17:04 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009350886</dc:identifier>
<dc:title><![CDATA[A Solution Framework for Environmental Characterization Problems]]></dc:title>
<prism:publicationDate>2009-10-26</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009348231v1?rss=1">
<title><![CDATA[Distributed Radiotherapy Simulation with the Webcom Workflow System]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009348231v1?rss=1</link>
<description><![CDATA[
<p><P>
Accurate radiotherapy plans are a vital tool in combating cancer. The verification of such plans is a computationally intensive task, and providing clinical experts with access to sufficient resources to conduct plan verification simulations in a suitable and timely manner is a genuine challenge. In this paper we present a new approach to the problem, incorporating the Monte Carlo method for treatment verification. A fully integrated radiotherapy treatment verification workflow built on the BEAM simulation package has been developed within the scope of this work. The Monte Carlo approach is recognized as being superior to the standard clinical techniques available. To be useful in clinical practice, accurate results must be generated within a short time frame. Consequently, turnaround times must be predictable, and results must be of a consistently high standard. These requirements are the key challenges that drive this work. The development of this application is being conducted within the context of the Webcom project. Webcom is an interpreter for a graph-oriented model of computing, implemented as a distributed virtual machine. This platform has been used to construct a workflow tool suite and a novel methodology for dynamic resource federation. These components are applied to the execution of Monte Carlo radiotherapy simulation application on heterogeneous dynamically coordinated resources. The Webcom-based model of workflow management facilitates the execution of resource intensive workflows and provides a basis for the development of scalable services in the heterogeneous environments formed through the dynamic aggregation of mixed autonomous resources. We discuss the motivation behind the project and present the methodology, describe the software design of the current implementation, and demonstrates the utility of the system via experiments conducted in a real and deeply heterogeneous testbed environment.
</P>
]]></description>
<dc:creator><![CDATA[Downes, P., Curran, O., Shearer, A., Cunniffe, J.]]></dc:creator>
<dc:date>Mon, 26 Oct 2009 08:17:05 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009348231</dc:identifier>
<dc:title><![CDATA[Distributed Radiotherapy Simulation with the Webcom Workflow System]]></dc:title>
<prism:publicationDate>2009-10-26</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009347890v1?rss=1">
<title><![CDATA[A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/O System]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009347890v1?rss=1</link>
<description><![CDATA[
<p><P>
In this paper we present the novel design, implementation, and evaluation of an ad-hoc parallel I/O system (AHPIOS). AHPIOS is the first scalable parallel I/O system completely implemented in the Message Passing Interface (MPI). The MPI implementation brings the advantages of portability, scalability and high performance. AHPIOS allows MPI applications to dynamically manage and scale distributed partitions in a convenient way. The configuration of both the MPI-IO and the storage management system is unified and allows for a tight integration of the optimizations of these layers. AHPIOS partitions are elastic: they conveniently scale up and down with the number of resources. We develop two collective I/O strategies, which leverage a two-tiered cooperative cache in order to exploit the spatial locality of data-intensive parallel applications. The file access latency is hidden from the applications through an asynchronous data staging strategy. The two-tiered cooperative cache scales with both the number of processors and storage resources. Our experimental section demonstrates that, with various optimizations, integrated AHPIOS offers a substantial performance benefit over the traditional MPI-IO solutions on both PVFS or Lustre parallel file systems.
</P>
]]></description>
<dc:creator><![CDATA[Isaila, F., Garcia Blas, F. J., Carretero, J., Liao, W.-k., Choudhary, A.]]></dc:creator>
<dc:date>Mon, 05 Oct 2009 01:48:21 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009347890</dc:identifier>
<dc:title><![CDATA[A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/O System]]></dc:title>
<prism:publicationDate>2009-10-05</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009347891v1?rss=1">
<title><![CDATA[The Living Application: A Self-Organizing System for Complex Grid Tasks]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009347891v1?rss=1</link>
<description><![CDATA[
<p><P>
We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving limited user capabilities to a living application, the living application is able to port itself from one resource topology to another. The application performs these actions at run-time without depending on users or external workflow tools. We demonstrate this new concept in a special case of a living application: the living simulation. Today, many simulations require a wide range of numerical solvers and run most efficiently if specialized nodes are matched to the solvers. The idea of the living simulation is that it decides itself which grid machines to use based on the numerical solver currently in use. In this paper we apply the living simulation to modeling the collision between two galaxies in a test setup with two specialized computers. This simulation switches at run-time between a GPU-enabled computer in the Netherlands and a GRAPE-enabled machine that resides in the United States, using an oct-tree <I>N</I>-body code whenever it runs in the Netherlands and a direct <I>N</I>-body solver in the United States.
</P>
]]></description>
<dc:creator><![CDATA[Groen, D., Harfst, S., Portegies Zwart, S.]]></dc:creator>
<dc:date>Mon, 05 Oct 2009 01:48:21 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009347891</dc:identifier>
<dc:title><![CDATA[The Living Application: A Self-Organizing System for Complex Grid Tasks]]></dc:title>
<prism:publicationDate>2009-10-05</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009106597v1?rss=1">
<title><![CDATA[Audio Watermarking based on Advanced Wigner Distribution and Important Frequency Peaks]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009106597v1?rss=1</link>
<description><![CDATA[
<p><P>
An algorithm in which a multi-bit watermark is embedded into the important frequency peaks of an audio file is presented. In this algorithm, an advanced Wigner distribution method is used to estimate the most significant frequency band of the audio file. This method is based on the short-time Fourier transform (STFT) and the Wigner distribution methods, and has advantages over other methods. The important frequency peaks are selected from the most significant frequency band. Once broadcasted, an audio file is subject to many attacks such as compression and quantization. However, the main feature of the audio signal is its important frequency peaks, which are invariant. We exploit this invariance to embed the multi-bit watermark into the important frequency peaks. The simulation results show that the proposed algorithm is robust to the strong attacks such as noise addition, filtering, re-sampling and MP3 compression.
</P>
]]></description>
<dc:creator><![CDATA[Tuan, D. V., Chong, U.-P.]]></dc:creator>
<dc:date>Tue, 07 Jul 2009 04:15:57 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009106597</dc:identifier>
<dc:title><![CDATA[Audio Watermarking based on Advanced Wigner Distribution and Important Frequency Peaks]]></dc:title>
<prism:publicationDate>2009-07-07</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009106416v1?rss=1">
<title><![CDATA[Increasing the Locality of Iterative Methods and its Application to the Simulation of Semiconductor Devices]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009106416v1?rss=1</link>
<description><![CDATA[
<p><P>
Irregular codes are present in many scientific applications, such as finite element simulations. In these simulations the solution of large sparse linear equation systems is required, which are often solved using iterative methods. The main kernel of the iterative methods is the sparse matrix&ndash;vector multiplication which frequently demands irregular data accesses. Therefore, techniques that increase the performance of this operation will have a great impact on the global performance of the iterative method and, as a consequence, on the simulations. In this paper a technique for improving the locality of sparse matrix codes is presented. The technique consists of reorganizing the data guided by a locality model instead of restructuring the code or changing the sparse matrix storage format. We have applied our proposal to different iterative methods provided by two standard numerical libraries. Results show an impact on the overall performance of the considered iterative method due to the increase in the locality of the sparse matrix&ndash;vector product. Noticeable reductions in the execution time have been achieved both in sequential and in parallel executions. This positive behavior allows the reordering technique to be successfully applied to real problems. We have focused on the simulation of semiconductor devices and in particular on the BIPS3D simulator. The technique was integrated into the simulator. Both sequential and parallel executions have been analyzed extensively in this paper. Noticeable reductions in the execution time required by the simulations are observed when using our reordered matrices in comparison with the original simulator.
</P>
]]></description>
<dc:creator><![CDATA[Pichel, J C, Heras, D B, Cabaleiro, J C, Garcia-Loureiro, A J, Rivera, F F]]></dc:creator>
<dc:date>Tue, 16 Jun 2009 02:43:02 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009106416</dc:identifier>
<dc:title><![CDATA[Increasing the Locality of Iterative Methods and its Application to the Simulation of Semiconductor Devices]]></dc:title>
<prism:publicationDate>2009-06-16</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009106293v1?rss=1">
<title><![CDATA[High Performance Three-Dimensional Image Reconstruction for Molecular Structure  Determination]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009106293v1?rss=1</link>
<description><![CDATA[
<p><P>
We describe an efficient parallel implementation of a reliable iterative reconstruction algorithm for estimating the three-dimensional (3D) density map of a macromolecular complex from a large number of two-dimensional (2D) cryo-electron microscopy (Cryo-EM) images. Our algorithm is based on a hybrid regularization approach first developed by Bj&ouml;rck and O'Leary&ndash;Simmons. Our implementation uses a special data structure to represent the 3D density map to improve data locality in the reconstruction computation. Our parallelization strategy allows both 2D images and 3D data to be distributed on a 2D processor grid. We have used our implementation successfully on several datasets of different sizes, and we are able to achieve scalable parallel performance on a distributed memory cluster using over 15,000 CPUs for the largest dataset.
</P>
]]></description>
<dc:creator><![CDATA[Chung, J., Sternberg, P., Yang, C.]]></dc:creator>
<dc:date>Tue, 16 Jun 2009 02:43:02 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009106293</dc:identifier>
<dc:title><![CDATA[High Performance Three-Dimensional Image Reconstruction for Molecular Structure  Determination]]></dc:title>
<prism:publicationDate>2009-06-16</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

<item rdf:about="http://hpc.sagepub.com/cgi/content/abstract/1094342009106066v1?rss=1">
<title><![CDATA[Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit]]></title>
<link>http://hpc.sagepub.com/cgi/content/abstract/1094342009106066v1?rss=1</link>
<description><![CDATA[
<p><P>
The small number of some reactant molecules in biological systems formed by living cells can result in dynamical behavior which cannot be captured by traditional deterministic models. In such a problem, a more accurate simulation can be obtained with discrete stochastic simulation (Gillespie's stochastic simulation algorithm &ndash; SSA). Many stochastic realizations are required to capture accurate statistical information of the solution. This carries a very high computational cost. The current generation of graphics processing units (GPU) is well-suited to this task. In this paper we describe our implementation and present some computational experiments illustrating the power of this technology for this important and challenging class of problems.
</P>
]]></description>
<dc:creator><![CDATA[Petzold, L., Li, H.]]></dc:creator>
<dc:date>Tue, 16 Jun 2009 02:43:02 PDT</dc:date>
<dc:identifier>info:doi/10.1177/1094342009106066</dc:identifier>
<dc:title><![CDATA[Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit]]></dc:title>
<prism:publicationDate>2009-06-16</prism:publicationDate>
<prism:section>Article</prism:section>
</item>

</rdf:RDF>