


default search action
26th ICS 2012: Venice, Italy
- Utpal Banerjee, Kyle A. Gallivan, Gianfranco Bilardi, Manolis Katevenis:
International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012. ACM 2012, ISBN 978-1-4503-1316-2
Keynote address 1
- Yale N. Patt:
High performance supercomputers: should the individual processor be more than a brick? 1-2
Micro-architecture 1
- Mengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han:
Distributed replay protocol for distributed uniprocessors. 3-14
GPUs, compilers
- Wenhao Jia, Kelly A. Shaw, Margaret Martonosi:
Characterizing and improving the use of demand-fetched caches in GPUs. 15-24 - Ziyu Guo, Bo Wu, Xipeng Shen
:
One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation. 25-36 - Hongtao Yu, Zhiyuan Li:
Fast loop-level data dependence profiling. 37-46 - Nishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar:
Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors. 47-58
Fault tolerance
- Somayeh Sardashti, David A. Wood:
UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique. 59-68 - Manu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan:
Fault tolerant preconditioned conjugate gradient for sparse linear system solution. 69-78 - Wenjing Ma, Sriram Krishnamoorthy:
Data-driven fault tolerance for work stealing computations. 79-90 - Marc Casas-Guix
, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz
:
Fault resilience of the algebraic multi-grid solver. 91-100
Micro-architecture 2, interconnection networks
- Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez
:
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. 101-110 - Mingxing Tan, Xianhua Liu
, Tong Tong, Xu Cheng:
CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern. 111-120 - Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu:
Congestion avoidance on manycore high performance computing systems. 121-132 - Yi Xu, Jun Yang, Rami G. Melhem:
Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration. 133-142
Runtime, dependencies, load balancing
- Liang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck:
HiRe: using hint & release to improve synchronization of speculative threads. 143-152 - Gokcen Kestor
, Roberto Gioiosa, Osman S. Unsal
, Adrián Cristal
, Mateo Valero
:
Enhancing the performance of assisted execution runtime systems through hardware/software techniques. 153-162 - Quan Chen, Minyi Guo, Zhiyi Huang:
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures. 163-172 - Tao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui:
CRQ-based fair scheduling on composable multicore architectures. 173-184 - Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz
, Nancy M. Amato:
Quantifying the effectiveness of load balance algorithms. 185-194
Communication, HPC applications
- John P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton:
Sparse matrix-vector multiply on the HICAMP architecture. 195-204 - Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer
, P.-K. Yeung, Richard W. Vuduc
:
On the communication complexity of 3D FFTs and its implications for Exascale. 205-214 - Gabriel Ilie Tanase, Gheorghe Almási, Hanhong Xue, Charles Archer:
Composable, non-blocking collective operations on power7 IH. 215-224 - Anshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar:
Collective algorithms for sub-communicators. 225-234 - Andrea Pietracaprina, Geppino Pucci
, Matteo Riondato
, Francesco Silvestri
, Eli Upfal
:
Space-round tradeoffs for MapReduce computations. 235-244
Keynote address 2
- Michael Gschwind:
Blue Gene/Q: design for sustained multi-petaflop computing. 245-246
Workloads
- Wayne Joubert, Shi-Quan Su:
An analysis of computational workloads for the ORNL Jaguar system. 247-256
Memory hierarchies & interconnects
- Nagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan:
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities. 257-266 - Yasuo Ishii, Mary Inaba, Kei Hiraki:
Unified memory optimizing architecture: memory subsystem control with a unified predictor. 267-278 - Dongyuan Zhan, Hong Jiang, Sharad C. Seth:
Locality & utility co-optimization for practical capacity management of shared last level caches. 279-290 - Keith D. Underwood, Eric Borch:
Exploiting communication and packaging locality for cost-effective large scale networks. 291-300
GPUs & parallel programming
- Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn:
Hardware support for enforcing isolation in lock-based parallel programs. 301-310 - Justin Holewinski, Louis-Noël Pouchet, P. Sadayappan:
High-performance code generation for stencil computations on GPU architectures. 311-320 - John W. Romein:
An efficient work-distribution strategy for gridding radio-telescope data on GPUs. 321-330 - Oded Green, Robert McColl, David A. Bader
:
GPU merge path: a GPU merging algorithm. 331-340
GPUs, CPUs, & linear algebra
- Jungwon Kim
, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee:
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. 341-352 - Bor-Yiing Su, Kurt Keutzer:
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. 353-364 - Fengguang Song, Stanimire Tomov
, Jack J. Dongarra:
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems. 365-376 - Jiajia Li
, Xingjian Li
, Guangming Tan, Mingyu Chen, Ninghui Sun:
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs. 377-386

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.