default search action
ACM Transactions on Architecture and Code Optimization, Volume 9
Volume 9, Number 1, March 2012
- Walid J. Ghandour, Haitham Akkary, Wes Masri:
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction. 1:1-1:33 - Jaekyu Lee, Hyesoon Kim, Richard W. Vuduc:
When Prefetching Works, When It Doesn't, and Why. 2:1-2:29 - Bita Mazloom, Shashidhar Mysore, Mohit Tiwari, Banit Agrawal, Timothy Sherwood:
Dataflow Tomography: Information Flow Tracking For Understanding and Visualizing Full Systems. 3:1-3:26 - Jung Ho Ahn, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, Robert S. Schreiber:
Improving System Energy Efficiency with Memory Rank Subsetting. 4:1-4:28 - Xuejun Yang, Li Wang, Jingling Xue, Qingbo Wu:
Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors. 5:1-5:30 - Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, Hans Peter Graf:
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification. 6:1-6:30
Volume 9, Number 2, June 2012
- Stijn Eyerman, Lieven Eeckhout:
Probabilistic modeling for job symbiosis scheduling on SMT processors. 7:1-7:27 - Rachid Seghir, Vincent Loechner, Benoît Meister:
Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization. 8:1-8:27 - Yi Yang, Ping Xiang, Jingfei Kong, Mike Mantor, Huiyang Zhou:
A unified optimizing compiler framework for different GPGPU architectures. 9:1-9:33 - Choonki Jang, Jaejin Lee, Bernhard Egger, Soojung Ryu:
Automatic code overlay generation and partially redundant code fetch elimination. 10:1-10:32 - Zahra Abbasi, Georgios Varsamopoulos, Sandeep K. S. Gupta:
TACOMA: Server and workload management in internet data centers considering cooling-computing power trade-off and energy proportionality. 11:1-11:37 - Andreas Lankes, Thomas Wild, Stefan Wallentowitz, Andreas Herkersdorf:
Benefits of selective packet discard in networks-on-chip. 12:1-12:21
Volume 9, Number 3, September 2012
- Yangchun Luo, Antonia Zhai:
Dynamically dispatching speculative threads to improve sequential execution. 13:1-13:31 - Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan:
Extendable pattern-oriented optimization directives. 14:1-14:37 - Adam Wade Lewis, Nian-Feng Tzeng, Soumik Ghosh:
Runtime energy consumption estimation for server workloads based on chaotic time-series approximation. 15:1-15:26 - Alejandro Valero, Julio Sahuquillo, Salvador Petit, Pedro López, José Duato:
Combining recency of information with selective random and a victim cache in last-level caches. 16:1-16:20 - Bin Li, Li-Shiuan Peh, Li Zhao, Ravi R. Iyer:
Dynamic QoS management for chip multiprocessors. 17:1-17:29 - Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra:
Mixed speculative multithreaded execution models. 18:1-18:26 - Mageda Sharafeddine, Komal Jothi, Haitham Akkary:
Disjoint out-of-order execution processor. 19:1-19:32 - Diego Andrade, Basilio B. Fraguela, Ramon Doallo:
Static analysis of the worst-case memory performance for irregular codes with indirections. 20:1-20:32 - Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, Chengyong Wu:
Deconstructing iterative optimization. 21:1-21:30 - Apala Guha, Kim M. Hazelwood, Mary Lou Soffa:
Memory optimization of dynamic binary translators for embedded systems. 22:1-22:29 - James R. Geraci, Sharon M. Sacco:
A transpose-free in-place SIMD optimized FFT. 23:1-23:21
Volume 9, Number 4, January 2013
- Bart Coppens, Bjorn De Sutter, Jonas Maebe:
Feedback-driven binary code diversification. 24:1-24:26 - Jeremy Fowers, Greg Brown, John Robert Wernsing, Greg Stitt:
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. 25:1-25:21 - Erven Rohou, Kevin Williams, David Yuste:
Vectorization technology to improve interpreter performance. 26:1-26:22 - Jimmy Cleary, Owen Callanan, Mark Purcell, David Gregg:
Fast asymmetric thread synchronization. 27:1-27:22 - Yong Li, Rami G. Melhem, Alex K. Jones:
PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs. 28:1-28:21 - Kristof Du Bois, Stijn Eyerman, Lieven Eeckhout:
Per-thread cycle accounting in multicore processors. 29:1-29:22 - Christian Wimmer, Michael Haupt, Michael L. Van de Vanter, Mick J. Jordan, Laurent Daynès, Doug Simon:
Maxine: An approachable virtual machine for, and in, java. 30:1-30:24 - Malik Murtaza Khan, Protonu Basu, Gabe Rudy, Mary W. Hall, Chun Chen, Jacqueline Chame:
A script-based autotuning compiler system to generate high-performance CUDA code. 31:1-31:25 - Kenzo Van Craeynest, Lieven Eeckhout:
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures. 32:1-32:23 - Samuel Antao, Leonel Sousa:
The CRNS framework and its application to programmable and reconfigurable cryptography. 33:1-33:25 - Boubacar Diouf, Can Hantas, Albert Cohen, Özcan Özturk, Jens Palsberg:
A decoupled local memory allocator. 34:1-34:22 - Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng:
Layout-oblivious compiler optimization for matrix computations. 35:1-35:20 - Stephen Dolan, Servesh Muralidharan, David Gregg:
Compiler support for lightweight context switching. 36:1-36:25 - Pablo Abad Fidalgo, Valentin Puente, José-Ángel Gregorio:
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors. 37:1-37:21 - Jorge Albericio, Pablo Ibáñez, Víctor Viñals, José María Llabería:
Exploiting reuse locality on inclusive shared last-level caches. 38:1-38:19 - Paraskevas Yiapanis, Demian Rosas-Ham, Gavin Brown, Mikel Luján:
Optimizing software runtime systems for speculative parallelization. 39:1-39:27 - Cedric Nugteren, Pieter Custers, Henk Corporaal:
Algorithmic species: A classification of affine loop nests for parallel programming. 40:1-40:25 - Marco Gerards, Jan Kuper:
Optimal DPM and DVFS for frame-based real-time systems. 41:1-41:23 - Zhichao Yan, Hong Jiang, Yujuan Tan, Dan Feng:
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory. 42:1-42:26 - Doris Chen, Deshanand P. Singh:
Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications. 43:1-43:25 - Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi:
Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems. 44:1-44:25 - Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan:
ADAPT: A framework for coscheduling multithreaded programs. 45:1-45:24 - Michele Tartara, Stefano Crespi-Reghizzi:
Continuous learning of compiler heuristics. 46:1-46:25 - Grigorios Chrysos, Panagiotis Dagritzikos, Ioannis Papaefstathiou, Apostolos Dollas:
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system. 47:1-47:25 - Jongwon Lee, Yohan Ko, Kyoungwoo Lee, Jonghee M. Youn, Yunheung Paek:
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures. 48:1-48:24 - Fabien Coelho, François Irigoin:
API compilation for image hardware accelerators. 49:1-49:25 - Carlos Luque, Miquel Moretó, Francisco J. Cazorla, Mateo Valero:
Fair CPU time accounting in CMP+SMT processors. 50:1-50:25 - Pavlos M. Mattheakis, Ioannis Papaefstathiou:
Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems. 51:1-51:25 - Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege, Konrad Trifunovic:
Improved loop tiling based on the removal of spurious false dependences. 52:1-52:26 - Antoniu Pop, Albert Cohen:
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. 53:1-53:25 - Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, Francky Catthoor:
Polyhedral parallel code generation for CUDA. 54:1-54:23 - Yu Du, Miao Zhou, Bruce R. Childers, Rami G. Melhem, Daniel Mossé:
Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory. 55:1-55:20 - Suresh Purini, Lakshya Jain:
Finding good optimization sequences covering program space. 56:1-56:23 - Mehmet E. Belviranli, Laxmi N. Bhuyan, Rajiv Gupta:
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. 57:1-57:20 - Anurag Negi, J. Rubén Titos Gil:
SCIN-cache: Fast speculative versioning in multithreaded cores. 58:1-58:26 - Thibaut Lutz, Christian Fensch, Murray Cole:
PARTANS: An autotuning framework for stencil computation on multi-GPU systems. 59:1-59:24 - Chunhua Xiao, M.-C. Frank Chang, Jason Cong, Michael Gill, Zhangqin Huang, Chunyue Liu, Glenn Reinman, Hao Wu:
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects. 60:1-60:27
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.