School of Informatics - University of Edinburgh Institute for Computing Systems Architecture - School of Informatics
Institute for Computing
Systems Architecture
CArD - Compiler and Architecture Design Group

Publications

2017 - 2016 - 2015 - 2014 - 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001 - 2000 - 1999 - 1998 - 1997

2017 Top

ATOM: Atomic Durability in Non-volatile Memory through Hardware Support for Logging
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis Viglas, To Appear In Proceedings of the 2017 International on High Performance Computer Architecture (HPCA'17)

Boomerang: a Metadata-Free Architecture for Control Flow Delivery
Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, Vijay Nagarajan, To Appear In Proceedings of the 2017 International on High Performance Computer Architecture (HPCA'17)

Minimizing the cost of iterative compilation with active learning
William Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather, To Appear In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO'17)

Synthesizing Benchmarks for Predictive Modeling
Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather, To Appear In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO'17)

2016 Top

ALEA: A Fine-grained Energy Profiling Tool
Lev Mukhanov, Pavlos Petoumenos, Zheng Wang, Nikos Parasyris, Dimitrios Nikolopoulos, Bronis de Supinski, Hugh Leather, To Appear In the ACM Transactions on Architecture and Code Optimization (TACO)

Asynchronous Memory Access Chaining
O. Kocberber, Boris Grot, B. Falsafi, In 42nd International Conference on Very Large Data Bases (VLDB'16)

Automatic configuration of ROS applications for near-optimal performance
Jose Cano, Alejandro Bordallo, Vijay Nagarajan, Subramanian Ramamoorthy, Sethu Vijayakumar, To Appear To appear in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'16)

Autotuning OpenCL Workgroup Size for Stencil Patterns
Chris Cummins, Pavlos Petoumenos, Michel Steuwer, Hugh Leather, In Proceedings of the 6th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT'16)

C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches.
Cheng-Chieh Huang, Rakesh Kumar, Marco Elver, Boris Grot, Vijay Nagarajan, In Proceedings of the 2016 International Symposium on Microarchitecture (MICRO'16)

Characterizing Memory Bottlenecks in GPGPU Workloads
Saumay Dublish, Vijay Nagarajan, Nigel Topham, In Proceedings of the 2016 International Symposium on Workload Characterization (IISWC'16)

Compositional Compilation for Sparse, Irregular Data Parallelism
Adam Harries, Michel Steuwer, Murray Cole, Alan Gray, Christophe Dubach, In Proceedings of the 2016 Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU'16)

Cooperative Caching for GPUs
Saumay Dublish, Vijay Nagarajan, Nigel Topham, In the ACM Transactions on Architecture and Code Optimization (TACO)

Diversity: A design goal for heterogeneous processors
Erik Tomusk, Christophe Dubach, Michael O'Boyle, IEEE Computer Architecture Letters (IEEE CAL)

Efficient Asynchronous Interrupt Handling in a Full-System Instruction Set Simulator
Tom Spink, Harry Wagstaff, Björn Franke, In Proceedings of the 2016 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'16)

Hardware Accelerated Cross-Architecture Full-System Virtualization
Tom Spink, Harry Wagstaff, Björn Franke, In the ACM Transactions on Architecture and Code Optimization (TACO)

Iterative Compilation on Mobile Devices
Paschalis Mpeis, Pavlos Petoumenos, Hugh Leather, In Proceedings of the 6th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT'16)

LLC Dead Block Prediction Considered Not Useful
Priyank Faldu, Boris Grot, In 13th Workshop on Duplicating, Deconstructing and Debunking (WDDD'16)

Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation
Michel Steuwer, Toomas Remmelg, Christophe Dubach, In Proceedings of the 2016 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'16)

McVerSi: A Test Generation Framework for Fast Memory Consistency Verification in Simulation
Marco Elver, Vijay Nagarajan, To Appear In Proceedings of the 2016 International Symposium on High-Performance Computer Architecture (HPCA'16)

Multi-Stage Programming for GPUs in Modern C++ using PACXX
Michael Haidl, Michel Steuwer, Tim Humernbrum, Sergei Gorlatch, In Proceedings of the 2016 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'16)

On the Inference of User Paths from Anonymized Mobility Data
Galini Tsoukaneri, George Theodorakopoulos, Mahesh Marina, Hugh Leather, In Proceedings of the 1st IEEE European Symposium on Security and Privacy (EuroS&P'16)

Performance Portable GPU Code Generation for Matrix Multiplication
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Christophe Dubach, In Proceedings of the 2016 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'16)

Predicting and Optimizing Image Compression
Oleksandr Murashko, John Thomson, Hugh Leather, In Proceedings of the 24th ACM International Conference on Multimedia (MM'16)

Quantitative Characterization of the Software Layer of a HW/SW Co-Designed Processor
Jose Cano, Rakesh Kumar, Aleksandar Brankovic, Demos Pavlou, Kyriakos Stavrou, Enric Gibert, Alejandro Martinez, Antonio González, To Appear In Proceedings of the 2016 International Symposium on Workload Characterization (IISWC'16)

SABRes: Atomic Object Reads for Rack-Scale In-Memory Computing
A. Daglis, D. Ustiugov, S. Novakovic, E. Bugnion, B. Falsafi, Boris Grot, In Proceedings of the 2016 International Symposium on Microarchitecture (MICRO'16)

Selecting heterogeneous cores for diversity
Erik Tomusk, Christophe Dubach, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization (TACO)

Task Variant Allocation in Distributed Robotics
Jose Cano, David White, Alejandro Bordallo, Ciaran McCreesh, Patrick Prosser, Jeremy Singer, Vijay Nagarajan, To Appear Proceedings of Robotics: Science and Systems (RSS'16)

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, Boris Grot, In 7th ACM Symposium on Cloud Computing (SOCC'16)

Towards Collaborative Performance Tuning of Algorithmic Skeletons
Chris Cummins, Pavlos Petoumenos, Michel Steuwer, Hugh Leather, In Proceedings of the 2016 Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU'16)

accelOS: Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing
Christos Margiolas, Michael O'Boyle, In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO'16)

2015 Top

Adaptive Parallelism Mapping in Dynamic Environments using Machine Learning
Murali Emani, Ph.D Thesis (PHD)

Application of Domain-aware Binary Fuzzing to Aid Android Virtual Machine Testing
Stephen Kyle, Hugh Leather, Björn Franke, Dave Butcher, Stuart Monteith, In Proceedings of the 2015 International Conference on Virtual Execution Environments (VEE'15)

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-based Heterogeneous Systems
Zheng Wang, Dominik Grewe, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization (TACO)

Celebrating Diversity: A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments
Murali Emani, Michael O'Boyle, In Proceedings of the 2015 Conference on Programming Language Design and Implementation (PLDI'15)

Confluence: Unified Instruction Supply for Scale-Out Servers
C. Kaynak, Boris Grot, B. Falsafi, In Proceedings of the 2015 International Symposium on Microarchitecture (MICRO'15)

Dynamic process migration in heterogeneous ROS-based environments
Jose Cano, Eduardo Molinos, Vijay Nagarajan, Sethu Vijayakumar, International Conference on Advanced Robotics (ICAR'15)

Efficient Dual-ISA Support in a Retargetable Dynamic Binary Translator
Tom Spink, Harry Wagstaff, Björn Franke, Nigel Topham, In Proceedings of the 2015 International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'15)

Efficient Persist Barriers for Multicores
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis Viglas, In Proceedings of the 2015 International Symposium on Microarchitecture (MICRO'15)

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms
Luna Backes, Alejandro Rico, Björn Franke, In Proceedings of the 2015 International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'15)

Four Metrics to Evaluate Heterogeneous Multicores
Erik Tomusk, Christophe Dubach, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization 12(4), November 2015 (TACO)

Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions
Stanislav Manilov, Björn Franke, Anthony Magrath, Cedric Andrieu, In Proceedings of the 2015 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'15)

From High Level Architecture Descriptions to Fast Instruction Set Simulators
Harry Wagstaff, Ph.D Thesis (PHD)

Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code
Michel Steuwer, Christian Fensch, Sam Lindley, Christophe Dubach, In Proceedings of the 20th ACM SIGPLAN international conference on Functional programming (ICFP'15)

Helium: a transparent inter-kernel optimizer for OpenCL
Thibaut Lutz, Christian Fensch, Murray Cole, In Proceedings of the 2015 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'15)

Improving the Programmability and Performance Portability on Many-Core Processors (Verbesserung der Programmierbarkeit und Performance-Portabilität von Manycore-Prozessoren)
Michel Steuwer, In Distinguished Dissertations in Informatics 2015 (Ausgezeichnete Informatikdissertationen 2015) - German Informatics Society (DDI'15)

Intelligent Heuristic Construction with Active Learning
William Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather, In Proceedings of Compilers for Parallel Computing (CPC'15)

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM
Luigi Nardi, Bruno Bodin, M. Zia, John Mawer, Andy Nisbet, Paul Kelly, Andrew Davison, Mikel Lujan, Michael O'Boyle, Graham Riley, Nigel Topham, Steve Furber, In Proceedings of the 2015 International Conference on Robotics and Automation (ICRA'15)

LIRA: Adaptive Contention-Aware Thread Placement for Parallel Runtime Systems
Alexander Collins, Tim Harris, Murray Cole, Christian Fensch, In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS'15)

Manycore Network Interfaces for In-Memory Rack-Scale Computing
A. Daglis, S. Novakovic, E. Bugnion, B. Falsafi, Boris Grot, In Proceedings of the 2015 International Symposium on Computer Architecture (ISCA'15)

PALMOS: A Transparent, Multi-tasking Acceleration Layer for Parallel Heterogeneous Systems
Christos Margiolas, Michael O'Boyle, In Proceedings of the 2015 ACM International Conference on Supercomputing (ICS'15)

PSLP: Padded SLP Automatic Vectorization
Vasileios Porpodas, Alberto Magni, Timothy Jones, In Proceedings of the 2015 International Symposium on Code Generation and Optimization (CGO'15)

Patterns and rewrite rules for systematic code generation (from high-level functional patterns to high-performance opencl code)
Michel Steuwer, Christian Fensch, Christophe Dubach, arXiv Technical Report arXiv:1502.02389 (ARXIV-TR)

Power Capping: What Works, What Does Not
Pavlos Petoumenos, Lev Mukhanov, Zheng Wang, Hugh Leather, Dimitrios Nikolopoulos, In Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS'15)

RC3: consistency directed cache coherence for x86-64 with RC extensions
Marco Elver, Vijay Nagarajan, In Proceedings of the 2014 International Conference on Parallel Architectures and Compilation Techniques (PACT'15)

Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Juan Fumero, Toomas Remmelg, Michel Steuwer, Christophe Dubach, In Proceedings of the 2015 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools (PPPJ'15)

2014 Top

A Compiler Framework for Automatically Mapping Data Parallel Programs to Heterogeneous MPSOCs
Kiran Chandramohan, Michael O'Boyle, In Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'14)

A Composable Array Function Interface for Heterogeneous Computing in Java
Michel Steuwer, Juan Fumero, Christophe Dubach, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14)

ATCache: Reducing DRAM Cache Latency via a Small SRAM Tag Cache
Cheng-Chieh Huang, Vijay Nagarajan, In Proceedings of the 2014 International Conference on Parallel Architectures and Compilation Techniques (PACT'14)

Automated ISA Branch Coverage Analysis and Test Case Generation for Retargetable Instruction Set Simulators
Harry Wagstaff, Tom Spink, Björn Franke, In Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'14)

Automated detection of structured coarse-grained parallelism in sequential legacy applications
Tobias Edler von Koch, Ph.D Thesis (PHD)

Automatic Optimization of Thread-Coarsening for Graphics Processors
Alberto Magni, Christophe Dubach, Michael O'Boyle, In Proceedings of the 2014 International Conference on Parallel Architectures and Compilation Techniques (PACT'14)

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
Luis Goes, Christiane Ribeiro, Marcio Castro, Jean-Francois Mehaut, Murray Cole, Marcelo Cintra, International Journal of Parallel Programming 42(2):365-382 (IJPP)

Automatic feature generation for machine learning-based optimising compilation
Hugh Leather, Edwin Bonilla, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization 11(1), Feb 2014 (TACO)

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures
Siddharth Mohanty, Murray Cole, International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM'14)

BuMP: Bulk Memory Page Access Prediction and Streaming
S. Volos, J. Picorel, B. Falsafi, Boris Grot, In Proceedings of the 2014 International Symposium on Microarchitecture (MICRO'14)

Change Detection based Parallelism Mapping: Exploiting Offline Models and Online Adaptation
Murali Emani, Michael O'Boyle, In Proceedings of the 2014 Workshop on Languages and Compilers for Parallel Computing (LCPC'14)

Community-driven reviewing and validation of publications
Grigori Fursin, Christophe Dubach, Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST'14)

Efficient Code Generation in a Region-Based Dynamic Binary Translator
Tom Spink, Harry Wagstaff, Björn Franke, Nigel Topham, In Proceedings of the 2014 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'14)

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code
Zheng Wang, Daniel Powell, Björn Franke, Michael O'Boyle, In Proceedings of the 2014 International Conference on Compiler Construction (CC'14)

Exploiting Function Similarity for Code Size Reduction
Tobias Edler von Koch, Björn Franke, Pranav Bhandarkar, Anshuman Dasgupta, In Proceedings of the 2014 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'14)

Exploiting GPU Hardware Saturation for Fast Compiler Optimization
Alberto Magni, Christophe Dubach, Michael O'Boyle, In Proceedings of the 2014 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'14)

FADE: A Programmable Filtering Accelerator for Instruction-Grain Monitoring
S. Fytraki, E. Vlachos, O. Kocberber, B. Falsafi, Boris Grot, In Proceedings of the 2014 International Symposium on High-Performance Computer Architecture (HPCA'14)

Fast Automatic Heuristic Construction Using Active Learning
William Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather, In Proceedings of the 2014 Workshop on Languages and Compilers for Parallel Computing (LCPC'14)

Increasing Cache Capacity via Critical-words-Only Cache
Cheng-Chieh Huang, Vijay Nagarajan, IEEE International Conference on Computer Design (ICCD'14)

Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping
Zheng Wang, Georgios Tournavitis, Björn Franke, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization Volume 11 Issue 1 (TACO)

LambdaJIT: a dynamic compiler for heterogeneous optimizations of STL algorithms
Thibaut Lutz, Vinod Grover, Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing (FHPC'14)

Mapping parallel programs to heterogeneous multi-core systems
Dominik Grewe, Ph.D Thesis (PHD)

Measuring Flexibility in Single-ISA Heterogeneous Processors
Erik Tomusk, Christophe Dubach, Michael O'Boyle, In Proceedings of the 2014 International Conference on Parallel Architectures and Compilation Techniques (PACT'14)

Best Paper Measuring QoE of Interactive Workloads and Characterising Frequency Governors on Mobile Devices
Volker Seeker, Pavlos Petoumenos, Hugh Leather, Björn Franke, In Proceedings of the 2014 International Symposium on Workload Characterization (IISWC'14)

Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration
Kiran Chandramohan, Michael O'Boyle, In Proceedings of the 2014 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'14)

Portable and Transparent Host-Device Communication Optimization for GPGPU Environments
Christos Margiolas, Michael O'Boyle, In Proceedings of the 2014 International Symposium on Code Generation and Optimization (CGO'14)

Scale-Out NUMA
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, Boris Grot, In Proceedings of the 2014 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14)

Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms
Yuan Wen, Zheng Wang, Michael O'Boyle, In Proceedings of the 2014 International Conference on High Performance Computing (HiPC'14)

Static Approximation of MPI Communication Graphs for Optimized Process Placement
Andrew McPherson, Vijay Nagarajan, Marcelo Cintra, In Proceedings of the 2014 Workshop on Languages and Compilers for Parallel Computing (LCPC'14)

TSO-CC: Consistency directed cache coherence for TSO
Marco Elver, Vijay Nagarajan, In Proceedings of the 2014 International Symposium on High-Performance Computer Architecture (HPCA'14)

Variability of Data Dependences and Control Flow
Tobias Edler von Koch, Björn Franke, In Proceedings of the 2014 International Symposium on Performance Analysis of Systems and Software (ISPASS'14)

2013 Top

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening
Alberto Magni, Christophe Dubach, Michael O'Boyle, In Proceedings of the 2013 International Conference for High Performance Computing, Networking, and Storage (SC'13)

A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation
Oscar Almer, Igor Böhm, Tobias Edler von Koch, Björn Franke, Stephen Kyle, Volker Seeker, Christopher Thompson, Nigel Topham, International Journal of Parallel Programming 41(2), April 2013 (IJPP)

Aligned Scheduling: Cache-efficient Instruction Scheduling for VLIW Processors
Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2013 Workshop on Languages and Compilers for Parallel Computing (LCPC'13)

CASTED: Core-Adaptive Software Transient Error Detection for Tightly Coupled Cores
Konstantina Mitropoulou, Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2013 IEEE International Parallel & Distributed Processing Symposium (IPDPS'13)

CAeSaR: unified Cluster-Assignment Scheduling and communication Reuse for clustered VLIW processors
Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'13)

Conference Proceedings
Björn Franke, Jingling Xue, In Proceedings of the 2013 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'13)

DRIFT: Decoupled compileR-based Instruction-level Fault-Tolerance
Konstantina Mitropoulou, Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2013 Workshop on Languages and Compilers for Parallel Computing (LCPC'13)

Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors
Christian Fensch, Nick Barrow-Williams, Robert Mullins, Simon Moore, IEEE Transactions on Computers 62(5), May 2013 (IEEE-TC)

Dynamic microarchitectural adaptation using machine learning
Christophe Dubach, Timothy Jones, Edwin Bonilla, In the ACM Transactions on Architecture and Code Optimization September 2013 (TACO)

Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description
Harry Wagstaff, Miles Gould, Björn Franke, Nigel Topham, In Proceedings of the 2013 Design Automation Conference (DAC'13)

Energy efficient cache architectures for single, multi and many core processors
Karthik Sundararajan, Ph.D Thesis (PHD)

High speed cycle approximate simulation for cache-incoherent MPSoCs
Christopher Thompson, Miles Gould, Nigel Topham, In Proceedings of the 2013 International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'13)

Instruction scheduling optimizations for energy efficient VLIW processors
Vasileios Porpodas, Ph.D Thesis (PHD)

LUCAS: Latency-adaptive Unified Cluster Assignment and instruction Scheduling
Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2013 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'13)

Limits of region-based dynamic binary parallelization
Tobias Edler von Koch, Björn Franke, In Proceedings of the 2013 International Conference on Virtual Execution Environments (VEE'13)

MaSiF: Machine Learning Guided Auto-tuning of Parallel Skeletons
Alexander Collins, Christian Fensch, Hugh Leather, Murray Cole, IEEE International Conference on High Performance Computin (HiPC'13)

OpenCL Task Partitioning in the Presence of GPU Contention
Dominik Grewe, Zheng Wang, Michael O'Boyle, In Proceedings of the 2013 Workshop on Languages and Compilers for Parallel Computing (LCPC'13)

PARTANS: An Autotuning Framework for Stencil Computation on Multi-GPU Systems
Thibaut Lutz, Christian Fensch, Murray Cole, In the ACM Transactions on Architecture and Code Optimization 9(4), September 2013 (TACO)

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
Dominik Grewe, Zheng Wang, Michael O'Boyle, In Proceedings of the 2013 International Symposium on Code Generation and Optimization (CGO'13)

Best Paper Position Paper: Weak heterogeneity as a way of adapting multicores to real workloads
Erik Tomusk, Michael O'Boyle, In Proceedings of the 2013 International Workshop on Adaptive Self-tuning Computing Systems (ADAPT'13)

Smart, Adaptive Mapping of Parallelism in the Presence of External Workload
Murali Emani, Zheng Wang, Michael O'Boyle, In Proceedings of the 2013 International Symposium on Code Generation and Optimization (CGO'13)

Speeding up dynamic compilation: concurrent and parallel dynamic compilation
Igor Böhm, Ph.D Thesis (PHD)

The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Cache Adaptation
Karthik Sundararajan, Timothy Jones, Nigel Topham, International Journal of Parallel Programming 41(2), April 2013 (IJPP)

2012 Top

Auto-Tuning Parallel Skeletons
Alexander Collins, Christian Fensch, Hugh Leather, Parallel Processing Letters 22(2), 2012 (PPL)

Automated application-specific optimisation of interconnects in multi-core systems
Oscar Almer, Ph.D Thesis (PHD)

Automatic skeleton-driven performance optimizations for transactional memory
Luis Goes, Ph.D Thesis (PHD)

Autotuning Wavefront Abstractions for Heterogeneous Architectures
Siddharth Mohanty, Murray Cole, In Proceedings of the 2012 Workshop on Applications for Multi-Core Architectures (WAMCA'12)

Compiler-driven data layout transformations for network applications
Damon Fenacci, Ph.D Thesis (PHD)

Compiling a high-level language for GPUs (via language support for architectures and compilers)
Christophe Dubach, P. Cheng, R. Rabbah, D. Bacon, S. Fink, In Proceedings of the 2012 Conference on Programming Language Design and Implementation (PLDI'12)

Compiling for Automatically Generated Instruction Set Extensions
Alastair Murray, Björn Franke, In Proceedings of the 2012 International Symposium on Code Generation and Optimization (CGO'12)

Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Nikolas Ioannou, Ph.D Thesis (PHD)

Cooperative Partitioning: Energy-Efficient Cache Partitioning for High-Performance CMPs
Karthik Sundararajan, Vasileios Porpodas, Timothy Jones, Nigel Topham, Björn Franke, In Proceedings of the 2012 International Symposium on High-Performance Computer Architecture (HPCA'12)

Customising compilers for customisable processors
Alastair Murray, Ph.D Thesis (PHD)

Design Space Exploration of Hybrid Ultra Low Power Branch Predictors
Matthew Bielby, Miles Gould, Nigel Topham, In Proceedings of the 2012 International Conference on Architecture of Computing Systems (ARCS'12)

Efficiently Parallelizing Instruction Set Simulation of Embedded Multi-Core Processors Using Region-based Just-in-Time Dynamic Binary Translation
Stephen Kyle, Igor Böhm, Björn Franke, Hugh Leather, Nigel Topham, In Proceedings of the 2012 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'12)

Exploring and predicting the effects of microarchitectural parameters and compiler optimisations on performance and energy
Christophe Dubach, Timothy Jones, Michael O'Boyle, In the 2012 ACM Transactions on Embedded Computing Systems, Special Issue on Software and Compilers for Embedded Systems (TECS'12:SCES)

UCIFF: Unified Cluster Assignment, Instruction Scheduling, and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores
Vasileios Porpodas, Marcelo Cintra, In Proceedings of the 2012 Workshop on Languages and Compilers for Parallel Computing (LCPC'12)

2011 Top

A Learning-Based Approach to the Automated Design of MPSoC Networks
Oscar Almer, Nigel Topham, Björn Franke, In Proceedings of the 2011 International Conference on Architecture of Computing Systems (ARCS'11)

A Machine Learning-Based Approach for Thread Mapping on Transactional Memory Applications
Marcio Castro, Luis Goes, Christiane Ribeiro, Murray Cole, Marcelo Cintra, Jean-Francois Mehaut, In Proceedings of the 2011 International Conference on High Performance Computing (HiPC'11)

A Reconfigurable Cache Architecture for Energy Efficiency
Karthik Sundararajan, Timothy Jones, Nigel Topham, In Proceedings of the 2011 ACM International Conference on Computing Frontiers (CF'11)

Best Paper A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
Dominik Grewe, Michael O'Boyle, In Proceedings of the 2011 International Conference on Compiler Construction (CC'11)

A Workload-Aware Mapping Approach For Data-Parallel Programs
Dominik Grewe, Zheng Wang, Michael O'Boyle, Transactions on High Performance Embedded Architectures and Compilers (HiPEAC)

An Evaluation of an OS-Based Coherence Scheme for Tiled CMPs
Christian Fensch, Marcelo Cintra, International Journal of Parallel Programming 39(3), June 2011 (IJPP)

An empirical architecture-centric approach to microarchitectural design space exploration
Christophe Dubach, Timothy Jones, Michael O'Boyle, IEEE Transactions on Computers (IEEE-TC)

Automatically Generating and Tuning GPU Code for Sparse Matrix-Vector Multiplication from a High-Level Representation
Dominik Grewe, Anton Lokhmotov, In Proceedings of the 2011 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'11)

Cycle-Accurate Performance Modelling in an Ultra-Fast Just-In-Time Dynamic Binary Translation Instruction Set Simulator
Igor Böhm, Björn Franke, Nigel Topham, Transactions on High Performance Embedded Architectures and Compilers 5(4), 2011 (HiPEAC)

Generalized Just-In-Time Trace Compilation using a Parallel Task Farm in a Dynamic Binary Translator
Igor Böhm, Tobias Edler von Koch, Stephen Kyle, Björn Franke, Nigel Topham, In Proceedings of the 2011 Conference on Programming Language Design and Implementation (PLDI'11)

Increasing the Energy Efficiency of TLS Systems Using Intermediate Checkpointing
Salman Khan, Nikolas Ioannou, Polychronis Xekalakis, Marcelo Cintra, In Proceedings of the 2011 International Conference on High Performance Computing (HiPC'11)

Increasing the efficacy of automated instruction set extension
Richard Bennett, Ph.D Thesis (PHD)

Machine learning in compilers
Hugh Leather, Ph.D Thesis (PHD)

Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer
Nikolas Ioannou, Matthias Gries, Michael Kauschke, Marcelo Cintra, In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT'11)

Profile-driven parallelisation of sequential programs
Georgios Tournavitis, Ph.D Thesis (PHD)

Scalable Multi-Core Simulation Using Parallel Dynamic Binary Translation
Oscar Almer, Igor Böhm, Tobias Edler von Koch, Björn Franke, Stephen Kyle, Volker Seeker, Christopher Thompson, Nigel Topham, In Proceedings of the 2011 International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'11)

Smart Cache: A Self Adaptive Cache Architecture for Energy Efficiency
Karthik Sundararajan, Timothy Jones, Nigel Topham, In Proceedings of the 2011 International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS'11)

2010 Top

A Predictive Model for Dynamic Microarchitectural Adaptivity Control
Christophe Dubach, Timothy Jones, Edwin Bonilla, Michael O'Boyle, In Proceedings of the 2010 International Symposium on Microarchitecture (MICRO'10)

Adaptive Source-Level Data Assignment to Dual Memory Banks
Alastair Murray, Björn Franke, In the 2010 ACM Transactions on Embedded Computing Systems (TECS'10)

Adaptive Statistical Scheduling of Divisible Workloads in Heterogeneous Systems
Horacio Gonzalez-Velez, Murray Cole, Journal of Scheduling 2010 (JoS)

Adaptive Structured Parallelism for Distributed Heterogeneous Architectures: A Methodological Approach
Horacio Gonzalez-Velez, Murray Cole, Concurrency and Computation: Practice and Experience 2010 (CC: P+E)

Compiler-Directed Performance Model Construction for Parallel Programs
Martin Schindewolf, David Kramer, Marcelo Cintra, In Proceedings of the 2010 International Conference on Architecture of Computing Systems (ARCS'10)

Best Paper Cycle-Accurate Performance Modelling in an Ultra-Fast Just-In-Time Dynamic Binary Translation Instruction Set Simulator
Igor Böhm, Björn Franke, Nigel Topham, In Proceedings of the 2010 International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS'10)

Best Paper Efficient Sequential Consistency Using Conditional Fences
Changhui Lin, Vijay Nagarajan, Rajiv Gupta, In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT'10)

Efficient design-space exploration of custom instruction-set extensions
Marcela Zuluaga, Ph.D Thesis (PHD)

Empirical Evaluation of Data Transformations for Network Infrastructure Applications
Damon Fenacci, Björn Franke, In Proceedings of the 2010 International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS'10)

Execution Supression: An Automated Iterative Technique for locating Memory Errors
Dennis Jeffrey, Vijay Nagarajan, Rajiv Gupta, Neelam Gupta, In the 2010 ACM Transactions on Programming Languages and Systems 32(5), May 2010 (TOPLAS'10)

Exploring the Unified Design-Space of Custom-Instruction Selection and Resource Sharing
Marcela Zuluaga, Nigel Topham, In Proceedings of the 2010 International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS'10)

Generating Code for Holistic Query Evaluation
Konstantinos Krikellas, Stratis Viglas, Marcelo Cintra, In Proceedings of the 2010 International Conference on Data Engineering (ICDE'10)

Handling Branches in TLS Systems with Multi-Path Execution
Polychronis Xekalakis, Marcelo Cintra, In Proceedings of the 2010 International Symposium on High-Performance Computer Architecture (HPCA'10)

Integrated Instruction Selection and Register Allocation for Compact Code Generation Exploiting Freeform Mixing of 16- and 32-bit Instructions
Tobias Edler von Koch, Igor Böhm, Björn Franke, In Proceedings of the 2010 international Symposium on Code Generation and Optimization (CGO'10)

Best Paper Partitioning Streaming Parallelism for Multi-cores: A Machine Learning Based Approach
Zheng Wang, Michael O'Boyle, In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT'10)

Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis, Nikolas Ioannou, Salman Khan, Marcelo Cintra, In Proceedings of the 2010 IEEE International Parallel & Distributed Processing Symposium (IPDPS'10)

Proximity Coherence for Chip Multiprocessors
Nick Barrow-Williams, Christian Fensch, Simon Moore, In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT'10)

Semi-Automatic Extraction and Exploitation of Hierarchical Pipeline Parallelism Using Profiling Information
Georgios Tournavitis, Björn Franke, In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT'10)

Statistical Performance Modeling in Functional Instruction Set Simulators
Björn Franke, In the 2010 ACM Transactions on Embedded Computing Systems (TECS'10)

Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm
Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas Yiapanis, Adam Pocock, Gavin Brown, Mikel Lujan, Ian Watson, Marcelo Cintra, In Proceedings of the 2010 International Symposium on Workload Characterization (IISWC'10)

Workload Characterization Supporting the Development of Domain-Specific Compiler Optimizations Using Decision Trees for Data Mining
Damon Fenacci, Björn Franke, John Thomson, In Proceedings of the 2010 International Workshop on Software and Compilers for Embedded Systems (SCOPES'10)

2009 Top

An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC
Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, Nigel Topham, In Proceedings of the 2009 International Workshop on GCC Research Opportunities (GROW'09)

Automatic Feature Generation for Machine Learning Based Optimizing Compilation
Hugh Leather, Edwin Bonilla, Michael O'Boyle, In Proceedings of the 2009 International Symposium on Code Generation and Optimization (CGO'09)

Characterising Effective Resource Analyses for Parallel and Distributed Coordination
P. Trinder, Murray Cole, H-W. Loidl, G. Michaelson, In Proceedings of the 2009 International Workshop on Foundational and Practical Aspects of Resource Analysis (FOPARA'09)

Code Transformation and Instruction Set Extension
Alastair Murray, Richard Bennett, Björn Franke, Nigel Topham, In the 2009 ACM Transactions on Embedded Computing Systems 8(4), 2009 (TECS'09)

Combining Thread Level Speculation, Helper Threads, and Runahead Execution
Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra, In Proceedings of the 2009 ACM International Conference on Supercomputing (ICS'09)

Compiler Directed Issue Queue Energy Reduction
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, Transactions on High Performance Embedded Architectures and Compilers 4(1), 2009 (HiPEAC)

Design Space Exploration of Resource Sharing Solutions for Custom Instruction Set Extensions
Marcela Zuluaga, Nigel Topham, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD '09) 28(12), December 2009 (TCAD'09)

Distance-Aware Round-Robin Mapping for Large NUCA Caches
Alberto Ros, Marcelo Cintra, Manuel Acacio, Jose Garcia, In Proceedings of the 2009 International Conference on High Performance Computing (HiPC'09)

Energy-Efficient Register Caching with Compiler Assistance
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, Oğuz Ergin, In the ACM Transactions on Architecture and Code Optimization 6(4), October 2009 (TACO)

Exploring the Limits of Early Register Release: Exploiting Compiler Analysis
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, Oğuz Ergin, In the ACM Transactions on Architecture and Code Optimization 6(3), September 2009 (TACO)

High Speed CPU Simulation using LTU Dynamic Binary Translation
Daniel Jones, Nigel Topham, Transactions on High Performance Embedded Architectures and Compilers (HiPEAC)

Introducing Control-Flow Inclusion to Support Pipelining in Custom Instruction Set Extensions
Marcela Zuluaga, T. Kluter, P. Brisk, P. Ienne, Nigel Topham, In Proceedings of the 2009 IEEE Symposium on Application Specific Processors (SASP'09)

Mapping Parallelism to Multi-cores: A Machine Learning Based Approach
Zheng Wang, Michael O'Boyle, In Proceedings of the 2009 Symposium on Principles and Practice of Parallel Programming (PPoPP'09)

Mixed Speculative Multithreaded Execution Models
Polychronis Xekalakis, Ph.D Thesis (PHD)

Portable Compiler Optimization Across Embedded Programs and Microarchitectures using Machine Learning
Christophe Dubach, Timothy Jones, Edwin Bonilla, Grigori Fursin, Michael O'Boyle, In Proceedings of the 2009 International Symposium on Microarchitecture (MICRO'09)

Raced Profiles: Efficient Selection of Competing Compiler Optimizations
Hugh Leather, Michael O'Boyle, Bruce Worton, In Proceedings of the 2009 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'09)

Rapid Early-Stage Microarchitecture Design Using Predictive Models
Christophe Dubach, Timothy Jones, Michael O'Boyle, IEEE International Conference on Computer Design (ICCD'09)

Reducing Training Time in a One-shot Machine Learning-based Compiler
John Thomson, Michael O'Boyle, Grigori Fursin, Björn Franke, In Proceedings of the 2009 Workshop on Languages and Compilers for Parallel Computing (LCPC'09)

Stream Chaining: Exploiting Multiple Levels of Correlation in Data Prefetching
Pedro Diaz, Marcelo Cintra, In Proceedings of the 2009 International Symposium on Computer Architecture (ISCA'09)

Towards Automatic Profile-Driven Parallelization of Embedded Multimedia Applications
Georgios Tournavitis, Björn Franke, In Proceedings of the 2009 Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG'09)

Towards a Holistic Approach to Auto-Parallelization: Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping
Georgios Tournavitis, Zheng Wang, Björn Franke, Michael O'Boyle, In Proceedings of the 2009 Conference on Programming Language Design and Implementation (PLDI'09)

Using Genetic Programming for Source-Level Data Assignment to Dual Memory Banks
Alastair Murray, Björn Franke, In Proceedings of the 2009 Conference on Statistical and Machine Learning Approaches to Architecture and Compilation (SMART'09)

Using Machine-Learning to Efficiently Explore the Architecture/Compiler Co-Design Space
Christophe Dubach, Ph.D Thesis (PHD)

Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators
Daniel Powell, Björn Franke, In Proceedings of the 2009 International Conference on Hardware/software Codesign and System Synthesis (CODES+ISSS'09)

2008 Top

A Partial Scan Based Test Generation for Asynchronous Circuits
Dilip Vasudevan, Aris Efthymiou, In Proceedings of the 2008 IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS'08)

Adaptive Structured Parallelism
Horacio Gonzalez-Velez, Ph.D Thesis (PHD)

An Adaptive Parallel Pipeline Pattern for Grids
Horacio Gonzalez-Velez, Murray Cole, In Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium (IPDPS'08)

An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs
Christian Fensch, Marcelo Cintra, In Proceedings of the 2008 International Symposium on High Performance Computer Architecture (HPCA'08)

An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors
Christian Fensch, Ph.D Thesis (PHD)

Automatic Code Generation Using Dynamic Programming
Igor Böhm, VDM Verlag Dr. Mueller e.K. (VDM)

Automatic Feature Generation for Setting Compilers Heuristics
Hugh Leather, Elad Yom-Tov, Mircea Namolaru, Ari Freund, In Proceedings of the 2008 Conference on Statistical and Machine Learning Approaches to Architecture and Compilation (SMART'08)

Evaluating the Effects of Compiler Optimisations on AVF
Timothy Jones, Michael O'Boyle, Oğuz Ergin, In Proceedings of the 2008 Workshop on the Interaction between Compilers and Computer Architecture (INTERACT'08)

Exploring and Predicting the Architecture/Optimising Compiler Co-Design Space
Christophe Dubach, Timothy Jones, Michael O'Boyle, In Proceedings of the 2008 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'08)

Fast Cycle-Approximate Instruction Set Simulation
Björn Franke, In Proceedings of the 2008 International Workshop on Software and Compilers for Embedded Systems (SCOPES'08)

Fast Source-Level Data Assignment to Dual Memory Banks
Alastair Murray, Björn Franke, In Proceedings of the 2008 International Workshop on Software and Compilers for Embedded Systems (SCOPES'08)

Instruction Cache Energy Saving Through Compiler Way-Placement
Timothy Jones, Sandro Bartolini, Bruno De Bus, John Cavazos, Michael O'Boyle, In Proceedings of the 2008 Design, Automation and Test in Europe (DATE'08)

MILEPOST GCC: Machine Learning Based Research Compiler
Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, In Proceedings of the 2008 GCC Summit (GCC'08)

Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems
Israel Hernandez, Ph.D Thesis (PHD)

Resource Sharing in Custom Instruction Set Extensions
Marcela Zuluaga, Nigel Topham, In Proceedings of the 2008 IEEE Symposium on Application Specific Processors (SASP'08)

Scheduling DAGs on Grids with Copying and Migration
Israel Hernandez, Murray Cole, In Proceedings of the 2007 Conference on Parallel Processing and Applied Mathematics (PPAM'07)

Using Machine Learning to Automate Compiler Optimisation
John Thomson, Ph.D Thesis (PHD)

2007 Top

A Compiler Cost Model for Speculative Parallelization
Jialin Dou, Marcelo Cintra, In the ACM Transactions on Architecture and Code Optimization 4(2), June 2007 (TACO)

A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning
Shun Long, Grigori Fursin, Björn Franke, In Proceedings of the 2007 International Conference on Network and Parallel Computing (NPC'07)

A Structural Approach for Modelling Performance of Systems using Skeletons
Gagarine Yaikhom, Murray Cole, Stephen Gilmore, Jane Hillston, Electronic Notes in Theoretical Computer Science pp167-183, 2007 (ENTCS)

A structural approach for modelling performance of workflow systems
Gagarine Yaikhom, Murray Cole, Stephen Gilmore, Jane Hillston, In Proceedings of the 2007 International Workshop on the Quantitative Aspects of Programming Languages (QAPL'07)

Adaptive Structured Parallelism for Computational Grids
Horacio Gonzalez-Velez, Murray Cole, In Proceedings of the 2007 Symposium on Principles and Practice of Parallel Programming (PPoPP'07)

Combining Source-to-Source Transformations and Processor Instruction Set Extension for the Automated Design-Space Exploration of Embedded Systems
Richard Bennett, Alastair Murray, Björn Franke, Nigel Topham, In Proceedings of the 2007 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'07)

Designing Efficient Processors Using Compiler-Directed Optimisations
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, Oğuz Ergin, In Proceedings of the 2007 Workshop on the Interaction between Compilers and Computer Architecture (INTERACT'07)

Fast Compiler Optimisation using Code-Feature Based Performance
Christophe Dubach, John Cavazos, Björn Franke, Grigori Fursin, Michael O'Boyle, Olivier Temam, In Proceedings of the 2007 International Conference on Computing Frontiers (ICCF'07)

MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization
Grigori Fursin, John Cavazos, Michael O'Boyle, Olivier Temam, In Proceedings of the 2007 International Conference on High Performance Embedded Architectures & Compilers (HiPEAC'07)

Microarchitectural Design Space Exploration Using An Architecture-Centric Approach
Christophe Dubach, Timothy Jones, Michael O'Boyle, In Proceedings of the 2007 International Symposium on Microarchitecture (MICRO'07)

Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Grigori Fursin, Albert Cohen, Michael O'Boyle, Olivier Temam, Transactions on High Performance Embedded Architectures and Compilers July 2007 (HiPEAC)

Rapidly Selecting Good Compiler Optimizations using Performance Counters
John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael O'Boyle, Olivier Temam, In Proceedings of the 2007 International Symposium on Code Generation and Optimization (CGO'07)

Reactive Grid Scheduling of DAG applications
Israel Hernandez, Murray Cole, In Proceedings of the 2007 Conference on Parallel and Distributed Computing and Networks (PDCN'07)

Reliable DAG Scheduling with Rewinding and Migration
Israel Hernandez, Murray Cole, In Proceedings of the 2007 International Conference on Networks for Grid Applications (GRIDNETS'07)

Using Predictive Modeling for Cross-Program Design Space Exploration inMulticore Systems
Salman Khan, Polychronis Xekalakis, John Cavazos, Marcelo Cintra, In Proceedings of the 2007 International Conference on Parallel Architectures and Compilation Techniques (PACT'07)

2006 Top

A Compiler Cost Model for Speculative Multithreading Chip-Multiprocessor Architectures
Jialin Dou, Ph.D Thesis (PHD)

Automatic Performance Model Construction for the Fast Software Exploration of New Hardware Designs
John Cavazos, Christophe Dubach, Felix Agakov, Edwin Bonilla, Michael O'Boyle, Grigori Fursin, Olivier Temam, In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06)

Combining measurement and stochastic modelling to enhance scheduling decisions for a parallel Mean Value Analysis algorithm
Gagarine Yaikhom, Murray Cole, Stephen Gilmore, In Proceedings of the 2006 International Conference on Computational Science (ICCS'06)

Compiler-Directed Energy Savings in Superscalar Processors
Timothy Jones, Ph.D Thesis (PHD)

Hybrid Optimizations: Which Optimization Algorithm to Use?
John Cavazos, J. Eliot Moss, Michael O'Boyle, In Proceedings of the 2006 International Conference on Compiler Construction (CC'06)

Iterative Collective Loop Fusion
Tom Ashby, Michael O'Boyle, In Proceedings of the 2006 International Conference on Compiler Construction (CC'06)

Message Passing with Communication Structures
Gagarine Yaikhom, Ph.D Thesis (PHD)

Method-Specific Dynamic Compilation using Logistic Regression
John Cavazos, Michael O'Boyle, In Proceedings of the 2006 Conference on Object-Oriented Programming Languages, Systems, and Applications (OOPSLA'06)

Predictive Search Distributions
Edwin Bonilla, Chris Williams, Felix Agakov, John Cavazos, John Thomson, Michael O'Boyle, In Proceedings of the 2006 International Conference on Machine Learning (ICML'06)

Quantifying Uncertainty in Points-To Relations
Constantino Ribeiro, Marcelo Cintra, In Proceedings of the 2006 Workshop on Languages and Compilers for Parallel Computing (LCPC'06)

Self-adaptive skeletal task farm for computational grids
Horacio Gonzalez-Velez, Parallel Computing 32(7-8), pp.479-490, 2006 (PC)

Towards fully adaptive pipeline parallelism for heterogeneous distributed environments
Horacio Gonzalez-Velez, Murray Cole, In Proceedings of the 2006 International Symposium on Parallel and Distributed Processing and Applications (ISPA'06)

Using Machine Learning to Focus Iterative Optimization
Felix Agakov, Edwin Bonilla, John Cavazos, Björn Franke, Grigori Fursin, Michael O'Boyle, John Thomson, Marc Toussaint, Chris Williams, In Proceedings of the 2006 International Symposium on Code Generation and Optimization (CGO'06)

2005 Top

A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
Björn Franke, Michael O'Boyle, In the 2005 IEEE Transactions on Parallel and Distributed Systems 16(3), pp.234-245, March 2005 (TPDS'05)

A Practical Method For Quickly Evaluating Program Optimizations
Grigori Fursin, Albert Cohen, Michael O'Boyle, Olivier Temam, In Proceedings of the 2005 International Conference on High Performance Embedded Architectures & Compilers (HiPEAC'05)

A heuristic search algorithm based on Unified Transformation Framework
Shun Long, Grigori Fursin, In Proceedings of the 2005 International Workshop on High Performance Scientific and Engineering Computing (HPSEC'05)

Automatic Tuning of Inlining Heuristics
John Cavazos, Michael O'Boyle, In Proceedings of the 2005 International Conference for High Performance Computing, Networking, and Storage (SC'05)

Compiler Directed Early Register Release
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, Oğuz Ergin, In Proceedings of the 2004 International Conference on Parallel Architectures and Compilation Techniques (PACT'04)

Design Space Exploration of a Software Speculative Parallelization Scheme
Marcelo Cintra, Diego Llanos, In the 2005 IEEE Transactions on Parallel and Distributed Systems 16(6), pp.562-576, June 2005 (TPDS'05)

IATAC: A Smart Predictor to Turn-Off L2 Cache Lines
Jaume Abella, Antonio González, Xavier Vera, Michael O'Boyle, In the ACM Transactions on Architecture and Code Optimization 2(1), pp.55-77, March 2005 (TACO)

Probabilistic Source-Level Optimisation of Embedded Programs
Björn Franke, Michael O'Boyle, John Thomson, Grigori Fursin, In Proceedings of the 2005 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'05)

Software Directed Issue Queue Power Reduction
Timothy Jones, Michael O'Boyle, Jaume Abella, Antonio González, In Proceedings of the 2005 International Symposium on High Performance Computer Architecture (HPCA'05)

2004 Top

A comparative study of intrinsic parallel programming methodologies
Horacio Gonzalez-Velez, A. de Luca, Virginia Gonzalez-Velez, In Proceedings of the 2004 International Conference on Electrical and Electronics Engineering (ICEEE'04)

Adaptive Java Optimisation using Instance-based Learning.
Shun Long, Michael O'Boyle, In Proceedings of the 2004 ACM International Conference on Supercomputing (ICS'04)

Adaptive Java Optimisation using machine learning techniques
Shun Long, Ph.D Thesis (PHD)

Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming
Murray Cole, Parallel Computing 30(3) pp.389-406, 2004 (PC)

Compilation Techniques for High-Performance Embedded Systems with Multiple Processors
Björn Franke, Ph.D Thesis (PHD)

Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization.
Jialin Dou, Marcelo Cintra, In Proceedings of the 2004 International Conference on Parallel Architectures and Compilation Techniques (PACT'04)

Cross Component Optimisation in a High Level Category-Based Language.
Tom Ashby, A. Kennedy, Michael O'Boyle, In Proceedings of the 2004 Euro-Par Parallel Processing Conference (Euro-Par'04)

Evaluating the performance of skeleton-based high level parallel programs
Anne Benoit, Murray Cole, Stephen Gilmore, Jane Hillston, In Proceedings of the 2004 International Conference on Computational Science (ICCS'04)

Fast and Accurate Method for Determining a Lower Bound on Execution Time.
Grigori Fursin, Michael O'Boyle, Olivier Temam, Gregory Watts, Concurrency and Computation: Practice and Experience 16(2-3), pp.271-292, February 2004 (CC: P+E)

Iterative Compilation and Performance Prediction for Numerical Applications
Grigori Fursin, Ph.D Thesis (PHD)

Speculative Parallelization of a Randomized Incremental Convex Hull Algorithm.
Marcelo Cintra, Diego Llanos, Belén Palop, In Proceedings of the 2004 Workshop on Computational Geometry and Applications (CGA'04)

The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling,
Peter Knijnenburg, Toru Kisuki, Kyle Gallivan, Michael O'Boyle, Concurrency and Computation: Practice and Experience 16(2-3), pp.247-270, February 2004 (CC: P+E)

Why Skeletal Parallel Programming Matters
Murray Cole, In Proceedings of the 2004 Euro-Par Parallel Processing Conference (Euro-Par'04)

2003 Top

Array recovery and high-level transformations for DSP applications.
Björn Franke, Michael O'Boyle, In the 2003 ACM Transactions on Embedded Computing Systems 2(2), pp.132-162, May 2003 (TECS'03)

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.
Peter Knijnenburg, Toru Kisuki, Michael O'Boyle, Journal of Supercomputing 24(1), pp.43-67, January 2003 (JoSC)

Combining Program Recovery, Auto-parallelisation and Locality Analysis for C programs on Multi-processor Embedded Systems.
Björn Franke, Michael O'Boyle, In Proceedings of the 2003 International Conference on Parallel Architectures and Compilation Techniques (PACT'03)

Compiler Parallelization of C Programs for Multi-Core DSPs with Multiple Address Spaces.
Björn Franke, Michael O'Boyle, In Proceedings of the 2003 International Conference on Hardware/software Codesign and System Synthesis (CODES+ISSS'03)

Toward Efficient and Robust Software Speculative Parallelization in Multiprocessors.
Marcelo Cintra, Diego Llanos, In Proceedings of the 2003 Symposium on Principles and Practice of Parallel Programming (PPoPP'03)

Towards general and exact distributed invalidation.
Michael O'Boyle, Rupert Ford, Elena Stöhr, Journal of Parallel and Distributed Computing 63(11), pp.1123-1137, November 2003 (JoPDC)

2002 Top

Automated Cost Analysis of a Parallel Maximum Segment Sum Program Derivation
Yasushi Hayashi, Murray Cole, Parallel Processing Letters 12(1), pp.95-112, 2002 (PPL)

Compile Time Barrier Synchronisation Minimisation.
Michael O'Boyle, Elena Stöhr, In the 2005 IEEE Transactions on Parallel and Distributed Systems 13(6), pp.529-543, June 2002 (TPDS'05)

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors.
Marcelo Cintra, Josep Torrellas, In Proceedings of the 2002 International Symposium on High Performance Computer Architecture (HPCA'02)

Integrating Loop and Data Transformations for Global Optimisation.
Michael O'Boyle, Peter Knijnenburg, Journal of Parallel and Distributed Computing 62, pp.563-590, April 2002 (JoPDC)

Iterative Compilation
Grigori Fursin, Michael O'Boyle, Peter Knijnenburg, In Proceedings of the 2002 Workshop on Languages and Compilers for Parallel Computing (LCPC'02)

Static Performance Prediction of Skeletal Programs
Yasushi Hayashi, Murray Cole, Parallel Algorithms and Applications 17(1), pp.59-84, 2002 (PAA)

The Integration of Task and Data Parallel Skeletons
Herbert Kuchen, Murray Cole, Parallel Processing Letters 12(2), pp.141-156, 2002 (PPL)

2001 Top

An Empirical Evaluation of High Level Transformations for Embedded Processors.
Björn Franke, Michael O'Boyle, In Proceedings of the 2001 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01)

Compiler Transformation of Pointers to Explicit Array Accesses in DSP Applications.
Björn Franke, Michael O'Boyle, In Proceedings of the 2001 International Conference on Compiler Construction (CC'01)

Coordinating Heterogeneous Parallel Systems with Skeletons and Activity Graphs
Murray Cole, Andrea Zavanella, Journal of Systems Integration 10(2), pp.127-143. 2001 (JoSI)

Towards Automatic Parallelisation for Multi-Processor DSPs.
Björn Franke, Michael O'Boyle, In Proceedings of the 2001 International Workshop on Software and Compilers for Embedded Systems (SCOPES'01)

Towards an Adaptive Java Optimising Compiler: An Empirical Evaluation of Program Transformations.
Shun Long, Michael O'Boyle, In Proceedings of the 2001 Workshop on Java for High Performance Computing (ACM ICS) (ICS-JHPC)

2000 Top

Activity Graphs: A Model-Independent Intermediate Layer for Skeletal Co-ordination
Murray Cole, Andrea Zavanella, In Proceedings of the 2000 ACM Symposium on Applied Computing (SAC'00)

Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors.
Marcelo Cintra, Josc Martinez, Josep Torrellas, In Proceedings of the 2000 International Symposium on Computer Architecture (ISCA'00)

Automatic Array Access Recovery in Pointer based DSP Codes.
Björn Franke, Michael O'Boyle, In Proceedings of the 2000 Workshop on Media Processors and DSPs (MP-DSP'00)

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.
Toru Kisuki, Peter Knijnenburg, Michael O'Boyle, In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00)

Exact Distributed Invalidation
Rupert Ford, Elena Stöhr, Michael O'Boyle, In Proceedings of the 2000 Euro-Par Parallel Processing Conference (Euro-Par'00)

Frame: An Imperative Coordination Language for Parallel Programming
Murray Cole, University of Edinburgh Technical Report EDI-INF-RR0026 (UoE-TR)

The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling.
Peter Knijnenburg, Toru Kisuki, Kyle Gallivan, Michael O'Boyle, In Proceedings of the 2000 Workshop on Feedback Directed and Dynamic Optimization (FDDO'00)

1999 Top

A Feasibility Study in Iterative Compilation.
Toru Kisuki, Peter Knijnenburg, Michael O'Boyle, François Bodin, Harry Wijshoff, In Proceedings of the 1999 International Symposium on High Performance Computing (ISHPC'99)

BSP-based Cost Analysis of Skeletal Programs
Yasushi Hayashi, Murray Cole, In Proceedings of the 1999 Scottish Workshop on Functional Programming (FP'99)

Efficient Parallelization using Combined Loop and Data Transformations.
Michael O'Boyle, Peter Knijnenburg, In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99)

Excel-NUMA: Toward Programmability, Simplicity, and High Performance.
Zheng Zhang, Marcelo Cintra, Josep Torrellas., IEEE Transactions on Computers, Special Issue on Cache Memory and Related Problems 48(2), pp.256-264, February 2009 (IEEE-TC:CMRR)

Non-singular Data Transformations: Definition, Validity, Applications
Michael O'Boyle, Peter Knijnenburg, International Journal of Parallel Programming 27(3), pp.131-159, June 1999 (IJPP)

OCEANS: Optimizing Compilers for Embedded Applications
Bas Aarts, Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John Gurd, Jan Hoogerbrugge, Ping Hu, William Jalby, Peter Knijnenburg, Michael O'Boyle, Erven Rohou, Rizos Sakellariou, Henk Schepers, André Seznec, Elena Stöhr, Marco Verhoeven, Harry Wijshoff, In Proceedings of the 1999 Euro-Par Parallel Processing Conference (Euro-Par'99)

1998 Top

First Fast Sink: A compiler algorithm for barrier placement optimisation
Elena Stöhr, Michael O'Boyle, Future Generation Computer Systems 13(4-5), March 1998 (FGCS)

Integrating Loop and Data Transformations for Global Optimisation.
Michael O'Boyle, Peter Knijnenburg, In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (PACT'98)

Iterative Compilation in a Non-linear Optimisation Space.
François Bodin, Toru Kisuki, Peter Knijnenburg, Michael O'Boyle, Erven Rohou, In Proceedings of the 1998 Workshop on Profile and Feedback Directed Compilation (PFDC'98)

MARS: A Distributed Memory Approach to Shared Memory Compilation Languages.
Michael O'Boyle, In Proceedings of the 1998 Workshop on Language, Compilers and Runtime Systems for Scalable Computing (LCR'98)

OCEANS: Optimizing Compilers for Embedded Applications
Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John Gurd, Jan Hoogerbrugge, Ping Hu, William Jalby, Peter Knijnenburg, Michael O'Boyle, Erven Rohou, Rizos Sakellariou, André Seznec, Elena Stöhr, Menno Treffers, Harry Wijshoff, In Proceedings of the 1998 Euro-Par Parallel Processing Conference (Euro-Par'98)

1997 Top

A Graph Based Approach to Minimising Barrier Synchronisation.
Elena Stöhr, Michael O'Boyle, In Proceedings of the 1997 ACM International Conference on Supercomputing (ICS'97)

A Monadic Calculus for Parallel Costing of a Functional Language of Arrays
C. Jay, Murray Cole, M. Sekanina, P. Steckler, In Proceedings of the 1997 Euro-Par Parallel Processing Conference (Euro-Par'97)

Barrier Synchronisation Minimisation
Elena Stöhr, Michael O'Boyle, International Journal of High Performance Computing and Networking (HPCN)

Non-Singular Data Transformations: Definition, Validity and Application.
Michael O'Boyle, Peter Knijnenburg, In Proceedings of the 1997 ACM International Conference on Supercomputing (ICS'97)

OCEANS: Optimizing Compilers for Embedded Applications
Bas Aarts, Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John Gurd, Jan Hoogerbrugge, Ping Hu, William Jalby, Peter Knijnenburg, Michael O'Boyle, Erven Rohou, Rizos Sakellariou, Henk Schepers, André Seznec, Elena Stöhr, Marco Verhoeven, Harry Wijshoff, In Proceedings of the 1997 Euro-Par Parallel Processing Conference (Euro-Par'97)

On Dividing and Conquering Independently
Murray Cole, In Proceedings of the 1997 Euro-Par Parallel Processing Conference (Euro-Par'97)

Prefetching and Multithreading Performance on a Bus-based Multiprocessor with Petri Nets
Edward Moreno, Marcelo Cintra, Sergio Kofuji, In Proceedings of the 1997 Euro-Par Parallel Processing Conference (Euro-Par'97)

Recursive 3D Mesh Indexing with Improved Locality
George Chochia, Murray Cole, International Journal of High Performance Computing and Networking (HPCN)