Home News People Research Study Search

Institute for Computing Systems Architecture

Computer Architecture Simulation & Visualisation

line

The QCD Computer Simulation Project (EPSRC GR/R/27129)

Members | Aims | HASE | QCD | UKQCD | Methodology | Programme | Publications |

Project Members


Overall Project Aims

The overall aim of the QCD Computer Simulation project is to gain an understanding of the factors which influence the performance of QCD computers and to influence future high performance system designs. This will be achieved through the construction of a parameterised simulation model in HASE (a Hierarchical computer Architecture Design and Simulation Environment) of the proposed UKQCD computer which can be used: The project wil also involve:

HASE

HASE is a Hierarchical computer Architecture design and Simulation Environment developed at the University of Edinburgh. The main goal of the HASE project has been to provide computer architects with a set of tools that allow the rapid development and exploration of system designs. The initial ideas were investigated in a PhD thesis, but it has evolved significantly as a result of the requirements of various of the projects for which it has been used and it is now a sophisticated and relatively stable environment. HASE addresses the fourth of the "Grand Challenge Problems in Computer Architecture" identified at the 1992 Purdue Workshop "to develop sufficient infrastructure to allow rapid prototyping of hardware ideas and the associated software in a way that permits realistic evaluation".

HASE includes facilities to allow a simulation model to be hierarchically structured (to reflect various levels of architectural/problem abstraction) and the visual verification of a model's behaviour via an animation of the simulation design window. Separate windows can be opened to allow the (changing) contents of memory and registers to be viewed.

HASE has been used for a variety of research projects. For example, simulation models of the DASH and the DLX architecture have been created and their robustness tested through use in student exercises. HASE itself was largely developed as part of an EPSRC funded project (GR/J4329) "Algorithms, Architectures and Models of Computation", an investigation of parallel architectures to support the HPRAM model of computation, and was subsequently used by an EPSRC funded PhD student to create models of a shared memory multiprocessor systems. A Java version (Simjava) was used in an EPSRC funded project (GR/K19716) to evaluate of multiprocessor interconnection networks.

Quantum Chromodynamics

Quantum Chromodynamics (QCD) describes theoretically the strong interactions between quarks and gluons. One of the essential features of QCD is that these elementary particles are always bound together, confined inside mesons and baryons, collectively called hadrons. This provides a challenge in relating theoretical and practical results, since the Standard Model of particle physics describes the interactions of the quarks and gluons, not of the experimentally observed hadrons.

To relate the experimental observations to the predictions from the Standard Model thus needs detailed evaluation of the hadronic structure, relating the quark constituents to the observed hadronic properties in a precise way. The only theoretical method to achieve this, with full control of all sources of error, is via large-scale numerical simulation: lattice QCD.

The UK QCD Collaboration

The UKQCD collaboration is one of the leading lattice QCD projects in the world, having pioneered many successful applications to particle physics phenomenology. It has recently been awarded JIF funding to build the fastest computer in the world for simulating strong interactions.

The UKQCD machine will be based on the Columbia QCDOC architecture. QCDOC is a natural evolution of the massively parallel QCDSP machine which won the 1998 IEEE Gordon Bell prize for the best price/performance high-end computer. The individual processing nodes in QCDOC will be Power PC-based and interconnected in a 4-dimension mesh with the topology of a torus. Each node in QDOC will be a single applications specific integrated circuit (ASIC) containing a 500 MHz 440 PowerPC processor core with a 1 Gflops, 64-bit floating point unit and 4 MBytes of on-chip memory together with a Direct Memory Access (DMA) unit for moving data between on-chip and external memory. It will also contains circuitry to support internode communication and an Ethernet controller for a boot-diagnostic-I/O network.

Each processor will be capable of sending and receiving data from each of its eight nearest neighbors in four dimensions at a rate of 500 Mbit/sec. This will provide a total off-node bandwidth of 8 Gbit/sec. Each of these 16 communication channels will have its own DMA capability allowing autonomous reads/writes from either on-chip or external memory. As in the QCDSP machines, an efficient and low-latency global sum, global max and broadcast capability will be incorporated into the serial communication.

Research Methodology

A variety of techniques can be used to explore design trade-offs in a computer architecture and to evaluate its performance. Evaluating the performance of an existing system is in principle straightforward, since benchmarks can be run and the execution time measured. However, without extensive instrumentation (possibly involving both hardware and software, which can itself influence the results), this provides little insight into the causes of performance limitations. Furthermore, it offers very little opportunity to measure the effects of varying architectural design parameters. One alternative is to use analytical modeling. This has been done effectively for a variety of multiprocessor system components but the models are usually driven by workload models rather than benchmarks or real applications and the results can be unreliable, particularly if attempts are made to model complex systems containing a variety of components which need different kinds of workload model. Simulation has therefore become a popular approach for predicting performance of complex systems.

The results from a simulation are only meaningful if the model is accurate and the input data is meaningful. The more detailed the model, the more likely it is to be accurate but, unfortunately, the longer it will take to run. Thus in building large complex models it is essential to use more abstract models of the various components. This allows meaningful results to be obtained provided that the more abstract models have been individually checked against the corresponding detailed models. This is the methodology that will be used here. For example, an instruction set model of the Power PC will be created and sequences of assembly code representing key components of the QCD code run on it to determine their execution times. Of particular interest will be the times to execute sequences of code between communication events. These times will then be used as inputs to a simpler processor model which will simply issue communication events at appropriate intervals.

This simpler, more abstract model could be trace-driven or distribution-driven, i.e. the information gained from running code on the detailed model could be used to create a trace of events which is then used to drive the more abstract model, or alternatively the abstract model could be one that generates communication events using a built-in event generator with a distribution profile tailored to match the profile of the detailed model. The selection of the most appropriate model will depend on the outcome of the simulation experiments undertaken.

Programme of Work

The first phase of the project will involve the creation of simulation models of the various components of the QCDOC computer. This work will draw on existing HASE model components wherever possible. The Power PC core, for example, is a typical RISC architecture, so the HASE DLX model will be re-coded to create a model of the Power PC architecture. Known Power PC execution times for individual instructions will be included in the model. Other components such as the DMA devices, floating-point units, etc, will be created as required. The model will be tested by running small test code sequences on the model and observing the results via the HASE animation facilities.

The second phase will involve running sequences of assembly code representing key components of the QCD code on a small scale model (with a few processors) to determine their execution times. Of particular interest will be the times to execute sequences of code between communication events, since by also determining the times required to communicate data between processors to allow more abstract models of the various components to be produced and which will thus run more efficiently when large numbers of processors are assembled together.

Each of the components will be parameterised to allow the effect on performance of varying processing speeds, bus times, cache/memory sizes, interconnection network speeds and protocols, etc., to be measured. In particular, the sensitivity of overall performance to any one of the various timing parameters will be assessed by setting each in turn to zero. Some development of the HASE environment to provide enhanced facilities to enable these experiments will also be undertaken.

The results of these experiments will be a series of graphs showing the run time of meaningful sections of the QCD code against variations in each of the variable parameters in the model. In addition, the HASE animation facility, which can display the busy/idle state of each component, will be used to gain insight into potential inefficiencies in the utilisation of the various components. This will highlight not only areas of architectural design to be investigated in the third pahse of the project, but will also enable performance optimisation of code for the system currently being built to be investigated.

The third phase of the project will use the information gained in the second phase to investigate the effects on overall performance of using alternative architectural configurations, particularly where different software/hardware trade-offs seem appropriate, and/or alternative software structures.


Related Publications

line

HASE Project
Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh
Last change 3/04/2003


Home : Research : Groups : Hase : Projects : Qcd 

Please contact our webadmin with any comments or changes.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh.