
Computer Architecture Simulation & Visualisation
The QCD Computer Simulation Project (EPSRC GR/R/27129)
Members |
Aims |
HASE |
QCD |
UKQCD |
Methodology |
Programme |
Publications |
Project Members
Overall Project Aims
The overall aim of the QCD Computer Simulation project is to gain an
understanding of the factors which influence the performance of QCD
computers and to influence future high performance system
designs. This will be achieved through the construction of a
parameterised simulation model in HASE (a Hierarchical computer
Architecture Design and Simulation Environment) of the proposed UKQCD
computer which can be used:
- to investigate the factors which will influence the performance
of this computer
- To explore the design parameter space of the model to
investigate variations in performance against a range of architectural
parameters in order to inform the design of subsequent generations of
such computers
The project wil also involve:
- Developing new visualisation mechanisms within HASE to allow
tailored views of system performance appropriate to computer
architects and compiler writers to be presented
- Developing a WWW interface to HASE allow external users to
experiment with the model as a Grid resource.
HASE
HASE is a Hierarchical computer Architecture design and Simulation
Environment developed at the University of Edinburgh. The main goal
of the HASE project has been to provide computer architects with a set
of tools that allow the rapid development and exploration of system
designs. The initial ideas were investigated in a PhD thesis, but it
has evolved significantly as a result of the requirements of various
of the projects for which it has been used and it is now a
sophisticated and relatively stable environment. HASE addresses the
fourth of the "Grand Challenge Problems in Computer Architecture"
identified at the 1992 Purdue Workshop "to develop sufficient
infrastructure to allow rapid prototyping of hardware ideas and the
associated software in a way that permits realistic evaluation".
HASE includes facilities to allow a simulation model to be
hierarchically structured (to reflect various levels of
architectural/problem abstraction) and the visual verification of a
model's behaviour via an animation of the simulation design
window. Separate windows can be opened to allow the (changing)
contents of memory and registers to be viewed.
HASE has been used for a variety of research projects. For example,
simulation models of the DASH and the DLX architecture have been
created and their robustness tested through use in student exercises.
HASE itself was largely developed as part of an EPSRC funded project
(GR/J4329) "Algorithms, Architectures and Models of Computation", an
investigation of parallel architectures to support the HPRAM model of
computation, and was subsequently used by an EPSRC funded PhD student
to create models of a shared memory multiprocessor systems. A Java
version (Simjava) was used in an EPSRC funded project (GR/K19716) to
evaluate of multiprocessor interconnection networks.
Quantum Chromodynamics
Quantum Chromodynamics (QCD) describes theoretically the strong
interactions between quarks and gluons. One of the essential features
of QCD is that these elementary particles are always bound together,
confined inside mesons and baryons, collectively called hadrons. This
provides a challenge in relating theoretical and practical results,
since the Standard Model of particle physics describes the
interactions of the quarks and gluons, not of the experimentally
observed hadrons.
To relate the experimental observations to the predictions from the
Standard Model thus needs detailed evaluation of the hadronic
structure, relating the quark constituents to the observed hadronic
properties in a precise way. The only theoretical method to achieve
this, with full control of all sources of error, is via large-scale
numerical simulation: lattice QCD.
The UK QCD Collaboration
The UKQCD collaboration is one of the leading lattice QCD projects in
the world, having pioneered many successful applications to particle
physics phenomenology. It has recently been awarded JIF funding to
build the fastest computer in the world for simulating strong
interactions.
The UKQCD machine will be based on the Columbia QCDOC architecture.
QCDOC is a natural evolution of the massively parallel QCDSP machine
which won the 1998 IEEE Gordon Bell prize for the best
price/performance high-end computer. The individual processing nodes
in QCDOC will be Power PC-based and interconnected in a 4-dimension
mesh with the topology of a torus. Each node in QDOC will be a single
applications specific integrated circuit (ASIC) containing a 500 MHz
440 PowerPC processor core with a 1 Gflops, 64-bit floating point unit
and 4 MBytes of on-chip memory together with a Direct Memory Access
(DMA) unit for moving data between on-chip and external memory. It
will also contains circuitry to support internode communication and an
Ethernet controller for a boot-diagnostic-I/O network.
Each processor will be capable of sending and receiving data from each
of its eight nearest neighbors in four dimensions at a rate of 500
Mbit/sec. This will provide a total off-node bandwidth of 8 Gbit/sec.
Each of these 16 communication channels will have its own DMA
capability allowing autonomous reads/writes from either on-chip or
external memory. As in the QCDSP machines, an efficient and
low-latency global sum, global max and broadcast capability will be
incorporated into the serial communication.
Research Methodology
A variety of techniques can be used to explore design trade-offs in a
computer architecture and to evaluate its performance. Evaluating the
performance of an existing system is in principle straightforward,
since benchmarks can be run and the execution time measured. However,
without extensive instrumentation (possibly involving both hardware
and software, which can itself influence the results), this provides
little insight into the causes of performance limitations.
Furthermore, it offers very little opportunity to measure the effects
of varying architectural design parameters. One alternative is to use
analytical modeling. This has been done effectively for a variety of
multiprocessor system components but the models are usually driven by
workload models rather than benchmarks or real applications and the
results can be unreliable, particularly if attempts are made to model
complex systems containing a variety of components which need different
kinds of workload model. Simulation has therefore become a popular
approach for predicting performance of complex systems.
The results from a simulation are only meaningful if the model is
accurate and the input data is meaningful. The more detailed the
model, the more likely it is to be accurate but, unfortunately, the
longer it will take to run. Thus in building large complex models it
is essential to use more abstract models of the various components.
This allows meaningful results to be obtained provided that the more
abstract models have been individually checked against the
corresponding detailed models. This is the methodology that will be
used here. For example, an instruction set model of the Power PC will
be created and sequences of assembly code representing key components
of the QCD code run on it to determine their execution times. Of
particular interest will be the times to execute sequences of code
between communication events. These times will then be used as inputs
to a simpler processor model which will simply issue communication
events at appropriate intervals.
This simpler, more abstract model could be trace-driven or
distribution-driven, i.e. the information gained from running code on
the detailed model could be used to create a trace of events which is
then used to drive the more abstract model, or alternatively the
abstract model could be one that generates communication events using a
built-in event generator with a distribution profile tailored to match
the profile of the detailed model. The selection of the most
appropriate model will depend on the outcome of the simulation
experiments undertaken.
Programme of Work
The first phase of the project will involve the creation of simulation
models of the various components of the QCDOC computer. This work will
draw on existing HASE model components wherever possible. The Power PC
core, for example, is a typical RISC architecture, so the HASE DLX
model will be re-coded to create a model of the Power PC
architecture. Known Power PC execution times for individual
instructions will be included in the model. Other components such as
the DMA devices, floating-point units, etc, will be created as
required. The model will be tested by running small test code
sequences on the model and observing the results via the HASE
animation facilities.
The second phase will involve running sequences of assembly code
representing key components of the QCD code on a small scale model
(with a few processors) to determine their execution times. Of
particular interest will be the times to execute sequences of code
between communication events, since by also determining the times
required to communicate data between processors to allow more abstract
models of the various components to be produced and which will thus
run more efficiently when large numbers of processors are assembled
together.
Each of the components will be parameterised to allow the effect on
performance of varying processing speeds, bus times, cache/memory
sizes, interconnection network speeds and protocols, etc., to be
measured. In particular, the sensitivity of overall performance to
any one of the various timing parameters will be assessed by setting
each in turn to zero. Some development of the HASE environment to
provide enhanced facilities to enable these experiments will also
be undertaken.
The results of these experiments will be a series of graphs showing
the run time of meaningful sections of the QCD code against variations
in each of the variable parameters in the model. In addition, the
HASE animation facility, which can display the busy/idle state of each
component, will be used to gain insight into potential inefficiencies
in the utilisation of the various components. This will highlight not
only areas of architectural design to be investigated in the third
pahse of the project, but will also enable performance optimisation of
code for the system currently being built to be investigated.
The third phase of the project will use the information gained in the
second phase to investigate the effects on overall performance of
using alternative architectural configurations, particularly where
different software/hardware trade-offs seem appropriate, and/or
alternative software structures.
Related Publications
-
H.J. Siegel, S. Abraham, et al. (1992)
"Report of the Purdue Workshop on Grand Challenges in Computer
Architecture for the Support of High Performance Computing".
J. Parallel and Distributed Comput., Vol 16, 199-211.
-
P.S. Coe, F.W. Howell, R.N. Ibbett and
L.M. Williams, "A Hierarchical Computer Architecture Design and
Simulation Environment", ACM TOMACS, Vol. 8 No. 4, 1998, pp 431-446.
-
R.N. Ibbett, "Computer Architecture Visualisation Techniques",
Microprocessors and Microsystems, Elsevier, Vol 23, pp 291-300, 1999.
-
R. N. Ibbett, "Computer Architecture Visualisation: the HASE DLX
Simulation Model", IEEE Micro, Vol 20, no 3, pp 57-65, 2000.
-
D. Chen, P. Chen, N. Christ, R. Edwards, G. Fleming, A. Gara,
S. Hansen, C. Jung, A. Kaehler, A. Kennedy, G. Kilcup, Y. Luo,
C. Malureanu, R. Mawhinney, J. Parsons, J. Sexton, C.Z. Sui, & P.
Vranas. "QCDSP: A teraflop scale massively parallel supercomputer"
Technical Report, SCRI, 1997, ACM/IEEE SC97, 1997.
-
D. Chen, P. Chen, N. H. Christ, R. G. Edwards, G. Fleming, A. Gara,
S. Hansen, C. Jung, A. Kahler, S. Kasow, A.D. Kennedy, G. Kilcup,
Y. Luo, C. Malureanu, R.D. Mawhinney, J. Parsons, C. Sui, P. Vranas &
Y. Zhestkov,
"QCDSP: Design, Performance and Cost", ACM/IEEE SC97, 1998.
-
D. Chen, N. Christ, Z. Dong, A. Gara, R. Mawhinney, S. Ohta,
T. Wettig, "QCDOC Architecture" Internal Report, Columbia
University, May 2000
HASE Project
Institute for Computing Systems Architecture, School of Informatics,
University of Edinburgh
Last change 3/04/2003
Please contact our
webadmin with any comments or changes.
Unless explicitly stated otherwise, all material is
copyright © The University of Edinburgh.