Computer Architecture Simulation & Visualisation

Stanford DASH Architecture: Cluster Simulation Model

The Stanford DASH architecture was designed to prove the feasibility of building a scaleable high performance machine with multiple coherent caches and a single address space. There are currently two HASE simulation models of parts of the DASH architecture, originally built in 1995/6 by Lawrence Williams as parts of his MSc project, one modelling a single node and one modelling a cluster of four nodes. These models were designed to demonstrate the cache coherency protocols used in the DASH [1]. The Cluster model demonstrates the snoopy bus protocol.

The files for the HASE DASH Cluster model can be downloaded as a zip file from dashclus_v2.8.zip

Instructions on how to use HASE models can be found at Downloading, Installing and Using HASE.

The Stanford DASH Architecture

The DASH architecture [2, 3] was built in the Computer Systems Laboratory at Stanford University. The main motivation underlying its inception was a desire to prove the feasibility of building a scaleable high performance machine with multiple coherent caches and a single address space. The intention was to produce a parallel architecture offering both ease of programmability (facilitated by the single-address space) and very high performance (by using hundreds to thousands of high performance (low-cost) processors).

The DASH hardware is organised as a hierarchy in which sets of processing nodes are grouped together in clusters of four, connected together via a common bus. Clusters are then connected together by an interconnection network. The DASH cluster is based upon a modified version of the Silicon Graphics POWER Station 4D/340 [4], in which the major components are:

The simulation model of the processing node consists of the two data caches and a MIPS `processor' which, rather than attempt to simulate the MIPS instruction set and run programs to generate cache addresses, simply emits a sequence of addresses (with read/write status) held in a notional memory.

diagram of DASH node

Figure 1. The Stanford DASH Node

DASH Cluster Simulation Model

The DASH Cluster simulation model is shown in Figure 2. The model has a number of size and timing parameters as described below.

diagram of DASH cluster

Figure 2. The DASH Cluster Simulaton Model

MIPS

The MIPS entity in the model contains an array of addresses to be sent to the Primary Cache. At the start of the simulation, the MIPS sends the first of these addresses, together with its Read/Write type, to the Primary Cache. Once the actions in the cache system are complete and the MIPS has received a reply, it sends the next address, and so on until it encounters an address of type z

Primary Cache

The primary cache is direct-mapped and operates a write-through/no-write-allocate policy. The line size is fixed at 4 words but the number of lines can be varied from 1 to 256 in multiples of 2 while the delay associated with a cache access can be varied from 1 to 8 clock cycles. As it processes each access, the cache icon displays the result (RH = Read Hit, RM = Read Miss, WH = Write Hit, WM = Write Miss). The data structure central to the operation of this entity is a HASE memory array which represents the cache memory contents via a C++ based array of structs. This structure specifies storage for valid, modified and shared bits as well as the cache entry tag and stored values:
Valid Modified Shared Tag Block A0 A1 A2 A3
This cache line format is shared with the secondary cache unit; the only difference in use is that the primary cache need never use the shared bit. On receipt of an incoming packet a table lookup is performed and validity bit and tag checks are made. If a hit occurs a delay is initiated before sending the result back to the MIPS entity. On a miss the packet is referred (after the miss delay) to the secondary cache entity.

Secondary Cache

The secondary level processor cache is identical to the primary cache except that it operates a write-back/write-allocate policy. As in the Primary Cache, the user can define cache size and latency through the use of entity parameters. A line in the Secondary Cache may be:

The MPBus

In the full model of a cluster the MPBus is one of the most complex entities in the simulation. It is responsible for displaying a large amount of state information detailing the on-going operation of the snoopy-bus protocol as well as carrying out the conventional tasks of bus arbitration, address and data transfer.

The Cluster Memory

The cluster memory is relatively simple in design. Because the simulation is only concerned with modelling the effects of read/writes throughout the system (and not the contents of memory locations) no actual storage needs to be modelled other than that present in the processor caches (and in these only addresses need be stored). Therefore a memory unit cycle consists of receiving an in-bound request, displaying read/write information on-screen and finally transmitting the result packet back onto the MPbus.

Parameters

The model has a number of size and timing parameters that can be varied using the sliders in the Project Inspector panel:

ParameterNominal Value
MIPS_delay1
P_cache_delay1
P_cache_size (256 max)8
S_cache_delay2
S_cache_size (1024 max)16
M_size4096
M_delay4
Bus_arb_delay1
Bus_add_delay3
Bus_data_delay4

The maximum number of cycles which the model executes can also be varied. This parameter is included so that if the termination access type is missing from one of the processor input files, the simulation will not run for ever.

The MP (Multi-Processor) Bus

The MPBus is one of the most complex entities in the simulation. It is responsible for displaying a large amount of state information detailing the on-going operation of the snoopy-bus protocol as well as carrying out the conventional tasks of bus arbitration, address and data transfer. The DASH Cluster simulation model is shown in Figure 3 with the bus expanded to show its three separate sections. From top to bottom, these correspond to the phases of:
image of DASH cluster bus

Figure 3. The DASH Cluster MP Bus

Bus Arbitration and Snooping

The icon representing the bus arbitration and snooping activities of the multiprocessor is the most complex component of the simulation model. The panel shows the following information:

The possible values that the snoop display panel can display are:

  1. Invalid(I): Block does not contain valid data
  2. Exclusive-Unmodified(EU): No other cache has this block. Therefore the data in the block is consistent with main memory
  3. Shared-Unmodified(SU): Some other caches may have this block. Data in this block is once again consistent with main memory
  4. Exclusive-Modified(EM): No other cache has this block. Data in the block has been modified locally and is therefore inconsistent with main memory.

Address Transfer

This icon represents the address transfer stage of the bus cycle. Requests to be satisfied via main memory will be seen leaving this entity on the right hand port, other cache->cache transfers will be seen moving out of the bottom port (the fact that a request moves across this cache link implies the value has been obtained from another cache).

Data Transfer

This entity simply sends the results of the transfer back to the requesting entity.

Demonstration

When the model is first loaded, the MIPS entities contain the sequences of accesses shown in the following table:

MIPS0 MIPS1 MIPS2 MIPS3
01 r d 17 r d 38 r d 00 r d
01 w d 17 w d 39 w d 58 r d
00 r d 00 r d 100 r d 57 r d
04 r d 00 w d 101 r d 03 r d
05 r d 18 r d 102 r d 03 w d
00 z d 00 z d 00 z d 00 z d

These sequences can be changed by altering the contents of the files:

References

1^     L.M. Williams and R.N. Ibbett,
"Simulating the DASH architecture in HASE",
29th Annual Simulation Symposium, SCS, pp 137-146, 1996.
2^     D.E. Lenoski,
"The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor",
TR:CSL-TR-92-507 Computer Systems Laboratory: Stanford University, 1992
3^     D.E. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta and J. Hennessy,
"The DASH Prototype: Implementation and Performance",
19th International Symposium on Computer Architecture, pp 92-103, May 1992.
4^     F. Baskett, T. Jermoluk and D. Solomon,
"The 4D-MP Graphics Superworkstation: Computing + Graphics = 40 MIPS + 40 MFLOPS and 100000 Lighted Polygons per Second",
Proc. Compcon Spring 88, pp 468-471, February, 1988.

Return to Computer Architecture Simulation Models


HASE Project
Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh
Last change 02/01/2023