SIMD Array Processors

Computer Architecture Simulation & Visualisation

Return to Computer Architecture Simulation Models

HASE SIMD Array Processor Models

The HASE SIMD-1 and SIMD-2 simulation models are designed to show the principles of operation of a 1-dimensional and a 2-dimensional Single Instruction Multiple Data processing system. Each consists of a Memory, an Array Control Unit (ACU) and an SIMD array of simple processing elements (PEs). The models both use the same Memory, ACU and PE entities and differ only in terms of the number PES and the configuration of the array. This website describes the architectures of the SIMD-1 and SIMD-2 models and the instruction sets of the ACU and the Processing Elements used in their construction. The models can be used in virtual laboratory exercises in computer architecture or introductory parallel programming; suggested student exercises are included below.

The files for SIMD-1 can be downloaded as a zip file from SIMD-1.zip

The files for SIMD-2 can be downloaded as a zip file from SIMD-2.zip

Instructions on how to use HASE models can be found at Downloading, Installing and Using HASE.

The SIMD-1 Model

Figure 1 shows the initial image seen when the SIMD-1 model is loaded into HASE. The array is opened for viewing by left clicking on its icon. Figure 2 shows an image taken during animation of a simulation.

Figure 1. The SIMD-1 simulation model loaded into HASE

Figure 2. The SIMD-1 simulation model during animation

The SIMD-2 Model

Figure 3 shows the initial image seen when the model is loaded into HASE. The array is opened for viewing by left clicking in its icon. Figure 4 shows an image taken during animation of a simulation.

Figure 3. The SIMD-2 simulation model loaded into HASE

Figure 4. The SIMD-2 simulation model during animation

System Operation

The systems operates on a two phase clock. In clock cycles in which they are active, each unit executes its internal actions in the first phase of the clock and sends out a result packet in the second phase. The Memory, for example, reads an instruction or operand in the first phase and sends its output to the ACU in the second phase.

The ACU is a simple load/store, register-register arithmetic processor. It has 16 general purpose registers, a Program Counter (PC), a Condition code Register (CC) and an Instruction Register (AC-IR). The Program Counter has two fields: label and offset. The label field is initially set to "main" and the offset to zero. The ACU also uses two other registers, the Processing Element Instruction Register (PE-IR) and the Processing Element Control register (PEC) which are global registers used to communicate with the SIMD Array. The Processing Elements operate in lock step, i.e. each active PE (determined by the state of its PEC bit) obeys the same instruction at the same time. Whenever a PE ACC is updated by a PE instruction, the PE sends the new ACC value to each of its neighbours.

The ACU Instruction Set

Table 1 shows the instruction set of the ACU. It includes absolute jumps (JUMP and JREG) a relative branch (BRANCH), loads (LD, LDL, LDX) and stores (ST and STX), a move instruction (MOV) which allows values to be transferred between the ACU registers and the PE Accumulators, register-register operations (ADD, etc) and register-literal arithmetic operations (ADDL, etc). All Literal (Immediate) operands are treated as 16-bit signed integers, i.e. any literal values which require >16 bits for their representation are truncated and sign extended to the 32-bit representation used by the adders.

INSTRUCTION	ACTION
JUMP I	PC = main/Immediate
JREG RS	PC = main/(RS)
SEQ RS1 RS2	SET CC if RS1 = RS2
SNE RS1 RS2	SET CC if RS1 != RS2
SGT RS1 RS2	SET CC if RS1 > RS2
SLT RS1 RS2	SET CC if RS1 < RS2
SGE RS1 RS2	SET CC if RS1 >= RS2
SLE RS1 RS2	SET CC if RS1 <= RS2
BRANCH Label	if CC = 1, PC = Label/0 if CC = 0 PC = PC + 1
LDI RD I	RD = Immediate
LDM RD Rx Address	RD = Memory[Address + (Rx)]
STM RS Rx Address	RS => Memory[Address + (Rx)]
LPEC I	PEC = Immediate
IPEC	Invert PEC bits
ALDR RS	PE ACC = RS
MOV RD RS	RD = RS
MOV RD AS	RD = AS
MOV AD RS	AD = RS
MOV RD P	RD = PEC
MOV P RS	PEC = RS
ADD RD RS1 RS2 e.g. `ADD R1 R2 R3`	RD = RS1 + RS2

INSTRUCTION	ACTION
ADDI RD RS1 I e.g. `ADD R1 R2 42`	RD = RS1 + Immediate
SUB RD RS1 RS2	RD = RS1 - RS2
SUBI RD RS1 I	RD = RS1 - Immediate
MUL RD RS1 RS2	RD = RS1 * RS2
MULI RD RS1 I	RD = RS1 * Immediate
DIV RD RS1 RS2	RD = RS1 / RS2
DIVI RD RS1 I	RD = RS1 / Immediate
AND RD RS1 RS2	RD = RS1 & RS2
ANDI RD RS1 I	RD = RS1 & Immediate
OR RD RS1 RS2	RD = RS1 \| RS2
ORI RD RS1 I	RD = RS1 \| Immediate
XOR RD RS1 RS2	RD = RS1 ^ RS2
XORI RD RS1 I	RD = RS1 ^ Immediate
SLL RD RS1 RS2	RD = RS1 << RS2
SLLI RD RS1 I	RD = RS1 << Immediate
SRL RD RS1 RS2	RD = RS1 >> RS2
SRLI RD RS1 I	RD = RS1 >> Immediate
SRA RD RS1 RS2	RD = RS1 >> RS2
SRAI RD RS1 I	RD = RS1 >> Immediate
NOP	No Operation
STOP	Stops the simulation

Table 1. ACU instruction set

JUMP Literal

JUMP takes a Literal (Immediate) operand which is loaded into the offset field of PC as an absolute instruction address; the label field of PC is set to "main". The value in PC is then sent to the Memory.

JREG RS

JREG uses a single source register (RS), the contents of which are read and loaded into the offset field of PC as an absolute instruction address; the label field of PC is set to "main". The value in PC is then sent to the Memory.

Compare Instructions

Compare instructions (SEQ, SNE, SGT, SLT, SGE, SLE) compare the values in registers RS1 and RS2 and set CC accordingly.

BRANCH

BRANCH takes a label as its operand. If CC = 1, the label is loaded into the label field of PC while the offset field is set to 0; if CC = 0, the offset field of PC is incremented. The value in PC is then sent to the Memory.

LDL RD Literal

LDL loads the value in the Literal field into the destination register (RD).

LDM RD Rx Address

LDM adds the value held in register Rx to the value in the Address field, sends the result to the Data Memory and loads the value received from the Memory into the destination register (RD).

STM RS Rx Address

STX adds the value in register Rx to the value in the Address field, then sends this address to Memory together with the value in source register RS.

LPEC Literal

LPEC loads the value in the Literal field into the PE Control Register. It’s convenient to specify the Literal in hex format in the instruction (e.g. 0xff in SIMD-1 or 0xffff in SIMD-2 to set all the processors active), though it will be displayed in decimal in the Instruction Register (AC-IR).

ALDR RS

The ACU sets the function in PE-IR to ALDI and copies the value in RS to the Immediate field in PE-IR. Those PEs that have their PE control bit set to active then execute the ALDI instruction.

ALDR Address(RS)

The ACU sets the function in PE-IR to ALDM and copies the value of Address plus the value in RS to the Address field in PE-IR. Those PEs that have their PE control bit set to active then execute the ALDM instruction.

MOV

MOV allows values to be moved from one R register to another, between an R register and the Accumulator of one of the Processing Elements in the array, or between an R register and PEC. Other combinations will cause the simulation to stop.

MOV RD RS : moves the value in RS to RD
MOV RD AS : moves the value in the ACC of the PE specified by AS to RD
MOV AD RS : loads the value in RS into the ACC of the PE specified by AS
MOV RD P : loads RD with the value in PEC
MOV P RS : loads PEC with the value in RS

Aritmetic Operations

ADD/ADDL adds the values in RS1 and RS2/Literal and writes the result into RD.

SUB/SUBL subtracts the value in RS2/Literal from the value in RS1 and writes the result into RD.

MUL/MULL multiplies the values in RS1 and RS2/Literal and writes the result into RD.

DIV/DIVL divides the value in RS1 by RS2/Literal and writes the result into RD.

AND/ANDL ands the values in RS1 and RS2/Literal and writes the result into RD.

OR/ORL ors the values in RS1 and RS2/Literal and writes the result into RD.

XOR/XORL xors the values in RS1 and RS2/Literal and writes the result into RD.

Shift Instructions

SLL and SLLL (Shift Left Logical) shifts left the value in RS1 by the number of places specified in RS2 (SLL) or the Literal (SLLL) and writes the result into RD.

SRL and SLRL (Shift Right Logical) shifts right the value in RS1 by the number of places specified in RS2 (SRL) or the Literal (SRLL), inserting zeros into digits to the left of the most significant bit; the result is written into RD.

SRA and SRAL (Shift Right Arithmetic) shifts right the value in RS1 by the number of places specified in RS2 (SRA) or the Literal (SRAL), copying the sign digit into digits to its left; the result is written into RD.

STOP stops the simulation.

The Array Processing Element Instruction Set

Table 2 shows the instruction set of the Processors in the Array. The processors take their instructions from the Instruction Register (PE-IR), which is loaded by the ACU. Instructions are divided into Arithmetic/Logical instructions and PE Control Setting instructions. The Arithmetic/Logical instructions are further divided into three subgroups with opcodes of the form Opcode, OpcodeI and OpcodeM. All are single address instructions which combine the value in their Accumulator (ACC) with an input Operand and return the result to their own ACC. At the end of an operation which updates the ACC, each active PE sends the value in its Accumulator to its neighbours.

In the SIMD-1 model the source of the Operand for an Opcode instructions can be E0, E1, W0, W1 or P, while in the SIMD-2 model the Operand can be N0, N1, E0, E1, W0, W1, S0, S1 or P, where an

N Operand = value sent by processor above

E Operand = value sent by processor to the right

W Operand = value sent by processor to the left

S Operand = value sent by processor below

P Operand = Processor’s own number within the array (0-7 in SIMD-1, 0-15 in SIMD-2)

and where 0 and 1 indicate the Wraparound condition:

N0/E0/W0/S0 a PE at the end of a row/column uses zero as its operand

N1/E1/W1/S1 a PE at the end of a row/column uses the value sent from the PE at the other end of the row/column

For an OpcodeI instruction, the Operand is the Immediate value in PE-IR.

For and OpcodeM instruction, the ACU reads the value in Rx from within its own Registers, adds this to the value in the Address field in AC-IR and loads this modified value into the address field of the instruction as it is copied into PE-IR. Each Processing Element accesses its own memory to obtain the operand using the modified address in PE-IR, i.e. it ignores the Rx field.

PE Control Setting Instructions are of the OpcodeI type. When executing these instructions, each PE sets its own control bit in the PE Control Register (PEC) according to the condition selected by the Opcode

Arithmetic/Logical Instructions
INSTRUCTION			ACTION
ALD	ALDI	ALDM	ACC = Operand
AADD	AADDI	AADDM	ACC = ACC + Operand
ASUB	ASUBI	ASUBM	ACC = ACC - Operand
ABUS	ABUSI	ABUSM	ACC = Operand - ACC
AMUL	AMULI	AMULM	ACC = ACC * Operand
ADIV	ADIVI	ADIVM	ACC = ACC / Operand
AVID	AVIDI	AVIDM	ACC = Operand / ACC
AAND	AANDI	AANDM	ACC = ACC & Operand
AOR	AORI	AORM	ACC = ACC \| Operand
AXOR	AXORI	AXORM	ACC = ACC ^ Operand
ASLL	ASLLI	ASLLM	ACC = ACC << Operand
ASRL	ASRLI	ASRLM	ACC = ACC >> Operand
ASRA	ASRAI	ASRAM	ACC = ACC >> Operand
---	---	ASTM	ACC => Operand

PE Control Setting Instructions
INSTRUCTION	CONDITION
ASEQ	ACC = Immediate
ASNE	ACC != Immediate
ASGT	ACC > Immediate
ASLT	ACC < Immediate
ASGE	ACC >= Immediate
ASLE	ACC <= Immediate

Table 2 Array Element instruction set

Using the SIMD-1 Model

When first loaded, the model contains a program (Figure 5) that reverses the order of the values held in memory locations 0 and 2 of the Processing Elements and leaves the results in location 1 and 3 of each of their memories, i.e. it moves the data in memory locations 0 and 2 of PEs 0-7 into memory locations 1 and 3 of PEs 7-0.

The files for the SIMD-1 model include:

MEMORY.instr_mem.mem MEMORY.data_mem.mem	the initial contents of the ACU Instruction and Data memories contained within the Memory Unit
ACU.main_reg.mem	the initial contents of the ACU registers; this file should not be edited
SIMD1.pe1.pe_mem.mem	the initial contents of the PE memories

Because the SIMD Array of processing elements (SIMD1) is defined as a mesh (MESH1D in the simd-1.edl file), HASE automatically loads the memories in the SIMD1 array from the SIMD1.pe1.pe_mem.mem file. This file has the following form:

0

8

0

//$next section

1

0

9

0

//$next section

Thus the first four values are loaded into locations 0-3 of PE0’s memory, the second four into locations 0-3 of PE1’s memory, etc.

LDI R1 0	// R1 is used as an index into the PE memory
LDI R2 4	// R2 is used as the limiting index value
loop:	//
ALDM R1 0	// load data into ACCs
ALD W1	// load data from west neighbour
LPEC 0x88	// load PE Control register
ASTM R1 1	// Write data to memories of selected PEs
LPEC 0xFF	//
ALD W1	//
ALD W1	//
LPEC 0x44	//
ASTM R1 1	//
LPEC 0xFF	//
ALDM R1 0	//
ALD E1	//
LPEC 0x11	//
ASTM R1 1	//
LPEC 0xFF	//
ALD E1	//
ALD E1	//
LPEC 0x22	//
ASTM R1 1	//
LPEC 0xFF	//
ALDM R1 1	//
ADDI R1 R1 2	//
SLT R1 R2	// test for limiting index value
BRANCH loop	//
STOP	//

Figure 5. SIMD-1 Demonstration Program Code

To run a new program, edit MEMORY.instr_mem.mem and, if appropriate, MEMORY.data_mem.mem and SIMD1.pe1.pe_mem.mem. Then just press run. The new contents of the memories will appear in the parameters panel as the locations are accessed during animation. To get the values to appear immediately, re-load the model after closing the edits.

Suggested Student Exercise

For the SIMD-1 Array Processor model, write an assembly code program, together with appropriate data in the PE memories, that loads 0 into the ACC of processor 0, 1 into the ACC of processor 1, etc, and then forms the set of parallel pre-fix sums for these numbers. You can load the ACCs from memory or by using the own processor number facility in the instruction set.

Your submission for this part should consist of three documents:

a commented listing of the program
an uncommented listing of the program suitable for copying and pasting into the ACU memory file (so that the instructor can run it)
a text file containing a note of the number of simulated clock cycles your program took to execute and comments on the degree of parallelism during its execution.

Using the SIMD-2 Model

When first loaded, the model contains the program shown in Figure 6. The first instruction loads each PE ACC with the PE’s own number in the array. The second instruction loads each PE ACC with the value in word 0 of its memory, in this case equal to the process number, so the result should be 0 in each ACC. The next four instructions load the ACC with the value in word 1, multiply it by the value in word 2, subtract the value in word 3 and then set the PE Control bit if the result is negative. The next instruction (Reverse Subtract) is only executed in PEs with a negative ACC (making their ACC values positive). Finally, the PE control bits are all set to 1 and all ACC values are written to memory word 4. The files for the SIMD-2 model include:

MEMORY.instr_mem.mem MEMORY.data_mem.mem	the initial contents of the ACU Instruction and Data memories contained within the Memory Unit
ACU.main_reg.mem	the initial contents of the ACU registers; this file should not be edited
SIMD2.pe2.pe_mem.mem	the initial contents of the PE memories

Similarly to the SIMD1 Array, the SIMD Array of processing elements (SIMD2) is defined as a mesh (MESH2D in the simd-2.edl file), HASE automatically loads the memories in the SIMD2 array from the SIMD2.pe2.pe_mem.mem file has the same form as SIMD1.pe1.pe_mem.mem, so the first four values are loaded into locations 0-3 of PE0’s memory, the second four into locations 0-3 of PE1’s memory, etc. Note that the PEs are numbered in column order.

ALD P	// load ACC with PE’s own number
ASUBM R0 0	// subtract value in memory word 0 from ACC
ALDM R0 1	// load ACC from memory word 1
AMULM R0 1	// multiply by value in memory word 2
ASUBM R0 3	// subtract value in memory word 3
ASLT 0	// set PE Control bit if ACC -ve
ABUSI 0	// negate value in ACC
LPEC 0xFFF	// set all PE Control bits
ASTM R0 4	// write ACC value to memory word 4
STOP

Figure 6. SIMD-2 Demonstration Program Code

Suggested Student Exercise

Figure 7 shows a data file suitable for use as a SIMD2.pe2.pe_mem.mem file. Copy this data into the file of this name in your copy of the SIMD-2 model files. For each PE, locations 1-4 and 5-8 contain elements of vectors A and B respectively. Write an assembly code program for SIMD-2 which first of all forms the scalar product of these two vectors, leaving the result in location 0 of the memory of each PE. The program should then form the sum of (a) all the positive scalar product results (copying this result into ACU Register 3) (b) the sum of all the negative scalar product results (copying this result into ACU Register 4).

Your submission for this part should consist of four documents:

a commented listing of the program
an uncommented listing of the program suitable for copying and pasting into the ACU memory file (so that the instructor can run it)
a listing of the contents of the ACU registers at the end of the program
a text file containing a note of the number of simulated clock cycles your program took to execute and comments on the degree of parallelism during its execution

0
1
2
3
4
5
6
7
8
0
//$next section
0
9
10
11
12
-13
14
15
-16
0
//$next section
0
17
18
19
20
21
22
23
24
0
//$next section
0
25
26
27
28
29
-30
31
-32
0
//$next section
0
33
34
35
36
37
38
38
40
0
//$next section
0
41
-42
43
44
45
46
47
-48
0
//$next section
0
49
50
51
52
53
54
55
56
0
//$next section
0
-57
58
59
60
61
62
63
-64
0
//$next section
0
65
66
-67
68
69
70
71
72
0
//$next section
0
73
74
75
76
77
78
79
80
0
//$next section
0
81
82
-83
84
85
86
87
-88
0
//$next section
0
89
90
91
92
93
94
95
96
0
//$next section
0
97
98
99
100
101
102
103
104
0
//$next section
0
105
106
107
-108
-109
110
111
112
0
//$next section
0
113
114
115
116
117
-118
119
120
0
//$next section
0
121
122
123
124
125
126
-127
-128
0

Figure 7. Data file for SIMD-2 Exercise

Return to Computer Architecture Simulation Models

HASE Project
Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh
Last change 20/04/2023

N	Operand = value sent by processor above
E	Operand = value sent by processor to the right
W	Operand = value sent by processor to the left
S	Operand = value sent by processor below
P	Operand = Processor’s own number within the array (0-7 in SIMD-1, 0-15 in SIMD-2)

N0/E0/W0/S0	a PE at the end of a row/column uses zero as its operand
N1/E1/W1/S1	a PE at the end of a row/column uses the value sent from the PE at the other end of the row/column