### Parallel Algorithms

### Lecture 4: Consistency & Protocols

Ronald Veldema

### Generic Parallel Machines

- SPMD model
  - PRAM machine
  - A write by one cpu is immediately seen by all others
- MDMD model
  - SMP machines
  - A write \_can\_ be seen much later

### PRAM model (1)

- 1 program (SPMD programming model)
- N times local memory & cpu
- 1 memory access unit
- 1 global memory









### PRAM model (4)

- Good for proving algorithm properties but not implementable
  - Assumes each processor has constant time access to all memory
  - Concurrent reads/writes to a memory cell
  - Single clock signal for all processors



# PRAM model • Every PRAM can be simulated by every other • P-CRCW by EREW • O(log(p)) • P-CRCW by C-CRCW • O(log(p) / (log(p)log(p))) • P-CRCW by A-CRCW • O(log(p)log(p))

## PRAM modelI = 0 .. NQuestion: what is the timem[I] = 1Question: what is the timeI = 0 .. NO(1)m[I] = 0Question: how manyI = 0 .. N $M^2$ if m[I] == 1 then $N^2$



### What is a consistency model?

• Consistency model = when is it guaranteed that others (cpus/threads) see my writes









### What is a consistency protocol?

- protocol implements a given consistency model
- Examples:
  - Only a single writer at a time to a variable
    For example by letting other writers wait
  - Multiple concurent writers allowed
    - For example by merging modified data from all writers

### Sequential consistency (1)

- From lamport
- · Serializability of operations
  - a series of operations H is equal to \_some\_ series of serial operations.



### Sequential consistency (3)

- All operations of each processor happen in order of the program that specified them.
- Every write operation becomes instantaneously visible throughout the system.

### Sequential consistency (4)

- pro: very much what the programmer wants to see.
- con: slow as no operations can be (observably) reordered
- con: writes must be 'flushed' immediately.
- con: modern (SMP) machines have own caches, perform write reordering etc.
  - An programming language implementor would have to perform a lot of work...

### Release consistency (1)

- paper: Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors (1990)
- · conflicting accesses:
  - two processors read/write the same variable X with at least one of them writting X.
- · synchonization accesses:
  - flags set to inform another processor of an event. (ackquire and release for example).

### Release consistency (2)

- compiler/programmer sets flags to tell if an access is special or ordinary.
  - Special = synchonization or competing.
- before an ordinary access, all previous ackuires must have been performed
- before a release, all ordinary reads+writes must have been performed
- special accesses are sequentially consistent in respect to one another.

### Release consistency (3)

```
example:

// 'flag' = special (synchronized,

conflicting)

// 'X,Y' = ordinary (non-conflicting)

if (flag == true) { // release

X++;

Y++;

}

Note: flag can't be cached

Note: read of flag acts as 'release'

Note: X,Y access can't be moved over flag access

Note: release of 'flag' acts as 'flush thread/cpu cache'

Note: X,Y accesses can be reordered, flag accesses cannot
```

### Eager release consistency

- writes immediately trigger protocol actions: invalidate caches
- release encountered while still have outstanding writes.

### Lazy release consistency

- At release send write notices to processors that hold a copy of the data.
- Those processors can then invalidate, etc.

### Processor consistency

- writes issued by a processor may not be observed in any other order.
- writes from two processors are not neccesarily in order.

$$A = B = 0$$
 $A = 5$ 
 print B ("0")

  $B = 4$ 
 print A ("5")

### Commit-Reconcile & Fences (CRF)

- Each thread/cpu has a cache
  - 'commit' writes a value from the cache to memory
  - 'Reconcile' removes stale values from cache
  - 'Fence' prevents recordering
    - Argument of fence is list of dependent variables
    - Waits until every dependent value is stored in memory
- Other memory models are 'mapped' to CRF
- Compiler/architecture can optimize by removing C,R,F actions...



### Scope consistency

• flush cached items in cache that we're pulled into the cache at when exiting a scope.

| enter_scope();<br>B = 3<br>exit_scope(); // flush B |                          |                         |
|-----------------------------------------------------|--------------------------|-------------------------|
| B = 3<br>exit_scope(); // flush B                   | A = 2;                   |                         |
| exit_scope(); // flush B                            | enter_scope();           |                         |
|                                                     | B = 3                    |                         |
| exit scope(); // flush A                            | exit_scope(); // flush B |                         |
|                                                     | exit_scope(); // flush A |                         |
|                                                     |                          | enter_scope();<br>B = 3 |

### Single Writer Protocols

- Single Writer Protocols
- · update protocols
- · directory based systems
- start/end-read/write protocols
- protocol state diagrams, protocol-provers

### Single writer protocols (2)

write\_lock(X); X++; write\_unlock(X);

read\_lock(X)
 print X
read\_unlock(X);

### Single writer Protocols (3)

- · Invalidation based protocols
  - When writing to a data item, inform all current readers that the data item is
    - Not to be used anymore
    - · Should not remain cached





### Multiple writer protocols (1)

- each thread/cpu may concurently write to a memory location.
- the final result is some form of 'merge'.

### Multiple writer protocols (2)

read\_into\_cache(X) read\_into\_cache(Y) read\_into\_cache(Z) X++; Y++; Z++; flush\_cache();

### Java Memory Model (1)

- · Each thread has own cpu
- · Each thread has own working memory
- · Working memory is initially empty
- Working memory is flushed at each lock/unlock
  - Lock/unlock = entry/exit synchronized block

















### Java Memory Model (8)

- Possible swap
  - Write a precedes read a, read b precedes write b
    Ha=2,hb=2, ma=2,mb=2,ya=2,yb=2
  - Read a precedes write a, write b precedes read b
     Ha=1,hb=1,ma=1,mb=1,ya=1,yb=1
  - Read a precedes write a, read b precedes write b
    - Ha=2,hb=2,ma=1,mb=2,ya=1,yb=1

### Java Memory Model (9)



class Swapper {
 int a = 1, b = 2;
 synchronized void hither() {
 a = b;
 }
 synchronized void yon() {
 b = a;
 }
}







### Exploiting the Java Memory Model

- Java software DSM systems
- Allow variables to be pulled into registers until a synchronization point is reached



### Update Protocols Whenever a piece of data is updated, broadcast modications made Question: broadcast updates or maintain lists of replicas ? Broadcast every write or wait a little and save on communication overheads ?

