## CS 110 Computer Architecture Lecture 24: More I/O: DMA, Disks, Networking

Instructor: Sören Schwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

## **Virtual Memory Review**

### Modern Virtual Memory Systems

Illusion of a large, private, uniform store

#### **Protection & Privacy**

\* Many processes, each with their own private address space and one or more shared address spaces

#### **Demand Paging**

\* Many processes share DRAM.

\* Provides ability to run programs with large address space. Pages that aren't yet allocated or pages that don't fit swap to secondary storage. \* Hides differences in machine configurations

The price is address translation on each memory reference







#### Private (Virtual) Address Space per Program



#### Translation Lookaside Buffers (TLB)

Address translation is very expensive! In a two-level page table, each reference becomes several memory accesses

Solution: Cache translations in TLB TLB hit => Single-Cycle Translation TLB miss => Page-Table Walk to refill



#### Page-Based Virtual-Memory Machine

(Hardware Page-Table Walk)



• Page tables held in untranslated physical memory



# Review: I/O

- "Memory mapped I/O": Device control/data registers mapped to CPU address space
- CPU synchronizes with I/O device:
  - Polling
  - Interrupts
- "Programmed I/O":
  - CPU execs lw/sw instructions for all data movement to/from devices
  - CPU spends time doing 2 things:
    - 1. Getting data from device to main memory
    - 2. Using data to compute

## Working with real devices

- "Memory mapped I/O": Device control/data registers mapped to CPU address space
- CPU synchronizes with I/O device:
  - Polling
  - Interrupts
- "Programmed I/O": DMA
  - CPU execs lw/sw instructions for all data movement to/from devices
  - CPU spends time doing <del>2 things</del>:
    - 1. Getting data from device to main memory
    - 2. Using data to compute

## Agenda

- Direct Memory Access (DMA)
- Disks
- Networking

## What's wrong with Programmed I/O?

- Not ideal because ...
  - 1. CPU has to execute all transfers, could be doing other work
  - 2. Device speeds don't align well with CPU speeds
  - 3. Energy cost of using beefy general-purpose CPU where simpler hardware would suffice
- Until now CPU has sole control of main memory

### PIO vs. DMA



## Direct Memory Access (DMA)

- Allows I/O devices to directly read/write main memory
- New Hardware: the <u>DMA Engine</u>
- DMA engine contains registers written by CPU:
  - Memory address to place data
  - # of bytes
  - I/O device #, direction of transfer
  - unit of transfer, amount to transfer per burst

## **Operation of a DMA Transfer**



Figure 5-4. Operation of a DMA transfer.

[From Section 5.1.4 Direct Memory Access in *Modern Operating Systems* by Andrew S. Tanenbaum, Herbert Bos, 2014]

## **DMA: Incoming Data**

- 1. Receive interrupt from device
- 2. CPU takes interrupt, begins transfer
  - Instructs DMA engine/device to place data @ certain address
- 3. Device/DMA engine handle the transfer
   CPU is free to execute other things
- 4. Upon completion, Device/DMA engine interrupt the CPU again

# DMA: Outgoing Data

- 1. CPU decides to initiate transfer, confirms that external device is ready
- 2. CPU begins transfer
  - Instructs DMA engine/device that data is available
     @ certain address
- 3. Device/DMA engine handle the transfer
   CPU is free to execute other things
- 4. Device/DMA engine interrupt the CPU again to signal completion

## DMA: Some new problems

- Where in the memory hierarchy do we plug in the DMA engine? Two extremes:
  - Between L1 and CPU:
    - Pro: Free coherency
    - Con: Trash the CPU's working set with transferred data
  - Between Last-level cache and main memory:
    - Pro: Don't mess with caches
    - Con: Need to explicitly manage coherency

## DMA: Some new problems

- How do we arbitrate between CPU and DMA Engine/Device access to memory? Three options:
  - Burst Mode
    - Start transfer of data block, CPU cannot access memory in the meantime
  - Cycle Stealing Mode
    - DMA engine transfers a byte, releases control, then repeats interleaves processor/DMA engine accesses
  - Transparent Mode
    - DMA transfer only occurs when CPU is not using the system bus

## Agenda

- Direct Memory Access (DMA)
- Disks
- Networking

## **Computer Memory Hierarchy**



## Magnetic Disk – common I/O device

- A kind of computer memory
  - Information stored by magnetizing ferrite material on surface of rotating disk
    - similar to tape recorder except digital rather than analog data
- A type of non-volatile storage
  - retains its value without applying power to disk.
- Two Types of Magnetic Disk
  - 1. Hard Disk Drives (HDD) faster, more dense, non-removable.
  - 2. Floppy disks slower, less dense, removable (now replaced by USB "flash drive").
- Purpose in computer systems (Hard Drive):
  - 1. Working file system + long-term backup for files
  - 2. Secondary "backing store" for main-memory. Large, inexpensive, slow level in the memory hierarchy (virtual memory)

#### Photo of Disk Head, Arm, Actuator



## **Disk Device Terminology**



- Several platters, with information recorded magnetically on both surfaces (usually)
- Bits recorded in <u>tracks</u>, which in turn divided into <u>sectors</u> (e.g., 512 Bytes)
- <u>Actuator</u> moves <u>head</u> (end of <u>arm</u>) over track (<u>"seek"</u>), wait for <u>sector</u> rotate under <u>head</u>, then read or write

## Hard Drives are Sealed. Why?

- The closer the head to the disk, the smaller the "spot size" and thus the denser the recording.
  - Measured in Gbit/in^2
  - ~900 Gbit/in^2 is state of the art
  - Started out at 2 Kbit/in^2
  - ~450,000,000x improvement in ~60 years
- Disks are sealed to keep the dust out.
  - Heads are designed to "fly" at around
     3-20nm above the surface of the disk.
  - 99.999% of the head/arm weight is supported by the air bearing force (air cushion) developed between the disk and the head.







- Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Overhead
  - Seek Time = time to position the head assembly at the proper cylinder
  - Rotation Time = time for the disk to rotate to the point where the first sectors of the block to access reach the head
  - Transfer Time = time taken by the sectors of the block and any gaps between them to rotate past the head

# Disk Device Performance (2/2)

- Average values to plug into the formula:
- Rotation Time: Average distance of sector from head?
  - 1/2 time of a rotation
    - 7200 Revolutions Per Minute => 120 Rev/sec
    - 1 revolution = 1/120 sec => 8.33 milliseconds
    - 1/2 rotation (revolution) => 4.17 ms
- Seek time: Average no. tracks to move arm?
  - Number of tracks/ 3
  - Then, seek time = number of tracks moved × time to move across one track

## But wait!

- Performance estimates are different in practice:
- Many disks have on-disk caches, which are completely hidden from the outside world
  - Previous formula completely replaced with ondisk cache access time

## Where does Flash memory come in?

- ~10 years ago: Microdrives and Flash memory (e.g., CompactFlash) went head-to-head
  - Both non-volatile (retains contents without power supply)
  - Flash benefits: lower power, no crashes (no moving parts, need to spin µdrives up/down)
  - Disk cost = fixed cost of motor + arm mechanics, but actual magnetic media cost very low
  - Flash cost = most cost/bit of flash chips
  - Over time, cost/bit of flash came down, became cost competitive



| 36   | NORAT BOX            |                         |
|------|----------------------|-------------------------|
| .0mm | CompactFlash<br>2 GB | C E Product of Thailand |
|      | 43.0mm               |                         |

## Flash Memory / SSD Technology



- NMOS transistor with an additional conductor between gate and source/drain which "traps" electrons. The presence/absence is a 1 or 0
- Memory cells can withstand a limited number of program-erase cycles. Controllers use a technique called *wear leveling* to distribute writes as evenly as possible across all the flash blocks in the SSD.

#### What did Apple put in its iPods?



### Flash Memory in Smart Phones



## Flash Memory in Laptops – Solid State Drive (SSD)

#### capacities up to 1TB

## HDD vs SSD speed





## Question

- We have the following disk:
  - 15000 Cylinders, 1 ms to cross 1000 Cylinders
  - 15000 RPM = 4 ms per rotation
  - Want to copy 1 MB, transfer rate of 1000 MB/s
  - 1 ms controller processing time
- What is the access time using our model?

Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Processing Time

| Α       | В    | С      | D       | E     |
|---------|------|--------|---------|-------|
| 10.5 ms | 9 ms | 8.5 ms | 11.4 ms | 12 ms |

## Question

- We have the following disk:
  - 15000 Cylinders, 1 ms to cross 1000 Cylinders
  - 15000 RPM = 4 ms per rotation
  - Want to copy 1 MB, transfer rate of 1000 MB/s
  - 1 ms controller processing time
- What is the access time?

Seek = # cylinders/3 \* time = 15000/3 \* 1ms/1000 cylinders = 5ms

Rotation = time for  $\frac{1}{2}$  rotation = 4 ms / 2 = 2 ms

Transfer = Size / transfer rate = 1 MB / (1000 MB/s) = 1 ms

Controller = 1 ms

Total = 5 + 2 + 1 + 1 = 9 ms

## Agenda

- Direct Memory Access (DMA)
- Disks
- Networking

## Networks: Talking to the Outside World

- Originally sharing I/O devices between computers

   E.g., printers
- Then communicating between computers

– E.g., file transfer protocol

Then communicating between people

– E.g., e-mail

- Then communicating between networks of computers
  - E.g., file sharing, www, ...

www.computerhistory.org/internet\_history

# The Internet (1962)

#### History

- 1963: JCR Licklider, while at DoD's ARPA, writes a memo describing desire to connect the computers at various research universities: Stanford, Berkeley, UCLA, ...
- 1969 : ARPA deploys 4 "nodes"
  @ UCLA, SRI, Utah, & UCSB
- 1973 Robert Kahn & Vint Cerf invent <u>TCP</u>, now part of the <u>Internet Protocol Suite</u>
- Internet growth rates
  - Exponential since start!



www.greatachievements.org/?id=3736
en.wikipedia.org/wiki/Internet\_Protocol\_Suite

en.wikipedia.org/wiki/History of the World Wide Web

# The World Wide Web (1989)

- "System of interlinked hypertext documents on the Internet"
- History
  - 1945: Vannevar Bush describes hypertext system called "memex" in article
  - 1989: Sir Tim Berners-Lee proposed and implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server using the internet.
  - ~2000 Dot-com entrepreneurs rushed in, 2001 bubble burst
- Today : Access anywhere!





Tim Berners-Lee s

World's First web server in 1990





Abstract

## Shared vs. Switch-Based Networks

- Shared vs. Switched:
  - Shared: 1 at a time (CSMA/CD)
  - Switched: pairs ("<u>point-to-</u> <u>point</u>" connections) communicate at same time
- Aggregate bandwidth (BW) in switched network is Node many times that of shared:
  - point-to-point faster since no arbitration, simpler interface



### What makes networks work?

 Links connecting switches and/or routers to each other and to computers or devices



- Ability to name the components and to route packets of information - messages - from a source to a destination
- Layering, redundancy, protocols, and encapsulation as means of <u>abstraction</u> (big idea in Computer Architecture)

## Software Protocol to Send and Receive

#### • SW Send steps

- 1: Application copies data to OS buffer
- 2: OS calculates checksum, starts timer
- 3: OS sends data to network interface HW and says start
- SW Receive steps
  - 3: OS copies data from network interface HW to OS buffer
  - 2: OS calculates checksum, if OK, send ACK; if not, <u>delete</u> <u>message</u> (sender resends when timer expires)
  - 1: If OK, OS copies data to user address space, & signals application to continue

#### Dest Src

Header

#### Checksu

Net ID Net ID Len ACK INFO

# Payload

**CMD/ Address /Data** 

Trailer 43

## **Protocols** for Networks of Networks?

What does it take to send packets across the globe?

- Bits on wire or air
- Packets on wire or air
- Delivery packets within a single physical network
- Deliver packets across multiple networks
- Ensure the destination received the data
- Create data at the sender and make use of the data at the receiver

## Protocol for Networks of Networks?

Lots to do and at multiple levels!

Use <u>abstraction</u> to cope with <u>complexity of</u> <u>communication</u>

- Hierarchy of layers:
  - Application (chat client, game, etc.)
  - Transport (TCP, UDP)
  - Network (IP)
  - Data Link Layer (Ethernet)
  - Physical Link (copper, wireless, etc.)

### **Protocol Family Concept**

- *Protocol*: packet structure and control commands to manage communication
- *Protocol families (suites)*: a set of cooperating protocols that implement the network stack
- Key to protocol families is that communication occurs logically at the same level of the protocol, called peer-to-peer...

...but is implemented via services at the next lower level

 Encapsulation: carry higher level information within lower level "envelope"

## Inspiration...

• CEO A writes letter to CEO B

- Folds letter and hands it to assistant

- Puts letter in envelope with CEO B's full name
- Your days are numbered. • FedEx Office
  - Puts letter in larger envelope
  - Puts Parte and street address on FedEx envelope
  - Puts package on FedEx delivery truck
  - FedEx delivers to other company

## The Path of the Letter

#### "Peers" on each side understand the same things No one else needs to Lowest level has most packaging



## **Protocol Family Concept**



Physical

Each lower level of stack "encapsulates" information from layer above by adding header and trailer.

Most Popular Protocol for Network of Networks

- <u>Transmission Control Protocol/Internet</u>
   <u>Protocol (TCP/IP)</u>
- This protocol family is the basis of the Internet, a WAN (wide area network) protocol
  - IP makes best effort to deliver
    - Packets can be lost, corrupted
  - TCP guarantees delivery
  - TCP/IP so popular it is used even when communicating locally: even across homogeneous LAN (local area network)

## TCP/IP packet, Ethernet packet, protocols

- Application sends message
- TCP breaks into 64KiB segments, adds 20B header
- IP adds 20B header, sends to network
- If Ethernet, broken into 1500B packets with headers, trailers



## "And in conclusion..."

- I/O gives computers their 5 senses
- I/O speed range is 100-million to one
- Polling vs. Interrupts
- DMA to avoid wasting CPU time on data transfers
- Disks for persistent storage, replaced by flash
- Networks: computer-to-computer I/O
  - Protocol suites allow networking of heterogeneous components. Abstraction !!!