Alon Horev

My stderr about Python, databases and the web.

A Hardware Lesson for Software Developers: From Logical Gates to the Game of Life

| Comments

A week ago I’ve decided to learn VHDL and as an exercise, implement The game of life.
VHDL is a hardware description language. Basically, it’s a programming language used by chip designers.

This post is an introduction to VHDL.

Why learn VHDL?
I thought it’d be interesting to see what it takes to program a chip. Coming from software, I was specifically curious about the programming model and mind set. Think about it, all software eventually translates to CPU instructions. What’s underneath that?..

It’s my belief that every programmer should be familiar with the layers he relies upon.
Not every Python programmer knows C, and not every C programmer knows assembler. But when the shit hits the fan, this knowledge comes in handy.

Prerequisites
I’m going to assume you know nothing about hardware and a little about software.
VHDL is based on Ada so you might even find it familiar.

Intro - Booleans All The Way Down

Most hardware these days is based on boolean logic, here’s why:

  • Boolean algebra has been around for ages - simple boolean operations like ‘and’, ‘or’ and ‘not’ can be composed into basic building blocks used to construct more sophisticated logic (calculators, modems, CPUs!).
  • Passing boolean data is easy in hardware, you pass electricity through a wire to indicate a true state and stop passing electricity to indicate a false state.

Light bulb moment: ever wondered why integers in software are powers of two?
A CPU has a fixed amount of wires (32/64) dedicated to represent data (a.k.a word size).
Each wire represents a bit that can be true or false (1 or 0).
The binary numeral system is used to represent a number in a bunch of 1s and 0s.

Dive Into VHDL - The Life Of A Gate

Let’s start off with an implementation of an ‘and’ gate:

And Gate (and.vhdl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
library ieee;
use ieee.std_logic_1164.all;    

entity and_entity is
  port( 
    x, y: in std_logic;
    result: out std_logic
    );
end and_entity;

architecture arch of and_entity is
begin
  result <= x and y;
end arch;

What’s happening in the code line by line:

1-2: Include the module that exports the std_logic type which represents a bit (can be 0 or 1).
4: Define a new entity. An entity is analogous to an object in object oriented programming.
5: The entity interacts with other entities using ports. Input ports are used to receive information and output ports are used to send information.
6: Define two inputs of type std_logic.
7: Define one output of type std_logic.
11: The architecture is the implementation of the object’s behavior. It’s name (arch) is redundant as only one architecture is allowed.
13: The implementation of an ‘and’ gate. Every time an input changes the output is recalculated.

By using a synthesis tool we can convert this code to a hardware schematic:

Think Parallel - A Simple Adder

We’ll implement an adder that takes two 1-bit sized inputs (x and y) and adds them to a 2-bit sized output (sum_0 and sum_1. sum_1 is the carry). The following table demonstrates the adder’s functionality:

x y sum_1 sum_0 sum as binary sum as int
0 0 0 0 0 0
0 1 0 1 1 1
1 0 0 1 1 1
1 1 1 0 10 2

Now let’s look at the code:

Adder (adder.vhdl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
library ieee;
use ieee.std_logic_1164.all;

entity adder is
  port (
    x, y: in std_logic;
    sum_0, sum_1: out std_logic
    );
end adder;

architecture arch of adder is
begin
  sum_0 <= x xor y;
  sum_1 <= x and y;
end arch;

Look at lines 13-14. When does this code execute? Is line 13 executed before line 14?
The answer is: these lines execute all the time, in parallel!

A major difference between software written for CPUs and VHDL is:
VHDL is concurrent by default.

VHDL has the notion of concurrent statements vs. sequential statements.
Sequential statements can be used to implement state machines and other sequential logic.

Testing Using Sequential Code

The hardware manufacturing phase of a chip is long and expensive, therefore, VHDL programmers (often called designers) write tests (often called ‘test-benches’ or ‘simulations’).

The following test code will verify the adder’s functionality by driving different inputs and verifying the output (explanation ahead):

Test Adder (test_adder.vhdl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
library ieee;
use ieee.std_logic_1164.all;

entity adder_test is
end adder_test;

architecture arch of adder_test is

  component adder is
    port (
      x, y: in std_logic;
      sum_0, sum_1: out std_logic
    );
  end component;

  signal x, y, sum_0, sum_1: std_logic;

  begin
    adder_0: adder port map (
      x => x, y => y,
      sum_0 => sum_0, sum_1 => sum_1
      );

    process
    begin
      x <= '1';
      y <= '1';
      wait for 1 ns;
      assert sum_1 = '1' and sum_0 = '0'
        report "result should be 2" severity error;
      x <= '0';
      y <= '0';
      wait for 1 ns;
      assert sum_1 = '0' and sum_0 = '0'
        report "result should be 0" severity error;
      x <= '1';
      y <= '0';
      wait for 1 ns;
      assert sum_1 = '0' and sum_0 = '1'
        report "result should be 1" severity error;
      x <= '0';
      y <= '1';
      wait for 1 ns;
      assert sum_1 = '0' and sum_0 = '1'
        report "result should be 1" severity error;

      wait;
    end process;
end arch;

4: Define an entity for the test. This entity has no inputs and outputs.
9-14: Repetition of the adder’s interface (VHDL isn’t very DRY)
16: Signals are wires that the test can manipulate. their type is std_logic (a bit)
19-22: A port map connects the signals to the adder’s ports. I’ve named the signals exactly as the adder’s ports.
24: The process definition. A process contains a list of sequential statements. Multiple processes run concurrently.
26-27: Signal assignments.
28: Signals have an interesting behavior where the assignment takes effect only when calling ‘wait’. This is a feature and not a bug as it allows doing atomic changes over multiple signals.
29: Assert the sum changed. In case an assertion fails, the program will stop with an error.
30 Until the end: Test the remaining cases.

We can instruct the VHDL compiler to generate a waveform of the simulation:

Note: Some VHDL statements are not synthesizable, meaning they cannot be converted to hardware and can only be used for simulations.
An example is the wait statement. A 1 nano second sleep cannot be expressed solely by logical gates. The solution for time-aware components is connecting to a high speed clock (Implemented using a Crystal oscillator), we’ll see an example ahead.

Another Note: Not every integer addition requires a custom adder implementation.
VHDL supports integers and arithmetic operations natively.

The Game Of Life

Let’s start with a typical software-based solution:

  1. The board is a matrix of booleans representing cells.
  2. For each cycle: the next board is generated by iterating over the matrix. For each cell: count it’s neighbors and determine if it should live or die.

When trying to apply this design in VHDL I quickly ran into brick walls. A sequential iteration over an entire matrix is not the VHDL ‘way’. My guess is that a sequential design would generate much more logical gates and perhaps even be slower.

Software is often bound by memory and CPU.
Chip design is often bound by the amount of gates and timing issues, which surface when signals travel through large amounts of gates.

VHDL encourages you to build small components and inter-connect them where needed.
each component has it’s own hardware resources and works in parallel to others.

The design I came up with is the following:

  1. Cell - Is an entity. Connected to it’s neighbors via ports. The cell waits for a clock signal, counts his live neighbors and sets it’s alive state.
  2. Board - Is another entity. Contains a matrix of cells. All it does is inter-connects them.

Here’s the cell’s code:

Cell (cell.vhdl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
entity cell is

  generic (
    start_alive : integer range 0 to 1 := 0
  );

  port (
    clock, left, right,
    upper_left, upper, upper_right,
    lower_left, lower, lower_right : in integer range 0 to 1;
    alive : inout integer range 0 to 1 := start_alive
  );

end cell;

architecture arch of cell is
begin

  process (clock)
    variable neighbors: integer range 0 to 8;
    begin
      neighbors := upper_left + upper + upper_right + left +
                   right + lower_left + lower + lower_right;
      if (neighbors = 3) or (alive = 1 and (neighbors) = 2) then
        alive <= 1;
      else
        alive <= 0;
      end if;
    end process;

end arch;

3: Generics are variables that can be set upon entity instantiation. Somewhat similar to constructor arguments in object oriented languages.
4-11: The alive bits are defined as integers of one bit. In contrast to std_logic, integers support arithmetic operations, even though both are implemented as a single wire.
19: Notice the process gets an argument? A process can state a list of signals (also called a sensitivity list) that should trigger it’s invocation. That way, the cell will calculate it’s next state every time the clock changes.
20: Variables are exactly what you think they. Notice we limit the integer’s range to 8? that’s because we’ll have a maximum of 8 neighbors. VHDL will dedicate only 4 wires for that variable.

Testing A Cell

The cell’s test is very similar to the adder’s test so here’s just a snippet:

Cell test cell_test.vhdl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...

-- 0 ns
clock <= 1 - clock; -- invert the clock!
wait for 1 ns;
assert alive = 0
  report "cell should start as dead" severity error;

-- 1 ns
left <= 1;
right <= 1;
clock <= 1 - clock;
wait for 1 ns;
assert alive = 0
  report "cell should stay dead with only two live neighbors" severity error;

-- 2 ns
lower <= 1;
clock <= 1 - clock;
wait for 1 ns;
assert alive = 1
  report "cell should come to life" severity error;

...

The following wave form can be generated from the test:

Composing A Board From Cells

I’ll spare you of the board’s code as it contains some boilerplate. Instead, I’ll show it’s hardware specification:

What are you seeing:

  • Each cell is a component encapsulating it’s logical gates.
  • Each cell is connected to it’s neighbors through signals. The left cell is connected to the right cell’s alive signal and vice versa.
  • All cells are connected to the same clock, which triggers the calculation of the next state.

Conclusions

Scalability - How does the design scale for a large amount of cells?

The software-based solution has several limits:

  • Memory-wise: limited by the amount of memory.
  • Speed-wise: iteration is done serially and will be directly proportional to the amount of cells, even if multiple cores are utilized.

The hardware-based solution is said to be scalable:

  • Adding more cells results in adding more logical gates.
  • There is no performance overhead and no arbitrary limit on the number of cells in the matrix.

How different is the mind-set?

One of the most challenging concepts in software is concurrency, it’s interesting how fundamental this topic is to hardware. Here’s how I see it:

Software - write serial code, parallelize when need to scale.
Hardware - write concurrent code, write serial code where needed.

Why bother manufacturing new chips and not re-use CPUs?

  1. Performance of ASICs (Application specific integrated circuits - chips created for a specific task) can be better than a CPU’s (That’s why GPUs became popular).
  2. Power consumption of CPUs can render them unusable for many applications.
  3. Price per unit can drop drastically for large amounts of ASICs (compare a GPS chip to an Intel CPU).

References

  • The github repo contains the source and a makefile for compiling, running tests and generating waveforms.
  • ghdl is a VHDL compiler (that even works on mac).
  • gtkwave is used for visualizing simulations.
  • Xilinix is used for schematic generation.

Comments