Lab 1 - Combinational Logic and ALUs

CSE 372 (Spring 2007): Digital Systems Organization and Design Lab

Preliminary Demo by 7pm on Friday, February 9th

Final Demo by 7pm on Thursday, February 15th

Writeup due in-class on Friday, February 16th

This lab is to be done individually.

This lab is worth 25 points.

Overview

In this lab, you will construct the ALU (Arithmetic/Logical Unit) for a P37X ISA processor. Before you can build the ALU, you need to create a few building blocks (4-bit adder, 16-bit adder, 16-bit multiplier, 16-bit shifter) which you will then combine to form an ALU.

Preliminaries

Before you begin, we have another tutorial for you to walk through: ModelSim simulation tutorial. This tutorial covers simulation of designs for verifying they are correct and debugging them when they are not.

Specifics

Design and test the following combinational logic structures.

Important

Before you write any Verilog code, first create a hand-drawn schematic diagram of the circuit with all wires and input/outputs labeled. Why? When designing hardware, even when using Verilog, you need to be thinking explicitly about the structure and interconnectedness of the circuits. Only when the diagram is complete should you write the Verilog code that corresponds to the circuit. As described below, you need to turn in both the hand-drawn schematic and a printout of the Verilog code.

1. 4-bit Adder

Before creating a 16-bit adder, first create a signed 4-bit ripple-carry adder as a basic building block. It has three inputs: two 4-bit signed values and a 1-bit carry in signal. It has two output values: the 4-bit output and a 1-bit carry out signal. You might want to use the 3-input, 2-output single-bit "full adder" you designed in Lab 0 (or an improved version of it) as a building block.

Testing: Test the adder both in simulation and on the board. To test the adder on the board, hook your 4-bit adder inputs to two sets of four input switches on the extension board; hook the outputs to five LEDs on the extension board.

2. 16-bit Adder

The 16-bit adder takes in two 16-bit signed values and a single-bit carry-in signal. It has a single 16-bit signed output.

Implementation: For comparison purposes, create three different adder implementations (using the 4-bit adder specified above):

A 16-bit ripple-carry adder made up of 4-bit adders.
A 16-bit two-segment carry-select adder made up of 8-bit select segments.
A 16-bit four-segment carry-select adder made up of 4-bit select segments.

In the lab writeup, compare the delay (in nanoseconds) and area (in terms of lookup tables or "LUTs") of these three different adder implementations.

See the CSE371 lecture notes for more information on carry-select adders.

Testing: Test the adder both in simulation and on the board. Unfortunately, the extension boards do not have enough switches to represent two 16-bit inputs. As an incomplete workaround, test the adder on the board by sign extending the two sets of four input switches on the extension board; hook the eight low-order bits of the 16-bit output to the eight LEDs on the extension board. This setup will give you partial test coverage (enough to demonstrate the design is basically working).

3. 16-bit Multiplier

The 16-bit multiplier takes in two 16-bit signed values. It has a single 16-bit signed output. The multiplier is single-cycle and fully combinational (in contrast, a sequential multiplier takes multiple cycles and latches intermediate values).

Implementation: The most straightforward implementation uses a chain of 15 sixteen-bit adders you just created to add up the 16 partial values. You'll also need to use some multiplexors, ranged bit selection, and/or other combinational logic.

Important

Do not use the shifters described below within your multiplier. Shifting by a known value can be done easily and more efficiently using bit selection and and concatenation operations.

For comparison purposes, create three different multiplier implementations:

A 16-bit multiplier using the 16-bit ripple-carry adder from above.
A 16-bit multiplier using the 16-bit two-segment carry-select adder from above.
A 16-bit multiplier using the 16-bit four-segment carry-select adder from above.

Note: As you'll be including the 16-bit adder as a structural component, the textual differences between these multipliers should be minor.

In the lab writeup, explain your general multiplier design and compare its delay using these three adders.

Testing: Test the multiplier much like you tested the 16-bit adder.

4. 16-bit Shifter

The shifter unit has three inputs: a 16-bit value, a 4-bit shift amount, and a 2-bit shift type (00 is left shift, 01 is logical right shift, 10 is arithmetic right shift, 11 is no shift). It has a single 16-bit output.

Implementation: Note, there are several ways to implement this shifter. You could create three different shifters using 2-to-1 MUXes at each level. You would then use a 4-to-1 mux to select among them at the end. An alternative implementation would use four copies of the 4-to-1 MUX to select between the three kinds of shifts and no shift at all at each stage.

Testing: Test the shifters much like you tested the 16-bit adder. Use an additional two switches to specify the specific shift operation.

5. ALU

The ALU has three inputs: two 16-bit signed values and a 4-bit control signal that determines which operation the ALU should perform. The ALU has a single 16-bit signed output, which is the result of the operation. The ALU can perform ten operations:

Description	Insn	Control
Addition	ADD	0 100
Subtraction	SUB	0 101
Multiplication	MUL	0 110
Bitwise or	OR	1 000
Bitwise not	NOT	1 001
Bitwise and	AND	1 010
Bitwise xor	XOR	1 011
Shift left logical	SLL	1 100
Shift right logical	SRL	1 101
Shift right arithmetic	SRA	1 110

A few notes:

The control signal corresponds directly to the encoding of the P37X ISA for the opcode 0000 and 0001. The first control bit is the last bit of the 4-bit opcode; the remaining three bits are the last three bits in the instruction.
If any control signal other those specified is given to the ALU, the ALU should set all 16 output bits to zero.
The NOT operation returns the logical inverse of the first ALU input, and it ignores the second input.
The SUB operation computes output = input1 - input2.
For the shift operations, input1 is the value to be shifted and the four lower bits of input2 determine the amount of the shift (0 to 15 binary digits).
The arithmetic shift right performs sign extension; in contrast, the logical shift right performs zero extension.

Implementation: The ALU should instantiate a single 16-bit adder (also used for subtract), a 16-bit multiplier, and a left/right shifter. Using the outputs from these modules and some combinational logic to generate all ten possible values. Finally, use a 16-to-1 multiplexer to select the correct signal.

Testing: Test the shifters much like you tested the 16-bit adder, but use an additional four switches (the small switches on the main FGPA board) as the 4-bit input select.

Lab Logistics

Verilog Restrictions

This lab should be implemented using only low-level structural Verilog and the assign statement. You are not allowed to use the following Verilog operators: +, -, *, /, <<, >>, etc. However, you are allowed to use the following operators: ~, &, |, ^, ==, !=, ?:, {}, etc. If you're not sure if you're allowed to use a certain Verilog construct, just ask (post a message on the newsgroup, send an e-mail, etc.).

LEDs and Switches

We'll be using an extension board that contains additional LEDs and switches. See lab1.v and lab1.ucf for a top-level Verilog module and mappings for the LED and switch pins.

Note

The switches on the extension boards are "active high", but (as described in the lab 0), the LEDs and the switches on the main board are "active low" signals.

Simulation Tutorial

Don't forget to walk through the ModelSim simulation tutorial before you begin.

Testing Verilog Designs for Fun and Profit

In an effort to make the testing process slightly less painful, your friendly CSE 372 TA's have put together a testbench framework using behavioral Verilog. The testbench is available here: lab1_testbench.v.

The code is straightforward: the Unit Under Test (UUT) is instantiated at the top. The testbench code basically just reads a series of values from a file (named, by default, lab1.input.test) to send as input to the UUT. Expected output values are also read from this file, and compared with the UUT's actual output.

To make the testbench work, make sure you have both the testbench verilog module and the test input file. Both can reside in the main directory of your Xilinx project, and they should work fine for both Xilinx and ModelSim.

Currently, the file lab1.input.test is pretty skimpy - you'll need to flesh it out with your own test cases. For example, you might want to make a testbench for each module or at least make input files to stress each part.

A couple of features:

Lines in the test input file that begin with a "#" (sh) character are ignored as comments (but see "Caveats" below)
Currently, values read from the test input file (other than the control signal for the ALU) must be in decimal.

Caveats: When a comment is read from the test input file, it causes the ALU control signal to dip to 0x0000 temporarily. This is usually not an issue, as 0x0000 is an invalid control signal, but if you write many lines of comments in succession, this can cause some bizarre timing issues.

Delay and Resource Usage

The delay and resource usage of your design can be found in various reports:

Synthesis-only Timing: Approximate timing information can be found in the Timing Summary section of the report generated by the "Synthesize - XST -> View Synthesis Report" process.
Post Place and Route Timing: For a more accurate timing report, in the ISE Project Navigator, double-click on the View Design Summary process. Under the Detailed Reports section, click on Static Timing Report.
Timing Tuning/Debugging: To better understand the source of delay in your design, use the "Timing Analyzer" tool by running the process "Implement Design -> Place & Route -> Generate Post-Place & Route Static Timing -> Analyze Post-Place & Route Static Timing" from the Processes hierarchy.

When reporting timing results, use the "Post Place and Route Timing" information.

Demos

For this lab, there is a preliminary and final demo.

Preliminary demo: (1) Demonstrate your 4-bit adder to the TAs both in simulation and by using the switches and LEDs. (2) Show the TAs hand-drawn schematics of the 16-bit adders and 16-bit multiplier.
Final demo: (1) Demonstrate the entire ALU both in simulation and with the boards. (2) Show the TAs hand-drawn schematics for each component of the design.

What to Turn In

For each of the designs, turn in:

Lab analysis (see below)
Hand-drawn schematics. It is okay if these are a little messy, but they should accurately represent the Verilog code you turned in. Do not waste your time making pretty computerized schematics; the whole point of using Verilog is to save the tedium of making picture-perfect schematics. Be sure to label all input and output signals.
Verilog code. Your Verilog code should be well-formatted, easy to understand, and include comments where appropriate (for example, use a comment to describe all the inputs and outputs to your Verilog modules). Some part of the project grade will be dependent on the style and readability of your Verilog, including formatting, comments, good signal names, and proper use of hierarchy.

Please put the lab analysis first; interleave the schematics with the Verilog code for each module.

Lab Analysis

Answer the following questions for your lab report. When reporting timing results, use the information from the "Post Place and Route Timing" report.

Make a table of the delay (in nanoseconds) and resource usage (in LUTs) for the three 16-bit adder designs, the three 16-bit multiplier designs, and the entire ALU using just the fastest adder and multiplier.
How much faster is the fastest adder than the slowest adder? How much more area does it require?
How much faster is the fastest multiplier than the slowest multiplier? How much more area does it require?
Is the difference between these two speedups surprising to you? What might explain why the speedup is not more similar.
How much larger (in terms of resources consumed) is your multiplier than the corresponding adder? Is this ratio higher or lower than you expected? What might explain the difference from the expected ratio?
What problems, if any, did you encounter while doing this lab?
How many hours did it take you to complete this assignment? On a scale of 1 (least) to 5 (most), how difficult was this assignment?

Note

As part of your grade will be determined based on your lab writeups, they should be clear, concise and neat (preferably typed). You could have the greatest design in the world but if you cannot convey your idea clearly to the graders and convince them that it works you will not get good marks. Your lab writeups should include a brief explanation of what the circuits are supposed to do and how they do it.

Addendum

[February 6] See Xilinx ISE Tips