Lab8: Finite State Machines (Matrix Multiply)
TOC

Background

To Do

Implementation Requirements

Testing

Check Off

Report

 

 

 

Objective:                

Use a datapath with finite state machine control to implement a [4x4] * [4x1] = [4x1] matrix multiply.

This lab is worth 50% more than previous labs (150 points instead of 100).

Background                                          TOC

Three dimensional computer graphic applications require coordinate manipulations such as translation, rotation, and scaling.  Three dimensional coordinates (X,Y,Z) can transformed into homogeneous coordinates (X',Y',Z',W')  where X=X'/W', Y=Y'/W', Z=Z'/W'.  For most applications, W=1. The homogeneous coordinates allow translation, rotation, and scaling to be performed in a consistent manner via matrix multiplications. The matrix multiplication is  T * S = S'  where T is a [4x4] transformation matrix, S is the original coordinate [4x1], and S' is the transformed coordinate.   Several transformations can be applied to a coordinate by successive matrix multiplications. These operations can be done via software or specialized hardware.   Typically the transformation operations are performed in floating point.

To Do                                                      TOC

You are to implement a [4x4][4x1] matrix multiply using fixed point arithmetic. The matrix coefficients are 9 bit values as used for the 'F' variable in the blend equation. The coordinate values are 8-bit unsigned quantities (in actual graphic applications both matrix coefficients and coordinate values would be floating point numbers). 

The interface of the matrix multiply block is defined as follows:

Inputs     

  • clk, reset   -  clock and asychronous reset
  • din[8..0] - data bus for both 9-bit matrix coefficients and 8-bit coordinates  
  • cf_load  - used to load the 16 matrix coefficients
  • cf_dump - used to dump the 16 matrix coefficients to the coeffs output bus.  
  • start  - used to input the 4 coordinate values and start a computation

Outputs

  • coeffs[8..0] - output bus for examining matrix coefficient values
  • dout[7..0] - output bus transformed coordinates
  • busy  - asserted when either coefficient values are being loaded or a matrix operation is being performed
  • output_rdy - asserted when either coeffs or dout contains valid data

The transformation matrix coefficients are:

T00  T01  T02  T03
T04  T05  T06  T07
T08  T09  T10  T11
T12  T13  T14  T15

The input coordinate is [X, Y, Z, W].  The transformed coordinate is computed as:

X' = X * T00  +  Y * T01 + Z * T02 + W * T03
Y' = X * T04  +  Y * T05 + Z * T06 + W * T07
Z' = X * T08  +  Y * T09 + Z * T10 + W * T11
W' = X * T12  +  Y * T13 + Z * T14 + W * T15

The multiplication is the 9bit * 8bit unsigned multiplication used in the blend equation; the addition is an unsigned saturating addition.  Note that if the matrix is the identity matrix:

1  0  0  0
0  1  0  0
0  0  1  0
0  0  0  1

then X'=X, Y'=Y, Z'=Z, W'=W (this is good test of your matrix multiply).

When the cf_load input is asserted, the coordinates are presented in ROW major order (T00, T01, T02, T03, T04, T05, .... T13, T14, T15).   When the cf_dump input is asserted, the coefficients should be written to the coeffs output bus in the same order.  When the start input is asserted, the coordinate is presented at X, Y, Z, W.

Study the golden files to learn more about the cycle behavior of these inputs and outputs.

Implementation Requirements             TOC

  1. You must use a  LPM_RAM_DQ to store the matrix coefficients.
  2. You can only use one multiplier (your FMULT from Lab #3) and one adder (your saturating adder from Lab #2).  You may connect the output of the multiplier directly to the input of adder without an intervening register (you can chain the multiply and addition operations).
  3. You may use any number of registers that you wish.
  4. You may not take over 50 clocks to perform the matrix multiply operation for a coordinate (this does NOT include the loading of the matrix coefficients).
  5. The operation of output_rdy and busy must be glitch free.
  6. You must use VHDL to implement your finite state machine.  You can use any mixture of VHDL and LPMs to implement the remaining logic.

  

Testing                            TOC         

You can use this schematic (tbmm.gdf) as the testbench.   This a design problem, which means that there are multiple solutions that will meet the specifications.  You will NOT be able to compare your output waveform against the 'golden' waveforms that I have provided.  The golden waveforms are actually intended to be reference waveforms to show you how the circuit is supposed to behave.  The counter that is included in the testbench is used to keep track of how long your circuit takes to do a matrix multiply (it works off the start and busy signals).

Reference waveforms:

  1. tbmmg0.scf  -- illustrates loading matrix coefficients and then dumping them to the coeffs bus. It does not exercise the matrix multiply.
  2. tbmmg1.scf --- loads the identity matrix, then performs one matrix multiply. The output coordinate will be equal to the input coordinate since the identity matrix is used.
  3. tbmmg2.scf -- loads a matrix, then does one matrix multiply where the input coordinate values are all '1's (actually, all 0xFFs which is as close to '1' as we can get in 0.8 fixed point format).
  4. tbmmg3.scf - does two matrix multiplies in succession.

Even though your waveform will not match this operation exactly because you may take more or less cycles to perform the computation, it is expected that the cycle operation of your  output_rdy and busy signals in reference to the start and output data will be the same (i.e, busy is asserted the clock cycle after start is asserted, output_rdy is asserted anytime valid data is on the coeffs or dout busses.)

 

Check Off                                TOC                               

Week 1:  You must demonstrate that you are able to load the matrix into your design, and then dump it out.  You should not be satisfied with just getting this work for week 1 but should also start on the second part as well.  There is no report due at this time - just a checkoff.  If you are NOT ready at this time, you can make AT MOST  80% on this lab even if you get everything working at a later date.

Week 2:  You must have everything working.  Your REPORT is due at this time.

Report                               TOC

  1. You must hand in plots of all schematics, and print outs of your VHDL code. 
  2. You must have a neatly drawn ASM chart that illustrates your FSM operation.   
  3. You must have neatly drawn datatpath diagram of your design.
  4. Report the NUMBER of clock cycles that it takes for your design to perform the matrix multiply.