NCL Multiplexer Implementation

Design RecapMUX4

4-Option Example

for Case in 0 to N-1
  [build CaseBits with DATA0's and DATA1's]
  -- CaseBits is a concatenated signal from the iSelector input
  Selectors(Case) <= THNN(CaseBits)

  GatedCase0 <= TH22(Selectors(Case), iOptions(Case).DATA0)
  GatedCase1 <= TH22(Selectors(Case), iOptions(Case).DATA1)
next Case

output.DATA0 <= TH1N(Gated00, Gated10, Gated20, Gated30, ...)
output.DATA1 <= TH1N(Gated01, Gated11, Gated21, Gated31, ...)

Generic pseudo-VHDL

Implementation

Remember the Full Adder‘s un-optimized version? If you look at the implementation, you’ll see a chunk of code at the top that generates one-hot encoding of all cases. We are going to use that for our internal Selectors signal:

cases: for case in 0 to NumOptions generate
  bits: for ibit in 0 to NumSelectors-1 generate

    Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) = 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA0;
    end generate;

    Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) > 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA1;
    end generate;
  end generate;

  CaseSelectorGate: THmn
    generic map(M => NumSelectors, N => NumSelectors)
    port map(inputs => selectorInputs(case),
             output => Selectors(case));

 end generate;

Next, we need to gate the two lines (DATA0 and DATA1) for each option, which will NULL them if they are not the selected signal:

cases: for case in 0 to NumOptions generate
  bits: for ibit in 0 to NumSelectors-1 generate

    Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) = 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA0;
    end generate;

    Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) > 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA1;
    end generate;
  end generate;

  CaseSelectorGate: THmn
    generic map(M => NumSelectors, N => NumSelectors)
    port map(inputs => selectorInputs(case),
             output => Selectors(case));

  Gated0: THmn
    generic map(M => 2, N => 2)
    port map(inputs(0) => Selectors(case),
             inputs(1) => iOptions(case).DATA0,
             output => GatedOptions0(case));

  Gated1: THmn
    generic map(M => 2, N => 2)
    port map(inputs(0) => Selectors(case),
             inputs(1) => iOptions(case).DATA1,
             output => GatedOptions1(case));

 end generate;

Finally, take all those gated options and or the signals together, so whichever one is selected will drive the line to a 1 if it is set:

cases: for case in 0 to NumOptions generate
  bits: for ibit in 0 to NumSelectors-1 generate

    Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) = 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA0;
    end generate;

    Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) > 0 generate
      selectorInputs(case)(iBit) <= iOptions(case).DATA1;
    end generate;
  end generate;

  CaseSelectorGate: THmn
    generic map(M => NumSelectors, N => NumSelectors)
    port map(inputs => selectorInputs(case),
             output => Selectors(case));

  Gated0: THmn
    generic map(M => 2, N => 2)
    port map(inputs(0) => Selectors(case),
             inputs(1) => iOptions(case).DATA0,
             output => GatedOptions0(case));

  Gated1: THmn
    generic map(M => 2, N => 2)
    port map(inputs(0) => Selectors(case),
             inputs(1) => iOptions(case).DATA1,
             output => GatedOptions1(case));

 end generate;

o0: THmn
  generic map(M => 1, N => NumOptions)
  port map(inputs(0) => GatedOptions0(case),
           output => output.DATA0);

o1: THmn
  generic map(M => 1, N => NumOptions)
  port map(inputs => GatedOptions1(case),
           output => output.DATA1);

That’s all the logic then, but we need to add the wrapping structures (entity declaration, architecture declaration, and internal signal declarations). This module will have one generic parameter (NumOptions), and a constant based on it (NumSelectors). The width of the iSelector input will be the log of the number of options:

entity MUX is
  generic(NumOptions : integer := 2);
  port (iSelector : in ncl_pair_vector(0 to clog2(NumOptions)-1);
        iOptions  : in ncl_pair_vector(0 to NumOptions1-);
        output   : out ncl_pair);
end MUX;

architecture structural of MUX is
  constant NumSelectors : integer := clog2(NumOptions);
  signal Selectors : std_logic_vector(0 to NumOptions-1);
  signal GatedOptions0 : std_logic_vector(0 to NumOptions-1);
  signal GatedOptions1 : std_logic_vector(0 to NumOptions-1);

  type SelectorData is array (integer range ) of std_logic_vector(0 to NumSelectors-1);
  signal selectorInputs : SelectorData(0 to NumOptions-1);
begin
  -- [This part is the same as before]
  
  cases: for case in 0 to NumOptions generate
    bits: for ibit in 0 to NumSelectors-1 generate

      Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) = 0 generate
        selectorInputs(case)(iBit) <= iOptions(case).DATA0;
      end generate;

      Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(case, 3)) > 0 generate
        selectorInputs(case)(iBit) <= iOptions(case).DATA1;
      end generate;

      CaseSelectorGate: THmn
        generic map(M => NumSelectors, N => NumSelectors)
        port map(inputs => selectorInputs(case),
                 output => Selectors(case));
    Gated0: THmn
      generic map(M => 2, N => 2)
      port map(inputs(0) => Selectors(case),
               inputs(1) => iOptions(case).DATA0,
               output => GatedOptions0(case));

    Gated1: THmn
      generic map(M => 2, N => 2)
      port map(inputs(0) => Selectors(case),
               inputs(1) => iOptions(case).DATA1,
               output => GatedOptions1(case));

    end generate;

  o0: THmn
    generic map(M => 1, N => NumOptions)
    port map(inputs(0) => GatedOptions0(case),
             output => output.DATA0);

  o1: THmn
    generic map(M => 1, N => NumOptions)
    port map(inputs => GatedOptions1(case),
             output => output.DATA1);

end structural;

Testing

I am testing this module with 2 inputs for now; in theory it scales, but at some point I should add a 4-option test, and maybe a 5 to see how it does with non-power of 2 values. The test script goes through the inputs options and tests that they output correctly.

When I first ran this, I had an error where the outputs indexing was in the wrong order. I had the part of the code near the top messed up to use iSelectors(case).DATA0 instead of DATA1 and vice versa.

Capture

Commit: b35b729

NCL Multiplexer Design

Theory

Multiplexers are components that let you switch between different options for a signal. They take in some number of option values (usually a power of 2) and a selector. Each data value of the selector corresponds to a particular input, which is fed to the output.

output = iOptions(iSelector)

If there are 2 options (the most basic MUX) then the selector is 1 bit. If there are 3 or 4 options 2 bits are needed, and so on.

Design

This time we’re going to go about this in a more intuitive, less rigorous, manner. Let’s consider each ‘row’ separately, each row will correspond to one input option (both *.0 and *.1). For each of these rows, we’ll generate a gating signal from the iSelector bits. This gating signal will be used by two TH22 gates to clear all but the selected signals.

This is very much like having 2 MUXes, one for the *.1s and one for the *.0‘s.

We’ll be reusing some code from the FullAdder implementation to get the selectors (one wire per input case); each of these will gate the DATA0 and DATA1 lines of the respective input option. The gated values will then be combined with a TH1n gate. An example 4-option case:

Case0 <= TH22(iSel(0).DATA0, iSel(1).DATA0)
Case1 <= TH22(iSel(0).DATA1, iSel(1).DATA0)
Case2 <= TH22(iSel(0).DATA0, iSel(1).DATA1)
Case3 <= TH22(iSel(0).DATA1, iSel(1).DATA1)

GatedA0 <= TH22(iOptions(0).DATA0, Case0)
GatedA1 <= TH22(iOptions(0).DATA1, Case0)

GatedB0 <= TH22(iOptions(1).DATA0, Case1)
GatedB1 <= TH22(iOptions(1).DATA1, Case1)

GatedC0 <= TH22(iOptions(2).DATA0, Case2)
GatedC1 <= TH22(iOptions(2).DATA1, Case2)

GatedD0 <= TH22(iOptions(3).DATA0, Case3)
GatedD1 <= TH22(iOptions(3).DATA1, Case3)

output.DATA0 <= TH14(GatedA0, GatedB0, GatedC0, GatedD0)
output.DATA1 <= TH14(GatedA1, GatedB1, GatedC1, GatedD1)

mux42.png

That’s a bit repetitive, let’s make it a little more general. The design involves some ‘magic’ parts because they are more of an implementation detail really.

for Case in 0 to N-1
  [build CaseBits with DATA0's and DATA1's]
      -- CaseBits is a concatenated signal from the iSelector input
  Selectors(Case) <= THNN(CaseBits)

  GatedCase0 <= TH22(Selectors(Case), iOptions(Case).DATA0)
  GatedCase1 <= TH22(Selectors(Case), iOptions(Case).DATA1)
next Case

output.DATA0 <= TH1N(Gated00, Gated10, Gated20, Gated30, ...)
output.DATA1 <= TH1N(Gated01, Gated11, Gated21, Gated31, ...)

Each row generates a selector, gates the option values, and passes them to the output. Any un-selected inputs are NULLed out (Gated#0 and GATED#1 both go to 0) leaving only the selected input to pass through the TH1N gates.

Using an NCL Register

In this post, I described what a NCL register is. I wanted to get a more practical understanding of what the register does and how different pipeline stages interact. To facilitate this, I put the Full Adder between two registers, with their control signals linked:

pipelinedadder.png

In this setup, both registers start with NULL, requesting DATA.

  1. When DATA is fed to the first register, it immediately passes it on to the adder and requests NULL
  2. Once the Adder completes, the second register saves the DATA to the outputs and requests NULL.

The same sequence repeats with the NULL wavefront, then back to DATA, and so on…

We’ve already tested the Adder, but we want to make sure the system works, so we make a separate test for this unit (VHDL source, TCL test script). This test doesn’t actually verify the results of the computation as we already checked the adder. Essentially, if it runs, the pipelining  worked. If it hangs, then something is wrong and wavefronts are not propagating through the circuit.

Pipelined Adder tests

In theory, a loop with 3 registers can be made, but in this case, if the outputs feed back, the result will degrade to 1 eventually, or stay at 0. I may make a 2-bit counter or something in a while.

Commit: 40d96b8

NCL Full Adder Implementation

Design Recap

The circuit from my last post:

FullAdder

oS.1:
TH12(
  TH22(
    TH13(iA.1, iB.1, iC.1), -- 1 <= NumBits
    TH23(iA.0, iB.0, iC.0)), -- NumBits < 2
  TH33(iA.1, iB.1, iC.1))); -- 3 <= NumBits

oS.0:
TH12(
  TH33(iA.0, iB.0, iC.0), -- NumBits < 1
  TH22(
    TH23(iA.1, iB.1, iC.1), -- 2 <= NumBits
    TH13(iA.0, iA.0, iA.0)); -- NumBits < 3

oC.1: TH23(iA.1, iB.1, iC.1) -- 2 <= NumBits

oC.0: TH23(iA.0, iB.0, iC.0) -- NumBits < 2

Optimized

The structural VHDL implementation:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.ncl.all;

entity FullAdder is
  port(iC : in ncl_pair;
       a : in ncl_pair;
       b : in ncl_pair;
       oS : out ncl_pair;
       oC : out ncl_pair);
end FullAdder;

architecture structural of FullAdder is
  type first_layer is array (integer range <>) of std_logic_vector(0 to 2);
  signal first_layer_inputs : first_layer(0 to 7);
  signal intermediate : std_logic_vector(0 to 7);
  signal inputs : ncl_pair_vector(0 to 2);

begin
  inputs(2) <= a;
  inputs(1) <= b;
  inputs(0) <= iC;
  input_layer: for i in 0 to 7 generate
    bits: for ibit in 0 to 2 generate
      Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(i, 3)) = 0 generate
        first_layer_inputs(i)(iBit) <= inputs(iBit).Data0;
      end generate;
      Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(i, 3)) > 0 generate
        first_layer_inputs(i)(iBit) <= inputs(iBit).Data1;
      end generate;
    end generate;
    gate: THmn
            generic map(M => 3, N => 3)
            port map(inputs => first_layer_inputs(i),
                     output => intermediate(i));
  end generate;

  oS0: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(0),
                  inputs(1) => intermediate(3),
                  inputs(2) => intermediate(5),
                  inputs(3) => intermediate(6),
                  output => oS.DATA0);

  oS1: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(1),
                  inputs(1) => intermediate(2),
                  inputs(2) => intermediate(4),
                  inputs(3) => intermediate(7),
                  output => oS.DATA1);

  oC0: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(0),
                  inputs(1) => intermediate(1),
                  inputs(2) => intermediate(2),
                  inputs(3) => intermediate(4),
                  output => oC.DATA0);

  oC1: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(3),
                  inputs(1) => intermediate(5),
                  inputs(2) => intermediate(6),
                  inputs(3) => intermediate(7),
                  output => oC.DATA1);
end structural;

architecture optimized of FullAdder is
  signal sLT2 : std_logic;
  signal sLT3 : std_logic;
  signal sGE2 : std_logic;
  signal sGE1 : std_logic;
  signal sEQ3 : std_logic;
  signal sEQ2 : std_logic;
  signal sEQ1 : std_logic;
  signal sEQ0 : std_logic;
begin
  LT2: THmn
         generic map(M => 2, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sLT2);

  GE2: THmn
         generic map(M => 2, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sGE2);

  GE1: THmn
         generic map(M => 1, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sGE1);

  EQ1: THmn
         generic map(M => 2, N => 2)
         port map(inputs(0) => sGE1,
                  inputs(1) => sLT2,
                  output => sEQ1);

  EQ3: THmn
         generic map(M => 3, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sEQ3);

  S1: THmn
        generic map(M => 1, N => 2)
        port map(inputs(0) => sEQ1,
                 inputs(1) => sEQ3,
                 output => oS.DATA1);

  EQ0: THmn
         generic map(M => 3, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sEQ0);

  LT3: THmn
         generic map(M => 1, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sLT3);

  EQ2: THmn
         generic map(M => 2, N => 2)
         port map(inputs(0) => sGE2,
                  inputs(1) => sLT3,
                  output => sEQ2);

  S0: THmn
        generic map(M => 1, N => 2)
        port map(inputs(0) => sEQ2,
                 inputs(1) => sEQ0,
                 output => oS.DATA0);

  oC.DATA0 <= sLT2;
  oC.DATA1 <= sGE2;

end optimized;

This VHDL implementation of the the un-optimized design  uses generic loops to setup the first layer (the first layer uses all combinations of group values). The second layer is set up manually.

The optimized design’s signals are named in terms of relations Greater or Equal to #, Less Than #, and EQual to #. So sEQ1 is asserted when 1 input group is set to 1, and sGE2 is asserted when at least 2 input groups are set to 1.

Testing

The test script runs through all combinations of inputs. I ran the tests with both versions. Here’s the result, no surprises really.

Full Adder Test

Commit: a7a9dba, d645811

NCL Full Adder Design

Theory

Like the Half Adder, a Full Adder counts it’s inputs. The full Adder counts three of them though. This to account for the carry in of the previous bit.

Truth Table

iA iB iC oS oC
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

Design

Once more, we’ll start with the truth table, derive sum-of-product equations, and circuit-ize.

oS = (iA'*iB'*iC) + (iA'*iB*iC') + (iA*iB'*iC') + (iA*iB*iC)
oC = (iA'*iB*iC) + (iA*iB'*iC) + (iA*iB*iC') + (iA*iB*iC)

Again, we need to convert these into NCL logic (DATA0 and DATA1).

oS.0 = (iA.0*iB.0*iC.0)+(iA.0*iB.1*iC.1)+(iA.1*iB.0*iC.1)+(iA.1*iB.1*iC.0)
oS.1 = (iA.0*iB.0*iC.1)+(iA.0*iB.1*iC.0)+(iA.1*iB.0*iC.0)+(iA.1*iB.1*iC.1)
oC.0 = (iA.0*iB.0*iC.0)+(iA.0*iB.0*iC.1)+(iA.0*iB.1*iC.0)+(iA.1*iB.0*iC.0)
oC.1 = (iA.0*iB.1*iC.1)+(iA.1*iB.0*iC.1)+(iA.1*iB.1*iC.0)+(iA.1*iB.1*iC.1)

Note that each row of the truth table is used exactly twice, once for each variable. Since 0’s and 1’s are both represented by a high signal, each output variable {oS, oC} has an assignment for each case. Build the AND-Plane with TH33 gates, and the OR-Plane with TH13 gates:

FullAdder

This design takes up 168 transistors in total. Lets see if we can make it with fewer.

Optimization

This time, instead of SOP form, I’m going to look at it more intuitively. Since the bits are symmetric (the values of iA can be swapped with iB without any change in the  expected output) let’s look at counting them with threshold gates instead of checking individual cases. To check a ‘less than’ relation ship for number of inputs set, just count the number of 0’s.

oS.1: (1 <= NumBits < 2) + (3 <= NumBits) --5 gates
oS.0: (NumBits < 1) + (2 <= NumBits < 3)  --5 gates

oC.1: (2 <= NumBits)                      -- 1 gate (shared)
oC.0: (NumBits < 2)                       -- 1 gate (shared)

Gate version:

oS.1: 
TH12(
     TH22(
          TH13(iA.1, iB.1, iC.1),  -- 1 <= NumBits
          TH23(iA.0, iB.0, iC.0)), -- NumBits < 2
     TH33(iA.1, iB.1, iC.1))); -- 3 <= NumBits

oS.0:
TH12(
     TH33(iA.0, iB.0, iC.0), -- NumBits < 1
     TH22(
          TH23(iA.1, iB.1, iC.1),  -- 2 <= NumBits
          TH13(iA.0, iA.0, iA.0)); -- NumBits < 3

oC.1: TH23(iA.1, iB.1, iC.1) -- 2 <= NumBits
oC.0: TH23(iA.0, iB.0, iC.0) -- NumBits < 2

I’m going to call this ‘functional notation’. It treats each gate as a function with other gates as inputs; common expressions are evaluated only once (duplicate gates are only built once, then shared). This uses the following gates (with their transistor counts):

  • 3 TH12 = 3*6  = 18 transistors
  • 1 TH22 = 1*12 = 12 transistors
  • 2 TH13 = 2*8  = 16 transistors
  • 2 TH23 = 2*18 = 36 transistors
  • 2 TH33 = 2*16 = 32 transistors

Total: 114 transistors (-32% from SOP). The downside is that this is three-layer logic, which generally has a little higher delay for that third layer. On the upside, most of the gates have fewer inputs (2 and 3 inputs instead of 3 and 4 inputs). This reduces the complexity of each gate and may actually reduce the critical path. I won’t do that analysis here.

You might be tempted to obtain the *.0 or *.1 signal by inverting the other. You cannot do this in NCL. You must be able to pass on NULL wavefronts which require both to be 0. This is a downside of NCL, all groups require two (or more) complimentary circuits to obtain. This limitation results in increased die area.

See my next post for the implementation.

NCL Register Implementation

See this post for the design of the NCL Register.

Implementation

I implemented this module structurally, with a for...generate (you’ll find that I’m a big fan of generics if you keep up with the blog).

This module assumes that all groups (DAT0, DATA1, … DATAn) are dual rail (capped at DATA1); if I need other encodings, I’ll make them as separate modules later. I added a generic RegisterDelay input so that I can better observe pipelines of components (if it is stable for time, then I can read values off the waves easier)

library ieee;
use ieee.std_logic_1164.all;
use work.ncl.all;

entity RegisterN is
  generic(N : integer := 1;
  RegisterDelay : time := 20 ns);
  port(inputs : in ncl_pair_vector(0 to N-1);
  from_next : in std_logic;
  output : out ncl_pair_vector(0 to N-1);
  to_prev : out std_logic);
end RegisterN;

architecture structural of RegisterN is
  signal outs : std_logic_vector(0 to (2*N)-1);
  signal watcher_out : std_logic := '0';
begin

register_gates: for i in 0 to N-1 generate
  T22_i0 : THmn
             generic map(N => 2, M => 2, Delay => RegisterDelay)
             port map(inputs(0) => inputs(i).DATA0,
                      inputs(1) => from_next,
                      output => outs(2*i));
  output(i).DATA0 <= outs(2*i);

  T22_i1 : THmn
             generic map(N => 2, M => 2, Delay => RegisterDelay)
             port map(inputs(0) => inputs(i).DATA1,
                      inputs(1) => from_next,
                      output => outs(2*i+1));
  output(i).DATA1 <= outs(2*i+1);

end generate register_gates;

  watcher: THmn
             generic map (N => N*2, M => N)
             port map (inputs => outs,
                       output => watcher_out);
  WatcherOutput: to_prev <= NOT watcher_out;
end structural;

Testing

For Testing I again used a test script, available on GitHub. It goes through the values, and makes sure that data is sent through correctly. It also checks that DATA/NULL wavefronts are delayed correctly by the control signal (handshaking). Note that the to_prev output signal is always in the ‘opposite’ state of the outputs (outputs NULL => to_prev 1 and otuputs DATA => to_prev 0) there is actually a 1-ns delay (default) in the watcher gate (a TH36 in this case).

Capture

Commits: 5d349e7, 4ecfe25, bf16da3

NCL Half Adder Implementation

See this post for the theory and design of the NCL Half Adder.

Design Recap

Here’s the circuit design:

halfadder1-e1499625273754.png

(simple version)

HalfAdder Optimized

(optimized version)

The top two gates are THxor0 gates, the next is a TH34w22, and the last one is a TH22 gate. I will implement this structurally, a fairly straightforward process in this case.

Implementation

library ieee;
use ieee.std_logic_1164.all;
use work.ncl.all;

entity HalfAdder is
 port(a : in ncl_pair;
      b : in ncl_pair;
      s : out ncl_pair;
      c : out ncl_pair);
end HalfAdder;

architecture structural of HalfAdder is
  signal a0b0_ins : std_logic_vector(0 to 1);
  signal a0b0_out : std_logic;
  signal a0b1_ins : std_logic_vector(0 to 1);
  signal a0b1_out : std_logic;
  signal a1b0_ins : std_logic_vector(0 to 1);
  signal a1b0_out : std_logic;
  signal a1b1_ins : std_logic_vector(0 to 1);
  signal a1b1_out : std_logic;

signal s0_ins : std_logic_vector(0 to 1);
  signal s0_out : std_logic;
  signal s1_ins : std_logic_vector(0 to 1);
  signal s1_out : std_logic;

  signal c0_ins : std_logic_vector(0 to 2);
  signal c0_out : std_logic;
begin
  a0b0_ins(0) <= a.DATA0;
  a0b0_ins(1) <= b.DATA0;   T21_A0B0 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a0b0_ins,
                        output => a0b0_out);

  a0b1_ins(0) <= a.DATA0;
  a0b1_ins(1) <= b.DATA1;   T21_A0B1 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a0b1_ins,
                        output => a0b1_out);

  a1b0_ins(0) <= a.DATA1;
  a1b0_ins(1) <= b.DATA0;   T21_A1B0 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a1b0_ins,
                        output => a1b0_out);

  a1b1_ins(0) <=  a.DATA1;
  a1b1_ins(1) <= b.DATA1;   T21_A1B1 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a1b1_ins,
                        output => a1b1_out);

  s1_ins(0) <= a0b1_out;
  s1_ins(1) <= a1b0_out;   T21_S1: THmn
            generic map(N => 2, M => 1)
            port map(inputs => s1_ins,
                     output => s1_out);
  s.DATA1 <= s1_out;

  s0_ins(0) <= a0b0_out;
  s0_ins(1) <= a1b1_out;   T21_S0: THmn
            generic map(N => 2, M => 1)
            port map(inputs => s0_ins,
                     output => s0_out);
  s.DATA0 <= s0_out;
  c.DATA1 <= a1b1_out;

  c0_ins(0) <= a1b0_out;
  c0_ins(1) <= a0b1_out;
  c0_ins(2) <= a0b0_out;   T31_C0: THmn
            generic map(N => 3, M => 1)
            port map(inputs => c0_ins,
                     output => c0_out);
  c.DATA0 <= c0_out;
end structural;

architecture optimized of HalfAdder is
begin   
  Sum0: THxor0
          port map(A => A.DATA0,
                   B => B.DATA0,
                   C => A.DATA1,
                   D => B.DATA1,
                   output => s.DATA0);

  Sum1: THxor0
          port map(A => A.DATA1,
                   B => B.DATA0,
                   C => A.DATA0,
                   D => B.DATA1,
                   output => s.DATA1);

  Carry0: THmn
            generic map(N => 6, M => 3)
            port map(inputs(0) => A.DATA0,
                     inputs(1) => A.DATA0,
                     inputs(2) => B.DATA0,
                     inputs(3) => B.DATA0,
                     inputs(4) => A.DATA1,
                     inputs(5) => B.DATA1,
                     output => c.DATA0);

  Carry1: THmn
            generic map(N => 2, M => 2)
            port map(inputs(0) => A.DATA1,
                     inputs(1) => B.DATA1,
                     output => c.DATA1);
end optimized;

I have two implementations here. The basic version, and the optimized version. There’s not a lot to explain beyond VHDL syntax. The gates are built as in the diagram, though some orderings might have changed.

Testing

The test script runs through all input values, clearing to NULL in between. The test simulation run:

Capture

If you have any questions, leave a comment below.

Commit: a57125a

NCL THxor0 Gate

The THxor0 Gate is a 4-input gate with logic function AB + CD. Data that I have found about implementing this gate is all at the transistor level. For now, I’ll make a behavioral model, but I may later design a structural version (technically a 1-bit state machine). Regardless, I think this implementation is actually synthesizable, so that’s nice

The behavioral implementation:

entity THxor0 is
  generic(Delay : time := 1 ns);
  port(A, B, C, D : in std_logic;
       output : out std_logic);
end THxor0;

architecture behavioral of THxor0 is
begin
  process (A, B, C, D)
  begin
    if (A = '0' and B = '0' and C = '0' and D = '0') then
      output <= '0';
    elsif ((A and B) or (C and D)) = '1' then
      output <= '1';
    end if;
  end process;
end behavioral;

The process statement again has 2 conditions: Set and Clear. If neither is met, the gate holds it’s state.

Testing

The test script was a modified version of the THmn test script. See scripts/test/test_threshold_gate.tcl on GitHub. Below is an excerpt from the test simulation session.

capture1.png

See this post for a Half Adder component that uses this gate.

Commit: 19c8318

NCL Half Adder Design

Definition

A Half Adder is a logic component that takes in two inputs, and outputs a binary (base 2) representation of how many are set (0, 1, or 2).

Like any good logic designer working on a small part, we’ll start by making a truth table:

iA iB oSum oCarry
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

Design

The first step of getting from truth table to gates is to generate Sum-of-Product logic equations, even with NCL.

oSum = (iA*iB')+(iA'*iB)
oCarry = iA*iB

Now begins the difference: We don’t treat iA' the same as we would in standard boolean logic. In standard boolean logic, we get the compliment by inverting the single signal. In NCL, we have to use an entirely different signal. In addition, we need logic functions to generate the compliments of our outputs.

oSum.1 = (iA.1*iB.0)+(iA.0*iB.1)
oSum.0 = (iA.1*iB.1)+(iA.0*iB.0)
oCarry.1 = iA.1*iB.1
oCarry.0 = (iA.0*iB.0)+(iA.1*iB.0)+(iA.0*iB.1)

Looking at these functions, it looks like we need 4 2-input gates that each check if both inputs are set (C-Element/TH22), and several gates that check if any inputs are set (OR/TH1n). Lets start by setting up the 4 TH22 gates (the ‘AND plane’):

halfadder-and-plane1.png

The gates represent iA'*iB', iA*iB', iA'*iB, and iA*iB from top to bottom. For each of these, if both inputs are set (remember that setting iA.Data0 means iA==0) the output is set; if both are clear, the output is clear. Since oCarry is just iA*iB, we can wire that output up directly. The others use multiple pairs, that are OR’ed together: if any one is set, the output line is set. Sounds like a TH1n gate.

HalfAdder

So, that’s the basic Half Adder. I double checked it by annotating the gates to make sure I got the same thing:

halfadder-annotated.png

Looks good. Now, let’s get fancy.

Optimizing

Optimizing NCL functions should be similar to optimizing standard logic functions. I’m going to try to optimize for logic levels, to see if I can make it all flat.

oCarry.0

By observation, we have symmetry between iA and iB, and iA' and iB', Since we don’t want iA*iB to be enough to trip the gate, lets consider weighting iA' and iB' at 2. This gives a THm4W22 gate. Now to figure out m:

  • If iA and iB are set, then the total is 2, so we need to be larger than 2.
  • If any other 2 lines are set, we have ≥3 (either 2+1 or 2+2)

Three it is. We will use oCarry.0 = TH34W22(iA.0, iB.0, iA.1, iB.1)

oSum

Looking through the table here, I don’t see a matching function for the gates I’ve studied, but there is a XOR gate: THxor0. This gate picks out the first two, and last two inputs as an SOP function.

oSum.0 = THxor0(iA', iB', iA, iB)
oSum.1 = THxor0(iA, iB', iA', iB)

Together

HalfAdder Optimized

Implementation

I’ll save this for another post.

Getting Down to Business

Now that we’ve talked about what NCL is, it’s time to actually make something with it.

For complete project files, see my GitHub

I will be using Modelsim as my IDE (with a little Notepad++) and VHDL. I’ll be starting with simulation, but eventually I want to load something on an FPGA, so I’ll be keeping an eye for synthesis where I can.

Setup

The first thing to do is make a package with some useful types and functions. I’ll call it work.NCL:

library ieee;
use ieee.std_logic_1164.all;

package ncl is
  type ncl_pair is record
    data0 : std_logic;
    data1 : std_logic;
  end record ncl_pair;
 
  type ncl_pair_vector is array (integer range <>) of ncl_pair;

  component THmn is
    generic(M : integer := 1;
            N : integer := 1;
            Delay : time := 1 ns);
    port(inputs : in std_logic_vector(0 to N-1);
         output : out std_logic);
  end component THmn;
end ncl;

I’ve decided to make each NCL line be std_logic to allow the synthesizer to use standard logic functions.

The gates themselves operate on single lines, but components take in pairs, though it could be expanded to triples or higher if needed later. Before, we can do anything else we need to build the NCL Threshold gate.

Implementation

I tried to implement the threshold gate structurally with the generic parameter, but I was unable to make it work, I’ll look into making a structural version someday.

For synthesis, I may need to make parameter-specific components. The generic component could maybe act as an automatic selector, instantiating the correct implementations.

The most basic description of a threshold gate is that it sets when enough inputs are set, when no inputs are set, the output clears, otherwise no action is taken. For now, I have decided to implement the threshold gate without weights, and instead to use repeated inputs (handled by the calling entity) if I need weights. A TH23W2 would be instantiated as a TH24, and the first input would be given twice.

Here goes:

library ieee;
use ieee.std_logic_1164.all;

use work.ncl.all;

entity THmn is
  generic(M : integer := 1;
          N : integer := 1;
          Delay : time := 1 ns);
  port(inputs : in std_logic_vector(0 to N-1);
       output : out std_logic := '0');
end THmn;

architecture simple of THmn is
begin
  ThresholdGate: process(inputs)
    variable num_1 : integer;
  begin
    num_1 := 0;
    for i in 0 to N-1 loop
      if inputs(i) = '1' then
        num_1 := num_1 + 1;
      end if;
    end loop;
    if num_1 >= M then
      output <= '1' after Delay;
    elsif num_1 = 0 then
      output <= '0' after Delay;
    end if;
  end process;
end simple;

Essentially, it counts the number of 1’s, and checks for the set and clear conditions.

Testing

I made a test script to help make sure the generics work correctly (and get practice for testing later modules). Part of a test for TH22 is below:

Capture

See scripts/tests/test_threshold_gate.tcl on GitHub for my tests.

I have tested it for up to TH77, beyond that it takes too long to run (runtime is 2n). My test file is designed to test every possible transition of input values, in both the output set and clear states.

 

Thanks for reading. If you have any thoughts on how to improve the design, let me know by message or in the comments.

Commit: 5b852e5