NCL Full Adder Implementation

Design Recap

The circuit from my last post:

FullAdder

oS.1:
TH12(
  TH22(
    TH13(iA.1, iB.1, iC.1), -- 1 <= NumBits
    TH23(iA.0, iB.0, iC.0)), -- NumBits < 2
  TH33(iA.1, iB.1, iC.1))); -- 3 <= NumBits

oS.0:
TH12(
  TH33(iA.0, iB.0, iC.0), -- NumBits < 1
  TH22(
    TH23(iA.1, iB.1, iC.1), -- 2 <= NumBits
    TH13(iA.0, iA.0, iA.0)); -- NumBits < 3

oC.1: TH23(iA.1, iB.1, iC.1) -- 2 <= NumBits

oC.0: TH23(iA.0, iB.0, iC.0) -- NumBits < 2

Optimized

The structural VHDL implementation:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.ncl.all;

entity FullAdder is
  port(iC : in ncl_pair;
       a : in ncl_pair;
       b : in ncl_pair;
       oS : out ncl_pair;
       oC : out ncl_pair);
end FullAdder;

architecture structural of FullAdder is
  type first_layer is array (integer range <>) of std_logic_vector(0 to 2);
  signal first_layer_inputs : first_layer(0 to 7);
  signal intermediate : std_logic_vector(0 to 7);
  signal inputs : ncl_pair_vector(0 to 2);

begin
  inputs(2) <= a;
  inputs(1) <= b;
  inputs(0) <= iC;
  input_layer: for i in 0 to 7 generate
    bits: for ibit in 0 to 2 generate
      Input0Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(i, 3)) = 0 generate
        first_layer_inputs(i)(iBit) <= inputs(iBit).Data0;
      end generate;
      Input1Selection: if (to_unsigned(2**iBit, 3) and to_unsigned(i, 3)) > 0 generate
        first_layer_inputs(i)(iBit) <= inputs(iBit).Data1;
      end generate;
    end generate;
    gate: THmn
            generic map(M => 3, N => 3)
            port map(inputs => first_layer_inputs(i),
                     output => intermediate(i));
  end generate;

  oS0: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(0),
                  inputs(1) => intermediate(3),
                  inputs(2) => intermediate(5),
                  inputs(3) => intermediate(6),
                  output => oS.DATA0);

  oS1: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(1),
                  inputs(1) => intermediate(2),
                  inputs(2) => intermediate(4),
                  inputs(3) => intermediate(7),
                  output => oS.DATA1);

  oC0: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(0),
                  inputs(1) => intermediate(1),
                  inputs(2) => intermediate(2),
                  inputs(3) => intermediate(4),
                  output => oC.DATA0);

  oC1: THmn
         generic map(M => 1, N => 4)
         port map(inputs(0) => intermediate(3),
                  inputs(1) => intermediate(5),
                  inputs(2) => intermediate(6),
                  inputs(3) => intermediate(7),
                  output => oC.DATA1);
end structural;

architecture optimized of FullAdder is
  signal sLT2 : std_logic;
  signal sLT3 : std_logic;
  signal sGE2 : std_logic;
  signal sGE1 : std_logic;
  signal sEQ3 : std_logic;
  signal sEQ2 : std_logic;
  signal sEQ1 : std_logic;
  signal sEQ0 : std_logic;
begin
  LT2: THmn
         generic map(M => 2, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sLT2);

  GE2: THmn
         generic map(M => 2, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sGE2);

  GE1: THmn
         generic map(M => 1, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sGE1);

  EQ1: THmn
         generic map(M => 2, N => 2)
         port map(inputs(0) => sGE1,
                  inputs(1) => sLT2,
                  output => sEQ1);

  EQ3: THmn
         generic map(M => 3, N => 3)
         port map(inputs(0) => a.DATA1,
                  inputs(1) => b.DATA1,
                  inputs(2) => iC.DATA1,
                  output => sEQ3);

  S1: THmn
        generic map(M => 1, N => 2)
        port map(inputs(0) => sEQ1,
                 inputs(1) => sEQ3,
                 output => oS.DATA1);

  EQ0: THmn
         generic map(M => 3, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sEQ0);

  LT3: THmn
         generic map(M => 1, N => 3)
         port map(inputs(0) => a.DATA0,
                  inputs(1) => b.DATA0,
                  inputs(2) => iC.DATA0,
                  output => sLT3);

  EQ2: THmn
         generic map(M => 2, N => 2)
         port map(inputs(0) => sGE2,
                  inputs(1) => sLT3,
                  output => sEQ2);

  S0: THmn
        generic map(M => 1, N => 2)
        port map(inputs(0) => sEQ2,
                 inputs(1) => sEQ0,
                 output => oS.DATA0);

  oC.DATA0 <= sLT2;
  oC.DATA1 <= sGE2;

end optimized;

This VHDL implementation of the the un-optimized design  uses generic loops to setup the first layer (the first layer uses all combinations of group values). The second layer is set up manually.

The optimized design’s signals are named in terms of relations Greater or Equal to #, Less Than #, and EQual to #. So sEQ1 is asserted when 1 input group is set to 1, and sGE2 is asserted when at least 2 input groups are set to 1.

Testing

The test script runs through all combinations of inputs. I ran the tests with both versions. Here’s the result, no surprises really.

Full Adder Test

Commit: a7a9dba, d645811

NCL Full Adder Design

Theory

Like the Half Adder, a Full Adder counts it’s inputs. The full Adder counts three of them though. This to account for the carry in of the previous bit.

Truth Table

iA iB iC oS oC
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

Design

Once more, we’ll start with the truth table, derive sum-of-product equations, and circuit-ize.

oS = (iA'*iB'*iC) + (iA'*iB*iC') + (iA*iB'*iC') + (iA*iB*iC)
oC = (iA'*iB*iC) + (iA*iB'*iC) + (iA*iB*iC') + (iA*iB*iC)

Again, we need to convert these into NCL logic (DATA0 and DATA1).

oS.0 = (iA.0*iB.0*iC.0)+(iA.0*iB.1*iC.1)+(iA.1*iB.0*iC.1)+(iA.1*iB.1*iC.0)
oS.1 = (iA.0*iB.0*iC.1)+(iA.0*iB.1*iC.0)+(iA.1*iB.0*iC.0)+(iA.1*iB.1*iC.1)
oC.0 = (iA.0*iB.0*iC.0)+(iA.0*iB.0*iC.1)+(iA.0*iB.1*iC.0)+(iA.1*iB.0*iC.0)
oC.1 = (iA.0*iB.1*iC.1)+(iA.1*iB.0*iC.1)+(iA.1*iB.1*iC.0)+(iA.1*iB.1*iC.1)

Note that each row of the truth table is used exactly twice, once for each variable. Since 0’s and 1’s are both represented by a high signal, each output variable {oS, oC} has an assignment for each case. Build the AND-Plane with TH33 gates, and the OR-Plane with TH13 gates:

FullAdder

This design takes up 168 transistors in total. Lets see if we can make it with fewer.

Optimization

This time, instead of SOP form, I’m going to look at it more intuitively. Since the bits are symmetric (the values of iA can be swapped with iB without any change in the  expected output) let’s look at counting them with threshold gates instead of checking individual cases. To check a ‘less than’ relation ship for number of inputs set, just count the number of 0’s.

oS.1: (1 <= NumBits < 2) + (3 <= NumBits) --5 gates
oS.0: (NumBits < 1) + (2 <= NumBits < 3)  --5 gates

oC.1: (2 <= NumBits)                      -- 1 gate (shared)
oC.0: (NumBits < 2)                       -- 1 gate (shared)

Gate version:

oS.1: 
TH12(
     TH22(
          TH13(iA.1, iB.1, iC.1),  -- 1 <= NumBits
          TH23(iA.0, iB.0, iC.0)), -- NumBits < 2
     TH33(iA.1, iB.1, iC.1))); -- 3 <= NumBits

oS.0:
TH12(
     TH33(iA.0, iB.0, iC.0), -- NumBits < 1
     TH22(
          TH23(iA.1, iB.1, iC.1),  -- 2 <= NumBits
          TH13(iA.0, iA.0, iA.0)); -- NumBits < 3

oC.1: TH23(iA.1, iB.1, iC.1) -- 2 <= NumBits
oC.0: TH23(iA.0, iB.0, iC.0) -- NumBits < 2

I’m going to call this ‘functional notation’. It treats each gate as a function with other gates as inputs; common expressions are evaluated only once (duplicate gates are only built once, then shared). This uses the following gates (with their transistor counts):

  • 3 TH12 = 3*6  = 18 transistors
  • 1 TH22 = 1*12 = 12 transistors
  • 2 TH13 = 2*8  = 16 transistors
  • 2 TH23 = 2*18 = 36 transistors
  • 2 TH33 = 2*16 = 32 transistors

Total: 114 transistors (-32% from SOP). The downside is that this is three-layer logic, which generally has a little higher delay for that third layer. On the upside, most of the gates have fewer inputs (2 and 3 inputs instead of 3 and 4 inputs). This reduces the complexity of each gate and may actually reduce the critical path. I won’t do that analysis here.

You might be tempted to obtain the *.0 or *.1 signal by inverting the other. You cannot do this in NCL. You must be able to pass on NULL wavefronts which require both to be 0. This is a downside of NCL, all groups require two (or more) complimentary circuits to obtain. This limitation results in increased die area.

See my next post for the implementation.

NCL Register Implementation

See this post for the design of the NCL Register.

Implementation

I implemented this module structurally, with a for...generate (you’ll find that I’m a big fan of generics if you keep up with the blog).

This module assumes that all groups (DAT0, DATA1, … DATAn) are dual rail (capped at DATA1); if I need other encodings, I’ll make them as separate modules later. I added a generic RegisterDelay input so that I can better observe pipelines of components (if it is stable for time, then I can read values off the waves easier)

library ieee;
use ieee.std_logic_1164.all;
use work.ncl.all;

entity RegisterN is
  generic(N : integer := 1;
  RegisterDelay : time := 20 ns);
  port(inputs : in ncl_pair_vector(0 to N-1);
  from_next : in std_logic;
  output : out ncl_pair_vector(0 to N-1);
  to_prev : out std_logic);
end RegisterN;

architecture structural of RegisterN is
  signal outs : std_logic_vector(0 to (2*N)-1);
  signal watcher_out : std_logic := '0';
begin

register_gates: for i in 0 to N-1 generate
  T22_i0 : THmn
             generic map(N => 2, M => 2, Delay => RegisterDelay)
             port map(inputs(0) => inputs(i).DATA0,
                      inputs(1) => from_next,
                      output => outs(2*i));
  output(i).DATA0 <= outs(2*i);

  T22_i1 : THmn
             generic map(N => 2, M => 2, Delay => RegisterDelay)
             port map(inputs(0) => inputs(i).DATA1,
                      inputs(1) => from_next,
                      output => outs(2*i+1));
  output(i).DATA1 <= outs(2*i+1);

end generate register_gates;

  watcher: THmn
             generic map (N => N*2, M => N)
             port map (inputs => outs,
                       output => watcher_out);
  WatcherOutput: to_prev <= NOT watcher_out;
end structural;

Testing

For Testing I again used a test script, available on GitHub. It goes through the values, and makes sure that data is sent through correctly. It also checks that DATA/NULL wavefronts are delayed correctly by the control signal (handshaking). Note that the to_prev output signal is always in the ‘opposite’ state of the outputs (outputs NULL => to_prev 1 and otuputs DATA => to_prev 0) there is actually a 1-ns delay (default) in the watcher gate (a TH36 in this case).

Capture

Commits: 5d349e7, 4ecfe25, bf16da3

NCL Register Design

We’ve covered some basics on NCL (signals and gates), next I’m looking into registers and structuring a system with multiple components.

Theory

In synchronous logic, designers use flip-flops to store data, they store the current value on every clock edge, moving it to the next stage. In asynchronous logic, there is no clock edge, so saving data requires something else. NCL uses threshold gates as registers, which works because of their hysteresis property. The requirements for the register:

  • Hold on to the value for as long as the next module needs it
  • Send a reset (NULL) signal to the next module on all inputs when it needs it
  • Let the previous register know what it needs (DATA/NULL)

So, there’s handshaking going on here, each register tells the one before it what it needs, and tries to send the next one what it asks for.

Design

How do we send the request to the previous module then? Lets assume 1 control line, and see if we need something else later. Since we are representing NULL with 0, lets set a request for null to be 0, and a request for data to be 1. We want to receive data as soon as the module has reset to NULL, and we want NULL as soon as the module is outputting data on all groups. Here’s the initial design:

NCL Register

If both A and B have a line set (either 0 or 1) then the ‘watcher’ gate is set. The little circle on the tip is an inverter, it turns the 1 (indicating we have DATA) to a 0 (indicating we want NULL) and vice versa.

There is one more requirement: If the module after us is requesting DATA, we can’t store the NULL wavefront (which would overwrite the DATA values) and vice versa and so need to hold the previous module until we can. This means that the request has to be based on the register’s outputs, not its inputs.

Refresher: A group of NCL lines are the set of lines representing a single entity, only one can be active at a time, but it is allowable to have none active (NULL).

NCL Register

Here we have a gate saving each bit: If the control input is low, then the gates will reset when the previous module’s outputs clear (next module requesting null). If the control input is high, then the gates will save DATA inputs (next module requesting DATA).

When both groups (A and B) have data, the watcher sees 2 data lines, sets its output, which goes through the inverter and requests NULL (which won’t be saved until the next module requests NULL).

Eventually, the previous module NULLs out and waits for a DATA request. When the next module requests NULL, the register gates flip to NULLs and the watcher outputs a 0, which is inverted to a 1 (request for DATA). The NULL wavefront passes through the module to the next register.

This cycle continues.

Notes

Components can be directly linked without registers, but only one operation can occur between registers at a time. Adding the registers splits up the operation into smaller parts, which can occur in parallel (for different inputs). At the start, the first set of inputs is loaded, and when they move to the second stage, the first is NULLed, after that, the first stage receives the second set of inputs, while the first set is still running through the third stage. this continues, with all data wavefronts separated by NULL wavefronts.

NCL Half Adder Implementation

See this post for the theory and design of the NCL Half Adder.

Design Recap

Here’s the circuit design:

halfadder1-e1499625273754.png

(simple version)

HalfAdder Optimized

(optimized version)

The top two gates are THxor0 gates, the next is a TH34w22, and the last one is a TH22 gate. I will implement this structurally, a fairly straightforward process in this case.

Implementation

library ieee;
use ieee.std_logic_1164.all;
use work.ncl.all;

entity HalfAdder is
 port(a : in ncl_pair;
      b : in ncl_pair;
      s : out ncl_pair;
      c : out ncl_pair);
end HalfAdder;

architecture structural of HalfAdder is
  signal a0b0_ins : std_logic_vector(0 to 1);
  signal a0b0_out : std_logic;
  signal a0b1_ins : std_logic_vector(0 to 1);
  signal a0b1_out : std_logic;
  signal a1b0_ins : std_logic_vector(0 to 1);
  signal a1b0_out : std_logic;
  signal a1b1_ins : std_logic_vector(0 to 1);
  signal a1b1_out : std_logic;

signal s0_ins : std_logic_vector(0 to 1);
  signal s0_out : std_logic;
  signal s1_ins : std_logic_vector(0 to 1);
  signal s1_out : std_logic;

  signal c0_ins : std_logic_vector(0 to 2);
  signal c0_out : std_logic;
begin
  a0b0_ins(0) <= a.DATA0;
  a0b0_ins(1) <= b.DATA0;   T21_A0B0 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a0b0_ins,
                        output => a0b0_out);

  a0b1_ins(0) <= a.DATA0;
  a0b1_ins(1) <= b.DATA1;   T21_A0B1 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a0b1_ins,
                        output => a0b1_out);

  a1b0_ins(0) <= a.DATA1;
  a1b0_ins(1) <= b.DATA0;   T21_A1B0 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a1b0_ins,
                        output => a1b0_out);

  a1b1_ins(0) <=  a.DATA1;
  a1b1_ins(1) <= b.DATA1;   T21_A1B1 : THmn
               generic map(N => 2, M => 2)
               port map(inputs => a1b1_ins,
                        output => a1b1_out);

  s1_ins(0) <= a0b1_out;
  s1_ins(1) <= a1b0_out;   T21_S1: THmn
            generic map(N => 2, M => 1)
            port map(inputs => s1_ins,
                     output => s1_out);
  s.DATA1 <= s1_out;

  s0_ins(0) <= a0b0_out;
  s0_ins(1) <= a1b1_out;   T21_S0: THmn
            generic map(N => 2, M => 1)
            port map(inputs => s0_ins,
                     output => s0_out);
  s.DATA0 <= s0_out;
  c.DATA1 <= a1b1_out;

  c0_ins(0) <= a1b0_out;
  c0_ins(1) <= a0b1_out;
  c0_ins(2) <= a0b0_out;   T31_C0: THmn
            generic map(N => 3, M => 1)
            port map(inputs => c0_ins,
                     output => c0_out);
  c.DATA0 <= c0_out;
end structural;

architecture optimized of HalfAdder is
begin   
  Sum0: THxor0
          port map(A => A.DATA0,
                   B => B.DATA0,
                   C => A.DATA1,
                   D => B.DATA1,
                   output => s.DATA0);

  Sum1: THxor0
          port map(A => A.DATA1,
                   B => B.DATA0,
                   C => A.DATA0,
                   D => B.DATA1,
                   output => s.DATA1);

  Carry0: THmn
            generic map(N => 6, M => 3)
            port map(inputs(0) => A.DATA0,
                     inputs(1) => A.DATA0,
                     inputs(2) => B.DATA0,
                     inputs(3) => B.DATA0,
                     inputs(4) => A.DATA1,
                     inputs(5) => B.DATA1,
                     output => c.DATA0);

  Carry1: THmn
            generic map(N => 2, M => 2)
            port map(inputs(0) => A.DATA1,
                     inputs(1) => B.DATA1,
                     output => c.DATA1);
end optimized;

I have two implementations here. The basic version, and the optimized version. There’s not a lot to explain beyond VHDL syntax. The gates are built as in the diagram, though some orderings might have changed.

Testing

The test script runs through all input values, clearing to NULL in between. The test simulation run:

Capture

If you have any questions, leave a comment below.

Commit: a57125a

NCL THxor0 Gate

The THxor0 Gate is a 4-input gate with logic function AB + CD. Data that I have found about implementing this gate is all at the transistor level. For now, I’ll make a behavioral model, but I may later design a structural version (technically a 1-bit state machine). Regardless, I think this implementation is actually synthesizable, so that’s nice

The behavioral implementation:

entity THxor0 is
  generic(Delay : time := 1 ns);
  port(A, B, C, D : in std_logic;
       output : out std_logic);
end THxor0;

architecture behavioral of THxor0 is
begin
  process (A, B, C, D)
  begin
    if (A = '0' and B = '0' and C = '0' and D = '0') then
      output <= '0';
    elsif ((A and B) or (C and D)) = '1' then
      output <= '1';
    end if;
  end process;
end behavioral;

The process statement again has 2 conditions: Set and Clear. If neither is met, the gate holds it’s state.

Testing

The test script was a modified version of the THmn test script. See scripts/test/test_threshold_gate.tcl on GitHub. Below is an excerpt from the test simulation session.

capture1.png

See this post for a Half Adder component that uses this gate.

Commit: 19c8318

NCL Half Adder Design

Definition

A Half Adder is a logic component that takes in two inputs, and outputs a binary (base 2) representation of how many are set (0, 1, or 2).

Like any good logic designer working on a small part, we’ll start by making a truth table:

iA iB oSum oCarry
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

Design

The first step of getting from truth table to gates is to generate Sum-of-Product logic equations, even with NCL.

oSum = (iA*iB')+(iA'*iB)
oCarry = iA*iB

Now begins the difference: We don’t treat iA' the same as we would in standard boolean logic. In standard boolean logic, we get the compliment by inverting the single signal. In NCL, we have to use an entirely different signal. In addition, we need logic functions to generate the compliments of our outputs.

oSum.1 = (iA.1*iB.0)+(iA.0*iB.1)
oSum.0 = (iA.1*iB.1)+(iA.0*iB.0)
oCarry.1 = iA.1*iB.1
oCarry.0 = (iA.0*iB.0)+(iA.1*iB.0)+(iA.0*iB.1)

Looking at these functions, it looks like we need 4 2-input gates that each check if both inputs are set (C-Element/TH22), and several gates that check if any inputs are set (OR/TH1n). Lets start by setting up the 4 TH22 gates (the ‘AND plane’):

halfadder-and-plane1.png

The gates represent iA'*iB', iA*iB', iA'*iB, and iA*iB from top to bottom. For each of these, if both inputs are set (remember that setting iA.Data0 means iA==0) the output is set; if both are clear, the output is clear. Since oCarry is just iA*iB, we can wire that output up directly. The others use multiple pairs, that are OR’ed together: if any one is set, the output line is set. Sounds like a TH1n gate.

HalfAdder

So, that’s the basic Half Adder. I double checked it by annotating the gates to make sure I got the same thing:

halfadder-annotated.png

Looks good. Now, let’s get fancy.

Optimizing

Optimizing NCL functions should be similar to optimizing standard logic functions. I’m going to try to optimize for logic levels, to see if I can make it all flat.

oCarry.0

By observation, we have symmetry between iA and iB, and iA' and iB', Since we don’t want iA*iB to be enough to trip the gate, lets consider weighting iA' and iB' at 2. This gives a THm4W22 gate. Now to figure out m:

  • If iA and iB are set, then the total is 2, so we need to be larger than 2.
  • If any other 2 lines are set, we have ≥3 (either 2+1 or 2+2)

Three it is. We will use oCarry.0 = TH34W22(iA.0, iB.0, iA.1, iB.1)

oSum

Looking through the table here, I don’t see a matching function for the gates I’ve studied, but there is a XOR gate: THxor0. This gate picks out the first two, and last two inputs as an SOP function.

oSum.0 = THxor0(iA', iB', iA, iB)
oSum.1 = THxor0(iA, iB', iA', iB)

Together

HalfAdder Optimized

Implementation

I’ll save this for another post.

Getting Down to Business

Now that we’ve talked about what NCL is, it’s time to actually make something with it.

For complete project files, see my GitHub

I will be using Modelsim as my IDE (with a little Notepad++) and VHDL. I’ll be starting with simulation, but eventually I want to load something on an FPGA, so I’ll be keeping an eye for synthesis where I can.

Setup

The first thing to do is make a package with some useful types and functions. I’ll call it work.NCL:

library ieee;
use ieee.std_logic_1164.all;

package ncl is
  type ncl_pair is record
    data0 : std_logic;
    data1 : std_logic;
  end record ncl_pair;
 
  type ncl_pair_vector is array (integer range <>) of ncl_pair;

  component THmn is
    generic(M : integer := 1;
            N : integer := 1;
            Delay : time := 1 ns);
    port(inputs : in std_logic_vector(0 to N-1);
         output : out std_logic);
  end component THmn;
end ncl;

I’ve decided to make each NCL line be std_logic to allow the synthesizer to use standard logic functions.

The gates themselves operate on single lines, but components take in pairs, though it could be expanded to triples or higher if needed later. Before, we can do anything else we need to build the NCL Threshold gate.

Implementation

I tried to implement the threshold gate structurally with the generic parameter, but I was unable to make it work, I’ll look into making a structural version someday.

For synthesis, I may need to make parameter-specific components. The generic component could maybe act as an automatic selector, instantiating the correct implementations.

The most basic description of a threshold gate is that it sets when enough inputs are set, when no inputs are set, the output clears, otherwise no action is taken. For now, I have decided to implement the threshold gate without weights, and instead to use repeated inputs (handled by the calling entity) if I need weights. A TH23W2 would be instantiated as a TH24, and the first input would be given twice.

Here goes:

library ieee;
use ieee.std_logic_1164.all;

use work.ncl.all;

entity THmn is
  generic(M : integer := 1;
          N : integer := 1;
          Delay : time := 1 ns);
  port(inputs : in std_logic_vector(0 to N-1);
       output : out std_logic := '0');
end THmn;

architecture simple of THmn is
begin
  ThresholdGate: process(inputs)
    variable num_1 : integer;
  begin
    num_1 := 0;
    for i in 0 to N-1 loop
      if inputs(i) = '1' then
        num_1 := num_1 + 1;
      end if;
    end loop;
    if num_1 >= M then
      output <= '1' after Delay;
    elsif num_1 = 0 then
      output <= '0' after Delay;
    end if;
  end process;
end simple;

Essentially, it counts the number of 1’s, and checks for the set and clear conditions.

Testing

I made a test script to help make sure the generics work correctly (and get practice for testing later modules). Part of a test for TH22 is below:

Capture

See scripts/tests/test_threshold_gate.tcl on GitHub for my tests.

I have tested it for up to TH77, beyond that it takes too long to run (runtime is 2n). My test file is designed to test every possible transition of input values, in both the output set and clear states.

 

Thanks for reading. If you have any thoughts on how to improve the design, let me know by message or in the comments.

Commit: 5b852e5

The Logo

I want to explain the logo a bit:

Logo

It’s three relevant logic gates in one. The red portion is a Threshold Gate:

cropped-global-diagrams.png

Threshold gates are used by NCL logic. The next gate is the blue one, you may recognize it as an AND gate. Some models of Asynchronous Logic use something called a C-Element, which is drawn like an AND-gate, but it has a ‘C’ in the middle (not shown in the logo). C-Elements are functionally identical to a TH22 gate.

AND_C

The white portion in the middle was not actually something I planned, but I was quite pleased when I noticed the shape. It is the shape of an NCL Threshold gate without hysteresis. These gates aren’t (to my knowledge) used directly very often in designs, but they are the blocks that Threshold Gates with hysteresis are built on.

This is done by feeding the output back to the input m-1 times on an (n+m-1)-input non-hysteresis gate with the same m (see this post).

Threshold_without_Hysteresis

And of course I tried to make it all look sorta cool. Though I admit, it might be too complex for the 16×16 browser tab icon.

If you want, I can provide the GIMP project on GitHub. If you have suggestions for improvements, let me know.

Handshaking & Pipelining

Generally speaking (in this context), handshaking is the process of electronic systems agreeing on what to do next. In NCL, this takes the form of modules requesting a NULL or DATA signal.

  • The request for NULL indicates that the operation has been performed, and the results saved for the next module to use.
  • The request for DATA indicates that the module has reset, passed the reset signal to the next module, and is ready for more work.

If you’re having trouble, think of it like two people doing laundry, one is operating the washing machine, the other is operating the dryer. Additionally, assume that the operators can’t be sure of how long a load will take in either stage. It’s not a perfect analogy, but it might help. Initially, both are empty:

  1. In the empty state, the washing machine operator knows he or she needs clothes, so he or she finds some and puts them in the wash. The drier is also empty, but it needs to get its clothes from the washer, so it has to wait.
  2. When the washer is finished, the operator takes out the clothes and pus them between the washer and dryer, for the dryer’s operator to use when ready. Since the washer is now ‘reset’, it is ready for clothes again, its operator loads it.
  3. The dryer operator sees the clothes are ready to be dried and puts them in the dryer. Eventually they finish and leave the system.

Now, imagine that after a few loads, the dryer can’t keep up, and the washing machine finishes a load while the dryer is still running. The load goes into the space between the washer and dryer. Since the washer can’t know how close the dryer is to completion, it has to wait for the dryer to finish and take the waiting load before starting another.