Analysis of CMOS Logic and Transmission Gate for 64 Bit Parallel Prefix Adders

ABSTRACT


INTRODUCTION
Simple overlaying of all output evaluation trees from the unoptimized prefix algorithm leads to the tree-prefix algorithm proposed by [1]. This leads to a high fan-out of some black nodes (O (n), unbounded fan-out) but results in the smallest possible number of node delays (minimal depth), a small number of signals and very few wiring tracks(O(log n)).The Kogge-Stone adder is a parallel prefix form of carry look-ahead adder. It generates the carry signals in O (log n) time, and is widely considered as the fastest adder design possible. It is the common design for high-performance adders in industry [2][3][4]. This is achieved by using a large number of independent tree structures in parallel.The generate signal indicates that the outgoing carry is 1, independent of the incoming carry, while the propagate signal indicates that the outgoing carry is equivalent to the incoming carry. A simpler Brent-Kung adders was been proposed to solve the disadvantages of Kogge-Stone adders [5], [6]. It has only 2N-2-log2N carry merge blocks, so the cost and wiring complexity is greatly reduced. But the logic depth of Brent-Kung adders increases to 2log2N-1, so the speed is slower.Han and Carlson proposed an algorithm which combines the advantages of the Brent-Kung and the Kogge-Stone algorithms by mixing them [7], [8]. The first and last levels are of the Brent-Kung type while the Kogge-Stone graph is used in the middle. The number of parallel trees and thus the number of black nodes and interconnections is reduced at the cost of a slightly longer critical path, compared tothe Kogge-Stone adder [9], [10].

New Prefix Cell Operators
In this PPA, the dot operator ' ' and the semi-dot operator ' 'are introduced. The dot operator ' ' is defined by the equation (1) and the semi-dot operator ' ' is defined by the equation (2).
In the above equation, '•' operator is applied on two pairs of bits (Pi,Gi) and (Pi-1,Gi-1) . These bits represent generate and propagate signals used in addition. The output of the operator is a new pair of bits which is again combined using a dot operator ' ' or semi-dot operator ' ' with another pairs of bits. This procedural use of dot operator ' ' and semi-dot operator ' ' creates a prefix tree network which ultimately ends in the generation of all carry signals.
In the final step, the sum bits of the adder are generated with the propagate signals of the operand bits and the preceding stage carry bit using a xor gate. The semi-dot operator ' ' will be present as last computation node in each column of the prefix graph structures, where it is essential to compute only generate term, whose value is the carry generated from that bit to the succeeding bit.

Four Operators Cells
In the first stage, generation and propagation signals generated by XORgates respectively. For deriving the carry signals in the second stage, this architecture introduces four different computation nodes for achieving improved performance.

8 Bit Parallel Prefix Adder
The 8-bit parallel prefix having four stages for generating carries in the middle PPA network is shown in Figure 1. This 8-bit PPA contains three odd-dot cells, one even-dot cells, three odd-semi-dot cells and two even-semi-dot cells along with three inverters pairs.

16 Bit Parallel Prefix Adder
The 16 bit parallel prefix adder is shown in Figure 2.The generation and propagation signals can be generated by using the equation (3) and (4). This stage is responsible for creations of group generate and group propagates signals. The 16-bit parallel prefix having five stages for generating carries in the middle PPA network. This 16-bit PPA contains seven odd-dot cells, four even-dot cells, nine odd-semi-dot cells and six even-semi-dot cells along with seven inverters pairs. The second stage in the prefix addition is termed as prefix computation.  Figure 3 shows the architecture of the proposed 32-bit parallel prefix adder. The objective is to eliminate the massive overlap between the prefix sub-terms being computed. Hence the associate property of the dot operator is employed to keep the number of computation nodes at a minimum. The first stage of the computation is called as pre-processing. The first stage in the architectures of the 32-bit prefix adder involve the creation of generate and propagate signals for individual operand bits in active low format. The equations (3) and (4) represent the functionality of the first stage.

32-Bit Parallel Prefix Adders
From the equations (3) and (4), ai, bi represent input operand bits for the adder, where 'i' varies from 0 to 31. The second stage in the prefix addition is termed as prefix computation. This stage is responsible for creation of group generates and group propagate signals. The stages with odd indexes use odd-dot and odd-semi-dot cells where as the stages with even indexes use even-dot and even-semi-dot cells.  Figure 4 shows the architecture of 64 bit parallel prefix adder. The proposed 64-bit parallel prefix adder has ten stages of implementation. CMOS logic family will implement only inverting functions. Thus cascading odd cells and even cells alternatively gives the benefit of elimination of two inverters between them, if a dot or a semi-dot computation node in an odd stage receives both of its input edges from any of the even stages and vice-versa. But it is essential to introduce two inverters in a path, if a dot or a semi-dot computation node in an even stage receives any of its edges from any of the even stages and viceversa. From the prefix graph of the proposed structure shown in Figure 4, we assume that there are only few edges with a pair of inverters, to make (G, P) as ( , ̅ ) ̅̅̅ or to make ( , ̅ ) ̅̅̅ as (G, P) respectively The pair of inverters in a path is represented by a in the prefix graph. By introducing two cells for dot operator and two cells for semi-dot operator, we have eliminated a large number of inverters. Due to inverter elimination in paths, the propagation delay in those paths would have reduced. Further we achieve a benefit in power reduction, since these inverters if not eliminated, would have contributed to significant amount of power dissipation due to switching. The output of the odd-semi-dot cells gives the value of the carry signal in that corresponding bit position. The output of the even-semi-dot cell gives the complemented value of carry signal in that corresponding bit position. The final stage in the prefix addition is termed as post-processing. The final stage involves generation of sum bits from the active low propagate signals of the individual operand bits and the carry bits generated in true form or complement form. The first stage and last stage are intrinsically fast because they involve only simple operations on signals local to each bit position. The intermediate stage embodies long distance propagation of carries, so the performance of the adder depends on the intermediate stage.

8-Bit Parallel Prefix Adder
Schematic and simulation results for 8 bit parallel prefix adder are shown in Figure 5 and Figure 6. The inputs given are bit patterns to the T spice. The schematic is drawn by using library model files. The T spice is used to generate the netlist from the S edit.S-edit is a tool is used for schematic.

16-Bit Parallel Prefix Adders
Schematic and simulation results for 16 bit parallel prefix adder are shown in Figure 7 and Figure 8. The inputs given are bit patterns to the T spice. The schematic is drawn by using library model files.The T spice is used to generate the netlist from the S edit.S-edit is a tool is used for schematic.

32-bit Parallel Prefix Adders
A schematic and simulation result for 32 bit parallel prefix adder is shown in Figure 9 and Figures  10 & 11. The inputs given are bit patterns to the T spice. The schematic is drawn by using library model files.The T spice is used to generate the netlist from the S edit.S-edit is a tool is used for schematic.

Proposed 64-Bit Parallel Prefix Adder
Schematic and simulation results for 64 bit parallel prefix adder is shown in Figure 12 and Figures 13, 14 & 15. The inputs given are bit patterns to the T spice. The schematic is drawn by using library model files. The T spice is used to generate the netlist from the S edit. S-edit is a tool is used for drawing schematic.

Results & Discussions
The power comparison shows that when number of input bits increased, the power consumed by the adder will also increase as shown in Tables 1 and 2.The transistor count for different size of adder as shown in Tables 3 and 4.

CONCLUSION
We have compared proposed low power 64 bit parallel prefix adder results with existing parallel prefix adders such as 8 bit PPA, 16-bit PPA and also with 32bit PPA. The proposed low power 64-bit parallel prefix adder was designed by using the four different prefix cell operators. The performances in terms of power, number of nodes, number of transistor tradeoffs for various input ranges were analyzed and discussed with the results .For 64-bit low power Parallel Prefix Adder seventeen stages were used for generating the sixty three carries outputs and 1339 nodes were formed in the 64-bit low power Parallel Prefix Adder. This makes our proposed low power adder was most suitable for complex digital systems.