FPGA Implementation of High Speed Hardware Efficient Carry Select Adder

This paper presents a novel architecture for high speed and hardware efficient carry select addition. We modify the two operand ripple carry addition followed in conventional Carry SeLect Adder (CSLA) with a simple and efficient gate level circuit to reduce area and delay significantly. For this, we use an increment 1 block for generating the sum outputs with carry input 1 instead of second pair ripple carry adder as in conventional CSLA. The novelty of the proposed approach is that it reduces area, and the delay due to carry propagation in second pair of adder cells. The proposed CSLA adder has been designed using structural VHDL code and synthesized using Altera Quartus II. Experimental results show that the proposed design outperform the previous approaches in terms of delay and area reduction.


INTRODUCTION AND RELATED WORK
The basic arithmetic operation most frequently encountered in the design of digital processors is addition and design of high speed and area efficient adders is significant area of research in VLSI data path systems. A large number of adders have been designed to meet the requirements of different image and signal processing applications. Adders lie in the critical path of a processing architecture and the critical path determine the overall performance of the system. A variety of applications require certain arithmetic operations such as incrementing the sum of two numbers by unity, finding the absolute difference between two numbers, or augmenting the sum of two numbers by a constant. One approach to perform these operations is to utilize dual adders or use multi-operand adders such as the Carry-Save Adders (CSA), Carry SKip Adder (CSKA), Carry Propagate Adder(CPA) and CSLA. Also, multi operand addition forms a significant part of multiplication and certain DSP algorithms. Adder performance in a multi bit addition can be improved by reducing the delay due to carry propagation between different adder cells. This can be addressed by improving the structure of the basic adder block. A number of architectures and algorithms to improve efficiency in multi bit addition are proposed.
Kantabutra [6](1991) proposed a novel approach for design of optimum-speed one-level CSKA. Min Cha and Swartzlander(2000) [7] proposed a modified version of carry skip logic in which to reduce delay in the first block, carry look ahead logic is used. However, the design is prone to high hardware complexity due to use of carry look ahead adder in the first stage. Youngjoon Kim and Lee-Sup Kim (2001) [10] proposed a CSLA using Ripple Carry Adder(RCA) and an add-one circuit instead of using dual RCAs. The proposed add-one circuit uses first zero finding circuit and multiplexers to reduce the area and power with no speed penalty. For bit length n=64, this new CSLA requires 38 percent fewer transistors than the dual ripplecarry CSLA.  [9] proposed a novel approach for the design of 4-to-2 CSA using dynamic logic and the Limited Switch Dynamic Logic (LSDL) circuit. Yu Shen Lin and Radhakrishnan (2006) [8]proposed a novel design for 32 bit CSK addition. The Generate and Propagate logic used in carry look ahead addition are used to reduce delay in RCA. The number of bits in the skip blocks is decided by considering the critical path into account. The CSK adder in [8] is implemented in 25 ηm CMOS technology file. The experimental results demonstrated a critical path delay reduction of 18% compared to the best of the previous approaches. A novel technique for multi bit addition using flag bit generation was proposed by Vibhuti Dave et al(2010) .
Yu pang et al (2012) [4] proposed a novel carry skip design for reducing delay in addition due carry propagation. Yu pang design revealed better reduction in energy dissipation due to use of reversible logic. A 4 bit CSK adder was designed using 4*4 reversible TSG & Fredkin gate by Chiwande and Dakhole(2012) [5]. The proposed adder in [5] demonstrated better reduction in power dissipation compared to existing four bit CSKAs.
CSLA is preferred in Digital signal processors and application specific ICs designed to execute dedicated algorithms such as convolution, correlation and filtering to alleviate the problem of carry propagation delay in addition [11]. However, the hardware complexity of CSLA is high due to use of pair of RCAs to generate partial sum and carry corresponding to carry input 1 and 0. Then the final sum and carry are selected from the partial results by using multiplexers (mux) [12]. However, Ramkumar and Kittur(2012) [3] in a novel approach proposed a CSLA using Binary to Excess-1 Converter (BEC) instead of RCA for second stage addition to alleviate the problem of high hardware complexity. The proposed adder in [3] revealed lower area and reduced power dissipation due to minimal switching. To further reduce area and latency in CSLA addition we have proposed an increment 1 block for second stage addition. The novelty of the increment 1 block is that it reduces area and delay significantly.
The rest of the paper is organized as follows. Section 2 gives an overview of carry select addition. Section 3 discusses about the design of proposed hardware efficient CSLA. In section 4 the performance of the proposed design are discussed and compared with the previous approaches. Section 5 gives brief conclusion of the work done.

OVERVIEW OF CARRY SELECT ADDITION
Based on the carry selection addition approach, for the case of 16 bit input, the input bits(A 15-0 & B 15-0 ) can be grouped into four pairs of 4 bits(A 15-12 ,A 11-8 ,A 7-4 ,A 3-0 and B 15-12 ,B 11-8 , B 7-4 , B 3-0 ). The corresponding most significant and least significant digits(grouped bits) can be added in parallel using separate 4 bit RCA with initial carry input as zero for first stage and 1 for second stage. The sum outputs from pair of adder cells at the corresponding bit position are adder by one. The various blocks of the proposed hardware efficient CSLA are first stage RCA, Increment 1 block and 2:1 passed through 2:1 mux, with the select signal for the mux being the carry out of the most significant adder in previous group. Thus, the total delay for a 16 bit addition in CSLA is 4 carry and 4 mux delay with the expense of little increase in hardware. However, for a 16 bit addition conventional RCA require 16 carry propagation delay to realise the final sum. The schematic of the basic CSLA for n=16 is shown in Figure 1.

PROPOSED HARDWARE EFFICIENT CSLA
The proposed high speed hardware efficient CSLA uses a simple gate level increment 1 block for incrementing the outputs of first stage multiplexers as shown in Figure 2. For a 16 bit input A 15-0 and B 15-0 , we divide bits as A 15-12 ,A 11-8 ,A 7-4 ,A 3-0 and B 15-12 , B 11-8 , B 7-4 , B 3-0 and are fed to the first stage adder in parallel. The sum outputs S"(S" 15 S" 14 S" 13 S" 12 ) (S" 11 S" 10 S" 9 S" 8 ) (S" 7 S" 6 S" 5 S" 4 ) & (S" 3 S" 2 S" 1 S" 0 ) of the first stage adder are fed to Increment 1 block to produce the equivalent sum output S(S 15 S 14 S 13 S 12 ) (S 11 S 10 S 9 S 8 ) (S 7 S 6 S 5 S 4 ) & (S 3 S 2 S 1 S 0 ) and a carry output C out, corresponding to the carry input 1. The first stage RCA output(S") and Increment 1 block output(S) are fed to the multiplexer with the control signal for the multiplexer being the carry output of previous group of adder cells . The schematic of the increment 1 block is shown in Figure 3.The logic that defines the output of Increment 1 block are shown in Equation (1) to Equation (5). We use "i' to represent the position of the adder block, with i=0 being the least adder block. (Note that symbol Θ represents XOR logic, ~ represents NOT logic, ^ represents AND logic).

AREA AND DELAY EVALUATION METHODOLOGY OF PROPOSED CSLA
To evaluate the area and delay of the proposed hardware efficient CSLA, we used BEC-CSLA [3] and Conventional designs [12] for comparison. The area and delay evaluation of the proposed and conventional designs are done based on the NAND equivalent implementation of the basic elements that make up the proposed and conventional designs. The NAND implementation of basic elements viz., Full Adder (FA), NOT, AND and OR gates used in the proposed and conventional BCD adder designs are shown in Fig 4. To calculate the area count and critical delay of the proposed CSLA and previous CSLA designs we assume the delay of NAND gate to be 1 unit and area equal to 1 count. The gates in parallel between dotted lines perform parallel operation and we use only one gate delay in case of parallel gates for calculation of worst case delay of circuit/element. Based on the above approach, the worst case delay is found by counting the number of NAND gates in the critical path and the area is evaluated by counting the total number of NAND gates that make up the circuit. The area and delay values of the proposed CSLA and previous approaches are shown in Table 1. From the reports in Table 1, it is seen that the proposed CSLA design demonstrates significant reduction in area and delay.

RESULTS AND DISCUSSION
The proposed hardware efficient CSLA and designs used for comparison are described using structural VHDL to produce gate level net list and synthesized using Altera Quartus II with EP2C35F672C6 device. The area, delay and total power dissipation results of proposed CSLA and previous approaches are shown in Table 2. From the reports, it is seen that the proposed CSLA design has lower logic cell count compared to all other architectures used for comparison, thanks to the increment 1 block which realizes constant (0001) addition with fewer gates. The delay of the proposed increment 1 -CSLA design is less when compared with all other adder designs used for comparison. This is due to the elimination of delay due to carry propagation in second pair of adders with carry input 1. However, the proposed CSLA exhibit little higher total power dissipation compared to the conventional design. The better delay performance of the proposed CSLA reveals the best Area -Delay Product (ADP) and Power-Delay Product(PDP) performances compared to the previous approaches.

CONCLUSION
A novel approach for the design of carry select adder which reduces area and delay significantly is proposed in this brief. Extensive comparison using synthesis results shows that the proposed CSLA outperformed all other previous designs in terms of delay and area reduction. The potential benefits of reduced logic cell count and delay of the proposed carry select adder realizes an ADP and PDP reduction of 23.7% and 7.4% respectively compared to the conventional design and 5.6% and 1.2% compared to BEC-CSLA. The proposed CSLA is thus hardware and energy efficient, and suitable for portable VLSI implementation.