Space-time trellis codes : Field programmable gate array approach

This paper presents an implementation of Space-time Trellis Codes for 4-state on FPGA. To reach the very high data rates provided in STTC, a lot of expensive high-speed Digital Signal Processors (DSPs) should be employed for the real time applications, while it might not be affordable. This fact has motivated in designing dedicated hardware implementations using Field Programmable Gate Array (FPGA) with low cost and power consumption. The hardware device XC3S400, family Xilinx Spartan-3, and package PQ208 are used in this project, in which the STTC encoder and decoder utilizes maximum 10% and 22% as that of available device capacity respectively. The design has been simulated and synthesized successfully in Xilinx integrated software environment.


INTRODUCTION
Space-Time Trellis Coding (STTC) is a channel coding technique that can be used to improve the performance of wireless communication systems over fading channels. STTC combines both space and time diversity to avoid multipath fading. Several researchers have undertaken the construction of space-time trellis codes [1]. The rank and determinant criterion (RDC) and Euclidean distance criterion (EDC) have been developed as design criteria. The space-time (ST) coding can make good use of the transmit diversity to alleviate the effect of fading and achieve high data transmission rate without adding extra bandwidth and power consumption. The first ST code with a normalized rate of 1 symbol per channel use (pcu) was proposed by Alamouti over two transmit antennas and two time periods [2]. In [3], Tarokh gave the design criteria, which is a tradeoff between constellation size, data rate, complexity, and diversity advantage for ST code; he presented space-time trellis code (STTC) is able to combat the effect of fading, which can simultaneously offer a substantial coding gain, spectral efficiency, and diversity improvement on flat fading channels [4][5][6][7][8][9][10].

HARDWARE IMPLEMENTATION
The hardware implementation of STTC encoder for any number of states is less complex and for more number of states in the decoder the implementation complexity of hardware increases. The Viterbi algorithm to perform maximum-likelihood decoding is attractive for efficient VLSI hardware implementation.

STTC encoder
Space-time trellis codes (STTCs) combine modulation and trellis coding to transmit information over multiple transmit antennas [1]. To achieve full diversity, a STTC should satisfy the rank criterion where as to achieve full data rate, should satisfy the determinant criterion. Let us assume a constellation that maps c bits of data to a symbol, the delay diversity scheme uses every c input bit to pick one symbol that is transmitted from the second antenna. Then, the first antenna transmits the same symbols with a delay of one symbol. Since the symbols go through two independent channel path gains, a diversity of two is achieved for one or more receive antennas.
The block diagram of STTC encoder is shown in Figure 1. In our implementation, we considered a 4-state space-time trellis coded QPSK scheme with 2-transmit antennas. The encoder consists of a bit splitter, two feedforward shift registers, four multipliers, and modulo 4-adder. A bit splitter splits the input sequence, c, into two parallel bit streams which are delayed by one bit delay circuit. The bits,

Delay generator
The Circuit diagram for Delay generator as illustrated in Figure 3. The state selection is taken for 64-state but we have done for only 32-state, and explanation with an example is given for 4-state only.

Coder
A coder consists of multiplier, state selector, and modulo-4 adder. The internal architecture (RTL Schematic) of coder is shown in the appendix. If an encoder which is operated with either 4-state or 8-state or 16-state or 32-state, the state selector should select either 010 or 011 or 100 or 101 respectively. The encoder coefficient set are multiplied with input bits and delayed bits by multiplier and then modulo-4 adder is used to add all the multiplier outputs, since the encoded symbols 0, 1, 2, & 3 are used to modulate for QPSK modulator. The adder outputs 1 t x & 2 t x are points from a QPSK constellation which are transmitted simultaneously through the first and second antenna, respectively.
Step 2: Input sequence, c, is splitted into two streams of bits, say Step 6: For 2nd time, 3rd time, … repeats the steps from step 2 to 5.

Flow chart for STTC encoder
The flowchart for STTC encoder is shown in Figure 4, as a first step all the shift registers are cleared, i.e., reset to zero, then the input sequence is splitted into to two streams of bits, i.e.,

STTC decoder
The implementation of STTC Decoder on FPGA is shown in Figure 5 which consists of code converter, Squared Euclidean Distance (SED) generator, feasible state selector, feasible branch selector, PISO converter, and Detector. The received symbols from receive antennas are selected from the diversity selector and demodulated from the QPSK demodulator. The demodulated outputs are in the form of binary bits, which are converted to decimal symbols from the code converter. These symbols are given to the squared Euclidean distance generator, which generates the squared Euclidean distances for all states. The squared Euclidean distances are calculated for each state which contains 4 branches and hence the squared Euclidean distance is to be calculated for all sixteen combinations for 4-state. The SEDs for all states are fed into the feasible state selector, which selects the required states. And it counts the number of states are selected and gives the corresponding Most Significant Digit (MSD) for each state. The required state is that the SEDs of a state of its branches are less than or equal to 2.
In 4-states STTC, maximum three required states are obtained by the feasible state selector. These states are given to the three feasible branch selectors. The feasible branch selector selects the required branches from each state. Required branch is that the SED of the branch of a state is less than or equal to 2. And feasible branch selector gives the corresponding Least Significant Digit (LSD) for each branch and gives the SEDs for required branches in a state are given to the detector. The detector consists of reference symbol generator, comparator, subtractor, and decoder. The reference symbols are initially set to zero. The squared Euclidean distance for the reference symbols are calculated with its reference state (MSD of reference symbols) from the SED generator. The feasible branches are selected from the values of SEDs for the reference symbols. The feasible branch selector gives the SEDs and LSDs for feasible branches.
The detector compares the all SEDs which are taken from the three feasible branch selectors for states and the reference symbols. And it selects any one survive (likely) path, feasible reference symbols, and required output data. The output data is in the form of decimal digit, which is converted into binary from the code converter. These binary bits are in the form of parallel bits which are converted into serial data. The serial data is called information.

Algorithm for STTC decoder
Step 1: Clear all the reference symbols, states, branch metrics for all the 16 combinations and reference symbol, memory elements which are used to store and processing the symbols to zero. Step2: Received symbols, Step 3: Find SEDs for all the 16 combinations and the results are stored into memory.
Step 4: Select the feasible states from above step, say, ' 0 S , ' 1 S and ' 2 S .
Step 5: Select feasible branch metrics for each feasible state and the results are stored into memory.
Step 6: Find SEDs for the reference symbols, dl and dr and the results are stored into memory.
Step 7: Select feasible branch metrics from above step. The results are stored into the memory.
Step 8: Subtract all the LSD of feasible branches of reference symbol from the input symbol, 1 t rx (MSD).
Step 9: Compare them and select least result which was generated for the LSD of reference symbol.
It represents the present state among the four states. Step10: Select the symbol that was least SED among four branches in the state. These symbols are reference symbols for the next state.
Step 11: Select the MSD of reference symbol is the output symbol.
Step 12: The output symbol is converted into binary, which is converted back to serial data which is the actual information.
Step 13: Repeat the steps from 2 to 12 for next receive data.
We note that the maximum space utilization for an encoder in a device available space is just 10%. The five 1-bit registers, two 3-bit registers, two 2-bit Latches, nine 1-bit xors are used. The minimum period: 2.882ns (Maximum Frequency: 346.933MHz), minimum input arrival time before clock: 4.161ns, and maximum output required time after clock: 6.141ns for device speed grade: -5 is used.
We note that the maximum space utilization for a decoder in a device available space is just 22%. The one 4x1-bit ROM, two 4x2-bit ROMs, one each 4x6-bit ROM and 8x6-bit ROM, twelve 7x7-bit multipliers, sixteen 6-bit adders, fifteen 7-bit adders, three 7-bit subtractors, three 1-bit registers, two 2-bit registers, eight 6-bit registers, sixteen 6-bit latches, forty one 6-bit comparator equal, sixteen 6-bit comparator great equal, forty eight 6-bit comparator less, sixty four 6-bit comparator less equal, eighteen 8-bit comparator less equal, and seventeen 6-bit 4-to-1 multiplexers are used. The minimum period: 12.584ns (Maximum Frequency: 79.466MHz), minimum input arrival time before clock: 3.442ns, and maximum output required time after clock: 6.216nsfor a device speed grade: -5 is used. The trellis structure for received symbols is shown in Figure 5. The survive paths are shown in bold lines in Figure 7. The device logic utilization like number of slices, number of slice flip-flops, number of four input LUTs, number of bonded IOBs, number of multipliers, number of GCLKs for STTC encoder, decoder and STTC system are shown in Table 9, Table 10 and Table 11 respectively.

SIMULATION RESULTS
The simulation results for STTC encoder and decode are shown below. We select the device XC3S400 which contains 400k gates, family is Xilinx Spartan-3, package PQ208, and speed -5 are used to implement the hardware for STTC encoder and decoder. The Table 1 and Table 2 describe the device/hardware utilization for encoder and decoder respectively. We note that the maximum space utilization for an encoder in a device available space is just 10%. These details are given below: The five 1-bit registers, two 3-bit registers, two 2-bit Latches, nine 1-bit xors are used. The minimum period: 2.882ns (Maximum Frequency: 346.933MHz), minimum input arrival time before clock: 4.161ns, and maximum output required time after clock: 6.141ns for device speed grade: -5 is used.
We note that the maximum space utilization for a decoder in a device available space is just 22%. These details are given: The one 4x1-bit ROM, two 4x2-bit ROMs, one each 4x6-bit ROM and 8x6-bit ROM, twelve 7x7-bit multipliers, sixteen 6-bit adders, fifteen 7-bit adders, three 7-bit subtractors, three 1-bit registers, two 2-bit registers, eight 6-bit registers, sixteen 6-bit latches, forty one 6-bit comparator equal, sixteen 6-bit comparator great equal, forty eight 6-bit comparator less, sixty four 6-bit comparator less equal, eighteen 8-bit comparator less equal, and seventeen 6-bit 4-to-1 multiplexers are used. The minimum period: 12.584ns (Maximum Frequency: 79.466MHz), minimum input arrival time before clock: 3.442ns, and maximum output required time after clock: 6.216nsfor a device speed grade: -5 is used. The simulation results for STTC encoder and decoder are shown in Figure 8, Figure 9, and Figure 10.

CONCLUSION
In this paper we have presented the implementation of STTC encoder and decoder for 4-state on FPGA. The device XC3S400 which contains 400k gates, family is Xilinx Spartan-3, package PQ208, and speed -5 are used for STTC implementation.
In STTC decoder implementation, the decoding complexity increases with number of states, therefore, an optimal decoding technique such as soft decision Viterbi decoding algorithm is used and makes it easy hardware implementation, reduces the complexity in branch metrics calculations, and hardware cost. However, the branch metrics calculations and difficulties in detector significantly increase with number of states. The logic utilization for STTC encoder and decoder is 10% and 22% as that of available device resources, respectively.