Vol. No.6, Issue No. 10, October 2017 www.ijarse.com



# LOW POWER HIGH-PERFORMANCE NON-BINARY LOW-DENSITY PARITY CHECK CODER (FNB-LDPC) OVER GF (2<sup>m</sup>) USING CHECK-NODE UNIT,

## M.Srinivasan<sup>1</sup>, G.M. Tamilselvan<sup>2</sup>

<sup>1</sup>Department of Electronics and Communication Engineering,

KGiSL Institute of Technology, Coimbatore,(India)

<sup>2</sup>Department of Electronics and Communication Engineering,

Bannari Amman Institute of Technology, Sathyamangalam, (India)

#### **ABSTRACT**

An error correction codes are one of important part in channel coding to enhance the performance of communication system. Recently, low-density parity check (LDPC) codes used as the same purpose but hardware design require large resources, which limits the performance of coders. This paper presents a low power, high-performance non-binary LDPC coder (FNB-LDPC) over Galois fields  $GF(2^m)$ . The hardware realization of check-node unit (CNU) is a challenging part because it consists of big modules such as FFT/IFFT and multiplier. We overcome these problems by modified CNU structure. The proposed versatile induced flexible LDPC coder supports all field size of suggested Galois fields without the necessity to reconfigure the hardware structure to increase the performance in terms of hardware utilization, power, and delay.

Keywords: FNB-LDPC coder, check node unit, variable node unit, Galois field, real-valued FFT, versatile bit serial multiplier.

#### **I.INTRODUCTION**

LDPC codes are a class of block code that satisfies both long length and randomness. The fully parallel LDPC code [1] extended from BP decoding algorithm by capitalizing on parallel structure. A 1024 bit LDPC code achieves a maximum symbol throughput of 1Gbps implemented in ASIC technology. QC-LDPC [2] code achieves high hardware utility efficiency (HUE) and brings about great memory block reduction without any performance degradation. First to split the check matrix into several row blocks, then it performs to improved message passing computations sequentially block by block. A resource efficient LDPC decoder based on a reduced complexity Min-Sum algorithm reduced the interconnect complexity by restricting the extrinsic message length to 2 bits and simplified the CNU [3]. The high throughput decoding of high-rate LDPC codes modified by the sliced message passing (SMP) decoding architecture [4] which overlaps the CNU and VNU and achieved a good tradeoff between area and throughput, and also, high hardware utilization efficiency. A look-up

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com



table (LUT) based VNU design have the best solution for high-speed hardware design and it extended for (2048, 1723) LDPC code of the IEEE 802.3an standard [5].

A parallel NB-LDPC decoder over GF(256) have implemented in 28-nm CMOS technology [6]. The trellis based CNU design maximize the storing capacity by reduced the large amount of memory occupying activities. The sorted log likelihood ratio (LLR) vector of a check-to-variable message has approximated using a piecewise linear function. The first and second minimums have computed by modified CNU [7] in terms of accurate and imprecise manner. In this paper, we present a flexible non-binary LDPC coder (FNB-LDPC) over  $GF(2^m)$ 

The objective of proposed FNB-LDPC coders increases the hardware utilization efficiency by maximum clock frequency, and minimizes the power consumption.

#### II. PROBLEM DEFINITION AND SYSTEM MODEL

Sułek et al. [18] have proposed a NB-LDPC coder using the mixed domain FFT-BP decoding algorithm with the multiplication units and it also named as semi-parallel decoder. Coder favors mapping a touch of the check to the multiplier focuses embedded in a FPGA, in like way making use of the wide number of sorts of FPGA resources. The throughput wrapped up by a single FPGA by the decoder in light of current conditions made. In NB-LDPC coder, the CNU block enhanced by an approximated evaluation of the nonlinear vectors. The NB-LDPC coder implemented using an FPGA development board from Xilinx with Virtex4 XC4VSX55 device with two different GF orders such as GF(8) and GF(32). The GF(8) NB-LDPC decoder consumes the number of slices utilized as 14535, the number of multiplier as 128 and maximum clock frequency of 170.8MHz. The GF(32)NB-LDPC decoder consumes the number of slices utilized as 22494, the number of multiplier as 192 and maximum clock frequency of 130.2MHz. This NB-LDPC coder is not a fully parallel structure and it consume more hardware utilization than existing LDPC coders discussed in related works. Moreover, the multiplier is as main part of CNU block in LDPC coder, but author's use recursive multiplier for this design. For that reason, we present the flexible non-binary LDPC coder (FNB-LDPC) over  $GF(2^m)$  without the necessity of reconfigurable hardware structure. The proposed FNB-LDPC coder implemented over different Galois fields  $GF(2^m)$  without modified structure of hardware design. The performance of proposed coder will compare with existing coders including NB-LDPC coder [18].

#### 2.1 System Model of Proposed FNB-LDPC Coder

Check node unit (CNU) module consists of different modules such as permutation, non-linear functions, real-valued FFT/IFFT, and versatile bit serial multiplier.

#### 2.1.1 Real-valued FFT/IFFT module

The initial process in the proposed technique for noise degradation is the transformation of input signals in time domain to the frequency domain. Since speech and noise signals are real valued signals, the conventional FFT

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com

IJARSE ISSN 2319 - 8354

architecture for domain conversion can be replaced with modified low power pipelined architecture so as to make the complete hardware architecture efficient in terms of area and power consumption.



Fig. 1 Parallel Pipelined Architecture for 16 Point Radix 2 RFFT

At stage 1, the butterfly unit will process the pair of real samples  $x(\varphi)$  and  $x(\varphi + M/2)$ . The butterfly unit consists of 2:1 multiplexer with one selector line S. When the inputs are real, then the selector line S set to 1 and the butterfly starts to compute the input values. When the inputs are complex S set to 0, then the multiplexer just passes the input without computation. At stage 2, the architecture consists of shuffling unit, butterfly unit and twiddle factor block A.

The shuffling unit is used to transform the order of the data that required from the stage 1 to stage 2, which also contains 2:1 multiplexer and two delay elements. The twiddle factor  $(W^{\phi})$  module is shown in Fig. 2.



Fig. 2 Twiddle Factor Module

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com



Table 1: Twiddle factor real and imaginary coefficients for M=16

| Twiddle factor (W <sup>\Phi</sup> ) | Real values | Imaginary values |
|-------------------------------------|-------------|------------------|
| $\mathbf{W}^0$                      | 1           | 1                |
| $\mathbf{W}^1$                      | 0.9239      | 0.3827           |
| $\mathbf{W}^2$                      | 0.7071      | 0.7071           |
| $W^3$                               | 0.3827      | 0.9239           |

#### III. EXPERIMENTAL RESULTS

The performance metrics such as device utilization and maximum frequency of proposed and existing coder is given in table 2. The proposed coder implemented with flexible bit design in Virtex7 FPGA devices. The Table 2 shows the performance as device utilization, maximum frequency, and power consumption of proposed router perform very effective than other existing routers.

The fully parallel stochastic LDPC-

BC decoder [8] consumes gate count of 760.3K, maximum clock frequency of 768MHz, and power consumption of 437.2mW. The half-stochastic decoding architecture for LDPC-BC decoder over GF(16) [9] consumes gate count of 1077K gate counts, and maximum clock frequency of 333MHz. The multi-mode LDPC decoder architecture [10] consumes gate count of 320K gate counts, the maximum clock frequency of 400MHz, and power consumption of 284.3mW.

The memory efficient decoder architecture [11] consumes the number of slices is 16803, a number of slice LUT are 31305, the number of slice registers are 4066, the maximum clock frequency of 400MHz, and power consumption of 1638mW.

The Quasi-cyclic LDPC coder [12] consumes the gate counts as 416.2K, maximum clock frequency as 474MHz, and power consumption of 114.3mW. The self-corrected min-sum (SCMS-V1) coder consumes LUT and FF pair count as 60K, maximum clock frequency as 300MHz, and SCMS-V2 consumes LUT and FF pair count as 51K, maximum clock frequency as 300MHz [13].

LDPC-BC decoder [14] consumes an area of 1.79 mm2, maximum clock frequency of 100MHz, and power consumption of 104mW.

The LDPC decoder [15] consumes an area of 3.11 mm2, maximum clock frequency of 200MHz, and power consumption of 99.2mW. The LDPC decoder [16] consumes an area of 15.75 mm2, maximum clock frequency of 100MHz, and power consumption of 800mW. The NB-LDPC decoder [17] consumes a gate count of 564K, the maximum clock frequency of 277MHz, and power consumption of 274mW.

The NB-LDPC coder [18] implemented with two different separate GF orders as GF(8) and GF(32). The GF(8) NB-LDPC decoder consumes the number of slices utilized as 14535, the number of multiplier as 128

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com

IJARSE ISSN 2319 - 8354

and maximum clock frequency of 170.8MHz. The  $GF(32)_{\rm NB-LDPC}$  decoder consumes the number of slices utilized as 22494, the number of multiplier as 192 and maximum clock frequency of 130.2MHz. The proposed design consumes maximum clock frequency as 506.303MHz, and power consumption of 143mW. The FNB-LDPC 32 bit coder consumes slice registers of 2932, slice LUTs of 3523, number of LUT and FF pairs as 4323 for y=3 VNUs, maximum clock frequency of 334.437MHz, and power consumption of 143mW. The maximum frequency of proposed coder increase in terms of 196.43% compare to NB-LDPC coder for GF(8) and y=8 and also increase in terms of 156.86% compare to NB-LDPC coder for GF(32) and y=3.

| FPGA family            | Maximum    | Power               |
|------------------------|------------|---------------------|
|                        | Frequency  | consumption         |
| 90 nm CMOS tech.       | 768 MHz    | 437.2mW             |
| 90 nm CMOS tech.       | 333 MHz    | -                   |
| 65 nm CMOS tech.       | 400 MHZ    | $284.3 \mathrm{mW}$ |
| Vitex4                 | 82 MHz     | 1638mW              |
| Vitex5                 | 474 MHz    | 114.3mW             |
| Vitex7                 | 300 MHz    | -                   |
| 130 nm CMOS tech.      | 100 MHz    | 104 <b>m</b> W      |
| 220 nm CMOS tech       | 200 MHz    | 99.2mW              |
| 180 nm CMOS tech.      | 100 MHz    | $800 \mathrm{mW}$   |
| 90 nm CMOS tech.       | 277 MHz    | 27 <b>4m</b> W      |
| Virtex4                | 170.8 MHz  |                     |
| (XC4VSX55)             | 130.2 MHz  | -                   |
|                        |            | -                   |
| Virtex7<br>(XC7VX330T) | 317.068MHz | 143mW               |
| ()                     | 334.437MHz | 143mW               |

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com





Fig. 3 Maximum Clock Frequency Comparison of Proposed and Existing Work



Fig. 4 Power Consumption Comparison of Proposed and Existing Works

### IV.CONCLUSION

The new flexible non-binary LDPC (FNB-LDPC) coder proposed to enhance the performance of design and it is not required reconfigure hardware structure for any GFs. The proposed coder consists of two units such as check node unit (CNU) and variable node unit (VNU). The hardware realization of check-node unit (CNU) is a challenging part because it consists of big modules such as FFT/IFFT and multiplier.

The proposed FNB-LDPC coder have implemented on Virtex7 FPGA expertise in Xilinx tool. From the experimental results, proposed coder perform very effective than existing coders in terms of hardware utilization, power and delay.

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com

## IJARSE ISSN 2319 - 8354

#### REFERENCES

- [1] J. Broulim, P. Broulim, J. Moldaschl, V. Georgiev and R. Salom, "Fully parallel FPGA decoder for irregular LDPC codes", 2015 23rd Telecommunications Forum Telfor (TELFOR), 2015.
- [2] L. ZHAO, R. LIU, Y. HOU and X. ZHANG, "High Hardware Utilization and Low Memory Block Requirement Decoding of QC-LDPC Codes", Chinese Journal of Aeronautics, vol. 25, no. 5, pp. 747-756, 2012.
- [3] V. Chandrasetty and S. Aziz, "An area efficient LDPC decoder using a reduced complexity min-sum algorithm", Integration, the VLSI Journal, vol. 45, no. 2, pp. 141-148, 2012.
- [4] F. Angarita, T. Sansaloni, M. Canet and J. Valls, "Improved Sliced Message Passing Architecture for High Throughput Decoding of LDPC Codes", Journal of Signal Processing Systems, vol. 66, no. 2, pp. 99-104, 2011.
- [5] V. Torres, A. Perez-Pascual, T. Sansaloni and J. Valls, "Fully-parallel LUT based (2048,1723) LDPC code decoder for FPGA", 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012), 2012.
- [6] J. Lin and Z. Yan, "An Efficient Fully Parallel Decoder Architecture for Nonbinary LDPC Codes", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 12, pp. 2649-2660, 2014.
- [7] O. Boncalo, D. Declercq, A. Amaricai, V. Savin and F. Ghaffari, "Check node unit for LDPC decoders based on one-hot data representation of messages", Electronics Letters, vol. 51, no. 12, pp. 907-908, 2015.
- [8] X. Lee, C. Chen, H. Chang and C. Lee, "A 7.92 Gb/s 437.2 mW Stochastic LDPC Decoder Chip for IEEE 802.15.3c Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62, no. 2, pp. 507-516, 2015.
- [9] X. Lee, C. Yang, C. Chen, H. Chang and C. Lee, "An Area-Efficient Relaxed Half-Stochastic Decoding Architecture for Non-binary LDPC Codes", IEEE Transactions on Circuits and Systems II: Express Briefs, *vol.* 62, no. 3, pp. 301-305, 2015.
- [10] S. Ajaz and H. Lee, "Efficient multi-Gb/s multi-mode LDPC decoder architecture for IEEE 802.11ad applications", Integration, the VLSI Journal, vol. 51, pp. 21-36, 2015.
- [11] V. Chandrasetty and S. Aziz, "Resource efficient LDPC decoders for multimedia communication", Integration, the VLSI Journal, vol. 48, pp. 213-220, 2015.
- [12] M. Roberts and R. Jayabalan, "A Power- and Area-Efficient Multirate Quasi-Cyclic LDPC Decoder", Circuits, Systems, and Signal Processing, vol. 34, no. 6, pp. 2015-2035, 2014.
- [13] O. Boncalo, A. Amaricai, P. Mihancea and V. Savin, "Memory trade-offs in layered self-corrected min-sum LDPC decoders", Analog Integrated Circuits and Signal Processing,, vol. 87, no. 2, pp. 169-180, 2015.
- [14] J. Yoon and J. Park, "An Efficient Memory-Address Remapping Technique for High-Throughput QC-LDPC Decoder", Circuits, Systems, and Signal Processing, vol. 33, no. 11, pp. 3457-3473, 2014.
- [15] C. Condo, A. Baghdadi and G. Masera, "Reducing the Dissipated Energy in Multi-standard Turbo and LDPC Decoders", Circuits, Systems, and Signal Processing, vol. 34, no. 5, pp. 1571-1593, 2014.
- [16] K. Lin and M. Lin, "High-Throughput Architectures for Circular Block-Type Low-Density Parity-Check Codes", Circuits, Systems, and Signal Processing, vol. 34, no. 9, pp. 2993-3009, 2015.

Vol. No.6, Issue No. 10, October 2017 www.ijarse.com



[17] C. Lin, S. Tu, C. Chen, H. Chang and C. Lee, "An Efficient Decoder Architecture for Non-binary LDPC Codes With Extended Min-Sum Algorithm", IEEE Transactions on Circuits and Systems II: Express Briefs, *vol.* 63, no. 9, pp. 863-867, 2016.

[18] W. Sułek, "Non-binary LDPC Decoders Design for Maximizing Throughput of an FPGA Implementation", Circuits, Systems, and Signal Processing, vol. 35, no. 11, pp. 4060-4080, 2016.

[19] M.Srinivasan and G.M.Tamilselvan, "VLSI Implementation of Low Power High Speed ECC Processor using Versatile Bit Serial Multiplier", Journal of Circuits, Systems and Computers, vol. 26, no. 07, p. 1750114, 2017.