## International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 www.ijarse.com IJARSE ISSN: 2319-8354

### Implementation of New Reconfiguration Arithmetic Units for Approximate Addition

### Ishrathunnisa Begum, T. Srivani

<sup>1</sup> Pursuing M.Tech (DSCE) from Sri Visvesvaraya Institute of Technology & Science,

Chowderpally, Devarkadra, Mahabubnagar

<sup>2</sup>Working as Assistant professor (ECE) from Sri Visvesvaraya Institute of Technology & Science,

Chowderpally, Devarkadra, Mahabubnagar

#### **ABSTRACT**

The domain of fairly accurate estimating has collected significant attention from the research union in the previous years, especially in the various signal processing applications. Image and video compression algorithms, like as JPEG, MPEG, etc, are mostly striking objects for approximate estimating, because they are tolerant of estimating indistinctness because of individual invisibility, which would be developed to recognize extremely power-proficient implementations of these algorithms. However, surviving fairly accurate structures commonly join the stage of hardware approximation fixed and are not acclimatize to excite data. This thesis descripts this difficulty by projecting a reconfigurable fairly accurate structural design for MPEG encoders that minimizes power utilization with the objective of sustaining an exact Peak Signal-to-Noise Ratio (PSNR) threshold for whichever video. Toward this finish, we propose reconfigurable adder/subtractor blocks (RABs), which have the capability to change their amount of approximation, and finally combine these obstructs in the movement evaluation and discrete cosine transform modules of the MPEG encoder. Practical outcomes illustrate that our way of energetically changing the amount of hardware approximation supported on the excite video respects the given quality bound (PSNR humiliatation of 1%–10%) transversely unusual videos whereas reaching a power saving up to 38% above a usual nonapproximated MPEG encoder architecture. It can be easily developed to other DSP applications.

Keywords- Approximate circuits, Peak Signal-to-Noise Ratio (PSNR), Reconfigurable adder/subtractor blocks (RABs).

### **I.INTRODUCTION**

Introducing a limited amount of computing imprecision in image and video processing algorithms often results in a ne gligible amount of observable visual change in the output, which makes these algorithms as ideal objects for the use of approximate estimating architectures. Approximate computing architectures achieve the fact that a small recovering in output exactness can react in significantly easier and lower power implementations. On the other hand, mainly fairly accurate hardware structures projected so far endure from the constraint that, for usually fluctuating excite parameters, it happens to very inflexible to give a quality bound on the output, and in some cases, the response feature may be severely degraded. The major motivation for this response feature deviation is that the degree of approximation (DA) in the hardware structure is predetermined statically and would not be

### International Journal of Advance Research in Science and Engineering 4 Volume No.06, Issue No. 10, October 2017 www.ijarse.com ISSN: 2319-8354

customized for different inputs. One probable key is to accept a conservative approach and use a very low DA in the hardware so that the output accuracy is not highly affected. On the other hand, such a conservative approach will, as expected, highly impact the power reserves as well. This paper follows an unusual mode to addressing this difficulty by energetically reconfiguring the estimated hardware structure based on the excites. Specifically, this paper makes the following improvements.

- 1) We show that, for a predetermined stage of hardware estimation in an MPEG encoder, the response feature changes commonly transversely unusual videos, often departing under allowable bounds. This illustrates that situating the stage of hardware estimation statically is deficient.
- 2) We review, for the foremost time, the use of dynamically reconfigurable estimated hardware structures that modify the DA during run-time beyond multiple computational cycles, based on the inputs. In the direction of this finish, we project the design of reconfigurable adder/subtractor blocks (RABs) for four generally utilized adder structures, viz., ripple carry adder (RCA), carry lookahead adder (CLA), carry bypass adder (CBA), and carry select adder (CSA), and finally integrate them into the MPEG encoder to enable quality configurable ex-
- 3) We propose a design methodology to prepare the DA energetically supported on the video characteristics with the objective of ensuring that output quality is within a specified bound.
- 4) We have employed the projected structure for an MPEG encoder on an Altera DE2 field-programmable gate array (FPGA) board and calculated it using eight benchmark videos. Our practical results illustrate that the projected structure results in power savings equivalent to a baseline approach that uses fixed approximate hardware while respecting quality constraints beyond different videos.

The remains of this thesis is managed as follows. Section II discusses an account of related work in the domain of approximate computing. Section III explained the projected reconfigurable approximate architecture for MPEG encoding. Section IV presents the results obtained through hardware implementation for our design, and Section V concludes this paper.

#### **II.RELATED WORK**

There has been a lot of effort in designing energy-efficient video compression schemes. A lot of them are associated to the exact case of an MPEG encoder. Different schemes of power-diminution comprise algorithmic modifications, voltage over-scaling, and approximate computation of metrics. The preface of approximate estimating techniques has released up completely new chances in building low-power video compression structures. Fairly accurate estimating methods produce a huge quantity of power reserves by presenting a small amount of error or inaccuracy into the logic block. Unusual ways for approximation consist of fault preface during voltage overscaling, intelligent logic manipulation, and trail generalization using don't care-based optimization techniques. The methods introduce inaccuracy by substituting adders with their fairly accurate counterparts. The estimated adders are accessed by wisely removing some of the transistors in a parallel adder. An important point to note is that these fairly accurate trails are hardwired and cannot be changed exclusive of resynthesizing the whole circuit. There also exist instances of approximations suggested in an MPEG encoder. Most of them utilize the intrinsic fault flexibility of the motion estimation (ME) algorithm, which outcomes in small quality humilia-

**IIARSE** 

### International Journal of Advance Research in Science and Engineering 4 Volume No.06, Issue No. 10, October 2017 www.ijarse.com ISSN: 2319-8354

tation. For instance, Moshnyaga et al. use a bitwidth compression procedure to diminish power utilization of video frame memory. An adaptive bit masking process is proposed, wherever the instigators recommend to trim the pixels of the present and earlier edges involved for ME depending upon the quantization step. On the other hand, such a common-graine excite truncation is relevant simply to the specific case of ME and provides unacceptable outcomes for additional obstructs, such as discrete cosine transform (DCT), which involves a finer directive over fault.

In this thesis too intends in approximating the adders of the ME and DCT blocks of an MPEG encoder. However, this paper suggests the perception of dynamically reconfigurable approximation, which, as we will show, helps in maintaining better control over application-level eminence metrics whereas concurrently obtaining the power utilization benefits of hardware approximation. Our projected procedure can automatically regulate the extent of hardware estimate energetically supported on the video characteristics. In extension, such dynamic reconfiguration also gives users with a control knob for changing the response feature of the videos and the power utilization for the battery-powered multimedia devices.

When we compared with introductory version of this paper, a number of additional features as described here. We improve the heuristics for modulating the DA of the reconfigurable hardware blocks by adding the feature of most significant bit (MSB) truncation, which progresses the energy-feature exchange through the video encoding procedure. We too enlarge the RAB to comprise three extra adder structures, viz., CLA, CBA, and CSA. In addition, for the carry lookaheadbased RAB, we propose dual-mode carry lookahead and propagate-generate blocks as its component basic building blocks. Lastly, we present a relative reading of the power utilization of the unusual RABs and too illustrate how the DA is involuntarily balanced across different frames during runtime.

#### III. PROPOSED ARCHITECTURE

This section represents the different steps occured in designing our projected reconfigurable structure and how it was embedded within the MPEG encoder.

### **Reconfigurable Adder/Subtractor Blocks:**

Active variation of the DA can be done when each of the adder/subtractor blocks is equipped with one or more of its approximate models and this is able to switch between them as per requirement. This reconfigurable architecture can cover any approximate description of the adders/subtractors. It was proposed six different kinds of approximate circuits for adders. On the other hand, it too wants to be ensured that the additional area overheads required for designing the reconfigurable estimated circuits are minimal with sufficiently large power savings. As examples, we have chosen the two most naive methods presented, namely, truncation and approximation 5, for approximating the adder/subtractor blocks. The latter one would too be explained as an improved description of truncation as it just relays the two 1-bit inputs, one as Sum and the other as Carry Out (Choice 2). In case A, B, and C in are the 1-bit inputs to the full adder (FA), then the responses are Sum = B and Cout = A. The resultant truth-table represents that the responses are correct for more than half of all input combinations, thus proving to be a better approximation mode than truncation.

**IJARSE** 

## International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

The projected process substitutes every FA cell of the adders/subtractors with a dual-mode FA (DMFA) cell (Fig. 1) in which every FA cell can perform either in fully correct or in various approximation mode based on the state of the control signal APP. A logic high value of the APP signal denotes that the DMFA is performting in the approximate mode. We called these adders/subtractors as RABs. It is essential to reminder that the FA cell is power-gated when performing in the approximate mode.

Our practicals have shown a negligible difference in the power utilization of DMFA when performed in either of the two approximation modes. Hence, exclusive of any failure of generality, approximation 5 was chosen for its higher probability of providing the correct output result than truncation, which regularly outputs 0 irrelevant of the input. Fig. 1 represents the logic obstruct diagram of the DMFA cell, which substitutes the element FA cells of an 8-bit RCA, as illustrated in Fig. 2. In extension, it also contains the approximation controller for generating the appropriate select signals for the multiplexers.



Fig. 1. 1-bit DMFA

A multimode FA cell can give even a recovered substitute to the DMFA from the position of controlling the approximation magnitude. On the other hand, it also enhances the difficulty of the decoder block employed for declaring the right select signals to the multiplexers and the logic transparency for the multiplexers themselves. This weakens the main objective as mainly of the power reserves that we acquire from approximating the bits are lost. Instead, the two-mode decoder and the 2:1 multiplexers have insignificant transparency and also give adequate command over the estimate degree.

### DMFA Overhead:

The power gating transistor and the multiplexers of the DMFA are designed to obtain the least possible overhead. Our parcticals show that switching power of the CMOS transistors provides toward mainly of the entire power utilization of the FA and DMFA blocks. It shows that the power increases by  $0.21~\mu W$  when we perform DMFA in accurate mode as compared with the original FA block. This difference in power can be associated mainly to the increase in load capacitance of the FA block because of the addition of the input capacitance of the interfaced multiplexers. A small segment of the whole power is provided by the additional switching of the multiplexers. It also represents that the power consumed during DMFA approximate mode is almost negligible when compared with the accurate mode, which is because of the power gating of the FA obstruct by the pMOS transistor, as illustrated in Fig. 1. Reduction in the input switching activity of the multiplexers is also a secondary matter for this small amount of power. The additional overhead due to switching of the power gating transis-

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

tor can be neglected, since its switching activity is very small because of the nature of our controling algorithms. This is primarily due to the spatial and chronological locality of the pixel values across consecutive frames.



Fig. 2. 8-bit reconfigurable RCA block



Fig. 3. 1-bit dual-mode carry propagate generate blocks

The perception of RAB can also be continued to extra adder structures too. Adder architectures, such as CBA and CSA, which too comprise FA as the essential building block, can be made accuracy configurable by direct substitution of the FAs with DMFAs. Other varieties, like CLA and tree adders, use unusual kinds of carry propagate and generate blocks as their basic building units, and hence require some additional changes to operate as RABs. As an example, we designed a 16-bit CLA consisting of four unusual kinds of basic blocks (Fig. 4) depending upon the presence of sum (S), Cout, carry propagation (P), and carry generation (G) at unusual stages. We concentrate on the basic obstructs there at the initial (or lowermost) stage of a CLA, which have excites coming in straight, as carry lookahead blocks, CLB1 and CLB2. The distinction along with them being that CLB1 gives an additional Cout signal compared with CLB2. Their respective dual-mode versions, DMCLB1 and DMCLB2, have both S and P approximated by input operand B and both Cout and G approximated by input operand A, as illustrated in Fig. 3. The fundamental obstructs there at the higher levels of CLA hierarchy are shown as propagate and generate blocks, PGB1 and PGB2. In this case, PGB1 gives an extra Cout output as compared with PGB2. As illustrated in Fig. 3, the configurable dual-mode versions, DMPGB1 and DMPGB2, use inputs PA and GB as approximations for outputs P and G, respectively, when operating in the approximate mode. These approximations were selected experimentally ensuring that the ratio of the possibility of exact output to the additional circuit overhead for each of the blocks is large. Table 1 declares the outputs of each of the dual-mode blocks when performing in either accurate or approximate mode.

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 ... IJARSE

www.ijarse.com



Fig. 8. 8-bit reconfigurable CLA block

| Basic Block     | Outputs for APP = 0                | Outputs for APP = 1     |  |
|-----------------|------------------------------------|-------------------------|--|
| (adder type)    | (accurate mode) (approximate mo    |                         |  |
| DMFA            | $S = A \oplus B \oplus C_{in}$     | S = B                   |  |
| (RCA, CBA, CSA) | $C_{out} = AB + BC_{in} + AC_{in}$ | $C_{out} = A$           |  |
| DMCLB1          | $P = A \oplus B$                   | P = B                   |  |
| (CLA)           | G = AB                             | G = A                   |  |
|                 | $S = P \oplus C_{in}$              | S = B                   |  |
|                 | $C_{out} = G + PC_{in}$            | $C_{out} = A$           |  |
| DMCLB2          | $P = A \oplus B$                   | P = B                   |  |
| (CLA)           | G = AB                             | G = A                   |  |
|                 | $S = P \oplus C_{in}$              | S = B                   |  |
| DMPGB1          | $P = P_A P_B$                      | $P = P_A$               |  |
| (CLA)           | $G = G_B + G_A P_B$                | $G = G_B$               |  |
|                 | $C_{out} = G + PC_{in}$            | $C_{out} = G + PC_{in}$ |  |
| DMPGB2          | $P = P_A P_B$                      | $P = P_A$               |  |
| (CLA)           | $G = G_B + G_A P_B$                | $G = G_B$               |  |

Table 1 Dual-mode block outputs for accurate and approximatemodes

For a reconfigurable CLA, DMCLB1 and DMCLB2 blocks are approximated in respective with the DA. However, the DMPGB1 and DMPGB2 blocks are approximated only when each and every DMCLB1, DMCLB2, DMPGB1, and DMPGB2 block, which exists to the transitive fan-in cones of the concerned block, is approximated. Otherwise, the obstruct is performed in the accurate mode. For example, any DMPGB block at the second level of CLA can be made to perform in approximate mode, if and only if, both of its constituent DMCLB1 and DMCLB2 blocks are operating in the approximate mode. Similar protocol is occured for the blocks residing at higher levels of the tree, where each DMPGB block can be approximated only when both of its constituent DMPGB1 and DMPGB2 blocks are approximated. This architecture can be easily deduced to other similar type CLAs, such as Kogge–Stone, Brent–Kung, Manchester-carry chain, and so on.

This can be associated to the architecture of the carry save adders, where approximating each bit in the MSB outcomes in power gating of two FAs compared with one FA when the LSBs are approximated. The basic error flexibility shown by the ME and the small inputs to the DCT block give sufficient opportunities for achieving a high DA (much greater than 5) and thereby high power savings.

ISSN: 2319-8354

## International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

### IV. SYNTHESIS AND SIMULATION RESULTS

The projected reconfigurable CLA unit is designed with the XILINX ISE 14.5 simulation tool and executed with Verilog HDL. The RTL diagram and simulation results are displayed below.



Fig: Top level schematic diagram



Fig: Internal architectures of RTL diagram

# International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

| topcla Project Status (06/09/2017 - 10:41:10) |                           |                       |             |  |  |  |  |
|-----------------------------------------------|---------------------------|-----------------------|-------------|--|--|--|--|
| Project File:                                 | Videoenc.xise             | Parser Errors:        | No Errors   |  |  |  |  |
| Module Name:                                  | topcla                    | Implementation State: | Synthesized |  |  |  |  |
| Target Device:                                | xc7z010-2dg400            | • Errors:             | No Errors   |  |  |  |  |
| Product Version:                              | ISE 14.5                  | • Warnings:           | No Warnings |  |  |  |  |
| Design Goal:                                  | Balanced                  | Routing Results:      |             |  |  |  |  |
| Design Strategy:                              | Xilinx Default (unlocked) | Timing Constraints:   |             |  |  |  |  |
| Environment:                                  | System Settings           | • Final Timing Score: |             |  |  |  |  |

| Device Utilization Summary (estimated values) |      |           |             |  |
|-----------------------------------------------|------|-----------|-------------|--|
| Logic Utilization                             | Used | Available | Utilization |  |
| Number of Slice LUTs                          | 54   | 17600     | 0%          |  |
| Number of fully used LUT-FF pairs             | 0    | 54        | 0%          |  |
| Number of bonded IOBs                         | 32   | 100       | 32%         |  |

| Detailed Reports |         |                          |        |          |                | [-] |
|------------------|---------|--------------------------|--------|----------|----------------|-----|
| Report Name      | Status  | Generated                | Errors | Warnings | Infos          |     |
| Synthesis Report | Current | Fri 9. Jun 10:41:07 2017 | 0      | 0        | 1 Info (0 new) |     |

Fig: Synthesis report



Fig: Simulation result

### International Journal of Advance Research in Science and Engineering 4 Volume No.06, Issue No. 10, October 2017 www.ijarse.com

**IIARSE** ISSN: 2319-8354

### V. CONCLUSION

In this thesis, we have proposed a reconfigurable estimated structure for the MPEG encoders that minimize power consumption while maintaining output quality across different input videos. The projected structure is supported on the perception of energetically reorganizing the stage of approximation in the hardware supported on the excite characteristics. It needs the user to specify only the overall minimum quality for videos instead of having to decide the stage of hardware approximation. Synthesis and Simulation study in Xilinx 14.5 software using Verilog HDL verifies the projected structure outcomes in power savings equivalent to a baseline approach that uses fixed approximate hardware while respecting quality constraints across different videos.

### VI. FUTURE SCOPE

The incorporation of other approximation techniques and extending the approximations to other arithmetic and functional blocks for video encoding in future.

#### REFERENCES

- [1] M. Elgamel, A. M. Shams, and M. A. Bayoumi, "A co parative analysis for low power motion estimation VLSI architectures," in *Proc. IEEE Workshop Signal Process. Syst. (SiPS)*, Oct. 2000, pp. 149–158.
- [2] F. Dufaux and F. Moscheni, "Motion estimation techniques for digital TV: A review and a new contribution," *Proc. IEEE*, vol. 83, no. 6, pp. 858–876, Jun. 1995.
- [3] I. S. Chong and A. Ortega, "Dynamic voltage scaling algorithms for power constrained motion estimation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol. 2. Apr. 2007, pp. II-101-II-104.
- [4] I. S. Chong and A. Ortega, "Power efficient motion estimation using multiple imprecise metric computations," in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2007, pp. 2046–2049.
- [5] D. Mohapatra, G. Karakonstantis, and K. Roy, "Significance driven computation: A voltage-scalable, variation-aware, quality-tuning motion estimator," in Proc. 14th ACM/IEEE Int. Symp. Low Power Electron. Design (ISLPED), 2009, pp. 195-200.
- [6] J. George, B. Marr, B. E. S. Akgul, and K. V. Palem, "Probabilistic arithmetic and energy efficient embedded signal processing," in Proc. Int. Conf. Compil., Archit., Synth. Embedded Syst. (CASES), 2006, pp. 158-168.
- [7] D. Shin and S. K. Gupta, "A re-design technique for datapath modules in error tolerant applications," in Proc. 17th Asian Test Symp. (ATS), 2008, pp. 431–437.
- [8] S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, and
- A. Raghunathan, "SALSA: Systematic logic synthesis of approximate circuits," in Proc. 49th Annu. Design Autom. Conf. (DAC), Jun. 2012, pp. 796-801.
- [9] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and
- K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," in Proc. 17th IEEE/ACM Int. Symp. Low-Power Electron. Design (ISLPED), Aug. 2011, pp. 409-414.

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 IJARSE

www.ijarse.com

[10] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Lowpower digital signal processing using approximate adders," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 32, no. 1, pp. 124–137, Jan. 2013.

### **AUTHOR DETAILS**



**ISHRATHUNNISA BEGUM**, pursuing M.Tech (DSCE) from Sri Visvesvaraya Institute Of Technology & Science, Chowderpally (Vill), Devarkadra (Mdl), Mahabubnagar (Dist), TS, INDIA.



**T. SRIVANI**, working as Assistant professor (ECE) from Sri Visvesvaraya Institute Of Technology & Science, Chowderpally (Vill), Devarkadra (Mdl), Mahabubnagar (Dist), TS, INDIA.

ISSN: 2319-8354