# Implementation Of MAC Unit For Artificial Neural

Network Architecture using Verilog HDL

Mr. G. V. HANUMAN, M. Tech<sup>(1)</sup>, B Subhashini<sup>(2)</sup>, SK Ayeesha<sup>(3)</sup>, SK Abdul Kalam<sup>(4)</sup>, SK Asha<sup>(5)</sup>

<sup>(1)</sup> Assistant Professor, Department of Electronics and communication Engineering, Tirumala Engineering College, <sup>(2), (3), (4), (5)</sup> Department of Electronics and communication Engineering, Tirumala Engineering college, JNTUK

#### KAKINADA, India

Abstract: The processing unit is the most essential part in an artificial neural network. The processing unit performs the complex parallel computations that are important for the efficient working of the neuron which along with the activation unit makes up an artificial neural network . In an existing system, the processing MAC unit was designed by Booth multiplier and carry look ahead adder. The existing processing unit provides delay and consumes more area and power. In our work we have proposed an efficient MAC (Multiplication and Accumulation) unit which can be implemented as the processing unit in an ANN. It makes use of Vedic Multiplier with Square Root Carry Select Adder .Our design attempts to be superior than the current implementation as far as area and delay are concerned .An efficient MAC processing unit can improve the speed of ANN to a larger extent. In order to fix those drawbacks, a new processing unit was created. The suggested approach overcomes the shortcomings of the current system and improves network performance overall. The Verilog HDL design language was used to create and implement our proposed MAC unit, and the results were analyzed.

Keywords—Artificial neural network(ANN), MAC, Vedic multiplier, SQRT-CSLA, Booth multiplier, Verilog HDL.

#### **I.INTRODUCTION**

The MAC stands for the fundamental arithmetic tasks, the multiply-accumulate component is indeed the essential block of several digital signal processing (DSP) systems. For any realtime processing, great capacity and speed are required. The MAC unit is used to find out the energy conserved by the system and also used to find out the speed of the system. MAC unit is playing an important role in the signal processing unit. To use DSP in the future, it is essential in developing high-speed, lowpower MAC. WSN (Wireless Sensor Network) is a network of television stations that broadcast. As we all know, essential processes in digital signal processing generally entail a lot of multiplies, adding, and accumulates. In order to accomplish high performance digital signal processing, an essential part is the

high-speed multiplier accumulator (MAC) unit for actual signal processing.

The multiply-accumulate unit (MAC) action adds the product of the two integers to an accumulator. The three layers that make up Convolutions Neural Networks (CNN) processing are a convolution layer, a pooling layer, and a fully connected layer.

Basic MAC operations make up the convolution layer, which takes up the majority of the execution time. If we are going to study the MAC unit, we must first understand all of the basic ideas of ANN (Artificial Neural Network) and DNN (Deep Neural Network). The purpose of that work is to develop and build a multiplier accumulator (MAC) unit for high-speed digital Processing.

ANN can be dividing into feed forward and feedback network. In the feed forward network the input is directly feed to processing unit, after the completion of process forward to the output unit. The operation of the feed forward network shows the output is purely depends on present input only, not a previous one. But the feedback network is differ from feed forward, the output of the feedback network is depends on past output also. The output of the previous stage is taken as the feedback and given to the input unit. Application of the feed forward Networks is to develop nonlinear models that are used for pattern recognition and classification

#### **II. LITERATURE REVIEW**

G. Raut et al. proposed RECON for a neuron architecture, a valuable resource and adaptable CORDIC based architecture. Configuration is possible with the CORDIC based architecture; thus, a single block can produce both MAC and a number of activation functions

Vamsi and Ramesh asserted that the MAC unit formed with the design multiplier might be employed in DSP applications to boost efficiency and pace. This idea has the potential to grow in the future by using reversible logic gates instead of

## International Journal of Advance Research in Science and Engineering Volume No. 13, Issue No. 05, May 2024 www.ijarse.com ISSN 2319 - 8354

multipliers to get even more power savings and improved performance.

Yuvaraj et al.introduced the Sampoornam, a single integrated multiplier having a unique logic block that utilizes all of Vedic mathematic multiplying modules to produce better time delay performance.

Langer explored the efficacy of deep neural networks (DNNs) using sigmoid activation function in this paper. DNNs have been demonstrated to estimate any d-dimensional, smooth function on a compressed set at a pace of order Wp/d, here, W denotes the number of nonzero weights and p presents smoothness of function. Consequently, only a subset of DNNs with sparse connections benefit from these rates.

Simonyan and Zisserman investigated the impact of the complexity of a convolutional network on its effectiveness in large-scale image identification.

Antony et al. suggested that the use of Verilog HDL to develop a fast-paced Vedic multiplier in light of the Urdhva Triyakbhyam Sutra. Partial product culmination is accomplished using high-speed MUX-based complete adders. In comparison to existing traditional Vedic multipliers, the proposed device has a significantly shorter delay. In the future, its efficiency in the MAC unit and ALU may be analyzed and compared to other traditional and Vedic designs

## **III.PROCESSING UNIT(MAC)**

MAC is the collection of an adder, multiplier, and accumulator. We get the input of the multiplier and accumulator from the memory location and then transfer it into the multiplier factor block, which does the multiplication operation. After that data send to the adder and then it accumulates all data and then the data is stored in the memory location. The entire process gets into a single clock. MAC unit architecture. The basic MAC unit architecture is presented in the Fig.1



Fig.1 Multiply and Accumulate (MAC)

IJARSE

#### a. Vedic Multiplier

Multipliers and adders play a vital role in determining the performance of FIR filter. They have proposed modified Annuprya vedic multiplier methods with Kogge Stone fast adder for implementation in the direct form FIR filter. Multipliers play a major role in today's digital signal processing and various other applications. Both signed and unsigned multiplications are required in many computing applications. This work proposes the design of efficient signed multiplier using Vedic mathematics.



Figure.2.Block Diagram 8\*8 Vedic multiplier

Figure. 2. Shows the architecture of 8-bit Vedic multiplier. It was designed by four 4 x 4 Vedic multiplier, each multiplier perform the operation separately[11]. Partial products are added by 8-bit SQRT-CSLA; finally get a 16bit multiplication output. The efficient Vedic multiplication technique is used.

The 8-bit Vedic multiplier is designed by using four 4x4 Vedic multiplier and square root carry select adder (SQRTCSLA)[11]. The 8-bit input sequence is divided into two 4-bit numbers. Input to the 4-bit multiplier are a[7:4] & b[7:4], a[3:0] & b[7:4], a[7:4] & b[3:0], a[3:0] & b[3:0]. Intermediate partial products output are added using the three modified adder, named as SQRT-CSLA.

## b. SQRT-CSL Adder

Carry propagation delay and low complexity are recognized as high potential in every addition circuit[10]. To achieve an efficient output, the proposed SQRT-CSLA structure has designed. SQRT-CSLA adder circuit is classified into two types based on selecting the carry inputs. a) Dual RCA based SQRT CSLA; b) BEC based SQRT CSLA.

## International Journal of Advance Research in Science and Engineering Volume No. 13, Issue No. 05, May 2024 www.ijarse.com ISSN 2319 - 8354

In the dual RCA (Ripple Carry Adder) based SQRT CSLA circuit, each group has dual RCA pair for providing carry select signals. RCA circuit would be more disadvantageous due to the increasing propagation delay. To overcome the problem, Binary to Excess 1 converter circuit has been suggested in the SQRT-CSLA adder.

Figure. 3. shows the Architecture of BEC Based SQRT CSLA, it contain BEC, RCA and mux. Half adders, full adders and multiplexers are used for providing partial product addition results. BEC circuits are used to provide same RCA functions, but have different architectures with less gate count.



Figure.3.Architecture of BEC based SQRT CSLA

## **IV. SIMULATION RESULTS**

Simulation was done by using the ModelSim XE III 6.3c simulator. Parameters like area delay and power can be analyzed by using Xilinx ISE 10.1 simulator. Output of the Vedic multiplier is same as other multiplier, compared to the other multiplier speed and accuracy of the Vedic multiplier is higher. The results are shown in the figure 6 contains different combination of inputs, based on the input it produced the output.



Figure.4. Simulation Output

## **V.PERFORMANCE EVALUATION**

IJARSE

| Parameters | Existing Method | Proposed Method |
|------------|-----------------|-----------------|
| LUT        | 717             | 3               |
| Slices     | 397             | 3               |
| Delay(ns)  | 19.1            | 2.321           |

Table.1.Comparision of area and delay between Existing and proposed system.

## VI.CONCLUSION AND FUTURE SCOPE

Artificial Neural Networks are used in many applications, to analyze the methodology. MAC unit is one of the processing units in the artificial neural network. MAC decides the output function is efficient or not. So designed a new MAC unit with the help of Vedic multiplier with SQRTCSLA. It produced the accurate and efficient output, compared to the existing booth multiplier with carry look ahead adder. Our proposed MAC increases the speed of the neural network

The MAC operation is performed well, entire network performance also increased. n the parameter of resource utilization report, on chip power reports and critical delay reports. From the above table it is clear that as the number of precision increases by 4bit, 8-bit, 12-bit and 16-bit the physical parameters of Resources Utilization Reports like Logic Slices, Slices LUTs, Slices Registers and DSPs are also increases. In the table, on chip power reports the Logic Power, Signal power and I/O power are increases as the number of decision increases. The circuit's critical delay report shows that Input delay is same for the 4bit, 8-bit and 12-bit precision but change (increase) for 16-bit precision. Although other parameters like Path Delay, Logic Delay and Route Delay have nearly same value for 4-bit, 8- bit and 12bit but has different value for the 16-bit precision.

This MAC unit is also verified on the Zybo board for the different precision and performs better as the results shown in the table. This multiplier performs with less delay, high computational speed, and consumption of low power. The resource utilization will decrease on the FPGA board if the number of components is reduced fatherly. If we will be able to use this design as an image processing unit, there will be an oversized scope to implement image processing algorithms. If machine learning techniques are included in the future, then these varieties of multiplication algorithms will be very helpful to implement a few specific tasks.

#### International Journal of Advance Research in Science and Engineering Volume No. 13, Issue No. 05, May 2024 www.ijarse.com ISSN 2319 - 8354

In future the multiplier circuit is designed by using Reversible logic gates. It consumes less power compared to our ordinary logic gates. So this technique is applied to the neural network, get a better results.

#### REFERENCES

[1] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Lowpower digital signal processing using approximate adders," IEEE Trans. Comput. -Aided Design Integr. Circuits Syst., vol. 32, no. 1,pp.124-137, Jan.2013.

[2] E. J. King and E. E. Swartzlander, Jr., "Data-dependent truncation schme for parallel multipliers," in Proc.31st Asilomar Conf. Signals, Circuits Syst., Nov.1998, pp.1178-1182

[3] K. -J. Cho, K. -C. Lee, J. -G. Chung, and K. K. Parhi, "Design of low -error fixed -width modified booth multiplier," IEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522 -531, May 2004.

[4] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, "Bio -inspired imprecise computational blocks for efficient VLSI implementation of soft -computing applications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 850 -862, Apr. 2010.

[5] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, "Design and analysis of approximate compressors for multiplication," IEEE Trans.Comput., vol. 64, no. 4, pp. 984 – 994, Apr. 2015.

[6] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim,"Energy -efficient approximate multiplication for digital signal processing and classification applications," IEEE

[7] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, "Design -efficient approximate multiplication circuits through partial product perforation," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 10, pp. 3105 - 3117,

[8] P. Kulkarni, P. Gupta, and M. D. Ercegovac, "Trading accuracy for power in a multiplier architecture, " J. Lo Power Electron., vol. 7, no. 4, pp. 490 – 501, 2011.

[9] C.-H. Lin and C. Lin, "High accuracy approximate multiplier with error correction," in Proc. IEEE 31st Int. Conf. Comput. Design, Sep. 2013, pp. 33 - 38.

[10] C. Liu, J. Han, and F. Lombardi, "A low -power, high performance approximate multiplier with configurable partial error recovery," in Proc.Conf. Exhibit. (DATE), 2014, pp. 1-4

IJARSE

[11] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, "MACACO: Modeling and analysis of circuits for approximate computing," in Proc.IEEE/ACM Int. Conf. Comput. -Aided Design (ICCAD), Oct. 2011, pp. 667 - 673.

[12] J. Liang, J. Han, and F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders," IEEE Trans.Comput., vol. 63,no. 9, pp. 1760 –1771, Sep. 2013.

[13] S. Sumanet al., "Image enhancement using geometric mean filter and gamma correction for WCE iamges," in Proc. 21st Int. Conf., Neural Inf. Process. (ICONIP), 2014, pp. 276 -283.



#### Mr.G.V.HANUMAN, M.Tech working as

Assistant Professor in the department of Electronics and Communication Engineering, Tirumala Engineering College, Jonnalagadda, Narasaraopet, Dalmadu Diatriat







SK Ayeesha, Student in the department of Electronics and Communication Engineering, Tirumala Engineering College, Jonnalagadda, Narasaraopet, Palnadu District.

## International Journal of Advance Research in Science and Engineering Volume No. 13, Issue No. 05, May 2024 www.ijarse.com



SK Abdul Kalam, Student in the department of Electronics and Communication Engineering, Tirumala Engineering College, Jonnalagadda, Narasaraopet, Palnadu District.



SK Asha, Student in the department of Electronics and Communication Engineering, Tirumala Engineering College, Jonnalagadda, Narasaraopet, Palnadu District. **IJARSE** 

ISSN 2319 - 8354