

# **Design of a VLSI Router with Buffer** for Fast Data Transfer

<sup>(1)</sup>I Venkata Ganga Srujana, <sup>(2)</sup>Ch Prathibha Chowdary, <sup>(3)</sup>G Sai Sri Krishna Teja, <sup>(4)</sup>B Charan Singh, <sup>(5)</sup>B Naga Sri, <sup>(6)</sup>Dr. Y. V. NARAYANA, <sup>(7)</sup>Mr. T. ANJI REDDY.

<sup>(1)(2)(3)(4)(5)(6)(7)</sup>Department of Electronics and Communication Engineering, Tirumala Engineering College, Narasaraopet.

Abstract: **On-chip** interconnects are the performance bottleneck in modern system-on-chips. Code-division multiple access (CDMA) has been proposed to implement on-chip crossbars due to its fixed latency, reduced arbitration overhead, and higher bandwidth. In this paper, we advance overloaded CDMA interconnect (OCI) to enhance the capacity of CDMA network-on-chip (NoC) crossbars by increasing the number of usable spreading codes. Serial-OCI and P-OCI architecture variants are presented to adhere to a different area, delay, and power requirements. The parallel OCI crossbar achieves N times higher bandwidth compared with the serial OCI crossbar at the expense of increased area and power consumption. This kind of extension results in High-speed P-OCI and serial-OCI compare to proposed P-OCI and serial-OCI architectures respectively.

Keywords -Code-division multiple access (CDMA), CDMA transmitter and CDMA receiver, network-onchip (NoC), overloaded CDMA crossbar

#### **I. INTRODUCTION**

Developing effective high-performance oninterconnects has been crucial for chip the implementation of parallel and high-performance computing technologies because **ON-CHIP** communications have a significant impact on the overall area, performance, and power consumption of modern system-on-chips (SoCs). Amdahl's law states that increasing communication overhead degrades the speedup achieved by parallel computing [1]. Networkson-chips (NoCs) are the most scalable interconnection paradigm that is capable of meeting different performance requirements of heavy workloads [2], including latency via adaptive routing [3], throughput via improved path diversity [4], power dissipation by optimizing the NoC to targeted workloads [5], and flexibility by run-time configuration [6].

On-chip processing elements (PEs) are considered as network nodes connected by routers and switches, while data in NoCs are handled as packets. NoCs offer a scalable alternative to massive SoCs, but they come with high resource overheads and greater power consumption [7]. The transaction is divided into four levels by the NoC layering model: 1.application, 2.transport, 3.network, 4.physical layers [8]. The fundamental unit of the NoC physical layer is a crossbar.

A crossbar switch is a shared communication channel that uses multiple access to facilitate the interchange of physical packets. Time-division multiple access (TDMA), in which the physical link is time shared between the interconnected PEs [9], and spacedivision multiple access (SDMA), in which a dedicated link is established between every pair of interconnected PEs [10], are the primary resource sharing techniques used by current NoC crossbars. An NoC router's physical layer additionally contains storage and buffering devices [7]. Another medium sharing method that makes use of the coding space to provide simultaneous medium access is code-division multiple access (CDMA). Each transmit-receive (TX-RX) pair in a CDMA channel is given an individual bipolar spreading code, and the data spread from all transmitters is added together to create an additive communication channel. Since there is no cross correlation between orthogonal spreading codes in standard CDMA systems, the received sum can be correctly decoded by a correlator decoder at the CDMA receiver. Walsh-Hadamard orthogonal codes are used in classical CDMA systems to allow for medium sharing. For both bus and NoC interconnect topologies, CDMA has been suggested as an on-chip interconnect sharing approach [11].

Reduced power consumption, fixed communication delay, and decreased system complexity are just a few benefits of using CDMA for on-chip interconnects [12]. A CDMA switch offers a good compromise between the two since it has less wiring complexity than an SDMA crossbar and less arbitration overhead than a TDMA switch. However, the on-chip connection literature has primarily examined

fundamental aspects of the CDMA technology. Overloaded CDMA is a well-known medium access technique utilized in wireless communications where the number of users sharing the communication channel is boosted by raising the number of usable spreading codes at the expense of rising multiple-access interference (MAI) [13]. On-chip interconnects can have their interconnect capacity increased by implementing the overloaded CDMA idea. In previous work, we used the overloaded CDMA idea to CDMAbased on-chip buses and proposed two methods to boost the bus capacity by 25% and 50%, respectively: MAIbased and difference-based overloaded CDMA interconnects [14], [15]. In order to improve the CDMA router capacity by 100% at minimal cost, we in this article apply the overloaded CDMA idea to NoCs and propose an original overloaded CDMA interconnect (OCI) crossbar design.

#### **II.RELATED WORK**

Crossbar switches that use CDMA as their medium access mechanism benefit from predictable transaction latency and minimal arbitration overhead. A scalable CDMA-based peripheral bus has been proposed by Nikolic et al. [16] in order to reduce the number of PTP buses and parallel transfer lines while avoid the overhead caused by TDMA arbiters. Because fewer lines are utilized to add and transmit the data from the peripherals, this method lowers the number of pins when used at the interface connecting multiple peripherals to multiple PEs. Since peripherals typically run at lower frequencies than master PEs, the increase in transaction latency caused by data dispersion is acceptable. Crossbar switches with CDMA as their medium access method have low arbitration overhead and consistent transaction latency. Nikolic et al. [16] created a scalable CDMA-based peripheral bus that avoids the overhead caused by TDMA arbiters and reduces the number of PTP buses and parallel transfer lines. This strategy reduces the number of pins needed at the interface linking multiple peripherals to multiple PEs since fewer lines are needed to add and send the data from the peripherals. Data dispersion causes an increase in transaction latency, however this is to be expected since peripherals normally operate at lower frequencies than master PEs.

Crossbar switches that use CDMA as its medium access technique feature consistent transaction latency and low arbitration overhead. A scalable CDMA-based peripheral bus was developed by Nikolic et al. [16] that eliminates the need for PTP buses and parallel transfer lines while avoiding the overhead imposed on by TDMA arbiters. Because fewer lines are required to add and transfer the data from the peripherals, this technique lowers the number of pins required at the interface connecting multiple peripherals to multiple PEs. Transaction latency increases as a result of data dispersion, however this is to be expected given that peripherals typically operate at lower frequencies than the TDMA bus of master PEs.

In the CT-Bus, where data are multiplexed over both the time and code domains. CDMA and TDMA have been merged [12]. Since the TDMA controller must carry out arbitration every clock cycle, the CDMA bus controller is only required to assign spreading codes. whereas the CT-Bus shows that the communication overhead of CDMA is lower than that of TDMA. Because CT-Bus combines the scalability of the TDMA bus with the continuity of the CDMA channel, it performs better than its TDMA equivalent for heterogeneous traffic. In [17], a comparison between a PTP bidirectional ring-based NoC and a CDMA-based NoC indicates that the best case latency of the PTP with the same channel width and the fixed data transfer latency of the CDMA NoC are equal. The concurrent interconnect sharing by the network nodes is responsible for the fixed data transfer latency of the CDMA NoC. In [18] and [19], a hierarchical CDMA star NoC router is shown.

An in-depth understanding of variables including topology, flow management, switching methods, and routing strategies is required for the design of a NOC router. A network's topology defines the way its nodes and channels are connected. Several topologies have been proposed, including mesh, torus, star, spin, butterfly, and others. In this architecture, mesh topology is favoured because it is straightforward, simple to integrate, and serves addresses that are in line with communications. Low latency, high throughput, low power consumption, low cost, and high performance are all important features of an appropriate architecture, but achieving all of these goals is complex.





Fig. 1. Flow chart of round robin algorithm

The Round Robin (RR) algorithm is widely used in many industries for its fairness and simplicity in enhancing system performance. Paper reviews RR algorithm use in CPU scheduling and cloud computing, examining techniques used to improve the algorithm. Researchers have proposed ways to optimize the RR algorithm, including the selection of an ideal time quantum. The switching technique, which can be divided into packet switching and circuit switching, is the process of connecting a router's input and output. While packet switching transmits data as soon as it becomes available, independent of the path, circuit switching transmits data packets only once the path has been established. A routing strategy is a method that guides data packets to their destination while avoiding situations like hunger, deadlock, and live lock.

Data packet waiting indicates an incorrect flow control technique, while deadlock avoidance indicates a good one. Flow control controls the distribution of resources to data packets [20]. A data packet's transit through the router can be analyzed to determine its functionality. IJARSE ISSN 2319 - 8354





#### A. FIFO

The initial work request in a queue is treated first when using the FIFO method of work request processing. It can be implemented in hardware as a read/write memory or array of flip-flops that stores data from one clock domain and provides it to other clock domains upon request in line with FIFO logic [21,23]. Read, write, empty, full, memory map, counter, and two pointer counter blocks are all contained in a FIFO. A FIFO has two data pointers: one for writing to RAM and the other for reading from RAM. FIFO initially verifies the header bit in order to validate the presence of data. The grant signal for a particular port is used to update read and write addresses.

#### B. Arbiter

The router's control center is known as the Arbiter. The round robin scheduling algorithm, which allows the data from each FIFO buffer for a specific

period of time, is executed by the arbiter. In computer science and operating systems, the Round Robin scheduling algorithm is used to split CPU time among programs in a time-sharing way. The algorithm gives the CPU to each task in a cyclic order by dividing equal time slots for each process, hence the name Round Robin. This avoids a single process from consuming the CPU and gives each process a fair portion of CPU time. In real time systems, the Round Robin algorithm can be used when tasks must be completed in a specific period of time[24,29]. It offers a fast and simple way of distributing resources, reducing famine, and enhancing system performance in overall.

#### C. Crossbar

A crossbar is a module that has both muxes and demuxes combined in it. There is a connection between an input and an output port. This design does not have any input. Crossbar is only able to establish one link at a time. From inputs of Cin, Ein, Nin, Sin, and Win, binary outputs are Cout, Eout, Nout, Sout, and Wout. Eight bits define both the input and the output. There are four binary opportunities possible with this design since the select line is reduced to two lines. It is mentioned that buffers will help in reducing the latency[24]. So, based on this the designed is modified with an additional buffer at each output port of the router.

#### **III.METHODOLOGY**

In this section, overloaded CDMA in wireless communications and the requirements of its on-chip interconnect counterpart and preliminaries of the classical on-chip CDMA.

#### A. Overloaded CDMA in Wireless Communications:

Multiple users can share the same frequency band at once using a digital cellular technique called CDMA, or code division multiple access. While Time Division or Frequency Division Multiple Access are used by other cellular technologies, such as GSM, CDMA uses unique codes to distinguish between users. Every user has a different code sequence issued to them, and their signals spread throughout the whole frequency range. In order to separate and decode the original signal from the received mixture of signals, CDMA receivers employ correlation algorithms.



IIARSE

Compared to other technologies, CDMA networks are able to attain higher capacity and improved resilience to interference due to this signal dispersion. In 2G, 3G, and certain 4G cellular networks, CDMA has been extensively employed to provide dependable and effective communication services.

Code Division Multiple Access, or CDMA, is a digital cellular technology that permits several users to share a single frequency band at once. CDMA use unique codes to distinguish users from one another, in contrast to other cellular technologies like GSM, which use Time Division or Frequency Division Multiple Access.

Signals from each user are propagated throughout the entire frequency band, and each is allocated a unique code sequence. When a combination of signals is received, CDMA receivers use correlation algorithms to separate and decode the desired signal. As a result of this signal spreading, CDMA networks outperform competing technologies in terms of capacity and interference resistance.



Fig 4 : CDMA RECEIVER



#### Fig 5 : CDMA CROSSBAR

In computing systems, the accumulator register is essential to data exchange and transfer. The accumulator temporarily holds data being processed by the arithmetic and logic unit (ALU) during data transfer, particularly within a CPU. The accumulator makes it easier for computing systems to manipulate and transport data efficiently in both scenarios.

#### B. Classical CDMA Crossbar Switch:

Fig. 3 and Fig. 4 illustrates the high-level architecture of a CDMA-based NoC router. The physical layer of the router is based on the classical CDMA switch. Therefore, the crossbar transaction frequency ft and operating clock frequency fc are related as ft = fc/N

#### **IV. RESULTS & DISCUSSION**

The proposed simulation waveforms are shown in the figure 4. The suggested system's data will be generated by the system itself. In this case, the clock pin controls the data flow between the input and output. The clock will send data from data input (DataIn) to data output (Dataout) when it reaches "0". The clock will not

where S(i ) is an m-bit binary number representing the channel sum at the i<sup>th</sup> clock cycle, the crossbar width  $m = \log 2 M$ , d(j) is the data bit from the j<sup>th</sup> encoder, Co(j, i ) is the i<sup>th</sup> chip of the j<sup>th</sup> orthogonal spreading code, and  $\oplus$  is the XOR operation. In the ordinary CDMA crossbar, the adder has M = N - 1 input bits and  $m = \log 2 M = \log 2 N$ output bits.

IIARSE



$$S(i) = \sum_{j=1}^{m} d(j) \oplus C_o(j,i) \tag{1}$$

| PARAMETERS               | EXISTED | PROPOSED |
|--------------------------|---------|----------|
| Power Consumption<br>(W) | 0.988   | 4.420    |
| Timing Analysis (nS)     | 10.08   | 4.233    |
| Area (LUT)               | 42      | 15       |

transfer data while it is at "1". The data will be deleted whenever the value of reset (RES) is "1".



#### V. POWER AND TIME REPORT

The design is implemented in Xilinx software in Artix7, FPGA board. The board is of family XC7A200. The board has total 269200 Flipflops, 134600 LUT elements, 676 I/O Pin Count, 365 Block RAMs. Table represents the comparison of power consumption, Timing Analysis and Area occupied.

#### **VI. CONCLUSION**

The purpose of the effort is to maximize the router architecture based on VLSI performance. We can make a comparison between the suggested router model and the present router model from tables. When compared to the present model, the proposed approach improves data transmission efficiency by 5.847ns. Our design achieves the target of greater performance even if it slightly increases the power consumption. The CDMA block's architecture can be further optimized by utilizing additional approaches.

#### **VII.REFERENCES**

- K. Asanovic et al., "The landscape of parallel computing research: A view from berkeley," Dept. EECS, Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2006-183, 2006.
- [2] P. Bogdan, "Mathematical modeling and control of multifractal workloads for data-center-on-a-chip optimization," in Proc. 9th Int. Symp. Netw.-Chip, New York, NY, USA, 2015, pp. 21:1–21:8.
- [3] Z. Qian, P. Bogdan, G. Wei, C.-Y. Tsui, and R. Marculescu, "A trafficaware adaptive routing algorithm on a highly reconfigurable network-onchip architecture," in Proc. 8th IEEE/ACM/IFIP Int. Conf. Hardw./Softw. Codesign, Syst. Synth., New York, NY, USA, Oct. 2012, pp. 161–170.
- [4] Y. Xue and P. Bogdan, "User cooperation network coding approach for NoC performance improvement," in Proc. 9th Int. Symp. Netw.-Chip, New York, NY, USA, Sep. 2015, pp. 17:1–17:8.
- [5] T. Majumder, X. Li, P. Bogdan, and P. Pande, "NoCenabled multicore architectures for stochastic analysis of biomolecular reactions," in Proc. Design, Autom. Test Eur. Conf. Exhibit. (DATE), San Jose, CA, USA, Mar. 2015, pp. 1102–1107.

- [6] S. J. Hollis, C. Jackson, P. Bogdan, and R. Marculescu, "Exploiting emergence in on-chip interconnects," IEEE Trans. Comput., vol. 63, no. 3, pp. 570–582, Mar. 2014.
- [7] S. Kumar et al., "A network on chip architecture and design methodology," in Proc. IEEE Comput. Soc. Annu. Symp. (VLSI), Apr. 2002, pp. 105–112.
- [8] T. Bjerregaard and S. Mahadevan, "A survey of research and practices of network-on-chip," ACM Comput. Surv., vol. 38, no. 1, 2006, Art. no. 1.
- [9] Y. Xue, Z. Qian, G. Wei, P. Bogdan, C. Y. Tsui, and R. Marculescu, "An efficient network-on-chip (NoC) based multicore platform for hierarchical parallel genetic algorithms," in Proc. 8th IEEE/ACM Int. Symp. Netw.-Chip (NoCS), Sep. 2014, pp. 17–24.
- [10] D. Kim, K. Lee, S.-J. Lee, and H.-J. Yoo, "A reconfigurable crossbar switch with adaptive bandwidth control for networks-on-chip," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2005, pp. 2369–2372.
- [11] R. H. Bell, C. Y. Kang, L. John, and E. E. Swartzlander, "CDMA as a multiprocessor interconnect strategy," in Proc. Conf. Rec. 35th Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2001, pp. 1246–1250.
- [12] B. C. C. Lai, P. Schaumont, and I. Verbauwhede, "CTbus: A heterogeneous CDMA/TDMA bus for future SOC," in Proc. Conf. Rec. 35th Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2004, pp. 1868– 1872.
- [13] S. A. Hosseini, O. Javidbakht, P. Pad, and F. Marvasti, "A review on synchronous CDMA systems: Optimum overloaded codes, channel capacity, and power control," EURASIP J. Wireless Commun. Netw., vol. 1, pp. 1–22, Dec. 2011.
- [14] K. E. Ahmed and M. M. Farag, "Overloaded CDMA bus topology for MPSoC interconnect," in Proc. Int. Conf. ReConFigurable Comput. FPGAs (ReConFig), Dec. 2014, pp. 1–7.
- [15] K. E. Ahmed and M. M. Farag, "Enhanced overloaded CDMA interconnect (OCI) bus architecture for onchip communication," in Proc. IEEE 23rd Annu. Symp. High-Perform. Interconnects (HOTI), Aug. 2015, pp. 78–87.
  - [16] T. Nikolic, G. Djordjevic, and M. Stojcev, "Simultaneous data transfers over peripheral bus using

IJARSE ISSN 2319 - 8354

CDMA technique," in Proc. 26th Int. Conf. Microelectron. (MIEL), May 2008, pp. 437–440.

- [17] X. Wang, T. Ahonen, and J. Nurmi, "Applying CDMA technique to network-on-chip," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 10, pp. 1091–1100, Oct. 2007.
- [18] D. Kim, M. Kim, and G. E. Sobelman, "CDMAbased network-on-chip architecture," in Proc. IEEE Asia–Pacific Conf. Circuits Syst., vol. 1. Dec. 2004, pp. 137–140.
- [19] D. Kim, M. Kim, and G. E. Sobelman, "Design of a high-performance scalable CDMA router for on-chip switched networks," in Proc. Int. SoC Des. Conf, Nov. 2005, pp. 32–35.
- [20] Venkataraman, N. L., Rajagopal Kumar, and P. Mohamed Shakeel. "Ant lion optimized bufferless routing in the design of low power application specific network on chip." Circuits, Systems, and Signal Processing 39.2 (2020): 961-976.
- [21] S. Madhavan and H. P. V, "Design and Verification of 1X5 ROUTER," 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India, 2022, pp. 1-6, doi: 10.1109/MysuruCon55714.2022.9972633.
- [22] J. A. Williams, N. W. Bergmann and X. Xie, "FIFO communication models in operating systems for reconfigurable computing," 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05), Napa, CA, USA, 2005, pp. 277-278, doi: 10.1109/FCCM.2005.35.
- [23] J. T. Han, Y. Guan and Z. Dai, "Implementation of SoC-PC Communication Interface Based on USB2.0,"
  2009 International Conference on New Trends in Information and Service Science, Beijing, China, 2009, pp. 831-834, doi: 10.1109/NISS.2009.53.
- [24] M. Oveis-Gharan and G. N. Khan, "Index-Based Round-Robin Arbiter for NoC Routers," 2015 IEEE Computer Society Annual Symposium on VLSI, Montpellier, France, 2015, pp. 62-67, doi: 10.1109/ISVLSI.2015.27.
- [25] A. Mangukia, M. Ibrahim, S. Golamudi, N. Kumar and M. Anand Kumar, "Improved Variable Round Robin Scheduling Algorithm," 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2021, pp. 1-7, doi: 10.1109/ICCCNT51525.2021.9579716.

- [26] W. Ullah and M. A. Shah, "A novel resilent round robin algorithm based CPU scheduling for efficient CPU utilization," Competitive Advantage in the Digital Economy (CADE 2022), Hybrid Conference, Venice, Italy, 2022, pp. 41-48, doi: 10.1049/icp.2022.2038.
- [27] Hassan, Syed Minhaj, and Sudhakar Yalamanchili. "Centralized buffer router: A low latency, low power router for high radix nocs." 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS). IEEE, 2013.
- [28] B. Zhao, Y. Zhang and J. Yang, "A speculative arbiter design to enable high-frequency many-VC router in NoCs," 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS), Tempe, AZ, USA, 2013, pp. 1-8, doi: 10.1109/NoCS.2013.6558415.
- [29] G. Xiaopeng, Z. Zhe and L. Xiang, "Round Robin Arbiters for Virtual Channel Router," The Proceedings of the Multiconference on "Computational Engineering in Systems Applications", Beijing, China, 2006, pp. 1610-1614, doi: 10.1109/CESA.2006.4281893.