International Journal of Advance Research in Science and Engineering 4 Volume No.06, Issue No. 10, October 2017 www.ijarse.com

### **Design of Hybrid Hardware Blocks** For FPGA Architectures

A. Venkateswari<sup>1</sup>, K. Rajesh Kumar<sup>2</sup>

<sup>1</sup>Pursuing M.Tech (DSCE) from Sri Visvesvaraya Institute of Technology & Science, Chowderpally, Devarkadra, Mahabubnagar <sup>2</sup>Working as Assistant professor (ECE) from Sri Visvesvaraya Institute of Technology & Science, Chowderpally, Devarkadra, Mahabubnagar

#### **Abstract**

Hybrid CLB structures of FPGAs to facilitate consists of lookup tables and multiplexers are estimated in the direction of the goal of more logic solidity as well as size diminution. Several hybrid configurable logic block structures are two types. They are nonfracturable and fracturable. Both architectures by diverging MUX:LUT logic unit percentages are estimated using front-end synthesis and technology mapping, and VPR for package, place, route, and structure investigattion. Technology mapping minimizations of the projected structures are implemented. Practically, we illustrate that the timing performance is affected minor and area is reduced for both nonfracturable and fracturable architectures. To design the proposed technique this requires XILINX ISE 14.5 simulation tool. In this we can calculate the area, delay and power consumptions of the proposed technique. Keywords-FPGA, CLB, multiplexer (MUX), Look up table (LUT).

#### **I.INTRODUCTION**

In the record of FPGAs, LUTs have been the primary logic element (LE) utilized to recognize combinational logic. A K-excite LUT is basic and more flexible, proficient to employ any K-excite Boolean expression. The exploit of LUTs rectifies technology mapping as the difficulty is diminished to a graph enveloping difficulty. Conversely, an exponential region cost is compensated as greater LUTs are considered. The value of K between 4 and 6 is generally seen in industry and academia, and this range has been demonstrated to offer a efficient area/performance compromise. Recently, an amount of other works have considered alternative FPGA LE architectures for performance improvement to close the large gap between FPGAs and application-specific integrated circuits(ASICs). In this thesis, we recommend including (various) inflexibled multiplexers (MUXs) in the FPGA logic obstructs as a indicates of boosting silicon region competence and logic density.

The MUX-supported logic obstructs for the FPGAs have seen accomplishment in premature productive structures, like as the Actel ACT-1/2/3 structures, and proficient mapping to these structures has been studied in the premature 1990s. Conversely, their utilize in business chips has decreased, maybe moderately because of the simplicity with which logic expressions would be mapped into LUTs, shorting the whole computer aided design (CAD) flow. Nevertheless, it is generally implicit that the LUTs are ineffective at employing multiplexers, and that multiplexers are commonly exploited in logic circuits. To feature the ineffectiveness of LUTs employing multiplexerss, believe that a sixinput LUT (6-LUT) is effectively a 64-to-1 multiplexer (to choose 1 of 64 truth-

**IJARSE** 

ISSN: 2319-8354

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 IJARSE www.ijarse.com ISSN: 2319-8

table lines) and 64-SRAMconfiguration cells, earlier it would simply recognize a 4-to-1 MUX (4 data + 2 select = 6 inputs).

In this thesis, we explain a six-input LE supported on a 4-to-1 MUX, MUX4, that would conceive a division of six-input Boolean logic utilities, and a novel hybrid complex logic block (CLB) that comprises a combination of MUX4s as well as 6-LUTs. The projected MUX4s are diminutive evaluated through a 6-LUT (15% of 6-LUT region), and would effectively map all {2, 3}-excite utilities and some {4, 5, 6}-input functions. In addition, we analyze fracturability of LEs—the facility to seperate the LEs into several diminutive components—in both LUTs and MUX4s to boost logic solidity. The percentage of LEs that would be LUTs versus MUX4s is too discovered in the direction of improving logic solidity for both nonfracturable and fracturable FPGA architectures. To simplify the architecture investigation, we enhanced a CAD cycle for mapping into the projected hybrid CLBs, generated employing ABC and VPR, and illustrate technology mapping techniques that restore the selection of logic functions that would be implanted into the MUX4 elements.

The main improvements in this thesis are as follows.

- 1) Two hybrid CLB structures (nonfracturable and fracturable) that comprise a combination of MUX4 LEs as well as the traditional LUTs defering up to 8% size reserves.
- 2) Mapping techniques known as NaturalMux and MuxMap targeted toward the hybrid CLB structure that optimize for area, while conserving the original mapping depth.
- 3) A full post-place-and-route architecture evaluation with VTR7, and CHStone benchmarks simplified by LegUp-HLS, the Verilog-to-Routing project showing effect on both size and delay.

Compared by the preface publication, we have presented transistor level modelling of the MUX4 LE, further studied the fracturable architectures, and merged the open source tool-flow as of C during LegUp-HLS to the VTR flow. Sparse crossbars (versus full crossbars in the prior work) have too been comprised in our CLBs, raising modelling precision. The novel transistor-level modelling of the MUX4 too gives extra perfect outcomes as evaluated through the prior work. Outcomes have too been expanded with the incorporation of timing results and larger architectural ratio sweeps.

The remains of this thesis is composed as trails. Chapter II converses prior work. Chapter III outlines the projected MUX4 LE, the alternative utilized in the fracturable architecture as well as the sheme of the hybrid complex logic block. trails. Chapter IV provides synthesis and simulation results. At last, we finish by final remarks in Chapter V.

#### II. RELATED WORK

Recent works have illustrated that the heterogeneous architectures and synthesis methods can have a significant impact on improving logic solidity and delay, precising the ASIC-FPGA gap. By "gated" LUTs, after that with asymmetric LUT LEs, illustrate that the LUT components exist in business FPGAs implement unnecessary flexibility.

In the direction of recovered delay and size, the macrocell-supported FPGA structures have been introduced. These studies describe significant changes to the traditional FPGA architectures, whereas the changes introduced here build on architectures used in industry and academia. Likewise, and inverter funnels have been projected as substitutes for the LUTs, motivated by and-inverter graphs (AIGs).

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 IJARSE WWW.ijarse.com ISSN: 2319-8354

Purnaprajna and Ienne analyzed the prospect of reprocessing the usual MUXs enclosed inside the Xilinx Logic Slices. Parallel to this exertion, they employ the ABC priority cut mapper and VPR for packing, place, and route. However, their work is primarily delay-based presenting an average speedup of 16% using only ten of 19 VTR7 benchmarks.

#### III. PROPOSED STRUCTURES

An amount of FPGA architecture variants were evaluated as well as all are depend on the essential MUX4 component represented in Chapter III-A. The scheme of LEs and deliberations for fracturability are conversed in Chapter III-B trailed by hybrid-CLB sheme in Chapter III-C. The areas of all projected structures are illustrated in Chapter III-D.

#### A. MUX4: 4-to-1 Multiplexer Logic Element

The MUX4 LE illustrated in Fig. 1 involves a 4-to-1 MUX by elective complement on its excites that concede the recognition of any {2, 3}-excite function, various {4, 5}-excite functions, and one 6-excite function—a 4-to-1 MUX itself by elective complement on the data excites. A 4-to-1 MUX contests the excite pin calculation of a 6-LUT, conceding for fair evaluations regarding the connectivity and intracluster routing.



Fig. 1. MUX4 LE depicting optional data input inversions

Commonly, any two-excite Boolean expression would be simply employed in the MUX4: the two function excites would be fixed to the select lines and the truth table values (logic 0 or logic 1) would be directed to the data excites accordingly. Or consequently, a Shannon disintegration would be functioned regarding one of the two variables—the variable would then give to a select excite. The Shannon cofactors will comprise at mainly one variable and can, as a result, be fed to the data excites (the elective complement would be needed).

For three-excite functions, think that a Shannon disintegration regarding one variable outcomes cofactors with at mainly two variables. A second decomposition of the cofactors regarding one of their two remaining variables outcomes cofactors by at mainly one variable. Such single-variable cofactors would be given to the data excites (the elective complement may be necessitated), with the disintegration variables giving the select excites. As well, utilities of more than four excites would be employed in the MUX4 providing Shannon decomposition regarding any two excites outcome cofactors with at mainly one excite.

Survey that excite complement on every select excite is erased as this can simply provide to permute the four MUX data inputs. While this can facilitate routability inside the CLB's internal crossbar, more inversions on the select excites would not raise the amount of Boolean expressions that are proficient to record to the MUX4 LE.



#### B. Logic Elements, Fracturability, and MUX4-Based Variants

Two families of structures are fabricated: 1) non fracturable LEs and 2) fracturable LEs. In this thesis, the fracturable LEs indicate to a structural element on which single or additional logic functions would be electively mapped. Nonfracturable LEs indicate to a structural element on which simply single logic expression is mapped. In the nonfracturable structures, the MUX4 component illustrated in Fig. 1 is employed mutually by nonfracturable 6-LUTs. This component distributes the equal amount of excites as a 6-LUT giving for fair evaluation regarding the excite connectivity.

For the fracturable structure, we believe an eight-excite LE, similarly contested by the adaptive logic element in



Fig. 2. Fracturable 6-LUT that can be fractured into two 5-LUTs with two shared inputs



Fig. 3. Dual MUX4 LE that utilizes dedicated select inputs and shared data inputs

new Altera Stratix FPGA families. A 6-LUT that would be separated into two 5-LUTs using eight excites is illustrated in Fig. 2. Two five-excite operations would be mapped into this LE if two excites are distributed among the two functions. If no excites are distributed, two four-input operations would be mapped to every 5-LUT. For the MUX4 element, Dual MUX4, we exploit two MUX4s inside an one eight-input LE. In the design, illustrated in Fig. 3, the two MUX4s were lined to have contributed select excites and shared data inputs. This

## International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

configuration concedes this structure to map two independent (no shared inputs) three-input functions, while larger functions may be mapped dependent on the shared inputs between the two functions.

An architecture in which a 4-to-1 MUX (MUX4) is breaken into two smaller 2-to-1 MUXs was first recognized.



Fig. 4. Hybrid CLB with a 50% depopulated intra-CLB crossbar depicting BLE internals for a nonfracturable (one optional register and one output) architecture



Fig. 5. Hybrid CLB with a 50% depopulated intra-CLB crossbar depicting BLE internals for a fracturable (two optional re isters and two outputs) architecture

However, since a 2-to-1 MUX's mapping flexibility is quite limited (can only map two-input functions and the three-input 2-to-1 MUX itself), little benefit was joined evaluated by the overheads of making the MUX4 fracturable and poor area results were observed.

#### C. Hybrid Complex Logic Block

A mixture of unusual architectures were recognized—the first being a nonfracturable architecture. In the non-fracturable architecture, the CLB has 40 excites and ten basic LEs (BLEs), through every BLE bearing six excites as well as one respose trailing experimental information in preceding work. Fig. 4 shows this nonfracturable CLB architecture with BLEs that comprise an optional register. We modify the percentage of MUX4s to LUTs inside the ten component CLB since 1:9 toward 5:5 MUX4s:6-LUTs. The MUX4 component is introduced to work in conjunction with 6-LUTs, designing a hybrid CLB through a combination of 6-LUTs and MUX4s (or MUX4 v riants). Fig. 4 shows the association of our CLB and internal BLEs.

## International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 IJARSE WWW.ijarse.com ISSN: 2319-8354

For fracturable structures, the CLB has 80 excites and ten BLEs, through every BLE bearing eight inputs and two responses following an Altera Stratix Adaptive-LUT. The similar ramp of MUX4 to LUT percentages has too operated. Fig. 5 illustrates the fracturable structure with eight inputs to each BLE that comprises two optional registers. We calcuate breakability of LEs versus nonfracturable LEs in the context of MUX4 elements since fracturable LUTs are universal in commercial architectures. For example, Altera Adaptive 6-LUTs in Stratix IV and Xilinx Virtex 5 6-LUTs can be separated into two smaller LUTs with some limitations on excites.

The crossbar for fracturable architectures are grater than the nonbreakable structures for two reasons. Because of the virtual increase of LEs, a larger number of CLB inputs are desired, which increases crossbar size. Since there are now twice as many outputs from the LEs, these additional outputs need to also be get back into the crossbar, also increasing its size. Due to this difference in crossbar size, fair comparisons cannot be finished among fracturable and nonfracturable architectures. Hence, in this thesis, we compare nonfracturable hybrid CLB structures to a baseline LUT only nonfracturable architecture and we compare fracturable hybrid CLB structures to a baseline LUT-only fracturable architecture.

Sparse crossbars have been previously studied and in this thesis, we model a 50% wasted crossbar inside the CLB for intracluster routing for both nonbreakable and breakable structures as evaluated through the preface book that only modeled a full input crossbar.

#### D. Area Modelling

1) MUX4 Logic Element: Initial estimates of the MUX4 component displayed that the MUX4 is ~10% the area of a 6-LUT overall. A 4-to-1 MUX can be known with three 2-to-1 MUXs. Hence, the MUX4 element comprises seven 2-to-1 MUXs, four SRAM cells, and four inverters in total (see Fig. 1). The elective complement uses the four SRAM cells, whereas the rest of the LE configuration is operated through routing. In extension, the strength of the MUX tree is halved evaluated through the 6-LUT, which has six 2-to-1 MUXs on its longest paths. Conservatively, considering constant pass transistor sizing and that the area of a 2-to-1 MUX and six transistor SRAM cell are roughly equivalent, the MUX4 component has (1/16)th the SRAM area and (1/8)th the MUX area of a 6-LUT.

These calculates were revised using transistor level modeling of the trail blocks. Transistor-stage minimization of the constituent circuit blocks of an FPGA desires an understanding of the optimal area-delay tradeoffs for each individual circuit block. This desires extracting correspond to critical path, which is a path whose composition of blocks and topology will be similar to the critical path of a specific design. Extracting the representative critical path allows us to judge to what extent each individual block is timing critical, which thus provides an area-delay tradeoff goals for each block. This is in line with the transistor-level optimization tool developed earlierly. We use the outcomes of prior work to provide the optimal area-delay tradeoff for 6-LUTs in a conventional island-style FPGA architecture with typical architectural parameters. The resulting 6-LUT delay delivers as a point of reference for optimization for the circuits considered in this paper: in the interest of maximizing area reduction while allowing performance to be managed (ignoring the differences in cell counts between mapping to a conventional LUT and the LEs projected in this paper), we stab to test the delay of a 6-LUT as optimizing the area of each of the variants of the MUX4 circuits.

| <b>IJARSE</b>   |  |  |  |  |
|-----------------|--|--|--|--|
| ISSN: 2319-8354 |  |  |  |  |

| Logic Element Design | Area (MWTA) | % 6-LUT Area | Scaled Max. Delay (ps) |
|----------------------|-------------|--------------|------------------------|
| MUX4 Min. Area       | 95          | 10.2%        | 311                    |
| MUX4 Min. Delay      | 108         | 11.6%        | 248                    |
| Dual MUX4 Min. Area  | 249         | 26.7%        | 398                    |
| Dual MUX4 Min. Delay | 255         | 27.4%        | 375                    |
| 6-LUT                | 930         | 100.0%       | 398                    |

Table I LE Transistormodels With Area Given In Minimum-Width Transistor Area And Delays Scaled For A 40-nm Process

Transistor stage representation and optimizations were supported on a predictive 22-nm high performance process, while the area model introduced in preceding effort was used to calculate the area of various circuit structures. We determined an area-delay optimal 6-LUT has an area of 930 minimum-width transistors, and a worst-case delay of 261 ps using this methodology. For the MUX4 cell and Dual MUX4 cell, a minimum area and minimum delay cell was designed. The minimum area MUX4 cell has an area of 95 minimumwidth transistors and a delay of 204 ps; all transistors were minimum-width in this case, and as the minimum area solution for this circuit was able to carry out (and improve upon) the bad-case delay target of a 6-LUT. Similarly, the Dual MUX4 cell has an area of 249 minimum-width transistors as convening the bad-case delay requirement. However, we desired to utilize the minimum delay design for both the MUX4 and Dual MUX4 elements for the rest of the study as there is not a critical increase in area over the minimum area design.

#### IV. SYNTHESIS AND SIMULATION RESULTS

The proposed LUT are designed with the XILINX ISE 14.5 simulation tool and executed with Verilog HDL. The RTL diagram and simulation results are displayed below.



### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 IJARSE

www.ijarse.com

Fig: Top level schematic diagram



Fig: Internal architectures of RTL diagram

| lut6 Project Status (06/08/2017 - 11:02:31) |                           |                       |                        |  |  |  |
|---------------------------------------------|---------------------------|-----------------------|------------------------|--|--|--|
| Project File:                               | fpgalutmux.xise           | Parser Errors:        | No Errors              |  |  |  |
| Module Name:                                | lut6                      | Implementation State: | Synthesized            |  |  |  |
| Target Device:                              | xc7z010-2dg400            | • Errors:             | No Errors              |  |  |  |
| Product Version:                            | ISE 14.5                  | • Warnings:           | 161 Warnings (161 new) |  |  |  |
| Design Goal:                                | Balanced                  | • Routing Results:    |                        |  |  |  |
| Design Strategy:                            | Xilinx Default (unlocked) | • Timing Constraints: |                        |  |  |  |
| Environment:                                | System Settings           | • Final Timing Score: |                        |  |  |  |

| Device Utilization Summary (estimated values) |      |           |             |     |
|-----------------------------------------------|------|-----------|-------------|-----|
| Logic Utilization                             | Used | Available | Utilization |     |
| Number of Slice LUTs                          | 18   | 17600     |             | 0%  |
| Number of fully used LUT-FF pairs             | 0    | 18        |             | 0%  |
| Number of bonded IOBs                         | 19   | 100       |             | 19% |
| Number of BUFG/BUFGCTRL/BUFHCEs               | 1    | 80        |             | 1%  |

Fig: Synthesis Report



ISSN: 2319-8354

### International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 www.ijarse.com

ISSN: 2319-8354

**IIARSE** 

#### Fig: Simulation result

#### V. CONCLUSION

In this thesis, we have projected a novel hybrid CLB structure comprising MUX4 inflexible MUX components as well as illustrated procedures for proficiently mapping to these structures. The adding of MUX4s to FPGA architectures minimally effect FMax and show potential for enhancing logic-density in nonfracturable structures as well as reserved latent for enhancing logic density in fracturable architectures. MUX4 LE, Dual MUX4 LE and Fracturable 6-LUT were designed by the Verilog HDL synthesized in Xilinx ISE 14.5.

#### VI. FUTURE SCOPE

Hybrid CLB would be executed utilizing 4-input LUT for high speed performance with best area-efficiency.

#### REFERENCES

- [1] J. Rose et al., "The VTR project: Architecture and CAD for FPGAs from verilog to routing," in Proc. ACM/SIGDA FPGA, 2012, pp. 77-86.
- [2] Y. Hara, H. Tomiyama, S. Honda, and H. Takada, "Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis," J. Inf. Process., vol. 17, pp. 242-254, Oct. 2009.
- [3] A. Canis et al., "LegUp: High-level synthesis for FPGA-based processor/accelerator systems," in Proc. ACM/SIGDA FPGA, 2011, pp. 33-36.
- [4] E. Ahmed and J. Rose, "The effect of LUT and cluster size on deepsubmicron FPGA performance and density," IEEE Trans. Very Large Scale Integr. (VLSI), vol. 12, no. 3, pp. 288–298, Mar. 2004.
- [5] J. Rose, R. Francis, D. Lewis, and P. Chow, "Architecture of field programmable gate arrays: The effect of logic block functionality on area efficiency," IEEE J. Solid-State Circuits, vol. 25, no. 5, pp. 1217-1225, Oct. 1990.
- [6] H. Parandeh-Afshar, H. Benbihi, D. Novo, and P. Ienne, "Rethinking FPGAs: Elude the flexibility excess of LUTs with and-inverter cones," in Proc. ACM/SIGDA FPGA, 2012, pp. 119–128.
- [7] J. Anderson and Q. Wang, "Improving logic density through synthesisinspired architecture," in *Proc. IEEE* FPL, Aug./Sep. 2009, pp. 105-111.
- [8] J. Anderson and Q. Wang, "Area-efficient FPGA logic elements: Architecture and synthesis," in Proc. ASP DAC, 2011, pp. 369-375.
- [9] J. Cong, H. Huang, and X. Yuan, "Technology mapping and architecture evalution for k/m-macrocell-based FPGAs," ACM Trans. Design Autom. Electron. Syst., vol. 10, no. 1, pp. 3-23, Jan. 2005.
- [10] Y. Hu, S. Das, S. Trimberger, and L. He, "Design, synthesis and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates," in Proc. IEEE ICCAD, Nov. 2007, pp. 188–193.

# International Journal of Advance Research in Science and Engineering Volume No.06, Issue No. 10, October 2017 Www.ijarse.com IJARSE ISSN: 2319-8354

#### **AUTHOR DETAILS**



**A. VENKATESWARI**, pursuing M.Tech (DSCE) from Sri Visvesvaraya Institute Of Technology & Science, Chowderpally (Vill), Devarkadra (Mdl), Mahabubnagar (Dist), TS, INDIA.



**K. RAJESH KUMAR**, working as Assistant professor (ECE) from Sri Visvesvaraya Institute Of Technology & Science, Chowderpally (Vill), Devarkadra (Mdl), Mahabubnagar (Dist), TS, INDIA.