## A LOW COMPLEXITY MULTI-MODE FLIP-FLOP DESIGN

JIN-FA LIN, KUN-SHENG LI, YUN-RONG JIANG, MING-YIN TSAI AND HUNG-CHI CHU

> Department of Information and Communication Engineering Chaoyang University of Technology 168, Jifeng E. Rd., Wufeng District, Taichung 41349, Taiwan jflin@cyut.edu.tw

Received February 2016; accepted April 2016

ABSTRACT. A novel pulse triggered FF design featuring low circuit complexity, power efficiency and multi-mode operations is presented. A pass transistor logic (PTL) based pulse generator design, successfully incorporating 3 pulse generation modes into a unified logic consisting of 6 transistors only, is first developed. Combining the pulse generator with one simple level sensitive latch leads to a novel FF design supporting multi-triggering mode operations. Compared with a patented multi-mode FF design for FPGA circuitry, the proposed design has a much smaller layout size and excels in both power and speed performances.

Keywords: Low power, Pass transistor logic, Pulse-triggered, Flip-flop

1. Introduction. Flip-flops (FFs) are essential storage elements used extensively in digital circuit designs. Although the circuit complexity of a single FF is generally small, it has significant impact on the overall circuit complexity – especially in those system designs employing heavily pipelining strategy or massive register files. Besides the circuit complexity issue, FFs also contribute to a prominent portion of the system's power consumption. It is estimated that more than 80% of the clock tree power consumption comes from the FFs. Designing a low complexity and power efficient FF is thus crucial to the system performance [1-3]. As for the functionality of the FFs, mode programmability was once a feature unique to FPGAs. Recent VLSI circuit designs, nonetheless, employ much more complicated clocking schemes and FFs are often employed to operate in different triggering modes [4,5]. Latest low power design techniques also indicate dynamic controls of a triggering mode of FF subject to various working conditions as an appealing feature. Conventional FF designs support only a fixed operation mode, either single- or dual-edge triggering. Multi-mode and mode programmable FF designs have so far appeared in the FPGA literature. Figure 1 depicts a patented programmable FF design by Xilinx<sup>TM</sup> [1]. In this design, three latches plus excessive control logic are needed to realize multi-mode operations. The incurred circuit complexity is significant and adverse to the power and the speed performances. Due to the fact that the threshold voltage loss occurred in three nMOS type latches, this design also suffers from DC power consumption problems and cannot sustain low V<sub>DD</sub> operations, either.

In this paper, a novel low complexity and power efficient multi-mode FF design will be presented. In contrast to the master-slave style approach, a pulse-triggered structure, which consists mainly of a pulse generator and a simple level sensitive latch, is adopted due to its circuit simplicity. Pulse-triggered FFs have become increasingly popular these days and were adopted widely in various micro-processor designs [6-8]. The mode control of the proposed FF design is accomplished by using a low transistor count multi-mode pulse generator. In addition, a clock gating mode, which suspends pulse generation when the FF is disabled, is incorporated in the design as well. The proposed design is implemented



FIGURE 1. Conventional master-slave based multi-mode FF design



FIGURE 2. Proposed pulse triggered based multi-mode FF design

in TSMC 0.18 $\mu$ m CMOS process technology and extensive simulations are conducted to demonstrate its performance superiority over others. The rest of the paper is organized as follows. In Section 2, the proposed multi-mode pulse generator design and FF design are explained. In Section 3, post-layout simulation results are compiled to demonstrate the superiority of the proposed FF design, and the conclusion is given in Section 4.

2. **Proposed Circuit Designs.** The proposed multi-mode pulse generator design using pass transistor logic (PTL) is shown in Figures 2(a)-2(c). The inverter chain consists of 4 inverters to generate 4 phased clock signals (Figure 2(a)). These 4 phased clock signals are classified into two groups, i.e., (*CK*, *CKBD*) and (*CKB*, *CKD*). They can be considered as two sets of complementary and skewed clock signals required in pulse generation. The former is responsible for the pulse generation on rising edges of the clock and the latter controls the pulse generation on falling edges of the clock. The timing diagram of these clock signals were depicted in Figure 2(b). The signal PulseCK at the bottom is the desired output waveform of the pulse generator subject to two mode control signals *MA* and *MB*.

1826

Basically, MA (active low) works in conjunction with (CK, CKBD) and MB (active low as well) works in conjunction with (CKB, CKD). A total of 4 pulse generation modes can be realized. For (MA, MB) equal to (0,0), (0,1), (1,0) and (1,1), negative pulses will be generated on dual edges, rising edges, failing edges and neither edges of the clock, respectively. The last combination is actually the disable mode and can prevent the FF from latching new data. Figure 2(c) shows the proposed pulse generation logic based on PTL. It is basically a combination of two generation logic wire ANDed together. Transistors P1, N1 and N2 form the logic for pulse generation on the rising edges. If the control signal MA is set to "0", a discharging path through N1 and N2 will be formed momentarily on the rising edges to pull the output node low as a negative pulse. Otherwise, the output node remains high via transistor P1 when CK equals "0". Likewise, transistor P2, N3 and N4 form the logic for pulse generation on the negative edges. The outputs of these two logics are tied together, which behaves like a wired AND function. Transistors P1 and P2, controlled by complementary signals CK and CKB, complement each other to pull the output node high in both halves of the clock cycle.

This design tactfully integrates two separate logics into one to provide multi-mode pulse generations. The transistor count is as few as 6 (excluding the output buffer). Two additional inverters are employed at the output node to provide complementary pulses needed in the following latch design as shown in Figure 2(d). It is a simple transmission gate (TG) based latch design with low circuit complexity. Although various static levelsensitive latch designs have been considered, the TG based version is chosen to reduce loading capacitance of the clock tree and to ensure full voltage swing operations [1].

For the described pulse generation logic, short circuit power consumption occurs when pulses are generated – a typical pitfall in using wired AND scheme. This is because the discharging formed by transistors N1, N2 (or N3, N4) and the pull-up transistor P2 (or P1) conducts simultaneously at the rising (or falling) edges of the clock. Ratioed design must be observed to assure the height of the generated pulse is reasonably large. Fortunately, this problem can be easily solved by reducing the aspect ratio of the pull up pMOS transistors. Due to rather short pulse duration, the incurred short circuit power consumption can be greatly alleviated as well. Since the proposed pulse generation logic is PTL based, the input power of mode control signal MA and MB must be taken into account. Again, this part of power is inherently small in this design. When either mode control signal is active low, it serves as the current sink in the discharging path and causes no power loading to the signal driving circuit. When both mode control signals are high (i.e., disable mode), the output node of the pulse generator stays high and the nMOS transistors along the discharging path are in cut-off states  $(i_{DS} = 0)$ . The only case input power consumption emerges is when the pulse generator operates in single edge triggering modes (either rising or falling). In this case, the inactive mode control signal (= 1) supplies a charging current to the output node at the end of every pulse generation. The inactive mode control signal, however, plays only a secondary role as the pull up pMOS transistor assumes the primary charging responsibility. Once the output node is charged to  $V_{DD}$ - $V_{TN}$ , the charging path through nMOS transistors turns off and the driving circuit of the mode control signal is off the hook.

The total transistor count of proposed FF is only 26. Note that this number can be further reduced if one pulse generator is shared among several FFs – a common practice for pulse triggered FF designs [6]. Circuit simplicity implies power efficiency, not just to the FF itself but also to the clock tree network. Another advantage of the proposed pulse generator is that it is free of threshold voltage loss problem common in PTL based design. This can save the DC power consumption problem from the driven logic. The design can also tolerate lower V<sub>DD</sub> operations and the simulation results indicate it can function properly under 0.75V @SS corner. Another implication is that the proposed design, though targeting  $0.18\mu$ m process in this paper, is more sustainable to the deep submicron effect in more advanced process than other PTL logic.

3. Simulation Results. The simulation waveforms of the proposed multi-mode FF design working under 1.8V/200MHz is shown in Figure 3. All four modes of pulse generation are simulated and the generated pulses are sharp in shape with sufficient signal swing. The simulation settings are as follows. The target technology is TSMC 0.18 $\mu$ m CMOS process technology. The minimum feature size of the transistor is 0.45 $\mu$ m/0.18 $\mu$ m.



FIGURE 3. Simulation waveforms of proposed FF@1.8V/200MHz (SS corner)

To create the rise and the fall time signal delays, the input patterns are generated by the buffers. Since pulse width control is crucial to the correctness of data capturing as well as the power consumption in P-FF designs, the factor of process variation is also taken into account when sizing the transistors. In simulations, the output of the FF is loaded with a 20fF capacitor. An extra loading capacitance of 3fF is also placed after the clock buffer as suggested in [6-8]. The operating condition used in simulations is (200MHz/100MHz @1.8V). The 200MHz and 100MHz settings are used in the single and the double edge triggered modes, respectively, for a fair power consumption comparison.

The performance evaluation of the proposed design, which is against the master-slave based programmable FF design [1], is conducted. Six test patterns, each exhibiting a different data switching probability, are applied. Five of them are deterministic patterns with 0% (all-zero and all-one), 25%, 50% and 100% data transition probabilities, respectively. The 6th one is a random test pattern with a 30% bit "1" population. Note that the test pattern toggling is synchronized with the 200MHz clock signal. This ensures the FFs working in either single- or double-edge-triggered mode experience the same data switching frequency. Furthermore, the power consumption of the two extra inverters driving the mode control signals is also counted.

The power consumption simulation results are illustrated as a bar chart in Figure 4. The suffixes "SR", "SF" and "D" indicate rising-edge-triggered, falling-edge-triggered and double-edge-triggered operations, respectively. For the proposed FF, as a pulse-triggered based design, the power consumption of the pulse generator is constant regardless of the data pattern. The total power consumption increases mildly with respect to the data



FIGURE 4. Power consumptions under different test patterns

| Designs                      | Design [1] |             |             | Proposed Design |             |             |
|------------------------------|------------|-------------|-------------|-----------------|-------------|-------------|
| Triggering Mode              | Double     | Single-Rise | Single-Fall | Double          | Single-Rise | Single-Fall |
| # of Tr / Area ( $\mu m^2$ ) | 37 / 701.9 |             |             | 26 / 218.1      |             |             |
| Disable Function             | No         |             |             | Yes             |             |             |
| DC Power & $V_T$ Loss        | Yes        |             |             | No              |             |             |
| Setup Time (pS)              | 75.8       |             |             | -122            |             |             |
| Data to Q (nS)               | 0.70       | 0.51        | 0.70        | 0.21            | 0.26        | 0.22        |
| $PDP_{DQ}$ (fJ)              | 53.2       | 45.8        | 73.9        | 21.0            | 27.5        | 26.6        |

TABLE 1. Features and simulation results summary of FF designs

switching activities. On the other hand, the master-salve based design is free of the sustained power consumption in the pulse generator and has a lower quiescent power. Its power consumption, however, increases significantly as the data switching activities rise. This trend is evident in the simulation results. When the input data is static (all 0's or all 1's), the power consumption of the proposed design is slightly inferior to that of the design in [1] in double-edge-triggered operations. (The proposed design regains its power advantage in single-edge-triggered operations.) Note that this is the only case that the proposed design is outperformed by its counterpart. After the input data switching probability reaches 25% or beyond, the proposed design exhibits much smaller power consumption than the design in [1]. The power performance gap widens as the data switching probability increases. The power behaviors of both designs in the face of random test pattern are similar to the case with 25% data switching activities. Another advantage of the proposed design is that one pulse generator can be shared among several FF designs. This can further reduce the power consumption overhead incurred by the pulse generator circuit. Table 1 summarizes the transistor counts, the layout areas, the setup times, the D to Q delays, and the power-delay-product (PDP) under 25% data switching probability of both designs. The proposed design is much more area efficient and excels the rival design significantly in both speed and PDP performances.

Note that transmission gate type (in lieu of fully complementary CMOS type) AND logic designs are used in design [1] to reduce its circuit complexity. Yet, its transistor count is still much larger than the number of the proposed design. The area discrepancy between the two designs is even larger. This is attributed to the complicated wiring in design [1] causing less compact layout design. Due to the nature of the pulse-triggered FF design, the set up time of the proposed design is negative as opposed to a positive set

up time in the master-slave based design. The design in [1] also suffers from DC power consumption problems caused by the threshold voltage loss along the feedback path of the adopted latch design. This problem can be alleviated by replacing the feedback nMOS transistor with a transmission gate at the cost of extra transistors.

4. **Conclusion.** In this paper, a novel low complexity multi-mode FF design is presented. It can support 3 operation modes plus the disable function by using a simple pulse generator design. The circuit complexity of the proposed multi-mode FF design is much lower than the master-slave based design counterpart. Its power and PDP performance are also superior to the counterpart design and the power advantage becomes even more prominent in operations with higher data switching activities.

Acknowledgment. The authors would like to thank National Chip Implementation Center (CIC), Taiwan for technical support in simulations. This work was sponsored by MOST 103-2221-E-324-042-.

## REFERENCES

- T. J. Bauer et al., FPGA Memory Element Programmably Triggered on Both Edges, U.S. Patent 6072348, 2000.
- H. Kawaguchi and T. Sakurai, A reduced clock-swing flip-flop (RCSFF) for 63% power reduction, IEEE Journal of Solid-State Circuits, vol.33, no.5, pp.807-811, 1998.
- [3] N. Kawai et al., A fully static topologically-compressed 21-transistor flip-flop with 75% power saving, IEEE Journal of Solid-State Circuits, vol.49, no.11, pp.2526-2533, 2014.
- [4] S.-N. Tang, C.-H. Liao and T.-Y. Chang, An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systems, *IEEE Journal of Solid-State Circuits*, vol.47, pp.1419-1435, 2012.
- J.-F. Lin, Y.-T. Hwang and M.-H. Sheu, Novel low complexity dual mode pulse generator designs, *IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences*, vol.91-A, no.7, pp.1812-1815, 2008.
- [6] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev and V. De, Comparative delay and energy of single edge-triggered and dual edge triggered pulsed flip-flops for high-performance microprocessors, *Proc. of ISPLED*, pp.207-212, 2001.
- [7] P. Zhao, T. Darwish and M. Bayoumi, High-performance and low power conditional discharge flipflop, *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol.12, no.5, pp.477-484, 2004.
- [8] J.-F. Lin, Low power pulse-triggered flip-flop design based on a signal feed-through scheme, IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol.22, no.1, pp.181-185, 2014.