## PROCESS-TOLERANT DIGITIZED CASCADE TIMING TRACKING SCHEME FOR SRAM SENSE AMPLIFIER

ZHENGPING LI AND YONGLIANG ZHOU

School of Electronics and Information Engineering Anhui University No. 111, Jiulong Road, Hefei 230601, P. R. China 842320622@qq.com

Received July 2016; accepted October 2016

ABSTRACT. To reduce the process-variation of static random access memory (SRAM) sense amplifier timing more effectively, a novel digitized cascade replica bitline delay (DC-CRBD) technique has been proposed in this paper. The main idea of this technique is that both replica bitlines (RBLs) are utilized independently; meanwhile, 2K times replica cells (RCs) are utilized compared with that of conventional technique. Simulation results show that compared with conventional replica bitline delay (CONV) technique, digitized replica bitline delay (DRBD) technique and cascade replica bitline delay (CCRBD) technique, the variation of the proposed technique is reduced by 59.51%, 23.73% and 18.29%, respectively, under the conditions of 600mv supply voltage, slow-slow (SS) process corner in Taiwan semiconductor manufacturing company (TSMC) 65nm technology. **Keywords:** Replica bitline, Sense amplifier enable, Timing process-variation, Timing multiplier circuit

1. Introduction. For speed and power efficient read operation of SRAM, the sense amplifier (SA) is used to amplify the voltage difference between the pair of bitlines [1-5]. A sense amplifier enable (SAE) signal is used to control SA. When the voltage difference between the pair of bitlines reaches the threshold voltage of the sensitive amplifier, the SAE should be asserted. The earlier the SAE reaches, the easier read failure may happen. If the SAE comes late, unnecessary access time and energy consumption will be introduced, so the timing of SAE is critical for reliable and efficient read operation of SRAM. However, the timing of SAE signal is sensitive to process, voltage and temperature (PVT) variations. To obtain the variation resilient timing of SAE, a replica bitline technique is usually utilized [6-8].

This technique uses replica cells (RCs) and replica bitline (RBL) tracking the read operation of main memory cell. However, with the shrink of the transistor size, the process-variation of transistor's threshold voltage is more serious, and cannot be tracked tightly by the conventional replica bitline delay (CONV) technique, which leads to the SRAM access time deterioration and may result in read failure, particularly at low supply voltage [9-11].

To suppress the variation of the SAE timing, DRBD [9] and CCRBD [10] have been proposed. DRBD can reduce the process-variation of SAE by increasing the number of RCs, and utilizing the timing multiplier circuit (TMC) to delay the SAE timing. The CCRBD technique uses twice RCs compared with that of conventional technique, once the control signal is asserted, the left RBL of CCRBD starts to discharge by RCs firstly, and then, the inverter reverses the left RBL to discharge the right one. However, with the number of RCs increased in DRBD technique, the greater quantization noise as well as area overhead caused by the TMC cannot be neglected [10]. And, in spite of area efficiency, the CCRBD can only suppress the timing variation of SAE by 0.5 compared with that of CONV technique. Considering the trade-off between the area overhead and the efficiency, in order to further optimize the SAE timing a novel DCCRBD technique is proposed in this paper.

The remainder of this paper is organized as follows. In Section 2, we introduce the timing control technique of the conventional design and the other two improved designs: DRBD and CCRBD. Section 3 describes and analyzes the proposed digitized cascade replica bitline delay technique. Simulation results and performance comparisons are presented in Section 4. Finally, Section 5 concludes this paper.

## 2. CONV, DRBD and CCRBD Techniques.

2.1. Conventional replica bitline delay (CONV) technique. Figure 1 shows a block diagram of a conventional timing replica circuit with a SRAM cell array. A CONV circuit is made up of RCs and dummy cells (DCs), so that the actual bitline delay in the actual storage unit can be tracked. The number of RCs is equal to the number of memory cells (MCs) in each column of the memory array, and the length of the RBL is also the same as the length of the normal bitline (BL/BLB). Thus, the discharge current of RBL by timing replication module is n times the discharge current of BL/BLB by storage array. During standby, the normal BL/BLB and RBL are pre-charged to supply voltage level. Once the word line and the control signal CK are activated, the replica cells begin to discharge RBL; at the same time, the memory cell will discharge BL/BLB. When the voltage difference between the pair of normal bitlines reaches the threshold voltage (Vth) of the SA, the SAE should be asserted. The timing of SAE is extremely significant for read operation of SRAM, which should be determined as to PVT variant. Assuming that the logical threshold voltage of inverter is half of supply voltage, when lowering the supply voltage, the timing variation of the conventional SAE is difficult to be reduced because of the upper limit of RCs count [9,10].

2.2. Digitized replica bitline delay (DRBD) technique. In the CONV technique, the number of RCs is small because of the low supply voltage limitations, thus the Vth cannot be controlled accurately, and DRBD technique is proposed [9] to overcome this problem. Figures 2(a) and 2(b) show that the RCs of DRBD uses the same structure as that of CONV, but the number of RCs in DRBD is k times that of CONV. So the mean



FIGURE 1. Block diagram of conventional timing replica circuit



FIGURE 2. Comparison about circuits and RCs of (a) CONV; (b) DRBD; and (c) CCRBD scheme

value of SAE timing delay of DRBD becomes  $\frac{1}{k}$  of CONV's, and the standard deviation will become  $\frac{1}{\sqrt{k}}$  of CONV's. The discharge current of RCs  $(I_{cell})$  is much smaller than that of CONV. However, to guarantee the delay time keeping the same as normal bitline's, the TMC [9] is utilized, and the delay time will be adjusted by TMC.

2.3. Cascade replica bitline delay (CCRBD) technique. C. Peng et al. proposed a CCRBD technique in [10]. Figure 2(c) shows the number of RCs in CCRBD is twice that of CONV, and the control signals of RCs are independent, the left RC control signal connects with the CK, and the left RBL is connected to the right RC control signal via an inverter. Once the control signal CK is asserted, the left RBL starts to discharge by RCs, and then, the inverter reverses left RBL to discharge the right RBL. Consequently, the mean value of SAE timing delay of CCRBD is the same as that of CONV. Therefore, the standard deviation ( $\sigma$ ) of SAE is suppressed by  $\frac{1}{2}$  compared with that in CONV technique.

## 3. Proposed DCCRBD Technique.

3.1. **DCCRBD circuit analysis.** In this section, a new DCCRBD technique is proposed. Figure 3 shows the circuit structure in detail. The number of RCs, which are similar to those used in CCRBD technique, is 2kn in our proposed technique. Thus, it has the same layout area as RCs in CONV. Both left and right RBLs will pre-charge to



FIGURE 3. Block diagram of DCCRBD circuit



FIGURE 4. Circuit structure of TMC [9]

supply voltage level during standby. When the control signal CK arrives, the left RBL starts to discharge by RCs, and then, the inverter will reverse the left RBL to discharge the right RBL. When the right RBL voltage reaches the threshold voltage of the inverter, the SAEi will be generated. Then the SAEi will be adjusted by the TMC to give an accurate SAE signal.

Figure 4 shows the circuit structure of TMC, the number of delay units in backward path is twice that of forward path. Each delay unit consists of two NAND gates and

an inverter, and one of the NAND outputs is connected to the inverter input of the neighboring delay unit both in the forward path and backward path. The other of the NAND outputs of the delay unit in the forward path is connected to the NAND input of the delay unit in the backward path. Therefore, the CK signal can be delayed in this delay chain with these logic gates [9].

3.2. **DCCRBD principle analysis.** Figure 5 shows the waveform of the timing replica circuit. SAEi is activated by the replica bitline level sense inverter, which is used to digitize and detect the timing. The RBL voltage starts to discharge, as soon as CK asserted. At the same time, CK propagates in the forward path of TMC. When CK is propagated to  $F_{i-1}$ ,  $N_i$  and following nodes are negated, while  $N_{i-1}$  and previous nodes remain at the high level. Then, Bi is asserted, and the signal is propagated in the backward path of TMC. Because the backward path has twice as many delay units as the forward path, the propagation time in the backward path is twice that of forward path. Finally, the SAE timing is generated, which is 3 times as much period as the SAEi timing [9]. However, the random variation existing in the TMC is small enough compared with that of logic gate delay, because the abundant delay units utilized for TMC can sufficiently suppress the random variation.



FIGURE 5. Waveform of the proposed timing replica circuit [9]

Assume the number of CONV RCs is *n*. Here,  $\mu_{CONV}$  and  $\sigma_{CONV}$  are the mean delay and standard deviation of SAE in CONV technique, respectively. We increase the number of RCs in DCCRBD to 2kn, then the RC utilized in our design is similar to that in CCRBD technique, and both the total capacitance load of RBLs ( $C_{RBL}$ ) and discharge current of RCs ( $I_{Cell}$ ) are the same as that in CONV. Consequently, the mean value of SAEi timing delay is  $\mu_{SAEi} = \frac{C_{RBL} \times V_{dd}}{2kI_{Cell}} + \frac{C_{RBL} \times V_{dd}}{2kI_{Cell}} = \frac{\mu_{CONV}}{k}$ , which is suppressed by  $\frac{1}{k}$  compared with that in CONV. In order to get the right SAE timing, the TMC is used. Consequently, it deduces that:  $\mu_{DCCRBD} = \mu'_{SAEi} \times k = \mu_{CONV}$ . According to [5], the standard deviation will be divided by  $N\sqrt{N}$  when the number of RCs utilized is multiplied by N compared with that in CONV technique. Figure 6 shows distribution transformation of the SAE timing by applying the proposed scheme. Therefore, the  $\sigma$  of SAE timing with the proposed technique is  $k \times \sqrt{\left(\frac{C_{RBL} \times V_{dd}}{2k\sqrt{2k\Delta L_{Cell}}\right)^2} = \frac{\sigma_{CONV}}{2\sqrt{k}}$ , which is suppressed by  $\frac{1}{2\sqrt{k}}$ ,  $\frac{1}{2}$  and  $\frac{1}{\sqrt{k}}$  compared with that of CONV, DRBD and CCRBD technique, respectively. Thus, with theoretically the same improvement, the k of the



FIGURE 6. Distribution transformation of the SAE timing

DRBD is 4 times that of the proposed technique. As a result, the quantization noise as well as area overhead caused by the TMC will be reduced significantly. Meanwhile, compared with CCRBD technique, the timing variation of SAE will be further suppressed more than 0.5.

4. Simulation Results. Since our proposed technique is an improvement of circuit, which is based on theoretical analysis and irrelevant to the foundry and technology nodes, for presenting the effect of it, the TSMC 65nm CMOS technology is selected to accomplish the simulation. Figure 7 shows the Monte Carlo simulation result of the CONV, DRBD, CCRBD and the proposed scheme. The condition is 0.6V supply voltage, SS corner, 27°C. The number of RCs is 2 in the CONV, 6 in the DRBD, 4 in the CCRBD, 12 in the DCCRBD. The quantization error and the random variation caused by the delay units are included in the proposed circuit. The deviation of SAE timing with the proposed scheme is 2.76ns, which is much smaller than 6.817ns with the CONV, 3.619ns with the DRBD and 3.378ns with the CCRBD.

Figure 8 shows the cycle time margin by the Monte Carlo simulation at different voltages. Generally, CONV technique requires a timing margin to satisfy the timing deviation, which can ensure the chip is able to work in different environments. In this case, we assume 3 times the standard deviation for SAE timing margin. Moreover, the SAE timing generally used is about half of the cycle time as [9,10] mentioned. According to Figure 7, at 0.6V supply voltage, the standard deviation ( $\sigma$ ) of conventional SAE timing is 6.817ns; thus, the conventional timing margin is 40.9ns ( $3\sigma \times 2$ ). However, with the proposed DCCRBD technique, the timing margin is reduced to 16.56ns. Therefore, 59.5% of cycle time margin improvement is expected by applying the proposed scheme. Additionally, as mentioned above, the cycle time is twice SAE timing; thus, the conventional cycle time is 108.2ns (54.1ns  $\times$  2). Thus, compared with conventional, the cycle time with the proposed technique is reduced by  $\sim 45\%$  (i.e., (40.9 - 16.56)  $\times 2/108.2$ ) owing to 59.5% timing margin reduction. Similarly, with voltage changing, the cycle time will be also reduced with timing margin improved as Figure 8 shown.

Figure 9(a) shows the Monte Carlo simulation result of DCCRBD compared with that of CONV, DRBD, CCRBD in different process corners. This simulation is performed under



FIGURE 7. Monte Carlo simulation of the SAE timing variation of the CONV, DRBD, CCRBD, and DCCRBD



FIGURE 8. Simulation results of cycle time margin improvement

the conditions of 0.6V supply voltage, 27°C, with different process corners. Compared with that in CONV, the standard deviation of SAE timing of the proposed DCCRBD is reduced by 59.50%, 60.10%, 56.70%, 47.90% and 36.90% under SS, SF, TT, FS, and FF corners, respectively.

Figure 9(b) shows the standard deviation of SAE timing in different voltages. This simulation is performed under the condition of SS corner, 27°C, with different voltages. Compared with that in CONV, the standard deviation of SAE timing of the proposed



FIGURE 9. Standard deviation under different (a) process corners, (b) voltages, and (c) temperature

DCCRBD are reduced by 59.50%, 59.21%, 61.81%, 46.84%, 82.04%, 80.75% and 50.91% at 0.6V, 0.7V, 0.8V, 0.9V, 1.0V, 1.1V and 1.2V supply voltages, respectively.

Figure 9(c) shows the standard deviation of SAE timing in different temperature. This simulation is performed under the condition of SS corner, 0.6V. Compared with that in CONV, the standard deviation of SAE timing of the proposed DCCRBD is reduced by 35.50%, 49.30%, 59.51%, 55.50% and 57.67% at  $-40^{\circ}$ C,  $0^{\circ}$ C,  $27^{\circ}$ C,  $75^{\circ}$ C and  $125^{\circ}$ C, respectively.

For achieving more efficient PVT variation of SAE suppression, the addition of RCs in DCCRBD may lead to the additional power consumption. However, the number of additional RCs is quite small compared with the total number of SRAM storage units, so the power consumption of the additional RCs accounts for a very small proportion of overall SRAM power consumption. Considering the significant reduction of process variation, this power loss is acceptable.

5. Conclusions. A digitized cascade replica bitline delay technique is proposed to improve the process-variation resilient timing of SRAM sense amplifier enable in this paper. Simulation results show that compared with conventional replica bitline delay technique, digitized replica bitline delay technique, and cascade replica bitline delay technique, the process-variation is reduced by 59.51%, 23.73% and 18.29% respectively at the supply voltage of 600mv in TSMC 65nm, 27°C, SS process corner. Also, the cycle time is  $\sim 45\%$  smaller than that of CONV techniques. However, in terms of power consumption, the DRBD, CCRBD and proposed technique are larger than conventional technique. Thus, the further work about the trade-off between the power and efficiency will be considered.

Acknowledgment. This work is supported by the National Natural Science Foundation of China (Grant No. 61474001).

## REFERENCES

- T. Kobayashi, K. Nogami et al., A current-controlled latch sense amplifier and a static power-saving input buffer for low-power architecture, *IEEE Journal of Solid-State Circuits*, vol.28, no.4, pp.523-527, 1993.
- [2] A. Kawasumi, Y. Takeyama et al., A low-supply-voltage-operation SRAM with HCI trimmed sense amplifier, *IEEE J. Solid-State Circuits*, vol.45, no.11, pp.2341-2347, 2010.
- [3] J. Wu, J. Zhu et al., A multiple-stage parallel replica-bitline delay addition technique for reducing timing variation of SRAM sense amplifiers, *IEEE Trans. Circuits and Systems II: Express Briefs*, vol.61, no.4, pp.264-268, 2014.
- [4] H. Zhang and L. Lu, A low-voltage sense amplifier for embedded flash memories, *IEEE Trans. Circuits and Systems II: Express Briefs*, vol.62, no.3, pp.236-240, 2014.
- [5] J. Boley and B. Calhoun, Stack based sense amplifier designs for reducing input-referred offset, The 16th International Symposium on Quality Electronic Design, pp.1-4, 2015.

- [6] B. S. Amrutur and M. A. Horowitz, A replica technique for wordline and sense control in low-power SRAM's, *IEEE Journal of Solid-State Circuits*, vol.33, no.8, pp.1208-1219, 1998.
- [7] C. D. C. Arandilla and J. A. R. Madamba, Comparison of replica bitline technique and chain delay technique as read timing control for low-power asynchronous SRAM, *The 5th Asia Modelling Symposium*, pp.275-278, 2011.
- [8] U. Arslan, M. P. McCartney et al., Variation-tolerant SRAM sense-amplifier timing using configurable replica bitlines, *IEEE Custom Integrated Circuits Conference*, pp.415-418, 2008.
- [9] Y. Niki, A. Kawasumi et al., A digitized replica bitline delay technique for random-variation-tolerant timing generation of SRAM sense amplifiers, *IEEE Journal of Solid-State Circuits*, vol.46, no.11, pp.2545-2551, 2011.
- [10] C. Peng, Y. Tao et al., A novel cascade control replica-bitline delay technique for reducing timing process-variation of SRAM sense amplifier, *IEICE Electronics Express*, vol.12, no.5, pp.1012-1018, 2015.
- [11] V. K. Rajanna and B. Amrutur, A variation-tolerant replica-based reference-generation technique for single-ended sensing in wide voltage-range SRAMs, *IEEE Trans. Very Large Scale Integration* (VLSI) Systems, vol.24, no.5, pp.1663-1674, 2016.