# An Ultra Energy Efficient Nonvolatile Processor for Self-powered Sensor Platforms

## ABSTRACT

This design presents a fabricated ultra low power nonvolatile processor and a processor based self-powered sensor platform. To realize the entire design, an efficient nonvolatile flip-flop controller is employed and a reconfigurable voltage detection system is designed to adapt power failures. A self-powered sensor is then proposed to demonstrate its low power properties. Measurement results show that this nonvolatile processor achieve 70x energy saving in wake-up action and 19000x energy saving in sleep action compared with conventional microprocessors. Such capabilities will provide high practicability in fine-grained power management and energy harvesting applications.

## Keywords

Nonvolatile processor, sleep/wake-up, leakage power

## **1. INTRODUCTION**

In embedded or mobile applications, the computing system has high requirements in power consumption. As the CMOS feature size is scaling down, the leakage current is becoming more and more severe. Some low power processors cut off the power supply to eliminate the leakage current when idle. However, in that condition, the states in volatile registers and local memories will be lost which makes the system recovery inefficient or even impossible.

To solve the data backup problem in low power processor, nonvolatile technology is proposed. Some commercial micro-processors can back up the states into centralized nonvolatile memories (NVM) in low power modes [?,?]. It can achieve both the zero standby leakage and efficient system recovery. However the backup and recover operation can introduce nontrivial power and performance overheads considering the data transferring from local registers to peripheral NVM. In actual measurement, backing up 1KB data to a centralized Flash or FRAM consumes more than 1ms which cannot adapt fine-grained power gating or accidental power failure.



Figure 1: Architecture comparison of NVP and processor with centralized NVM

To reduce power and performance overheads induces by mass data transferring in a centralized NVM architectures, the concept of a nonvolatile processor (NVP) [?,?,?,?] is proposed. An NVP introduces distributed nonvolatile flipflops (FFs) to replace regular CMOS register and memory cells. By storing the system state in local FFs, the state can be stored and recalled in parallel despite the data quantities. Fig. ?? shows the difference between NVP and centralized NVM based processor. In theory, an NVP has the potential to reduce the sleep and wake-up time to several nanoseconds, which provides the capability of "instant" sleep and wake-up. Furthermore, the local data storage makes it low power consumption in power on/off switching and NVP with high resilience to power failure and high flexibility for power management. Therefore, NVP can help to support more efficient fine-grained power management and energy harvesting systems.

In this design we realized an NVP chip based on ferroelectric flip-flop. Moreover, we design a system including power failure detection and power management to facilitate it to low power applications. The contribution of our design is listed as following:

- 1. This design is the first fabricated nonvolatile processor with zero standby power, nano-Jole backup/recall energy and micro-seconds sleep/wake-up time. Comparing with existing industry processors, our NVP can achieve over 30-100x speedup on the wake-up/sleep time and over 70x energy savings
- 2. We introduce a design The low switching overhead of the NVP attributes to careful design of the FF controller and the voltage decision system. Meanwhile the NVP exhibits comparative performance and power consumption in normal operations.

The rest of this paper is organized as follows. Section ??

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...\$15.00.

discusses the chip design flow and its circuit structure. Then we describe the system level design in Section ?? including the power management and self-powered sensor platform. The chip measurement results are presented in Section ??. We conclude the paper in Section ??.

### 2. CHIP DESIGN

In this part, we provide the overall description of NVP chip design. Firstly, we give out the chip design flow, then follows the whole chip architecture, and some detailed circuit structures.

## 2.1 NVP Design Flow

The basic differences between NVP design and conventional processor design is locally replacing the memory cells and the controlling of nonvolatile read/write. To implement these two steps, we propose a general NV-processor design flow shown in Fig ??. The first step is division of volatile realm and nonvolatile realm which chooses the memory units to be non-volatilized. In our design, considering the recoverability, all registers and internal memory are chosen. However, this totally replacement may cause a certain degree of area overhead. In the second step, we replace the chosen memory units with nonvolatile flip-flops. The replacement is deployed in the gate level. It can be executed in three steps: I) The RTL circuit description is synthesized into a gate net-list with Synopsys Design Compiler; II) We use an in house script to replace all volatile flip-flops with NVFFs; III) To drive all NVFFs with a proper sequence during the backup/recall process, an NVFF controller is integrated to instruct the behavior. The NVFF controller is designed on purpose and it will be discussed in subsection. The rest step is the same as conventional VLSI front-end design flow.



Figure 2: Design Flow from a Volatile Processor to a Nonvolatile One

## 2.2 NVP Architecture

Fig. ?? shows the block diagram of our designed NVP. It contains an 8051 core based on Intel MCS51 instruction set, which consists of an MCU controller, an 8-bit arithmetic logic unit (ALU), several timers and counters, a serial interface, a 128-Byte register file and an 8K-Byte static random access memory (SRAM).Two peripherals, SPI and I2C



Figure 3: Overall architecture of nonvolatile processor

modules, are connected to the core via a Wishbone bus. A JTAG module is added to support online debugging via scan chains. There is a Mode module to control the operating mode of the processors, including the volatile mode, the nonvolatile mode and the debug mode. The flip-flop controller (FFC) is used to generate controlling signals for both FeFFs and volatile flip-flops (VFFs). In this design, the SRAM is used as a remote data memory so we don't replace it with a nonvolatile memory. However, it can be replaced with ferroelectric RAM (FeRAM) in future. As we don't need to keep the states of SPI and I2C modules, the two modules are implemented in the volatile logic domain.

#### 2.3 Nonvolatile Flip-flop and Controller

The ferroelectric flip-flop (FeFF) used in this chip is shown in Fig. ??(a). It adopts a hybrid CMOS and ferroelectric technology, consisting of a standard master-slave D flipflop (DFF) and a backup ferroelectric capacitors (FeCap). They are isolated by two CMOS switches M1/M2 controlled by "RW" signal. In normal operating mode, the switches M1/M2 are open so the FeFF works as a standard DFF. Therefore, no performance lost will be introduced to the NVP during the normal operations. When a sleep or wakeup signal is detected, the controller will generate signals to make FeFFs store and restore their states accordingly. In the store operation, M1/M2 are short, and the complementary "Dout" is injected to the positive plates of two FeCaps. The "PL" is pulled up to ploarize the FeCaps to different states. Therefore, the data is stored into the pair of FeCaps. In the restore operation, M1/M2 are short again, and the "PCH" is pull up to shorten nodes "a" and "b". The back-to-back inverters operate in the semi-stable state. After that, "PL" is pulled up and "PCH" is pulled down simultaneously. The pair of FeCaps would drive the nodes "a" and "b" with different currents until the back-to-back inverters go to a stable "0" or "1" state. The complementary FeCap and differential architecture help to improve the reliability and performance.

The flip-flop controller (FFC) is used to generate sequential controlling signals to FeFF in sleep and wake-up actions. Fig. ??(a) shows the block diagram of FFC. It is



Figure 4: Hybrid structure of ferroelectric flip-flop



Figure 5: Block diagram of flip-flop controller

composed of a timing block and a signal generating finite state-machine (FSM). The timing block is self-timed by the inverter chain and the three timers provide overflow signals (Tov1-Tov3) to the FSM. The FSM generates the controlling signals ("RW","PL","PCH") based on Tov1-Tov3 to meet the sequential requirements. The FSM has two different state cycles for sleep and wake-up operation and distinguished by the input signal "Sleep/Wake-up". The "CG" signal gates the clock of both FeFF and volatile FF during the store and recall actions. Because of the clock gating, the NVP can maintain the data in FeFF during store action to prevent writing uncertainty and hold system stably during recall action to guarantee precise recovery.

#### 3. SYSTEM DESIGN

In this part, we discuss the system level design supporting the NVP chip working in low power applications. Firstly we discuss the power management architecture supporting active and passive power gating. Then we describe the power failure detection circuit which is very critical in the merging of NVP and energy harvesting system. Finally, we propose the self-powered sensor platform design.

#### 3.1 Off-chip Power Management

The power management architecture is shown in Fig ??. It supports both active and passive power gating controlled by NVP's mode selection signal. In the active mode, the NVP manage its power switching itself that when to sleep and how long it stays in power off status. When sleep, it first configures the timer to set the sleep time. Then the controller switch off voltage supply from DC/DC and provides "Sleep" signals after a certain time to wait for the data backup. After the timer overflows, the signal controller switches on power supply then generate the "Wake-up" signal after a period to wait for clock stable.

In the passive mode, the NVP works under accidental power failures which is available in energy harvesting applications. In this condition, a voltage detection circuit is used to detect the power falling and rising, then generate "Sleep/Wake-up" signal as soon as possible. An capacitor



Figure 6: Active and passive power management architecture



Figure 7: Architecture of power failure detection block

is required to slow down the voltage dropping to guarantee enough data backup time. The capacity is modestly chosen, since too large value may deteriorate the sleep speed while too small value cannot provide enough energy for data backup.

#### **3.2** Power Failure Detection

We will discuss how the voltage detection circuit generates the "Sleep/Wake-up" signal when power failures happen. We use a configurable circuit to achieve it.

The circuit structure is shown in Fig. ??, which contains two configurable units. The first one is a switched capacitor array attached to the power line. The configurable capacitor, denoted as  $C_{PL}$ , provides the data backup energy to NVP by keeping the voltage above the operating threshold after the power is cut down. The other one, denoted as  $C_{VD}$ , is another switched capacitor array used in the voltage detection circuit. The voltage detection circuit could detect the power failure and recovery. It generates the "Sleep/Wakeup" signal (0 denotes sleep, 1 denotes wake-up) at a certain time after the "VDD" passes the detecting threshold. The detection latency is determined considering the tradeoff between system reliability and backup speed. The control words for those switched capacitor arrays are given by external input switches. Measurement results of  $C_{PL}$  and  $C_{VD}$ impacting the sleep/wake-up speed and power are given in subsection ??.

#### **3.3** Self-powered sensor platform

The most novel application to demonstrate the low power



Figure 8: Architecture of energy driven sensor platform and its realization



Figure 9: Signal timing chart in sleep and wake-up actions

NVP is self-powered sensor. The sensor consists of the energy harvesting module (e.g.solar cell), the power management unit (PMU) and the nonvolatile processor (NVP), shown in Fig. ??. Only several square centimeters solar cell can provide 6 V and more than 5 mW power supply under medium sun light which is sufficient for the processor. The PMU has functions of energy detecting and voltage regulating. It measures the energy stored on the capacitor  $C_{PL}$  and generate activate signals to NVP meanwhile regulate voltage supply properly to the NVP. The realized sensor platform based on NVP is shown in Fig. ??

To better describe the system working mechanism, we draw the signal timing chart of the sleep and wake-up actions in Fig. ??. In the sleep action, when the PMU detects the power dropping, it first generates the sleep signal and maintains the power supply via the capacitor until system state is stored in nonvolatile cells then cuts off the power. In the wake-up action, the PMU detects power regaining, then provides power supply to the NVP until power is stable and after that generates wake-up signal to restart the NVP. According to the measured results, the wake-up action consumes less than  $100 \, \mu s$  and the sleep action consumes around  $50 \, \mu s$  which reveals our system can work under a frequently interrupted power supply.

## 4. MEASUREMENTS

## 4.1 Test Setup and Performance Overview

We name the nonvolatile processor as **THU1010N**, has been fabricated using the ROHM's  $0.13\mu m$  CMOS-ferroelectric hybrid process. Fig. **??**(a) shows its photomicrograph. We test the processor with a suite of embedded benchmarks for sensor networks, which includes FFT, FIR Filter, AES encryption and Zigbee MAC protocol. To test the nonvolatile function, we provide a square wave power supply to the board, and the "Sleep/Wake-up" signal should be automatically generated by the voltage detection circuit. Figure ??(b) shows the testing platform of **THU1010N**. The performance measurements include the maximum operating frequency, the average power consumption, and the sleep/wake-up metrics with controlled power supplies. We illustrate those measurements in the first column of Table ??.



# Figure 10: Micrograph and general design statistics of THU1010N

Table ?? compares the NVP chip with a popular industrial processor "MSP430" [?] and an emerging processor based on FeRAM "MSP430FR series" [?]. The result shows our NVP chip has a lower power consumption while achieves comparative operating parameters as the other two chips. Also, as a nonvolatile chip, the entire chip can be powered down which lead to a zero-standby power. Moreover, our NVP's sleep/wake-up time is much shorter, which shows 100-1000x speedup in the sleep time and 30-100x speedup in the wake-up time. Therefore, we can conclude that the distributed nonvolatile architecture of NVP will provide much better performance than the existing centralized nonvolatile storage.

Table 1: Overall Properties of Nonvolatile ProcessorChip and Comparison Results

| Microprocessor Type |                   | THU1010N            | TI-MSP430-5series | TI-MSP430         |
|---------------------|-------------------|---------------------|-------------------|-------------------|
|                     |                   |                     | with Flash        | with FRAM         |
| General             | I/O Pin Number    | 100                 | 64-80             | 24-40             |
| Statistics          | Memory Capacity   | 1607-bit FeFF       | 2-16KB SRAM       | 16KB FRAM         |
|                     |                   | 8KB SRAM            | 32-128KB FLASH    | 1KB SRAM          |
|                     | Non-volatilizaion | Register            | Memory            | Memory            |
|                     | level             | level               | level             | level             |
|                     |                   |                     |                   |                   |
| Basic               | Max. Clock Freq.  | 25MHz               | 25MHz             | 24MHz             |
| Properties          | VDD for Core      | 0.9V-1.5V           | 1.8V-3.6V         | 2-3.6V            |
|                     | Active Power      | $160 \mu W @ 1 MHz$ | $450 \mu W$ @1MHz | $200 \mu W$ @1MHz |
|                     | Standby Power     | 0                   | $0.18 \mu A$      | $0.32 \mu A$      |
|                     |                   |                     |                   |                   |
| Sleep/Wake-up       | Sleep time        | $7 \mu s$           | 6 <i>ms</i>       | $212 \mu s$       |
| Time                | Wake-up time      | $3\mu s$            | 3ms               | $310 \mu s$       |

The sleep/wake-up time of "MSP430" come from the switch time between LPM4.5 mode and active mode (more details in [?,?])

## 4.2 Sleep and Wake-up Energy

The energy consumption for each sleep and wake-up operation is a key property for a low power chip. We compare the NVP with two other popular data backup method: the

 Table 2: Data store and reload energy consumption

 of three different data backup locations.





Figure 11: NVP sleep and wake-up time under different voltage supply

backup with off-chip Flash memory and the backup with onchip Flash memory. Table ?? shows the energy consumption to store and recall the 1607-bit data (the FeFF number in the NVP processor) with those three methods. Comparing with the other two, our chip's method, i.e. using distributed NV memory architecture, can decrease the energy consumption by 19000 times in the data store and 74 times in the data recall. The results indicate that much higher energy efficiency can be obtained if we using nonvolatile registers instead of the centralized nonvolatile memory.

#### 4.3 Sleep and Wake-up Time

We measure the sleep and wake-up time of the processor in details. To measure the chip-level sleep and wake-up time, we use a pattern generator to provide both "VDD" and "Sleep/Wake-up" signals. In the experiments, the NVP executes a counting program under a square wave power supply. By shorten the pulse width of "VDD" and the time interval between "VDD" and "Sleep/Wake-up", we can measure the minimal chip-level sleep and wake-up time. The result shows that the sleep and wake-up time is  $7\mu s$  and  $3\mu s$  under 1.5V power supply. Also, the core voltage has a impact on the sleep and wake-up time. Fig. ?? shows the sleep and wakeup time under different voltage supplies. As we can see, the lower voltage will lead to a slower speed. It is because the delays of both FeFF and FFC circuit become larger. However, the sleep and wake-up time can still be smaller than  $20\mu s$  and  $10\mu s$  under a even 0.8V supply voltage.

#### 4.4 Unstable power tolerance

In this part we measure the chip's tolerance under interrupted power supply. Fig ?? shows the minimum power pulse width under different  $C_{PL}$  and  $C_{VD}$ . The testing power pulse width can reflect the chip's robust under interrupted power supply. The  $C_{PL}$  impacts the "VDD" power stabilization, so minimum power pulse width decreases when  $C_{PL}$  gets lower. However, if the  $C_{PL}$  is smaller than 470nF, the chip doesn't have enough energy to backup its system states. The  $C_{VD}$  impacts the voltage detection time, so



Figure 12: The minimal power-on time for system accuracy under different capacitance values

the minimum power pulse width decreases when  $C_{VD}$  gets lower. The  $C_{VD}$  cannot be smaller than 10pF for the power and clock stabilizing concern. As we can see, the minimal requested power-on time (>  $100\mu s$ ) is much larger than the chip intrinsic sleep and wake-up time (<  $10\mu s$ ) because the most dominating factor in the system-level is the power and clock stability speed, instead of the inner chip circuit speed.

#### 5. CONCLUSIONS

Nonvolatile processors, as a emerging computing technique, opens new approaches to low power computation and high power failure resilience. We fabricated a ferroelectric flip-flop based nonvolatile processor and evaluate its properties under some extreme condition. Contrasting to existing microprocessors, our design has surpassing advantages on leakage power, sleep and wake-up power and speed, and power failure tolerance which facilitates it appropriate in low power applications.

#### 6. **REFERENCES**

- C. Holland. First MRAM-based FPGA taped-out. Website: http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId =4200035.
- [2] Rohm Co., Ltd. Rohm Demonstrates Nonvolatile CPU. Website:
- http://techon.nikkeibp.co.jp/english/NEWS\_EN/20071004/140206
  [3] TI. datasheet of msp430f522x mixed signal
- microcontrollers. 2009.
- [4] TI. datasheet of msp430fr573x mixed signal microcontrollers. 2011.
- [5] W. Yu, S. Rajwade, S. Wang, B. Lian, G. Suh, and E. Kan. a non-volatile microcontroller with integrated floating-gate transistors. In *Proceedings of the 5th Workshop on Dependable and Secure Nanocomputing*, pages 1–4. ACM Press, 2011.
- [6] W. Zhao, E. Belhaire, V. Javerliac, C. Chappert, and B. Dieny. Evaluation of a non-volatile fpga based on mram technology. In *Integrated Circuit Design and Technology*, 2006. ICICDT'06. 2006 IEEE International Conference on, pages 1–4. IEEE, 2006.