¡¡
Publications - Conference Papers                                             <<HOME

¡¡
¡¡ [C15] 2011 Xi Tian, Fei Qiao, Dong Zaiwang, Liu Yujun, Zhao Yuting, "Design Methodology for Multipliers with Active Leakage and Dynamic Power Reduction," ICCDA 2011, Xi'an China, 2011.
Abstract¡ªA novel design methodology for multipliers to reducing both active leakage and dynamic power using dynamic power gating is presented, where sleep transistors are inserted between the real and virtual ground rails of various parts of the multipliers which could be selectively turned on/off. On-chip sleep signals are extracted from one input signal of the multipliers which has larger dynamic range. By detecting the magnitude of the input signal, the idle parts of the multipliers are identified and power gating schemes are dynamically applied even when the multipliers are performing useful computation.

Keywords- multiplier; active leakage; dynamic power; power gating

>>DOWNLOAD LINK

¡¡ [C14] 2011 Zidong Du, Bingbing Xia, Fei Qiao, and Huazhong Yang, "System-Level Evaluation of Video Porecessing System Using SimpleScalar-based Multi-core Processor Simulator," in Proc. of  10th ISADS 2011, pp. 256-259, Tokyo and Hiroshima, Japan, March 23-27, 2011.(Oral Presentation)

Abstract£ºMulti-core processor Simulation Platform is always a very important tool in modern multi-core processor designs for the system-level design and evaluation. In this paper, a  multi-core processor simulator is proposed by modifying SimpleScalar\_v3.0 to simulate parallelized multi-core programs. Shared memory is used for the communication between different cores, which is the communication network among several different parts of the parallelized program seperately. Two simulators are designed for different kinds of usage, one for functional simulation and the other for the simulation of the system with two-level cache. The mismatch of such simulator is less than 10${\%}$ on average, and the presented simulator is used to evaluate the high-performance video processing systems.

 Keywords£ºMulti-core processor; Simulator; SimpleScalar; Video processing

>>DOWNLOAD LINK

¡¡ [C13] 2011 Ni Zhou, Fei Qiao, Huazhong Yang, and Hui Wang, "Low-Power Off-Chip Memory Design for Video Decoder Using Embedded Bus-Invert Coding," in Proc. of  10th ISADS 2011, pp. 251-255, Tokyo and Hiroshima, Japan, March 23-27, 2011.(Oral Presentation)

Abstract¡ªIn this paper, a simple, efficient, low power off-chip memory design is proposed, which fully exploits the features of DRAM memory and video application, as well as overcomes the drawbacks of algorithm complexity and system modification of embedded compression, which is a popular way to decrease power consumption of the off-chip memory. The integration of the scheme into video decoder will not involve any extra video decoding complexity. It adopts the simple bus-invert encoding scheme. Based on the fact that the power consumption of logic ¡®0¡¯ bit is less than that of logic ¡®1¡¯, bus-invert encoding scheme is applied to the transferring data between video decoder and off-chip memory. Meanwhile, the features of fault tolerance of human eyes and lossy processing of video decoding application are exploited to solve the extra flag-bit of encoder scheme in off-chip SDARM memory, which has the fixed bit width and is less flexible than on-chip SRAM. This scheme is integrated into MPEG-2 decoder system. The experiment results show that this scheme can archive 20%-35% reduction in power consumption of logic ¡®1¡¯ bit, and the objective quality of image has about 1.5db PSNR improvement on average.

Keywords-low power; bus-invert encoding scheme; off-chip SDRAM memory; MPEG-2 decoder; fault tolerance; lossy processing

>>DOWNLOAD LINK

¡¡ [C12] 2011 Bingbing Xia, Fei Qiao, Huazhong Yang and Hui Wang, "An Efficient Methodology for Transaction-Level Design of Multi-core h.264 Video Decoder,"  in Proc. of ICCE 2011, pp. 399-400, Las Vegas, USA, 2011.(Oral Presentation)

Abstract¡ªH.264 video decoder is a good choice for embedded instruments because of its higher compression ratio than MPEG2, as well as its higher requirements of run-time computational resource. Multi-core system is the future of the embedded processor design for its power efficiency and multi-thread parallelization, and can be used to fit well with the requirements for this decoder. To simulate and evaluate the performance of such application-specific multi-core systems effectively, a method based on the combination of TLM language (SystemC) and shared-memory parallel programming model (OpenMP ) is given,and experiments show that it can effectively simulate the system in a short time and more importantly, it can be used to help analyze the efficiency of each task-parallelization strategy. After optimization, the speedup ratio for each slice decoding can get about 3.06 on average under 4-core multi-core systems.

>>DOWNLOAD LINK

¡¡ [C11] 2010 Chang Li, Fei Qiao and Huazhong Yang, "Low Power Cache Architecture with Security Mechanism," in Proc of ICETC 2010, Shanghai, China, Jun 22-24, 2010.

AbstractEmbedded cryptographic devices not only suffer from traditional physical side channel attacks, but also undergo software cache-based side channel attacks recently. The attacks can easily get the user's confidential data via information leakage in caches, and don't require any special instruments. Among existing countermeasures, software solutions can defend the attacks perfectly while leading to significant performance degradation. Hardware solutions are very effective in reducing performance overhead while their structures are complex. This paper presents a novel easily implemented cache architecture which has an added small cache and adopts certain operation mechanism. Compared with traditional cache architecture, it has reduction in miss rate ranging between 20% and 50%, and has about 8.5% of reduction in power consumption, and is secure at the same time. This paper presents both theoretical analysis and experimental results. In our experiments, the MiBench suite is used to evaluate the cache performance of miss rate and energy consumption, and the security of cache is also analyzed. It also supplies the result of Synopsys Design Compiler of the RTL cache code under the 0.18μm CMOS technology.

>>DOWNLOAD LINK

¡¡ [C10] 2010 Bingbing Xia, Fei Qiao, Huazhong Yang, and Hui Wang, "A Fault-tolerant Structure for Reliable Multi-core Systems Based on Hardware-Software Co-design," in Proc of ISQED 2010, San Jose, CA, USA, March 22-24, 2010.
To cope with the soft errors and make full use of the multi-core system, this paper gives an efficient fault-tolerant hardware and software co-designed architecture for multi-core systems. And with a not large number of test patterns, it will use less than 33% hardware resources compared with the traditional hardware redundancy (TMR) and it will take less than 50% time compared with the traditional software redundancy (time redundant).Therefore, it will be a good choice for the fault-tolerant architecture for the future high-reliable multi-core systems.

>>DOWNLOAD LINK

¡¡ [C9] 2010 Hongli Gao, Fei Qiao, Huazhong Yang, "Efficient 5/3-DWT Based Embedded Compression Algorithm for H.264 High Definition Decoder, " in Proc. of ICCE 2010 Conference, Las Vegas, USA, 2010.

Abstract—To reduce the external memory cost, an efficient 5/3-DWT based embedded compression algorithm is proposed for H.264 decoder. Decoded frames are decomposed into 4×4 blocks which are then compressed into 32-bit or 64-bit segments. We achieve compression ratio1
of 28~33% just with a slight quality degradation.

>>DOWNLOAD LINK

¡¡ [C8] 2009 Fei Qiao, Yuan Zhou, Xiang Xie, and Huazhong Yang, "A Programmable DCO-Based Fast-Locking Clock Generator," in Proc. of  ISPACS 2009 Conference, pp. 93-98, Kanazawa, Japan, 2009.

Abstract—A programmable DCO-based fast-locking clock generator is presented. With a resettable DCO, the clock generator achieves similar jitter performance as conventional MDLL and avoids the initial delay constraints by resetting the output clock every two reference cycles. Compared with the previous work, a shorter locking time is obtained. The proposed clock generator is simulated with generic 1.8V-0.18um CMOS process.The clock multiplication ratio can be programmed from 2 to 15.The frequency range of the input and output clock are 16.7
212.5 MHz and 250 425MHz, respectively,dissipating less than 32mW at all operating frequencies.

>>DOWNLOAD LINK

¡¡ [C7] 2009

Sisi Tan, Fei Qiao, Bingbing Xia, Huazhong Yang, and Hui Wang, "A Functional Model of SystemC-Based MPEG-2 Decoder with Heterogeneous Multi-IP-Cores and Hybrid-Interconnections Architecture," 2nd International Congress on Image and Signal Processing (CISP 2009), pp. 1-5 , Tianjin, China, 2009.

Abstract—In this paper, a functional model of SystemC-Based MPEG-2 decoder is presented, which is of heterogeneous multi-IP-cores and hybrid-interconnections. Considering the application-specific features into the design flow, three important aspects are analyzed, including function partition, parameter sharing, and interconnection topology, which are the key technical difficulties in the system level design of a video decoder.A kind of function unit determination is proposed on account of the balance of traffic load on the communication sub-system on-chip. At the same time, to be able to charge with the heavy transmission burden, a hybrid interconnection with bus and point-to-point (P2P) is applied. Additionally, in order to enhance the decoder performance, high-frequently transmitted and updated parameters are extracted to store in on-chip shared memories. The presented application-specific architecture is proofed to be efficient with a system level model on SystemC-based environments. More, the topology and design flow would be a guideline for the design of such application-specific System-on-Chip (SoC), and set up a quick evaluation method.

>>DOWNLOAD LINK

¡¡ [C6] 2008

Hongli Gao, Fei Qiao, Huazhong Yang, "A Lossless Memory Reduction and Efficient Frame Storage Architecture for HDTV Video Decoder," in Proc. Int. ICALIP2008 Confernce, pp. 593-598, Shanghai, 2008.

Abstract-A novel lossless frame recompression method and an efficient memory address mapping scheme for frame storage are proposed, which can reduce the Read/Write and row activation operations of the external memory. The proposed scheme has been verified with MPEG-2 based HDTV video decoder. Without video quality degradation and the reasing of reading bytes from off-chip memory, the number of bytes writing to the memory is duced about 50% in comparison to the conventional decoder. By storing the luma and chroma components of the same macroblock in one row of different banks, as well as the ten neighboring macroblocks, the logic row change amount can be cut down to near 5% of theconventional scheme.

¡¡

>>DOWNLOAD LINK

¡¡ [C5] 2006

Hongli GAO, Fei QIAO, Dingli WEI, Huazhong YANG, "A Novel Low-Power and High-Speed Master-Slave D Flip-Flop," in Proc. Int. TENCON 2006 Conference, 2006.

Abstract- A novel low-power and high-speed master-salve D flip-flop (MSDFF) is proposed in this paper. Without clocked inverter on critical path, the flip-flop operation speed has been improved. Employing the pseudo-NAND logic in the slave stage, the flip-flop has a smaller clock capacitance load, which helps to reduce the power consumption. The proposed flip-flop is verified with GSMC 1.5V-0.15um CMOS technology. Compared with the widely used conventional D flip-flop, 48% power-delay-product (PDP) saving can be achieved, and both the power consumption and transition delay performance are better than many other flip-flops.

¡¡

>>DOWNLOAD LINK

¡¡ [C4] 2006

Yoshitaka Ueda1, Hideki Yamauchi1, Mamoru Mukuno1, Shinji Furuichi1, Mayumi Fujisawa1, Fei Qiao2, and Huazhong Yang2, "Multimedia Application Signal Processor with 6mW at MPEG Decoding Employing Conditional Pre-charged Flip-Flop," IEEE International Solid-State Circuits Conference (ISSCC 2006), NO. 22.7, pp. 413-415, 662, 2006. ( 1. Sanyo; 2. Tsinghua University )

Techniques used to ensure low power usage are very importantfor battery-driven multimedia processors used in wireless or multimedia applications. However, the latest multimedia applications such as H.264/MPEG4/AAC/MP3 audio/video system [5] are computationally intensive placing ever greater demands on power consumption and forcing further innovation in low-power techniques.Techniques to realize a low power multimedia application signal processor are presented, namely: (1) a parallel processing DSP for low voltage operation, (2) multi-power domains, and (3) a conditional pre-charge flip-flop. The combination of these low power techniques is able to reduce power dissipation without degrading operation speed and area, and uses only a single power supply. The techniques can reduce the power dissipation of the multimedia signal processor by 72.5% without area penalty or speed degradation. Furthermore, it allows the use of conventional CAD tools.

¡¡

>>DOWNLOAD LINK

¡¡ [C3] 2005

Huazhong YANG, Fei QIAO, Gang Huang, Hui WANG, "A Low-Swing Differential Interface Circuit for High-Speed On-Chip Asynchronous Interconnection," in Proc. Int. ASICON'05 Conference, pp., IEEE, Shanghai, 2005.10.24~27 (Oral Presentation)

AbstractA novel low-swing interface circuit for asynchronous interconnection is proposed in this paper. It takes a level-triggered differential latch to recover digital signal with ultra low-swing voltage less than 50mV, and the driver part of the interface circuit is optimized for low power using the method of Driver-Array [1] .The proposed circuit consumes less power than previously reported designs and can work up to 500MHz, which is simulated and fabricated with SMIC 0.18-μm 1.8-V digital CMOS technology

¡¡

>>DOWNLOAD LINK

¡¡ [C2] 2005 Fei QIAO, Huazhong YANG, Hui WANG, "Low Power Switched-Capacitor Circuits Powered By AC-Power Supply," in Proc. Int. ICCCAS'05 Conference, pp. 1075-1078, IEEE, HongKong, 2005.05.27-31 (Oral Presentation).

Abstract—A novel design method for low power switched capacitor (SC) circuits is presented. The new SC circuits make the best of the characteristics of switched capacitor circuits and can be directly powered by alternative-current (AC) power supply. Compared with the traditional direct-current-powered SC (DCPSC) circuits, AC-powered SC (ACPSC) circuits achieve a power saving ratio up to 40% without obvious damage to the settling behavior, which is simulated and fabricated with CSMC 5-V 0.6mm technology

¡¡

>>DOWNLOAD LINK

¡¡ [C1] 2003

Fei QIAO, Huazhong YANG, Hui WANG, "Design Of Low Power Buffer Using Driver-array For On-Chip IPs Interconnection," in Proc. Int. ASICON'03 Conference, pp. 1218-1221, IEEE, Beijing, 2003.10.22~24 (Oral Presentation).

Abstracts: A novel design method of low power buffers is presented. The design flow uses driver-array to optimize the equivalent multi-stage buffer and inserts an additional inverter to keep the same fan-out of the logic signal. It avoids the sightless determination when design interconnection buffers and achieves 4.96%, 42.15% and 22.30% savings of delay, area and power consumption, respectively

Keywords: Low power, Interconnection buffer,Multi-stage buffer, Driver-array

¡¡

>>DOWNLOAD LINK

¡¡

Copyright © 2006 - 2008   Fei Qiao, Tsinghua University, Beijing, P.R.C. All rights reserved

Last Update : 2012-04-30

¡¡