¡¡ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Publications - Conference Papers <<HOME | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C15] 2011 | Xi Tian, Fei Qiao, Dong Zaiwang, Liu Yujun, Zhao Yuting, "Design Methodology for Multipliers with Active Leakage and Dynamic Power Reduction," ICCDA 2011, Xi'an China, 2011. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract¡ªA novel design methodology for
multipliers to reducing both active leakage and dynamic power using
dynamic power gating is presented, where sleep transistors are inserted
between the real and virtual ground rails of various parts of the
multipliers which could be selectively turned on/off. On-chip sleep
signals are extracted from one input signal of the multipliers which has
larger dynamic range. By detecting the magnitude of the input signal,
the idle parts of the multipliers are identified and power gating
schemes are dynamically applied even when the multipliers are performing
useful computation.
Keywords- multiplier; active leakage; dynamic power; power gating |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C14] 2011 | Zidong Du, Bingbing Xia, Fei Qiao, and Huazhong Yang, "System-Level Evaluation of Video Porecessing System Using SimpleScalar-based Multi-core Processor Simulator," in Proc. of 10th ISADS 2011, pp. 256-259, Tokyo and Hiroshima, Japan, March 23-27, 2011.(Oral Presentation) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract£ºMulti-core processor Simulation Platform is always a very important tool in modern multi-core processor designs for the system-level design and evaluation. In this paper, a multi-core processor simulator is proposed by modifying SimpleScalar\_v3.0 to simulate parallelized multi-core programs. Shared memory is used for the communication between different cores, which is the communication network among several different parts of the parallelized program seperately. Two simulators are designed for different kinds of usage, one for functional simulation and the other for the simulation of the system with two-level cache. The mismatch of such simulator is less than 10${\%}$ on average, and the presented simulator is used to evaluate the high-performance video processing systems. Keywords£ºMulti-core processor; Simulator; SimpleScalar; Video processing |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C13] 2011 | Ni Zhou, Fei Qiao, Huazhong Yang, and Hui Wang, "Low-Power Off-Chip Memory Design for Video Decoder Using Embedded Bus-Invert Coding," in Proc. of 10th ISADS 2011, pp. 251-255, Tokyo and Hiroshima, Japan, March 23-27, 2011.(Oral Presentation) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract¡ªIn this paper, a simple, efficient, low power off-chip memory design is proposed, which fully exploits the features of DRAM memory and video application, as well as overcomes the drawbacks of algorithm complexity and system modification of embedded compression, which is a popular way to decrease power consumption of the off-chip memory. The integration of the scheme into video decoder will not involve any extra video decoding complexity. It adopts the simple bus-invert encoding scheme. Based on the fact that the power consumption of logic ¡®0¡¯ bit is less than that of logic ¡®1¡¯, bus-invert encoding scheme is applied to the transferring data between video decoder and off-chip memory. Meanwhile, the features of fault tolerance of human eyes and lossy processing of video decoding application are exploited to solve the extra flag-bit of encoder scheme in off-chip SDARM memory, which has the fixed bit width and is less flexible than on-chip SRAM. This scheme is integrated into MPEG-2 decoder system. The experiment results show that this scheme can archive 20%-35% reduction in power consumption of logic ¡®1¡¯ bit, and the objective quality of image has about 1.5db PSNR improvement on average. Keywords-low power; bus-invert encoding scheme; off-chip SDRAM memory; MPEG-2 decoder; fault tolerance; lossy processing |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C12] 2011 | Bingbing Xia, Fei Qiao, Huazhong Yang and Hui Wang, "An Efficient Methodology for Transaction-Level Design of Multi-core h.264 Video Decoder," in Proc. of ICCE 2011, pp. 399-400, Las Vegas, USA, 2011.(Oral Presentation) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract¡ªH.264 video decoder is a good choice for embedded instruments because of its higher compression ratio than MPEG2, as well as its higher requirements of run-time computational resource. Multi-core system is the future of the embedded processor design for its power efficiency and multi-thread parallelization, and can be used to fit well with the requirements for this decoder. To simulate and evaluate the performance of such application-specific multi-core systems effectively, a method based on the combination of TLM language (SystemC) and shared-memory parallel programming model (OpenMP ) is given,and experiments show that it can effectively simulate the system in a short time and more importantly, it can be used to help analyze the efficiency of each task-parallelization strategy. After optimization, the speedup ratio for each slice decoding can get about 3.06 on average under 4-core multi-core systems. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C11] 2010 | Chang Li, Fei Qiao and Huazhong Yang, "Low Power Cache Architecture with Security Mechanism," in Proc of ICETC 2010, Shanghai, China, Jun 22-24, 2010. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract—Embedded cryptographic devices not only suffer from traditional physical side channel attacks, but also undergo software cache-based side channel attacks recently. The attacks can easily get the user's confidential data via information leakage in caches, and don't require any special instruments. Among existing countermeasures, software solutions can defend the attacks perfectly while leading to significant performance degradation. Hardware solutions are very effective in reducing performance overhead while their structures are complex. This paper presents a novel easily implemented cache architecture which has an added small cache and adopts certain operation mechanism. Compared with traditional cache architecture, it has reduction in miss rate ranging between 20% and 50%, and has about 8.5% of reduction in power consumption, and is secure at the same time. This paper presents both theoretical analysis and experimental results. In our experiments, the MiBench suite is used to evaluate the cache performance of miss rate and energy consumption, and the security of cache is also analyzed. It also supplies the result of Synopsys Design Compiler of the RTL cache code under the 0.18μm CMOS technology. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C10] 2010 | Bingbing Xia, Fei Qiao, Huazhong Yang, and Hui Wang, "A Fault-tolerant Structure for Reliable Multi-core Systems Based on Hardware-Software Co-design," in Proc of ISQED 2010, San Jose, CA, USA, March 22-24, 2010. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To cope with the soft errors and make full use of the multi-core system, this paper gives an efficient fault-tolerant hardware and software co-designed architecture for multi-core systems. And with a not large number of test patterns, it will use less than 33% hardware resources compared with the traditional hardware redundancy (TMR) and it will take less than 50% time compared with the traditional software redundancy (time redundant).Therefore, it will be a good choice for the fault-tolerant architecture for the future high-reliable multi-core systems. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C9] 2010 | Hongli Gao, Fei Qiao, Huazhong Yang, "Efficient 5/3-DWT Based Embedded Compression Algorithm for H.264 High Definition Decoder, " in Proc. of ICCE 2010 Conference, Las Vegas, USA, 2010. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract—To reduce the
external memory cost, an efficient 5/3-DWT based embedded compression
algorithm is proposed for H.264 decoder. Decoded frames are decomposed
into 4×4 blocks which are then compressed into 32-bit or 64-bit
segments. We achieve compression ratio1 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C8] 2009 | Fei Qiao, Yuan Zhou, Xiang Xie, and Huazhong Yang, "A Programmable DCO-Based Fast-Locking Clock Generator," in Proc. of ISPACS 2009 Conference, pp. 93-98, Kanazawa, Japan, 2009. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract—A programmable
DCO-based fast-locking clock generator is presented. With a resettable
DCO, the clock generator achieves similar jitter performance as
conventional MDLL and avoids the initial delay constraints by resetting
the output clock every two reference cycles. Compared with the previous
work, a shorter locking time is obtained. The proposed clock generator
is simulated with generic 1.8V-0.18um CMOS process.The clock
multiplication ratio can be programmed from 2 to 15.The frequency range
of the input and output clock are 16.7 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C7] 2009 |
Sisi Tan, Fei Qiao, Bingbing Xia, Huazhong Yang, and Hui Wang, "A Functional Model of SystemC-Based MPEG-2 Decoder with Heterogeneous Multi-IP-Cores and Hybrid-Interconnections Architecture," 2nd International Congress on Image and Signal Processing (CISP 2009), pp. 1-5 , Tianjin, China, 2009. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract—In this paper, a functional model of SystemC-Based MPEG-2 decoder is presented, which is of heterogeneous multi-IP-cores and hybrid-interconnections. Considering the application-specific features into the design flow, three important aspects are analyzed, including function partition, parameter sharing, and interconnection topology, which are the key technical difficulties in the system level design of a video decoder.A kind of function unit determination is proposed on account of the balance of traffic load on the communication sub-system on-chip. At the same time, to be able to charge with the heavy transmission burden, a hybrid interconnection with bus and point-to-point (P2P) is applied. Additionally, in order to enhance the decoder performance, high-frequently transmitted and updated parameters are extracted to store in on-chip shared memories. The presented application-specific architecture is proofed to be efficient with a system level model on SystemC-based environments. More, the topology and design flow would be a guideline for the design of such application-specific System-on-Chip (SoC), and set up a quick evaluation method. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ | [C6] 2008 |
Hongli Gao, Fei Qiao, Huazhong Yang, "A Lossless Memory Reduction and Efficient Frame Storage Architecture for HDTV Video Decoder," in Proc. Int. ICALIP2008 Confernce, pp. 593-598, Shanghai, 2008. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract-A novel lossless frame recompression method and an efficient memory address mapping scheme for frame storage are proposed, which can reduce the Read/Write and row activation operations of the external memory. The proposed scheme has been verified with MPEG-2 based HDTV video decoder. Without video quality degradation and the reasing of reading bytes from off-chip memory, the number of bytes writing to the memory is duced about 50% in comparison to the conventional decoder. By storing the luma and chroma components of the same macroblock in one row of different banks, as well as the ten neighboring macroblocks, the logic row change amount can be cut down to near 5% of theconventional scheme. ¡¡ |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2006 - 2008 Fei Qiao, Tsinghua University, Beijing, P.R.C. All rights reserved Last Update : 2012-04-30 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
¡¡ |