Data Intensive SWOs Speedup Using FPGA Board


Objective

People

Project Description

Publications

กก


Objective

Field Programmable Gate Array (FPGA) based computing board, working as a co-processor together with the host, can be used to speedup computational intensive and data intensive algorithms. Designers decide whether the board can meet the application requirements by estimating the maximal speedup according to the board constraints. A fast and accurate estimation method is presented here for Sliding Window Operation (SWO) based applications, which is widely used in image processing. By defining three upper bounds according to area constraints, memory bandwidth and on-chip memory size constraints, the maximal speedup is estimated and a corresponding hardware block structure is also determined at the same time.

[Back to top]


People

Miriam Leeser Professor Northeastern University
Haiqian Yu PhD Student Northeastern University

[Back to top]


Project Description

Background

COTS(Commercial Off-The-Shelf) Computing Boards

  •   FPGA Chips
  •   External memory banks
  •   Interface between FPGA chips and memory banks
  •   Connection to host Processor

SWOs (Sliding Window Operations)

  •   Operation defined over a neighborhood pixels
  •   Window moves in a raster-scan order
  •   Widely used in image processing

กก

Why Implementing SWOs into FPGA board

  •   SWOs are computationally intensive and data intensive
  •   Inherited parallelism makes SWO suitable for hardware implementation
  •   On-chip memory can be used to form efficient memory hierarchy for better performance

Solution

Area Constraint

  •   FPGA area is limited
  •   Each function unit occupy some area
  •   20% area of FPGA needs to be reserved for routing purpose

กก

Memory Bandwidth Constraint

  •   Input and output data needs to be loaded or stored to memory using memory port
  •   The maximal data transfer rate is limited by the board
  •   Buffering method can reduce the redundantly data transfer

กก

On-chip Memory Constraint

  •   A common data access pattern can be found for SWOs
  •   Block buffering method can optimally use the on-chip memory resources
  •   The size of the block buffer can determine the implementation performance 

By selecting the most tight constraint, the maximal speedup is estimated and a corresponding hardware block structure is also determined at the same time.

Example

[Back to top]


Publications

ก์         "Optimizing Data Intensive Window-based Image Processing on Reconfigurable Hardware Boards"

Haiqian Yu, Miriam Leeser, IEEE Workshop on Signal Processing Systems, Nov 2005, Accepted.

[Back to top]