Reconfigurable and GPU Computing Laboratory

Current Research Projects

Floating Point Arithmetic:

Floating Point Arithmetic for Heterogeneous Architectures

This research aims at tools and techniques to help programmers find "issues" with the use of floating-point arithmetic in parallel scientific code, specifically written using OpenCL. Specifically, the goal is to detect potential sources of reliability and portability deficiencies in such code that are due to dependencies of the floating-point behavior on the underlying (IEEE-compliant) architecture.

VFLOAT: Variable Precision Floating Point Library

VFloat is a library of variable precision floating point units written in VHDL targetting FPGAs. Components include floating point arithmetic (add, sub, mul, div, sqrt, acc) and format conversion (fix2float and float2fix).

GPU Research:

Adaptable Template Matching on CUDA GPUs

Efficient template matching in CUDA for templates of sizes that are not powers of two and that can be very large. The solution adapts to the problem size.

Adding GPU Support to SCIRun

This project incorporates GPU implementations of linear solvers (including conjugate gradient) into the SCIRun biomedical problem solving environment. Users can use the GPU alternative with minimal changes to the environment or to the user experience.

CT Reconstruction Acceleration

This project accelerates Computed Tomography (CT) reconstruction using GPUs. Acceleration can be beneficial for biomedical image reconstruction applications with large datasets. Graphic Processing Units (GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct cone beam computed tomography using two dimensional projections is implemented using GPUs.

Extending Tasks and Conduits with GPU support

This project extends the PVTOL Tasks and Conduits framework to include support for Graphics Processing Units (GPUs), alleviating the difficulty of interfacing with different GPU architectures and creating portable applications.

Portable Application Framework for Heterogeneous Systems

Architecturally diverse systems have become common in high-performance computing, comprised of any variety and number of processing elements (multicore processors, FPGAs, GPUs, etc.). All of these architectures present their own programming challenges and complexities. Each often has its own programming language, development environment, and processing constraints. MIT Lincoln Laboratory developed the Parallel Vector Tile Optimizing Library (PVTOL) as a means of writing high-performance signal and image processing code that is portable across a large number of multicore general purpose computing architectures. This work extends the PVTOL Tasks and Conduits framework to include support for Graphics Processing Units (GPUs), alleviating the difficulty of interfacing with different GPU architectures and creating portable applications.

FPGA Research:

CRASH: Cognitive Radio Accelerated with Software and Hardware

CRASH creates a low latency, high performance Cognitive Radio platform that simplifies offloading algorithms to programmable logic. This research shows that heterogeneous computing systems, such as CRASH, can provide Cognitive Radios substantial processing gains without sacrificing programmability.

CRUSH: Cognitive Radio Universal Hardware Software

The CRUSH hardware platform is composed of a Xilinx ML605 connected to an Ettus USRP through a custom interface board, allowing flexible data transfer between them. Software provides a framework for allowing the FPGA to interact with the USRP and the host. A spectrum sensing application has been implemented to demonstrate CRUSH.

Accelerating Model Checking with FPGAs

We are accelerating MURPHI, an explicit state model checker using FPGAs.

VFLOAT: Variable Precision Floating Point Library

Past Research Projects

Backprojection

Backprojection is the most common algorithm used in the tomographic reconstruction of clinical data. An everyday example is the medical x-ray CAT scan: a person is x-rayed from various angles and the two-dimensional density of the person can be "reconstructed" by using backprojection. However, the restoration is computationally intensive. The project goal is to implement backprojection in reconfiguable hardware thus greatly decreasing the processing time.

Dynamo

Systems with FPGAs in them are inherently hardware/software systems. The simplest of these systems have one host processor and one FPGA both of which are used for computation. We are developing tools to determine when to best make use of the FPGA hardware. Our tools are unique in that they take into account communication costs and overhead costs and not just the raw computational speedup from running an algorithm on FPGA hardware. Our tool focuses on image processing pipelines. It determines what to run in hardware and what in software, generates the pipeline implementation, and runs it. We will extend this work to other application domains as well as to more sophisticated systems with several FPGAs and several processors.

Embedded PowerPC

New FPGA devices have embedded processors on the chip with the reconfigurable logic. We are investigating how best to make use of these embedded processors and how best to interface them to the FPGA logic. In addition, we are investigating ways to quantify computation times for algorithms run on the different types of resources available, including the overhead costs incurred in the interfaces. The goal is to predict how best to partition an application between hardware and software. For this research, we are using software defined radio as a target application.

Finite Difference Time Domain

The Finite-Difference Time-Domain (FDTD) method is one of the most popular numerical methods for the solution of problems in electromagnetics. It is used for analyzing radar cross sections of airplanes, siting cell phone towers, and finding breast tumors, among other applications. One of the challenges to using FDTD is the large amount of computation required. We have accelerated FDTD using FPGAs.

Grape

Graph-based power estimation for designing low-power CMOS VLSI circuits.

HML

A high level hardware description language and its translation toVHDL.

K-means clustering algorithm

K-means clustering in both software and reconfigurable hardware.

Memory Interfacing: Sliding Window Operations for FPGAs

Particle Image Velocimetry

PIV is an important technique used in Fluid Dynamics to determine the flow of particles in a fluid. PIV computes instantaneous velocity vectors for an area of interest. We are using FPGAs to accelerate PIV so that real-time PIV information can be used for control.

Phase Unwrapping

Phase unwrapping is the process of recovering phase information that has been constrained to cycle through the range between -pi and pi. Getting the original phase information is necessary for interference based imaging such as that used in the Optical Quadrature Microscope(OQM). Robust methods for performing this task are very computationally intensive. The goal of this project is to identify the key components of such algorithms and implement them in reconfigurable harware.

Retinal Vascular Tracing

Reconfigurable hardware is used to accelerate an existing real-time algorithm for tracing the vasculature and analysis of intersections and crossovers in live high-resolution retinal fundus image sequences.

Rothko

Northeastern Electron Devices Group

Synthentic Aperture Radar

Environmental monitoring, earth resource mapping, and military systems require broad-area imaging at high resolutions. Many times the imagery must be acquired in inclement weather or during night as well as day. Synthetic Aperture Radar (SAR) provides such a capability. SAR systems take advantage of the long-range propagation characteristics of radar signals and the complex information processing capability of modern digital electronics to provide high resolution imagery. Synthetic aperture radar complements photographic and other optical imaging capabilities because of the minimum constraints on time-of-day and atmospheric conditions and because of the unique responses of terrain and other targets to radar frequencies. We are developing an FPGA system for reconstructing images from SAR data. This project makes use of a Beowulf cluster owned by the DOD which has 48 nodes with an FPGA board at every node. One of the goals of this project is to investigate both the fine grained and coarse grained parallelism available on this cluster, and see how it can best be used to accelerate SAR processing.

VForce: VSIPL ++ Framework

VSIPL++ is the C++ version of the Vector/Signal/Image Processing Library, a library of C and C++ routines for simply and efficiently writing programs to perform standard signal processing functions. This project creates a framework to ease the inclusion of hardware algorithm accelerators (specifically those targeted for FPGAs) in VSIPL++ code development.

Tools Developed in the lab

libHLS

Tools for developing high level synthesis algorithms.

301 Ell Hall Boston, MA 02115 Phone: (617) 373-5294