Current Research Projects
Floating Point Arithmetic:
This research aims at tools and techniques to help programmers find "issues"
with the use of floating-point arithmetic in parallel scientific code,
specifically written using OpenCL. Specifically, the goal is to detect
potential sources of reliability and portability deficiencies in such code
that are due to dependencies of the floating-point behavior on the
underlying (IEEE-compliant) architecture.
VFloat is a library of variable precision floating point units written in VHDL targetting FPGAs.
Components include floating point arithmetic (add, sub, mul, div, sqrt, acc) and format conversion
(fix2float and float2fix).
GPU Research:
Efficient template matching in CUDA for templates of sizes that are not powers of two and
that can be very large. The solution adapts to the problem size.
This project incorporates GPU implementations of linear solvers (including conjugate
gradient) into the SCIRun biomedical problem solving
environment. Users can use the GPU alternative with minimal changes to the environment or
to the user experience.
This project accelerates Computed Tomography (CT) reconstruction using GPUs.
Acceleration can be beneficial for biomedical image reconstruction applications
with large datasets.
Graphic Processing Units (GPUs) are particularly useful in this context
as they can produce high fidelity images rapidly.
An image algorithm to reconstruct cone beam computed tomography
using two dimensional projections is implemented using GPUs.
This project extends the PVTOL Tasks and Conduits framework to include support for Graphics
Processing Units (GPUs), alleviating the difficulty of interfacing with different GPU
architectures and creating portable applications.
Architecturally diverse systems have become common in high-performance
computing, comprised of any variety and number of processing elements
(multicore processors, FPGAs, GPUs, etc.). All of these architectures present
their own programming challenges and complexities.
Each often has its own programming language, development environment,
and processing constraints. MIT Lincoln Laboratory developed the
Parallel Vector Tile Optimizing Library (PVTOL) as a means of writing
high-performance signal and image processing code that is portable
across a large number of multicore general purpose computing architectures.
This work extends the PVTOL Tasks and Conduits framework to
include support for Graphics Processing Units (GPUs), alleviating the
difficulty of interfacing with different GPU architectures and creating portable applications.
FPGA Research:
CRASH creates a low latency, high performance Cognitive Radio
platform that simplifies offloading algorithms to programmable
logic. This research shows that heterogeneous computing systems,
such as CRASH, can provide Cognitive Radios substantial processing
gains without sacrificing programmability.
The CRUSH hardware platform is composed of
a Xilinx ML605 connected to an Ettus USRP through a custom interface board, allowing
flexible data transfer between them. Software provides a framework
for allowing the FPGA to interact with the USRP and the host.
A spectrum sensing application has been implemented to demonstrate CRUSH.
We are accelerating MURPHI, an explicit state model checker using FPGAs.
VFloat is a library of variable precision floating point units written in VHDL targetting FPGAs.
Components include floating point arithmetic (add, sub, mul, div, sqrt, acc) and format conversion
(fix2float and float2fix).
Past Research Projects
Backprojection is the most common algorithm used in the tomographic
reconstruction of clinical data. An everyday example is the
medical x-ray CAT scan: a person is x-rayed from various angles and the
two-dimensional density of the person can be "reconstructed" by using
backprojection. However, the restoration is computationally intensive. The project
goal is to implement backprojection in reconfiguable hardware thus
greatly decreasing the processing time.
Dynamo
Systems with FPGAs in them are inherently hardware/software systems. The
simplest of these systems have one host processor and one FPGA both of
which are used for computation. We are developing tools to determine when
to best make use of the FPGA hardware. Our tools are unique in that they
take into account communication costs and overhead costs and not just the
raw computational speedup from running an algorithm on FPGA hardware. Our
tool focuses on image processing pipelines. It determines what to run in
hardware and what in software, generates the pipeline implementation, and
runs it. We will extend this work to other application domains as well as
to more sophisticated systems with several FPGAs and several processors.
Embedded PowerPC
New FPGA devices have embedded processors on the chip with the
reconfigurable logic. We are investigating how best to make use of these
embedded processors and how best to interface them to the FPGA logic. In
addition, we are investigating ways to quantify computation times for
algorithms run on the different types of resources available, including
the overhead costs incurred in the interfaces. The goal is to predict how
best to partition an application between hardware and software. For this
research, we are using software defined radio as a target application.
The Finite-Difference Time-Domain (FDTD) method is one of the most
popular numerical methods for the solution of problems in
electromagnetics. It is used for analyzing radar cross sections of
airplanes, siting cell phone towers, and finding breast tumors, among
other applications.
One of the challenges to using FDTD is the large
amount of computation required. We have accelerated FDTD
using FPGAs.
Graph-based power estimation for designing low-power CMOS VLSI circuits.
HML
A high level hardware description language and its translation toVHDL.
K-means clustering in both software and reconfigurable hardware.
PIV is an important technique used in
Fluid Dynamics to determine the flow of particles in a fluid. PIV computes instantaneous
velocity vectors for an area of interest. We are using FPGAs
to accelerate PIV so that real-time PIV information can be used for
control.
Phase unwrapping is the process of recovering phase information that
has been constrained to cycle through the range between -pi and pi.
Getting the original phase information is necessary for interference
based imaging such as that used in the Optical Quadrature
Microscope(OQM). Robust methods for performing this task are very
computationally intensive. The goal of this project is to
identify the key components of such algorithms and implement them in
reconfigurable harware.
Reconfigurable hardware is used to accelerate an existing real-time algorithm for tracing
the vasculature and analysis of intersections and crossovers in live high-resolution
retinal fundus image sequences.
Rothko
Synthentic Aperture Radar
Environmental monitoring, earth resource mapping, and military systems
require broad-area imaging at high resolutions. Many times the imagery
must be acquired in inclement weather or during night as well as day.
Synthetic Aperture Radar (SAR) provides such a capability. SAR systems
take advantage of the long-range propagation characteristics of radar
signals and the complex information processing capability of modern
digital electronics to provide high resolution imagery. Synthetic
aperture radar complements photographic and other optical imaging
capabilities because of the minimum constraints on time-of-day and
atmospheric conditions and because of the unique responses of terrain and
other targets to radar frequencies. We are developing an FPGA system for
reconstructing images from SAR data. This project makes use of a Beowulf
cluster owned by the DOD which has 48 nodes with an FPGA board at every
node. One of the goals of this project is to investigate both the fine
grained and coarse grained parallelism available on this cluster, and see
how it can best be used to accelerate SAR processing.
VSIPL++ is the C++ version of the Vector/Signal/Image Processing Library, a
library of C and C++ routines for simply and efficiently writing programs to
perform standard signal processing functions. This project creates a
framework to ease the inclusion of hardware algorithm accelerators
(specifically those targeted for FPGAs) in VSIPL++ code development.
Tools Developed in the lab
Tools for developing high level synthesis algorithms.