Current Research: Completeness Thresholds for Bounded Model Checking  

This page is a reproduction of the project webpage

      http://www.ccs.neu.edu/home/wahl/Research/fpa-heterogeneous.html

maintained by Thomas Wahl.

College of Computer Science, Northeastern University
The page below may therefore be slightly out of date.

Ensuring Reliability and Portability of Scientific Software
for Heterogeneous Architectures

 
Floating-point arithmetic is used in scientific software to perform calculations with (approximations of) real numbers. Despite successful efforts to standardize floating-point arithmetic — reflected in the universally accepted IEEE 754 floating-point standard (the "Standard") —, results of floating-point calculations are generally not portable across computer architectures and can in fact differ vastly.

There are a number of reasons for this phenomenon. One is the difference in floating-point hardware available on different architectures. For instance, the presence or absence of a fused multiply-add (FMA) unit significantly impacts the precision of calculating expressions of the form a × b + c. Different ways of evaluating compound expressions are sanctioned by the Standard, as evaluation rules for expressions are mostly left to the programming language (unlike the result of basic arithmetic operations such as a + b).

Another reason for the differences in floating-point results is especially relevant for parallel architectures: for efficiency, complex expressions such as sums of many operands are split by the compiler into sub-expressions to be computed by individual threads; the result is in the end combined to obtain the final sum, as shown here for a sum of four arguments:

        reduction sum

Unfortunately, due to the loss of precision in floating-point arithmetic compared to real  arithmetic, many common laws of arithmetic known from high school no longer hold, in particular the associativity of addition. Different associations of summands in a long addition will therefore typically yield different results.

Most programmers are unaware of these vagaries of floating-point arithmetic. As a result, parallel scientific programs are susceptible to reliability and portability issues that can range from simple deviations in precision to changes of program control flow when moving from one architecture to another. This threat of non-portability stands in contrast to the promise made by parallel programming standards such as OpenCL for "write-once, run anywhere" functionality.

This research aims at tools and techniques to help programmers find "issues" with the use of floating-point arithmetic in parallel scientific code, specifically written using OpenCL. Specifically, the goal is to detect potential sources of reliability and portability deficiencies in such code that are due to dependencies of the floating-point behavior on the underlying (IEEE-compliant) architecture. This will have important implications for the reliability of scientific programs such as those used in biomedical imaging applications, climate modeling, and vehicle design.

People:

Publications:

Sponsorship:

    National Science Foundation, under award number CCF-1218075.

    National
        Science Foundation


External links (follow at your own risk!):