Research Projects


SecondWrite

In recent years, there has been a tremendous amount of activity in executable-level research targeting varied applications like security vulnerability analysis, bug testing, and binary optimizations. As part of this project, we have tried to resolve what we believe is a fundamental aberration - in spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. We believe that one of the prime reasons behind this aberration is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases.

In order to address this aberration, we have developed a novel executable framework called SecondWrite, employing LLVM - a widely used open-source compiler IR. We have developed several techniques for segmenting the flat address space in an executable containing undifferentiated blocks of memory . We demonstrate the limitations of existing frameworks in this regard and present techniques such as converting the physically addressed stack in an executable to an abstract stack and promoting memory locations to symbols.

We have also proposed scalable static analyses to recover variables, data types, and function prototypes from x86 executables. Our techniques can run 352X faster than current techniques and still produce the same precision. This enables analyzing executables as large as millions of instructions in minutes which is not possible using existing techniques. Our techniques can recover variables allocated to the floating point stack unlike current techniques.

We exemplified the unique benefits of SecondWrite by implementing advanced transformation such as automatic parallelization inside our framework. SecondWrite has also been employed to embed various user specified security policies inside executables.


Memory Management inside Binary Rewriter


Cache memories in embedded systems play an important role in reducing the execution time of the applications. Various kinds of extensions have been added to cache hard- ware to enable software involvement in replacement decisions, thus improving the run-time over a purely hardware-managed cache. Novel embedded systems provide the facility of locking one or more lines in cache - this feature is called cache locking. We presented the first method in the literature for instruction-cache locking that is able to reduce the average-case run-time of the program.


Symbolic Analysis for executables

Symbolic analysis is an abstract interpretation based program analysis method in which the values of program variables are represented by symbolic expressions. It is an advanced program analysis method which is employed regularly in traditional optimizing compilers as well as for detection of parallelism in programs. However, traditional source-level symbolic analysis frameworks are not sufficient for executables. Traditional source-level symbolic analysis frameworks only track symbolic expressions for variables. We proposed a novel symbolic-analysis framework for executables which computes symbolic abstraction for memory locations as well. Such a framework is applicable in improving multiple analysis like value numbering, alias analysis and automatic parallelization for executables.