Browsing by Author "Foster, Jeffrey S."
Now showing 1 - 20 of 29
Results Per Page
Sort Options
Item Absynthe: Abstract Interpretation-Guided Synthesis(Association for Computer Machinery (ACM), 2023-06) Guria, Sankha Narayan; Foster, Jeffrey S.; Van Horn, DavidSynthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular language model. In this paper, we propose Absynthe, an alternative approach based on user-defined abstract semantics that aims to be both lightweight and language agnostic, yet effective in guiding the search for programs. A synthesis goal in Absynthe is specified as an abstract specification in a lightweight user-defined abstract domain and concrete test cases. The synthesis engine is parameterized by the abstract semantics and independent of the source language. Absynthe validates candidate programs against test cases using the actual concrete language implementation to ensure correctness. We formalize the synthesis rules for Absynthe and describe how the key ideas are scaled-up in our implementation in Ruby. We evaluated Absynthe on SyGuS strings benchmark and found it competitive with other enumerative search solvers. Moreover, Absynthe’s ability to combine abstract domains allows the user to move along a cost spectrum, i.e., expressive domains prune more programs but require more time. Finally, to verify Absynthe can act as a general purpose synthesis tool, we use Absynthe to synthesize Pandas data frame manipulating programs in Python using simple abstractions like types and column labels of a data frame. Absynthe reaches parity with AutoPandas, a deep learning based tool for the same benchmark suite. In summary, our results demonstrate Absynthe is a promising step forward towards a general-purpose approach to synthesis that may broaden the applicability of synthesis to more full-featured languages.Item Adapting Scrum to Managing a Research Group(2010-09-18) Hicks, Michael; Foster, Jeffrey S.Score is an adaptation of the Scrum agile software development methodology to the task of managing Ph.D. students in an academic research group. This paper describes Score, conceived in October 2006, and our experience using it. We have found that Score enables us---faculty and students---to be more efficient and thereby more productive, and enhances the cohesion of our research group.Item Adapton: Composable, Demand-Driven Incremental Computation(2013-07-12) Hammer, Matthew A.; Phang, Khoo Yit; Hicks, Michael; Foster, Jeffrey S.Many researchers have proposed programming languages that support incremental computation (IC), which allows programs to be efficiently re-executed after a small change to the input. However, existing implementations of such languages have two important drawbacks. First, recomputation is oblivious to specific demands on the program output; that is, if a program input changes, all dependencies will be recomputed, even if an observer no longer requires certain outputs. Second, programs are made incremental as a unit, with little or no support for reusing results outside of their original context, e.g., when reordered. To address these problems, we present lambdaCDDIC, a core calculus that applies a demand-driven semantics to incremental computation, tracking changes in a hierarchical fashion in a novel demanded computation graph. lambdaCDDIC also formalizes an explicit separation between inner, incremental computations and outer observers. This combination ensures lambdaCDDIC programs only recompute computations as demanded by observers, and allows inner computations to be composed more freely. We describe an algorithm for implementing lambdaCDDIC efficiently, and we present AdaptOn, a library for writing lambdaCDDIC-style programs in OCaml. We evaluated AdaptOn on a range of benchmarks, and found that it provides reliable speedups, and in many cases dramatically outperforms prior state-of-the-art IC approaches.Item Appendix to CMod: Modular Information Hiding and Type-Safe Linking for C(2007-06-30) Srivastava, Saurabh; Hicks, Michael; Foster, Jeffrey S.This brief note is an appendix to the paper "CMod: Modular Information Hiding and Type-Safe Linking for C." It consists of the proof of soundness for the formal language presented in that paper.Item Checking Interaction-Based Declassification Policies for Android Using Symbolic Execution(2015-07-01) Micinski, Kristopher; Fetter-Degges, Jonathan; Jeon, Jinseong; Foster, Jeffrey S.; Clarkson, Michael R.Mobile apps can access a wide variety of secure information, such as contacts and location. However, current mobile platforms include only coarse access control mechanisms to protect such data. In this paper, we introduce interaction-based declassification policies, in which the user's interactions with the app constrain the release of sensitive information. Our policies are defined extensionally, so as to be independent of the app's implementation, based on sequences of security-relevant events that occur in app runs. Policies use LTL formulae to precisely specify which secret inputs, read at which times, may be released. We formalize a semantic security condition, interaction-based noninterference, to define our policies precisely. Finally, we describe a prototype tool that uses symbolic execution of Dalvik bytecode to check interaction-based declassification policies for Android, and we show that it enforces policies correctly on a set of apps.Item Checking Type Safety of Foreign Function Calls(2005-11-15T20:58:38Z) Furr, Michael; Foster, Jeffrey S.We present a multi-lingual type inference system for checking type safety across a foreign function interface. The goal of our system is to prevent foreign function calls from introducing type and memory safety violations into an otherwise safe language. Our system targets OCaml's FFI to C, which is relatively lightweight and illustrates some interesting challenges in multi-lingual type inference. The type language in our system embeds OCaml types in C types and vice-versa, which allows us to track type information accurately even through the foreign language, where the original types are lost. Our system uses a representational type that can model multiple OCaml types, because C programs can observe that many OCaml types have the same physical representation. Furthermore, because C has a low-level view of OCaml data, our inference system includes a dataflow analysis to track memory offsets and tag information. Finally, our type system includes garbage collection information to ensure that pointers from the FFI to the OCaml heap are tracked properly. We have implemented our inference system and applied it to a small set of benchmarks. Our results show that programmers do misuse these interfaces, and our implementation has found several bugs and questionable coding practices in our benchmarks.Item Cmod: Modular Information Hiding and Type-Safe Linking for C(2006-07-31) Srivastava, Saurabh; Hicks, Michael; Foster, Jeffrey S.This paper presents CMod, a novel tool that provides a sound module system for C. CMod works by enforcing a set of four rules that are based on principles of modular reasoning and on current programming practice. CMod's rules flesh out the convention that .h header files are module interfaces and .c source files are module implementations. Although this convention is well-known, developing CMod's rules revealed there are many subtleties in applying the basic pattern correctly. We have proven formally that CMod's rules enforce both information hiding and type-safe linking. We evaluated CMod on a number of benchmarks, and found that most programs obey CMod's rules, or can be made to with minimal effort, while rule violations reveal brittle coding practices including numerous information hiding violations and occasional type errors.Item Contextual Effects for Version-Consistent Dynamic Software Updating and Safe Concurrent Programming(2007-11-01) Neamtiu, Iulian; Hicks, Michael; Foster, Jeffrey S.; Pratikakis, PolyviosThis paper presents a generalization of standard effect systems that we call contextual effects. A traditional effect system computes the effect of an expression e. Our system additionally computes the effects of the computational context in which e occurs. More specifically, we compute the effect of the computation that has already occurred (the prior effect) and the effect of the computation yet to take place (the future effect). Contextual effects are useful when the past or future computation of the program is relevant at various program points. We present two substantial examples. First, we show how prior and future effects can be used to enforce transactional version consistency (TVC), a novel correctness property for dynamic software updates. TVC ensures that programmer-designated transactional code blocks appear to execute entirely at the same code version, even if a dynamic update occurs in the middle of the block. Second, we show how future effects can be used in the analysis of multi-threaded programs to find thread-shared locations. This is an essential step in applications such as data race detection.Item Directed symbolic execution(2011-04-07) Ma, Kin-Keung; Khoo, Yit Phang; Foster, Jeffrey S.; Hicks, MichaelIn this paper, we study the problem of automatically finding program executions that reach a particular target line. This problem arises in many debugging scenarios, e.g., a developer might learn that a failure is possible on a particular line but might not know exactly how to reproduce the failure or even whether it is reproducible. This can happen particularly often for bug reports from static analysis tools, which can produce false positives. We propose two new directed symbolic execution strategies that aim to solve this problem: shortest-distance symbolic execution (SDSE) uses a distance metric in an interprocedural control flow graph to guide symbolic execution toward a particular target; and call-chain-backward symbolic execution (CCBSE) iteratively runs forward symbolic execution, starting in the function containing the target line, and then jumping backward up the call chain until it finds a feasible path from the start of the program. We also propose Mix- CCBSE, a strategy in which CCBSE is composed with another search strategy to yield a hybrid that is more powerful than either strategy alone. We compare SDSE, CCBSE, and Mix-CCBSE with several existing strategies from the literature. We find that, while SDSE performs extremely well in many cases, it sometimes fails badly. However, by mixing CCBSE with KLEE's search strategy, we obtain a strategy that has the best overall performance across the strategies we studied.Item Directing JavaScript with Arrows (Functional Pearl)(2008-08-02) Khoo, Yit Phang; Hicks, Michael; Foster, Jeffrey S.; Sazawal, VibhaJavaScript, being a single-threaded language, makes extensive use of event-driven programming to enable responsive web applications. However, standard approaches to sequencing events are messy, and often lead to code that is difficult to understand and maintain. We have found that arrows, a generalization of monads, are an elegant solution to this problem. Arrows allow us to easily write asynchronous programs in small, modular units of code, and flexibly compose them in many different ways, while nicely abstracting the details of asynchronous program composition. In particular, we show how to use arrows to construct a variety of state machines, such as autoscrollers and drag-and-drop handlers.Item Dr. Android and Mr. Hide: Fine-grained security policies on unmodified Android(2011-12-09) Jeon, Jinseong; Micinski, Kristopher K.; Vaughan, Jeffrey A.; Reddy, Nikhilesh; Zhu, Yixin; Foster, Jeffrey S.; Millstein, ToddGoogle’s Android platform includes a permission model that protects access to sensitive capabilities, such as Internet access, GPS use, and telephony. We have found that Android’s current permissions are often overly broad, providing apps with more access than they truly require. This deviation from least privilege increases the threat from vulnerabilities and malware. To address this issue, we present a novel system that can replace existing platform permissions with finer-grained ones. A key property of our approach is that it runs today, on stock Android devices, requiring no platform modifications. Our solution is composed of two parts: Mr. Hide, which runs in a separate process on a device and provides access to sensitive data as a service; and Dr. Android (Dalvik Rewriter for Android), a tool that transforms existing Android apps to access sensitive resources via Mr. Hide rather than directly through the system. Together, Dr. Android and Mr. Hide can completely remove several of an app’s existing permissions and replace them with finer-grained ones, leveraging the platform to provide complete mediation for protected resources. We evaluated our ideas on several popular, free Android apps. We found that we can replace many commonly used "dangerous" permissions with finer-grained permissions. Moreover, apps transformed to use these finer-grained permissions run largely as expected, with reasonable performance overhead.Item Dynamic Inference of Static Types for Ruby(2010-07-19) An, Jong-hoon (David); Chaudhuri, Avik; Foster, Jeffrey S.; Hicks, MichaelThere have been several efforts to bring static type inference to object-oriented dynamic languages such as Ruby, Python, and Perl. In our experience, however, such type inference systems are extremely difficult to develop, because dynamic languages are typically complex, poorly specified, and include features, such as eval and reflection, that are hard to analyze. In this paper, we introduce constraint-based dynamic type inference, a technique that infers static types based on dynamic program executions. In our approach, we wrap each run-time value to associate it with a type variable, and the wrapper generates constraints on this type variable when the wrapped value is used. This technique avoids many of the often overly conservative approximations of static tools, as constraints are generated based on how values are used during actual program runs. Using wrappers is also easy to implement, since we need only write a constraint resolution algorithm and a transformation to introduce the wrappers. The best part is that we can eat our cake, too: our algorithm will infer sound types as long as it observes every path through each method body—note that the number of such paths may be dramatically smaller than the number of paths through the program as a whole. We have developed Rubydust, an implementation of our algorithm for Ruby. Rubydust takes advantage of Ruby’s dynamic features to implement wrappers as a language library. We applied Rubydust to a number of small programs. We found it to be lightweight and useful: Rubydust discovered 1 real type error, and all other inferred types were correct, and readable.Item Evaluating Dynamic Software Update Safety Using Systematic Testing(2011-09-22) Hayden, Christopher M.; Smith, Edward K.; Hardisty, Eric A.; Hicks, Michael; Foster, Jeffrey S.Dynamic software updating (DSU) systems patch programs on the fly without incurring downtime. To avoid failures due to the updating process itself, many DSU systems employ timing restrictions. However, timing restrictions are theoretically imperfect, and their practical effectiveness is an open question. This paper presents the first significant empirical evaluation of three popular timing restrictions: activeness safety (AS), which prevents updates to active functions; confreeness safety (CFS), which only allows modifications to active functions when doing so is provably type-safe; and manual identification of the event-handling loops during which an update may occur. We evaluated these timing restrictions using a series of DSU patches to three programs: OpenSSH, vsftpd, and ngIRCd.We systematically applied updates at each distinct update point reached during execution of a suite of system tests for these programs to determine which updates pass and which fail. We found that all three timing restrictions prevented most failures, but only manual identification allowed none. Further, although CFS and AS allowed many more update points, manual identification still supported updates with minimal delay. Finally, we found that manual identification required the least developer effort. Overall, we conclude that manual identification is most effective.Item Evaluating Interaction Patterns in Configurable Software Systems(2009-06-16) Reisner, Elnatan; Song, Charles; Ma, Kin-Keung; Foster, Jeffrey S.; Porter, AdamMany modern software systems are designed to be highly configurable, which makes testing them a challenge. One popular approach is combinatorial configuration testing, which, given an interaction strength $t$, computes a set of configurations to test such that all $t $-way combinations of option settings appear at least once. Basically, this approach assumes that interactions are complete in the sense that any combination of $t$ options can interact and therefore must be tested. We conjecture, however, that in practical systems interactions are limited. If our conjecture is true, then new techniques might be developed to identify or approximate infeasible interactions, greatly reducing the number of configurations that must be tested. We evaluated this conjecture with an initial empirical study of several configurable software systems. In this study we used symbolic evaluation to analyze how the settings of run-time configuration options affected a test suite's line coverage. Our results strongly suggest that for these subject programs, test suites and configuration options, at least at the level of line coverage, interactions between configuration options are not complete.Item Existential Label Flow Inference via CFL Reachability(2005-11-10T20:04:41Z) Pratikakis, Polyvios; Hicks, Michael; Foster, Jeffrey S.Label flow analysis is a fundamental static analysis problem with a wide variety of applications. Previous work by Mossin developed a polynomial time subtyping-based label flow inference that supports Hindley-Milner style polymorphism with polymorphic recursion. Rehof et al have developed an efficient O(n^3) inference algorithm for Mossin's system based on context-free language (CFL) reachability. In this paper, we extend these results to a system that also supports existential polymorphism, which is important for precisely describing correlations among members of a structured type, even when values of that type are part of dynamic data structures. We first develop a provably sound checking system based on polymorphically-constrained types. As usual, we restrict universal quantification to the top level of a type, but existential quantification is first class, with subtyping allowed between existentials with the same binding structure. We then develop a CFL-based inference system. Programmers specify which positions in a type are existentially quantified, and the algorithm infers the constraints bound in the type, or rejects a program if the annotations are inconsistent.Item Expositor: Scriptable Time-Travel Debugging with First Class Traces(2013-02-24) Khoo, Yit Phang; Foster, Jeffrey S.; Hicks, MichaelWe present Expositor, a new debugging environment that combines scripting and time-travel debugging to allow programmers to automate complex debugging tasks. The fundamental abstraction provided by Expositor is the execution trace, which is a time-indexed sequence of program state snapshots or projections thereof. Programmers can manipulate traces as if they were simple lists with operations such as map and filter. Under the hood, Expositor efficiently implements traces as lazy, sparse interval trees whose contents are materialized on demand. Expositor also provides a novel data structure, the edit hash array mapped trie, which is a lazy implementation of sets, maps, multisets, and multimaps that enables programmers to maximize the efficiency of their debugging scripts. In our micro-benchmarks, Expositor scripts are faster than the equivalent non-lazy scripts for common debugging scenarios. We have also used Expositor to debug a stack overflow, and to unravel a subtle data race in Firefox. We believe that Expositor represents an important step forward in improving the technology for diagnosing complex, hard-to-understand bugs.Item Kitsune: Efficient, General-purpose Dynamic Software Updating for C(2012-03-28) Hayden, Christopher M.; Smith, Edward K.; Denchev, Michail; Hicks, Michael; Foster, Jeffrey S.Dynamic software updating (DSU) systems allow programs to be updated while running, thereby allowing developers to add features and fix bugs without downtime. This paper introduces Kitsune, a new DSU system for C whose design has three notable features. First, Kitsune’s updating mechanism updates the whole program, not individual functions. This mechanism is more flexible than most prior approaches and places no restrictions on data representations or allowed compiler optimizations. Second, Kitsune makes the important aspects of updating explicit in the program text, making its semantics easy to understand while keeping programmer work to a minimum. Finally, the programmer can write simple specifications to direct Kitsune to generate code that traverses and transforms old-version state for use by the new code; such state transformation is often necessary, and is significantly more difficult in prior DSU systems. We have used Kitsune to update five popular, open-source, single- and multi-threaded programs, and find that few program changes are required to use Kitsune, and that it incurs essentially no performance overhead.Item Locksmith: Context-Sensitive Correlation Analysis for Race Detection(2006-06) Pratikakis, Polyvios; Foster, Jeffrey S.; Hicks, MichaelOne common technique for preventing data races in multi-threaded programs is to ensure that all accesses to shared locations are consistently protected by a lock. We present a tool called Locksmith for detecting data races in C programs by looking for violations of this pattern. We call the relationship between locks and the locations they protect consistent correlation, and the core of our technique is a novel constraint-based analysis that infers consistent correlation context-sensitively, using the results to check that locations are properly guarded by locks. We present the core of our algorithm for a simple formal language which we have proven sound, and discuss how we scale it up to an algorithm that aims to be sound for all of C. We develop several techniques to improve the precision and performance of the analysis, including a sharing analysis for inferring thread locality; existential quantification for modeling locks in data structures; and heuristics for modeling unsafe features of C such as type casts. When applied to several benchmarks, including multi-threaded servers and Linux device drivers, Locksmith found several races while producing a modest number of false alarms.Item Mixing Type Checking and Symbolic Execution (Extended Version)(2010-03-23) Khoo, Yit Phang; Chang, Bor-Yuh Evan; Foster, Jeffrey S.Static analysis designers must carefully balance precision and efficiency. In our experience, many static analysis tools are built around an elegant, core algorithm, but that algorithm is then extensively tweaked to add just enough precision for the coding idioms seen in practice, without sacrificing too much efficiency. There are several downsides to adding precision in this way: the tool's implementation becomes much more complicated; it can be hard for an end-user to interpret the tool's results; and as software systems vary tremendously in their coding styles, it may require significant algorithmic engineering to enhance a tool to perform well in a particular software domain. In this paper, we present Mix, a novel system that mixes type checking and symbolic execution. The key aspect of our approach is that these analyses are applied independently on disjoint parts of the program, in an off-the-shelf manner. At the boundaries between nested type checked and symbolically executed code regions, we use special mix rules to communicate information between the off-the-shelf systems. The resulting mixture is a provably sound analysis that is more precise than type checking alone and more efficient than exclusive symbolic execution. In addition, we also describe a prototype implementation, Mixy, for C. Mixy checks for potential null dereferences by mixing a null/non-null type qualifier inference system with a symbolic executorItem MultiOtter: Multiprocess Symbolic Execution(2011-05-26) Turpie, Jonathan; Reisner, Elnatan; Foster, Jeffrey S.; Hicks, MichaelSymbolic execution can be an effective technique for exploring large numbers of program paths, but it has generally been applied to programs running in isolation, whose inputs are files or command-line arguments. Programs that take inputs from other programs---servers, for example---have been beyond the reach of symbolic execution. To address this, we developed a multiprocess symbolic executor called MultiOtter, along with an implementation of many of the POSIX functions, such as socket and select, that interactive programs usually rely on. However, that is just a first step. Next, we must determine what symbolic inputs to feed to an interactive program to make multiprocess symbolic execution effective. Providing completely unconstrained symbolic values causes symbolic execution to spend too much time exploring uninteresting paths, such as paths to handle invalid inputs. MultiOtter allows us to generate inputs that conform to a context-free grammar, similar to previous work, but it also enables new input generation capabilities because we can now run arbitrary programs concurrently with the program being studied. As examples, we symbolically executed a key-value store server, redis, and an FTP server, vsftpd, each with a variety of inputs, including symbolic versions of tests from redis's test suite and wget as a client for vsftpd. We report the coverage provided by symbolic execution with various forms of symbolic input, showing that different testing goals require different degrees of symbolic inputs.