Technical Reports from UMIACS

Permanent URI for this collection

http://hdl.handle.net/1903/7

Browse

Now showing 1 - 20 of 950

Fast Subsequence Matching in Time-Series Databases
(1998-10-15) Faloutsos, Christos; Ranganathan, M.; Manolopoulos, Yannis
We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree \cite{Beckmann90R}. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. Appeared in ACM SIGMOD 1994, pp 419-429. Given "Best Paper award" (Also cross-referenced as UMIACS-TR-93-131)
Analysis of the n-dimensional quadtree decomposition for arbitrary hyper-rectangles
(1998-10-15) Faloutsos, Christos; Jagadish, H.V.; Manolopoulos, Yannis
We give a closed-form expression for the average number of $n$-dimensional quadtree nodes (`pieces' or `blocks') required by an $n$-dimensional hyper-rectangle aligned with the axes. Our formula includes as special cases the formulae of previous efforts for 2-dimensional spaces \cite{Faloutsos92Analytical}. It also agrees with theoretical and empirical results that the number of blocks depends on the hyper-surface of the hyper-rectangle and not on its hyper-volume. The practical use of the derived formula is that it allows the estimation of the space requirements of the $n$-dimensional quadtree decomposition. Quadtrees are used extensively in 2-dimensional spaces (geographic information systems and spatial databases in general), as well in higher dimensionality spaces (as oct-trees for 3-dimensional spaces, e.g. in graphics, robotics and 3-dimensional medical images [Arya et al., 1994]. Our formula permits the estimation of the space requirements for data hyper-rectangles when stored in an index structure like a ($n$-dimensional) quadtree, as well as the estimation of the search time for query hyper-rectangles. A theoretical contribution of the paper is the observation that the number of blocks is a piece-wise linear function of the sides of the hyper-rectangle. (Also cross-referenced as UMIACS-TR-94-130)
Stabbing Orthogonal Objects in 3-Space
(1998-10-15) Mount, David M.; Pu, Fan-Tao
We consider a problem that arises in the design of data structures for answering {\em visibility range queries}, that is, given a $3$-dimensional scene defined by a set of polygonal patches, we wish to preprocess the scene to answer queries involving the set of patches of the scene that are visible from a given range of points over a given range of viewing directions. These data structures recursively subdivide space into cells until some criterion is satisfied. One of the important problems that arise in the construction of such data structures is that of determining whether a cell represents a nonempty region of space, and more generally computing the size of a cell. In this paper we introduce a measure of the {\em size} of the subset of lines in 3-space that stab a given set of $n$ polygonal patches, based on the maximum angle and distance between any two lines in the set. Although the best known algorithm for computing this size measure runs in $O(n^2)$ time, we show that if the polygonal patches are orthogonal rectangles, then this measure can be approximated to within a constant factor in $O(n)$ time. (Also cross-referenced as UMIACS-TR-96-71)
Iterative methods for solving Ax = b GMRES/FOM versus QMR/BiCG
(1998-10-15) Cullum, Jane K.
We study the convergence of GMRES/FOM and QMR/BiCG methods for solving nonsymmetric Az = b. We prove that given the results of a BiCG computation on Az = b, we can obtain a matrix B with the same eigenvalues as A and a vector c such that the residual norms generated by a FOM computation on Bz = c are identical to those generated by the BiCG computations. Using a unitary equivalence for each of these methods, we obtain test problems where we can easily vary certain spectral properties of the matrices. We use these test problems to study the effects of nonnormality on the convergence of GMRES and QMR, to study the effects of eigenvalue outliers on the convergence of QMR, and to compare the convergence of restarted GMRES and QMR across a family of normal and nonnormal problems. Our GMRES tests on nonnormal test matrices indicate that nonnormality can have unexpected effects upon the residual norm convergence, giving misleading indications of superior convergence when the error norms for GMRES are not significantly different from those for QMR. Our QMR tests indicate that the convergence of the QMR residual and error norms is infLuenced predominantly by small and large eigenvalue outliers and by the character, real, complex, or nearly real, of the outliers and the other eigenvalues. In our comparison tests QMR outperformed GMRES(10) and GMRES(20) on both the normal and nonnormal test matrices. If you have difficulty viewing the second part of the linked postscript file, open the file: http://www.cs.umd.edu/fs/ftp/pub/papers/papers/3587.figures.ps. This is the second part of the paper in a separate file. (Also cross-referenced as UMIACS-TR-96-2)
Qualitative Analysis for Maintenance Process Assessment
(1998-10-15) Briand, Lionel; Kim, Yong-Mi; Melo, Walcelio L.; Seaman, Carolyn B.; Victor R. Basili ,
In order to improve software maintenance processes, we first need to be able to characterize and assess them. These tasks must be performed in depth and with objectivity since the problems are complex. One approach is to set up a measurement-based software process improvement program specifically aimed at maintenance. However, establishing a measurement program requires that one understands the problems to be addressed by the measurement program and is able to characterize the maintenance environment and processes in order to collect suitable and cost-effective data. Also, enacting such a program and getting usable data sets takes time. A short term substitute is therefore needed. We propose in this paper a characterization process aimed specifically at maintenance and based on a general qualitative analysis methodology. This process is rigorously defined in order to be repeatable and usable by people who are not acquainted with such analysis procedures. A basic feature of our approach is that actual implemented software changes are analyzed in order to understand the flaws in the maintenance process. Guidelines are provided and a case study is shown that demonstrates the usefulness of the approach. (Also cross-referenced as UMIACS-TR-96-7)
Concurrent Maintenance of Skip Lists
(1998-10-15) Pugh, William
This papers describes a new approach to providing efficient concurrent access to a dynamic search structure. Previous approaches have attempted to solve this problem using search trees (either balanced or unbalanced). We describe methods for performing concurrent access and updates using skip lists. Skip lists are a probabilistic alternative to balanced trees that provide much of the simplicity of unbalanced trees, together with good worst-case expected performance. In this paper, we briefly review skip lists, describe simple methods for concurrent maintenance of sorted linked lists, formally prove the correctness of these methods, and show how they can be extended to provide simple and efficient algorithms for concurrent maintenance of skip lists. (Also cross-referenced as UMIACS-TR-90-80)
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation
(1998-10-15) Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversize networks can result in lower training and generalization error, and for c) the use of committee or ensemble techniques can be more beneficial as the amount of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks. (Also cross-referenced as UMIACS-TR-96-22)
UM Translog: A Planning Domain for the Development and Benchmarking of Planning Systems
(1998-10-15) Andrews, Scott; Kettler, Brian; Erol, Kutluhan; Hendler, James
The last twenty years of AI planning research has discovered a wide variety of planning techniques such as state-space search, hierarchical planning, case-based planning and reactive planning. These techniques have been implemented in numerous planning systems (e.g., STRIPS, SNLP, UCPOP, NONLIN, SIPE). Initially, a number of simple toy domains have been devised to assist in the analysis and evaluation of planning systems and techniques. The most well known examples are ``Blocks World'' and ``Towers of Hanoi.'' As planning systems grow in sophistication and capabilities, however, there is a clear need for planning benchmarks with matching complexity to evaluate those new features and capabilities. UM Translog is a planning domain designed specifically for this purpose. UM Translog was inspired by the CMU Transport Logistics domain developed by Manuela Veloso. UM Translog is an order of magnitude larger in size (41 actions versus 6), number of features and types interactions. It provides a rich set of entities, attributes, actions and conditions, which can be used to specify rather complex planning problems with a variety of plan interactions. The detailed set of operators provides long plans (~40 steps) with many possible solutions to the same problem, and thus this domain can also be used to evaluate the solution quality of planning systems. The UM Translog domain has been used with the UMCP, UM Nonlin, and CaPER planning systems thus far. (Also cross-referenced as UMIACS-TR-95-69)
Communication and Organization in Software Development: An Empirical Study
(1998-10-15) Seaman, Carolyn B.; Basili, Victor R.
The empirical study described in this paper addresses the issue of communication among members of a software development organization. The independent variables are various attributes of organizational structure. The dependent variable is the effort spent on sharing information which is required by the software development process in use. The research questions upon which the study is based ask whether or not these attributes of organizational structure have an effect on the amount of communication effort expended. In addition, there are a number of blocking variables which have been identified. These are used to account for factors other than organizational structure which may have an effect on communication effort. The study uses both quantitative and qualitative methods for data collection and analysis. These methods include participant observation, structured interviews, and graphical data presentation. The results of this study indicate that several attributes of organizational structure do affect communication effort, but not in a simple, straightforward way. In particular, the distances between communicators in the reporting structure of the organization, as well as in the physical layout of offices, affects how quickly they can share needed information, especially during meetings. These results provide a better understanding of how organizational structure helps or hinders communication in software development. This work was supported in part IBM's Centre for Advanced Studies, and by NASA grant NSG-5123. (Also cross-referenced as UMIACS-TR-96-23)
Optimization within a Unified Transformation Framework
(1998-10-15) Kelly, Wayne
Programmers typically want to write scientific programs in a high level language with semantics based on a sequential execution model. To execute efficiently on a parallel machine, however, a program typically needs to contain explicit parallelism and possibly explicit communication and synchronization. So, we need compilers to convert programs from the first of these forms to the second. There are two basic choices to be made when parallelizing a program. First, the computations of the program need to be distributed amongst the set of available processors. Second, the computations on each processor need to be ordered. My contribution has been the development of simple mathematical abstractions for representing these choices and the development of new algorithms for making these choices. I have developed a new framework that achieves good performance by minimizing communication between processors, minimizing the time processors spend waiting for messages from other processors, and ordering data accesses so as to exploit the memory hierarchy. This framework can be used by optimizing compilers, as well as by interactive transformation tools. The state of the art for vectorizing compilers is already quite good, but much work remains to bring parallelizing compilers up to the same standard. The main contribution of my work can be summarized as improving this situation by replacing existing ad hoc parallelization techniques with a sound underlying foundation on which future work can be built. (Also cross-referenced as UMIACS-TR-96-93)
Assessing Software Review Meetings: Results of a Comparative Analysis of Two Experimental Studies,
(1998-10-15) Porter, Adam A.; Johnson, Philip M.
Software review is a fundamental tool for software quality assurance. Nevertheless, there are significant controversies as to the most efficient and effective review method. One of the most important questions currently being debated is the utility of meetings. Although almost all industrial review methods are centered around the inspection meeting, recent findings call their value into question. To gain insight into these issues, the two authors of this paper separately and independently conducted controlled experimental studies. This paper discusses a joint effort to understand the broader implications of these tow studies. To do this, we designed and carried out a process of "reconciliation" in which we established a common framework for the comparison of the two experimental studies, re-analyzed to experimental data with respect to this common framework, and compared the results. Through this process we found many striking similarities between the the results of the two studies, strengthening their individual conclusions. it also revealed interesting differences between the two experiments, suggesting important avenues for future research. (Also cross-referenced as UMIACS-TR-97-15)
Sorting on Clusters of SMPs
(1998-10-15) Helman, David R.; JaJa, Joseph
Clusters of symmetric multiprocessors (SMPs) have emerged as the primary candidates for large scale multiprocessor systems. In this paper, we introduce an efficient sorting algorithm for clusters of SMPs. This algorithm relies on a novel scheme for stably sorting on a single SMP coupled with balanced regular communication on the cluster. Our SMP algorithm seems to be asymptotically faster than any of the published algorithms we are aware of. The algorithms were implemented in C using Posix Threads and the SIMPLE library of communication primitives and run on a cluster of DEC AlphaServer 2100A systems. Our experimental results verify the scalability and efficiency of our proposed solution and illustrate the importance of considering both memory hierarchy and the overhead of shifting to multiple nodes. (Also cross-reference as UMIACS-TR-97-69
Software Engineering of Distributed Simulation Environments
(1998-10-15) Duff, James; Purtilo, James M.; Capps, Michael; Stotts, David
With the increasing popularity of simulation and virtual environment software, it is necessary to provide software engineering techniques to simulation program designers. In this paper we lay out the requirements that any such techniques will have to meet, then suggest a formalism and an interconnection tool that will allow the interconnection of re-usable simulator components to build distributed simulation software. (Also cross-referenced as UMIACS-TR-95-100)
A Systematic Approach for Analyzing the Manufacturability of Machined Parts
(1998-10-15) Gupta, Satyandra K.; Nau, Dana S.
The ability to quickly introduce new quality products is a decisive factor in capturing market share. Because of pressing demands to reduce lead time, analyzing the manufacturability of the proposed design has become an important step in the design stage. This paper presents an approach for analyzing the manufacturability of machined parts. Evaluating the manufacturability of a proposed design involves determining whether. or not it is manufacturable with a given set of manufacturing operationsand if so, then finding the associated manufacturing efficiency. Since there can be several different ways to manufacture a proposed design, this requires us to consider different ways to manufacture it, in order to determine which one best meets the design and manufacturing objectives. The first step in our approach is to identify all machining operations which can potentially be used to create the given design. Using these operations, we generate different operation plans for machining the part. Each time we generate a new operation plan, we examine whether it can produce the desired shape and tolerances, and calculate its manufacturability rating. If no operation plan can be found that is capable of producing the design, then the given design is considered unmachinable; otherwise, the manufacturability rating for the design is the rating of the best operation plan. We anticipate that by providing feedback about possible problems with the design, this work will help in speeding up the evaluation of new product designs in order to decide how or whether to manufacture them. Such a capability will be useful in responding quickly to changing demands and opportunities in the marketplace. (Also cross-referenced as UMIACS-TR-93-105)
Cyclone Technology: An Overview
(1998-10-15) Lee, Sung
The current network, which is based on managing resources on demand and accepting uncontrolled communication request, often leads to problems such as congestion and other queueing bottlenecks. The extent of congestion and queues depends on the variability in customer arrival times, services needed, and the resource allocation mechanism used by system components. The queue sizes, which results in congestion, can be reduced only by controlling the variability in customer arrival times, and this is best done by making explicit use of time. Cyclone technology uses the information based on times of events explicitly, including the design of systems. Cyclone provides the coordination of resources through dynamic, time-based resource management leading to a network that is capable of providing end-to-end low latency communications free of losses, jitter, (Also cross-referenced as UMIACS-TR-98-10)
The End of Zero-Hit Queries: Query Previews for NASA's Global Change Master Directory
(1998-10-15) Greene, Stephan; Tanin, Egemen; Plaisant, Catherine; Shneiderman, Ben; Olsen, Lola; Major, Gene; Johns, Steve
The Human-Computer Interaction Laboratory (HCIL) of the University of Maryland and NASA have collaborated over the last three years to refine and apply user interface research concepts developed at HCIL in order to improve the usability of NASA data services. The research focused on dynamic query user interfaces, visualization, and overview +preview designs. An operational prototype, using query previews, was implemented with NASA's Global Change Master Directory (GCMD), a directory service for earth science data sets. Users can see the histogram of the data distribution over several attributes and choose among attribute values. A result bar shows the cardinality of the result set, thereby preventing users from submitting queries that would have zero hits. Our experience confirmed the importance of metadata accuracy and completeness. The query preview interfaces make visible problems or holes in the metadata that are unnoticeable with classic form fill-in interfaces. This could be seen as a problem, but we think that it will have a long-term beneficial effect on the quality of the metadata as data providers will be compelled to produce more complete and accurate metadata. The adaptation of the research prototype to the NASA data required revised data structures and algorithms. (Also cross-referenced as UMIACS-TR-97-84)
Product Unit Learning
(1998-10-15) Leerink, Laurens R.; Giles, C. Lee; Horne, Bill G.; Jabri, Marwan A.
Product units provide a method of automatically learning the higher-order input combinations required for the efficient synthesis of Boolean logic functions by neural networks. Product units also have a higher information capacity than sigmoidal networks. However, this activation function has not received much attention in the literature. A possible reason for this is that one encounters some problems when using standard backpropagation to train networks containing these units. This report examines these problems, and evaluates the performance of three training algorithms on networks of this type. Empirical results indicate that the error surface of networks containing product units have more local minima than corresponding networks with summation units. For this reason, a combination of local and global training algorithms were found to provide the most reliable convergence. We then investigate how `hints' can be added to the training algorithm. By extracting a common frequency from the input weights, and training this frequency separately, we show that convergence can be accelerated. A constructive algorithm is then introduced which adds product units to a network as required by the problem. Simulations show that for the same problems this method creates a network with significantly less neurons than those constructed by the tiling and upstart algorithms. In order to compare their performance with other transfer functions, product units were implemented as candidate units in the Cascade Correlation (CC) \cite{Fahlman90} system. Using these candidate units resulted in smaller networks which trained faster than when the any of the standard (three sigmoidal types and one Gaussian) transfer functions were used. This superiority was confirmed when a pool of candidate units of four different nonlinear activation functions were used, which have to compete for addition to the network. Extensive simulations showed that for the problem of implementing random Boolean logic functions, product units are always chosen above any of the other transfer functions. (Also cross-referenced as UMIACS-TR-95-80)
Configuration-Level Programming of Distributed Applications Using Implicit Invocation
(1998-10-15) Chen, Chen
An event-based distributed application is a group of software components interacting with each other by producing events that in turn trigger the invocation of procedures. In this work, we are concerned with the technology and methods for integrating an event-based application, whether that application is being constructed from scratch or synthesized from existing systems. Developing an event-based application is a complex task for programmers, who must address several issues not found in traditional systems and, currently, must do so without much assistance. These issues include event declaration, structure, binding, and naming. Our objective is to provide the same software engineering benefits to programmers of event-based applications as are currently provided to programmers of applications using traditional RPC or message-passing mechanisms. In this work, we broaden the technology for integration to encompass event-based programming. A method is described for separating event interaction properties from the implementation of the application modules so that system integration can be performed using only the abstractions. Then based upon the abstract aggregate, all interface software needed to validly implement the system can be generated automatically. A software bus model has been enhanced to accommodate the models which drive event-based distributed applications. In this way, designers may define complex event-based interactions abstractly, thus making it easier to integrate and experiment with event-based distributed applications. (Also cross-referenced as UMIACS-TR-95-70)
Iteration Space Slicing and Its Application to Communication Optimization
(1998-10-15) Pugh, William; Rosser, Evan
Program slicing is an analysis that answers questions such as ``Which statements might affect the computation of variable $v$ at statement $s$?'' or ``Which statements depend on the value of $v$ computed in statement $s$?''. The answers computed by program slicing are generally a set of statements. We introduce the idea of {\em iteration spacing slicing}: we refine program slicing to ask questions such as ``Which iterations of which statements might effect the computation in iterations $I$ of statement $s$?'' or ``Which iterations of which statements depend on the value computed by iterations $I$ of statement $s$?''. One application of this general-purpose technique is optimization of interprocessor communication in data-parallel compilers. For example, we can separate a code fragment into 1) those iterations that must be done before a send, 2) those iterations that don't need to be done before a send and don't depend on non-local data and 3), those iterations that depend on non-local data. We examine applications of iteration space slicing to communication optimizations in parallel executions of programs such as stencil computations and block-cyclic Gaussian elimination with partial pivoting. (Also cross-referenced as UMIACS-TR-97-02)
Thinking takes time: a modal active-logic for reasoning in time
(1998-10-15) Nirkhe, Madhura; Kraus, Sarit; Perlis, Don
Most common sense reasoning formalisms do not account for the passage of time a s the reasoning occurs, and hence are inadequate from the point of view of modeling an agent's {\em ongoing} process of reasoning. We present a modal active-logic that treats time as a valuable resource that is consumed in each step of the agent's reasoning. We provide a sound and complete characterization for this logic and examine how it addresses the problem of logical omniscience. (Also cross-referenced as UMIACS-TR-94-39)

Browse

Browsing Technical Reports from UMIACS by Issue Date

Results Per Page

Sort Options