Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • A. James Clark School of Engineering
    • Institute for Systems Research Technical Reports
    • View Item
    •   DRUM
    • A. James Clark School of Engineering
    • Institute for Systems Research Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Using Categorical Information in Multidimensional Data Sets: Interactive Partition and Cluster Comparison

    Thumbnail
    View/Open
    TR_2005-102.pdf (681.0Kb)
    No. of downloads: 551

    Date
    2005
    Author
    Seo, Jinwook
    Shneiderman, Ben
    Metadata
    Show full item record
    Abstract
    Multidimensional data sets often include categorical information. When most columns have categorical information, clustering the data set by similarity of categorical values can reveal interesting patterns in the data set. However, when the data set includes only a small number (one or two) of categorical columns, the categorical information is probably more useful as a way to partition the data set. For example, researchers might be interested in gene expression data for healthy vs. diseased patients or stock performance for common, preferred, or convertible shares. For these cases, we present a novel way to utilize the categorical information together with clustering algorithms. Instead of incorporating categorical information into the clustering process, we can partition the data set according to categorical information. Clustering is then performed with each subset to generate two or more clustering results, each of which is homogeneous (i.e. only includes the same categorical value for the categorical column). By comparing the partitioned clustering results, users can get meaningful insights into the data set: users can identify an interesting group of items that are differentially/similarly expressed in two different homogeneous partitions. The partition can be done in two different directions: (1) by rows if categorical information is available for each column (e.g. some columns are from disease samples and other columns are from healthy samples) or (2) by a column if a column contains categorical information (e.g. a column represents a categorical attribute such as colors or sex). We designed and implemented an interface to facilitate this interactive partition-based clustering results comparison. Coordination between clustering results displays and comparison results overview enables users to identify interesting clusters, and a simple grid display clearly reveals correspondence between two clusters.
    URI
    http://hdl.handle.net/1903/6568
    Collections
    • Institute for Systems Research Technical Reports

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility