Computer Science Research Works

Permanent URI for this collectionhttp://hdl.handle.net/1903/1593

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Scaffolding of long read assemblies using long range contact information
    (Springer Nature, 2017-07-12) Ghurye, Jay; Pop, Mihai; Koren, Sergey; Bickhart, Derek; Chin, Chen-Shan
    Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. We demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.
  • Item
    MetaCarvel: linking assembly graph motifs to biological variants
    (Springer Nature, 2019-08-26) Ghurye, Jay; Treangen, Todd; Fedarko, Marcus; Hervey, W. Judson IV; Pop, Mihai
    Reconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.