|dc.description||We used sequences and annotations for ten bat genomes (see Table 1 below), which include six recently published reference assemblies, to locate each 50 bp probe on the array. The alignment was done using the QUASR package (Gaidatzis et al., 2015) with the assumption for bisulfite conversion treatment of the genomic DNA. For each species’ genome sequence, QUASR creates an in-silico-bisulfite-treated version of the genome. The set of nucleotide sequences of the designed probes, which includes degenerate base positions due to the bisulfite conversion, was expanded into a larger set of nucleotide sequences representing every possible combination of degenerate bases. We then ran QUASR (a wrapper for Bowtie2) with parameters -k 2 --strata --best -v 3 and bisulfite = "undir” to align the enlarged set of probe sequences to each prepared genome. From these files, we collected only alignments where the entire length of the probe perfectly matched to the genome sequence (i.e. the CIGAR string 50M and flag XM=0).
Following the alignment, the CpGs were annotated based on the distance to the closest transcriptional start site using the Chipseeker package (Yu et al., 2015). A gff file with these was created using these positions, sorted by scaffold and position, and compared to the location of each probe in BAM format. We report probes whose variants only mapped to one unique locus in a particular genome. Genomic location of each CpG is categorized as intergenic, 3’ UTR, 5’ UTR, promoter region (minus 10 kb to plus 1000 bp from the nearest TSS), exon, or intron.
Gaidatzis, D., Lerch, A., Hahne, F., and Stadler, M.B. (2015). QuasR: quantification and annotation of short reads in R. Bioinformatics 31, 1130-1132.
Yu, G., Wang, L.G., and He, Q.Y. (2015). ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382-2383.
Table 1. Bat genome assemblies and sources used for identifying location of CpG sites and number of sites mapped per genome.
Species, Assembly and annotation, Source, CpGs mapped
Molossus molossus, HLmolMol2, MPI*, 33557
Myotis myotis, HLmyoMyo6, MPI*, 32687
Phyllostomus discolor, HLphyDis3, MPI*, 33615
Rhinolophus ferrumequinum, HLrhiFer5, MPI*, 34411
Pipistrellus kuhlii, HLpipKuh2, MPI*, 31074
Rousettus aegyptiacus, HLrouAeg4, MPI*, 34308
Desmodus rotundus, GCF 002940915.1, ASM294091v2, NCBI, 32930
Eptesicus fuscus, GCF 000308155.1, EptFus1.0, NCBI, 32218
Myotis lucifugus, GCF 000147115.1, Myoluc2.0, NCBI, 29810
Pteropus vampyrus, pteVam1.100, ENSEMBL, 24681
MPI* (downloaded from https://bds.mpi-cbg.de/hillerlab/Bat1KPilotProject/)||en_US