Project

Solve the genomic jig-saw puzzle: start with the borders

Chromosome-level assemblies are of key importance in studying genome evolution. Recent technological advances resulting in long sequencing reads permit chromosome-level assembly for many species. Yet some species, including some model species, do not have a gapless, telomere-to-telomere assembly. In many cases, due to the lower coverage of reads and their repetitive nature, sub-telomeric regions remain particularly problematic to assemble and the evolutionary dynamics of these regions is thus not well-studied.

To resolve these difficult regions of the genome and study genome evolution at the sub-telomeres, we will develop a pipeline in which we start with selecting long reads that contain telomeric repeats, cluster these reads based on sequence similarity, assemble these reads separately and integrate these chromosome ends into an existing assembly.

Research aims

  • Develop a pipeline that efficiently identifies telomeric repeats in high-error rate Nanopore reads.
  • Develop a pipeline that identifies informative positions in read alignments and cluster reads based on differences in these positions

Used techniques

  • Programming in Python (or language of choice)
  • Bash, assembly tools (e.g. Flye), mapping/alignment tools (e.g. minimap2), samtools, and bbduk,2fast2q or seqkit for extracting reads with telomeric repeats

Supervisor