Top
Back to All Events

Theory Seminar

  • McLennan Physical Laboratories (MP) 255 Huron Street, Room 137 Toronto Canada (map)

Title: Data structures for representing sets of k-mers
Presented By: Paul Medvedev, Pennsylvania State University

Abstract:
The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying k-mer sets has emerged as a shared underlying component and there have been many specialized data structures for their representation. In this talk, I will describe the applications of k-mer sets in bioinformatics and motivate the need for specialized data structures. I will give an overview of known approaches and lower bounds, with a focus on unitig-based representations. Finally, I will describe a data structure for representing sets of k-mer sets, called the HowDe Sequence Bloom Tree.

Earlier Event: October 18
OurCS 2019
Later Event: October 18
Graduate Studies Information Session