Skip to main navigation Skip to Content

Computer Science

University of Toronto
  • Quercus
  • Student Support
  • Contact
  • About
    • History of U of T Computer Science
    • Computer Science at U of T Mississauga
    • Computer Science at U of T Scarborough
    • Employment Opportunities for Faculty/Lecturers
    • How to Find Us
    • Contact
  • Undergraduate
    • Prospective Undergraduates
    • Current Undergraduates
  • Graduate
    • Prospective Graduate Students
    • Current Graduate Students
  • Research
    • Research Areas
    • Partner with us
  • People
    • Faculty
    • Staff
    • In Memoriam
    • Alumni and Friends
    • Honours & Awards
    • Women in Computer Science
    • Graduate Student Society
    • Undergraduate Student Union
    • Undergraduate Artificial Intelligence Group
    • Undergraduate Theory Group
  • News & Events
    • News
    • Events
    • @DCS Update
    • Alumni
    • Donate
You are viewing: > Home > News & Events > Events > Theory Seminar: Apr 6
  • About
  • Undergraduate
  • Graduate
  • Research
  • People
  • News & Events

Theory Seminar: Apr 6

Event date: Friday, April 06, 2018, from 11:00 AM to 12:00 PM
Location: Roseburgh Building, 164 College Street, Room 208

Lecture Title: Indexing Genomic Databases

Presented By: Travis Gagie, Universidad Diego Portales, Santiago, Chile

Location: Roseburgh Building, 164 College Street, Room 208

Abstract: Fifteen years after the first human genome was sequenced, the British government is building a database of a hundred thousand human genomes. Uncompressed, this database will take hundreds of terabytes, so only data structures that take advantage of the genetic similarity between humans will be able to handle it. There has been a sustained effort to scale up indexes based on the Burrows-Wheeler Transform (BWT), for two main reasons: such indexes play several important roles in bioinformatics, and run-length compressing the BWTs of genomic databases often reduces their sizes by orders of magnitude. In order for an index to report not just how many time a pattern occurs in the database but also where, however, it needs a suffix array (SA) sample in addition to a BWT. Until very recently, it was not known how to make an SA sample very small without also making it very slow. In this talk we will review what BWT-based indexes do, how they work, how they use standard SA samples, and why run-length compression works so well on them, and then see how to build a fast SA sample whose size is bounded in terms of the number of runs in the BWT. This is joint work with Gonzalo Navarro and Nicola Prezza, presented at SODA 2018.

For more information, visit the Theory Group Events Page.


All rights reserved copyright Computer Science, University of Toronto | Site Map