Undergraduate Summer Research Program (UGSRP)
Students work with one of our professors on a leading-edge research project. The research positions are paid. In 2018, 75 students participated in the UGSRP.
Applications for summer 2020 is not available.
The Undergraduate Research in Computer Science Conference (UrCSC) – taking place on September 13 at Bahen Centre – will be bringing together undergraduates to present and discuss computer science research! In addition to featuring undergrad computer science research, U of T professors will give talks on conducting research, and graduate students will share their advice and experience at a Q&A panel.
UGSRP 2019 Talk Series Schedule
"How to Give Good Talks" presented by Prof. Henry Yuen
Graduate Student Panel
"How to Write Great Research Papers" presented by Prof. Joseph Jay Williams
First Talk: Jeffrey Yi Nian Niu (Mentor: Danielle Denisko, Supervisor: Prof. Michael Hoffman)
Title: Corner Cases in Popular Bioinformatics Software
Abstract: The increased availability of high-throughput sequencing technologies over the years has led to the development of hundreds of software tools in genomics research. Each of these tools relies on a number of file formats. While some file formats have been rigorously defined by various working groups, others remain vague, causing confusion among users. In this presentation, I will discuss our work on the BED format which is widely used for gene annotation and visualization. We first aim to provide a rigorous definition of the format that meets the needs of the users. We will then use this definition to test various tools' ability to properly parse the format. Finally, I will discuss future extensions to this work including a file validator and defining different levels of conformity to our definition.
Second Talk: Alex Cann (Mentor: Jeremy Ko, Supervisor: Prof. Faith Ellen)
Title: Non-Blocking Mergeable Search Trees
Abstract: Non-Blocking Mergeable Search Trees are a Concurrent Data Structure supporting the operations Insert, Delete, Search, Split and Merge in an asynchronous system. My talk discusses the challenges with the design of this data structure and an approach to addressing them.
Third Talk: Ryan Marten (Mentor: Charles-Olivier Dufresne Camaro, Supervisor: Prof. Sven Dickinson)
Title: Homogeneity in images and its use in object segmentation
Abstract: I will present my work from this summer on training a convolutional neural network to classify regions as homogeneous or heterogeneous. This problem is complex and interesting because even humans can find that there is sometimes no definite right classification for a given image region.
Fourth Talk: Yizhe Cheng and Bence Weisz (Supervisor: Prof. Azadeh Farzan)
Title: Automated Program Synthesis
Abstract: Writing a correct program is often tedious and difficult. To simplify the work of programmers, we present an algorithm, based on a counterexample-guided refinement loop (CEGAR), for synthesizing correct programs from a set of specifications. Moreover, we discuss a representation of programs, a technique for finding and generalizing counter examples, and a method for determining the existence of a proven correct program.
First Talk: Jing Xie (Supervisor: Prof. Yang Xu)
Title: Moral Sense Acquisition from Text
Abstract: We present the problem of moral sense acquisition by asking whether machines can distinguish right from wrong given simple textual input. For example, the act of laughing at someone's mistake would be frowned upon, but laughing in general would not. We draw from work in social psychology and natural language processing to construct models that predict the moral sentiment of a stated situation. We describe preliminary methodologies and show that a simple approach based on verb schematization is often most competitive. We also discuss future directions to extend this research.
Second Talk: Lara Schull (Supervisor: Prof. Yang Xu)
Title: Evolution of the Moral Lexicon
Abstract: How does the lexicon evolve? One possibility is that the lexicon is driven by changing sociocultural needs that are hard to predict.The other possibility is that the lexicon is shaped by cognitive principles in adaptation to these needs, with predictable trajectories over time. We explore these possibilities by tracking how the moral lexicon evolved over a period of 1000 years in English. We present a suite of models that postulate different hypotheses about word emergence and discuss preliminary results on the predictability of lexical evolution above and beyond chance.
Third Talk: Ben Prystawski (Supervisor: Prof. Yang Xu)
Title: Gender Differences in Child Linguistic Input Reflect Implicit Biases in Text
Abstract: In recent years, word embeddings have been shown to reflect implicit biases in society, such as the stereotypical association of nurses as female and engineers as male. Using a corpus of child-directed speech, I will present evidence that caretakers speak to children differently depending on the gender of the child and that these differences correlate with the biases quantified by word embeddings. I will also compare these findings across languages.
Fourth Talk: Yuya Asano (Supervisor: Prof. Joseph Williams)
Title: Randomized Experiments with Machine Learning
Abstract: When researchers run randomized experiments, they have different goals. For example, they not only want to help users quickly but also want scientific (statistical) evidence to claim which option is better. However, it is difficult to balance out those goals. I will discuss how machine learning could potentially help researchers achieve all of the goals at the same time and how machine learning algorithms would behave under different conditions.
First Talk: Lukas O'Callahan (Supervisor: Prof Marsha Chechik)
Title: Aiutare: A Modular Benchmarking Framework
Abstract: When running multiple competing programs on thousands of benchmark cases, effectively making use of the overwhelming amount of resulting information can be a significant challenge. Aiutare helps researchers by abstracting the benchmarking and data storage processes, allowing users to easily query output for visualization, plotting, and testing. As a case study, we compared SMT solvers (z3 and cvc4) on a benchmark set of 18,000 instances, then wrote an extension program to analyze the data generated by Aiutare, revealing multiple bugs in each solver. This example demonstrates the modularity of Aiutare, which is being developed with the intention of being easily extended by end-users in order to answer more domain-specific questions.
Second Talk: Yueze Fang (Supervisor: Prof. Peter Marbach)
Title: Online advertisement bidding algorithm
Abstract: Some websites hold a real time second price auction to decide which advertisement to display whenever a user visits the website. As a result some advertisers will pay brokers some price to obtain a certain number of ad spots targeting a certain group of users. I will be presenting our work on the algorithm to help brokers find the price that maximizes their profit, by treating this problem as an optimization problem. I will, in particular, focus on the computation power needed for the algorithm.
Third Talk: Defne Dilbaz (Mentor: Thi Ha-Kyaw, Supervisor: Prof. Alan Aspuru-Guzik)
Title: From Basics of Superconducting Circuits to 0-pi Qubit
Abstract: A qubit is the smallest data representation in a quantum computer. 0-p qubit offers topological protection against external noise. We look into the postulates of quantum mechanics to evolve the qubit Hamiltonian. We extend our understanding to different circuit models and explain the Josephson effect. We calculate the Hamiltonian of the 0-p qubit and reflect the importance of the structure. We discuss our future steps to improve the 0-p qubit: simulating results of an open quantum system and using Fast Holonomic scheme to manipulate the qubit.
Fourth Talk: Thomas Ma (Supervisor: Prof. Peter Marbach)
Title: Local Search Algorithms for Core Detection in Communities
Abstract: In every community within a network there are users who are particularly respected for their ability to consume or produce large amounts of influential content. We use an abstract model of how information is shared in a social network to develop a local search heuristic which should theoretically be able to find these users, and test an applied version of our algorithm on real-world data pulled from Twitter.
First Talk: Lun Yu Li (Supervisor: Prof. Ishtiaque Ahmed)
Title: Gender Bias Regarding Nudity Issues in Commercials
Abstract: We all agree that advertisement shapes public opinions on things and one major issue with commercials is gender stereotype. We focused specifically on the issue of nudity in commercials. We extracted key frames from the commercial videos and filtered frames without people. Then, we labeled each frame with face, body and gender. We run the frames with nudity detection APIs (Deep AI and Algorithmia) to label as nudity or no nudity. After the preprocessing of our dataset, we wrote our own deep learning multimodal model to detect nudity on the frames and to improve the accuracy of the current existing nudity detection tools. In the end, we conducted statistical analysis on gender nudity. We found interesting results regarding nudity versus gender, release year of the commercial, number of views and category of the commercial.
Second Talk: Jacob Chmura (Mentor: Gurnit Atwal, Supervisor: Prof. Quaid Morris)
Title: Learning Feature Importance for a Deep Cancer Classifier
Abstract: A major inhibition in the practical application of deep learning tools is the lack of interpretability and intuition in the results that they provide. Such is typically the case in the medical sector, where complex architectures are employed, and understanding predictions is necessary for patient safety. Given a model trained to identify primary and metastatic cancer tumors, I explore feature importance using a known propagation-based method to improve the accuracy of the model and discover patterns in its misclassification. I also briefly discuss efforts in feature engineering, and Bayesian construction.
Third Talk: Jai Aggarwal (Supervisor: Prof. Joseph Williams)
Title: An Integrative Approach to Digital Mental Health Interventions
Abstract: We aim to explore ways to create digital mental health interventions that are both effective at reducing stress and engaging enough to sustain long-term usage for the average individual. Present work is focused on translating insights from the fields of clinical and positive psychology into text messages that can be deployed to users to help achieve the above goals. Moving forwards, we seek to implement reinforcement learning to personalize content to users to ensure maximum efficacy at stress reduction and to ensure the development of healthy mental wellness habits.
Fourth Talk: Jacob Kelly (Mentor: Arvind Mer, Supervisor: Prof. Benjamin Haibe-Kains)
Title: Drug Response Prediction from Gene Expression
Abstract: Personalized medicine promises to individualize care for each patient according to their genetic makeup. To that end, we explore the problem of predicting drug response of a tumour from its gene expression. We use classical machine learning techniques, and explore formulations of the problem as classification and as regression. We consider different metrics for validating our model, and explore generalization of our model to patient data.
First Talk: Shuli Jones (Mentor: Serena Jeblee, Supervisor: Prof. Graeme Hirst)
Title: The Potential of Computer-Based Analysis of Medieval Documents for Improving Historical Knowledge
Abstract: Very little written information survives from many parts of the medieval period. Much of what we have is in the form of contracts recorded by monks, detailing transfer of property between their religious order and the laypeople; these contracts were recopied over and over, such that only the words themselves (not the materials used or writing style) can tell us anything about contemporaneous events. I explored different avenues to extract information from these contracts and will be presenting several promising methods, using statistical comparison techniques, to reveal more about the historical context of the period in which a set of these documents were written.
Second Talk: Abhishek Moturu. Alex Chang, Vinith Suriyakumar (Supervisor: Prof. Anna Goldenberg)
Title: Early Pediatric Cancer Detection from Whole-Body MRIs using GANs
Abstract: Approximately 3-10 in 100 people have Li-Fraumeni syndrome (LFS), which is an inherited familial predisposition to a wide range of certain, often rare, cancers. Early detection is key to a good prognosis in these patients, requiring frequent and regular testing. Whole-body magnetic resonance imaging (wbMRI) is an essential part of several well-established screening protocols for hereditary cancer patients with screening starting in early childhood. wbMRI enables accurate detection of pre-cancerous and cancerous lesions across a diverse set of tissues. To date, machine learning (ML) has been used on wbMRI images to stage adult cancer patients. It is not possible to use such tools in pediatrics due to the changing bone signal throughout the growth period as well as the difficulty of obtaining these images in young children due to movement and limited compliance as well as rarity of positive cases. The novelty of our project is three-fold: 1) using generative adversarial networks (GANs) to create a large augmented set of wbMRI images with and without cancer; 2) using ML for cancer detection in children from the augmented dataset; 3) making use of federated learning, where the models are tested and adopted at Children’s Hospital of Philadelphia directly with no need for explicit data sharing. We aim to improve cancer screening in children in order to facilitate earlier diagnosis and potentially less aggressive cancer treatment. Our proof of concept project has very broad applicability, including early detection of metastases in common adult cancers.
Third Talk: Yongzhen Huang (Prof. Maryam Mehri Dehnavi)
Title: Cross-Kernel Fusion for Optimizing Sparse Linear Algebra
Abstract: Sparse linear algebra is widely used in many scientific and machine learning applications. Previous work has focused on dense linear algebra kernel fusion. Sparse linear algebra kernel fusion, however, has not been researched extensively to the best of our knowledge. This work introduces a novel approach to kernel fusion in sparse linear algebra based on sparsity pattern and data dependency to achieve better performance.The eventual goal is to create an automatic code-generator to improve the performance of sparse linear algebra computations on modern parallel architectures.
Fourth Talk: Ruiqi Wang (Mentor: Ella Rabinovich, Supervisor: Prof. Suzanne Stevenson)
Title: Towards Understanding of Code-Switching in Written Multilingual Discourse
Abstract: Code-switching -- the mixing of two languages within a single interaction -- is a common phenomenon in bilingual communities. Better understanding of this phenomenon will shed light on linguistics and cognitive motivations for mixing languages, as well as facilitate computational algorithms for processing and generating code-switched utterances. In this work, we draw on insights from theoretical and experimental linguistics to construct a hypothesis on prediction of this phenomenon. We present a distributional semantics approach to study code-switching at scale and discuss preliminary findings on factors that explain this phenomenon in written discourse.