Presenters

Brittany Baur

Postdoctoral fellow in Genomic Sciences Training Program (GSTP) |
Wisconsin Institute for Discovery |
University of Wisconsin, Madison

Da-Inn Erika Lee

PhD Student in Biomedical Data Science | Department of Biostatistics & Medical Informatics |
University of Wisconsin, Madison

Xiaotong Liu

PhD student in Bioinformatics and
Computational Biology |
University of Minnesota, Twin Cities

Henry Ward

PhD student in Bioinformatics and
Computational Biology |
University of Minnesota, Twin Cities

Agenda

 
 
 
 
 
10:00

Introduction

Everyone

We’ll introduce ourselves, briefly talk about the motivation behind the tutorial, and go over the plan for the worshop.
 
 
 
 
 
15:00

Non-negative Matrix Factorization (NMF)

Da-Inn Erika Lee

Datasets can often be represented as a matrix - think Netflix dataset where the rows represent the users, the columns the movies, each element in the matrix the star ratings. Matrix factorization yields lower-dimensional factors that allow co-clustering, e.g. groups of users with similar taste and groups of movies they tend to like. We’ll go over the intuition behind matrix factorization, its popular variants, and a demo with a single-cell RNA-seq dataset.
 
 
 
 
 
15:00

Principal Component Analysis (PCA) and its variants

Xiaotong Liu

Principal component analysis (PCA) has been widely used as a computational technique for dimensionality reduction and data visualization. It can simplify the complexity in high-dimensional biological data while capturing the major variance trend within the data. In this tutorial, we will present the mathematical basis of PCA and its application in biological study. Variants of PCA including linear discriminative analysis (LDA) and generalized discriminative analysis (GDA) will also be discussed regarding their usage in dimensionality reduction.
 
 
 
 
 
15:00

t-SNE and UMAP

Da-Inn Erika Lee and Henry Ward

t-SNE and UMAP are approaches whose main aim is visualization: projecting high-dimensional datasets in 2- or 3-dimensional space to explore easily discernable patterns or clusters. Intuitively, these methods preserve the similarity or distance between data points in the original high-dimensional space when the data points are projected to 2D or 3D space. We will cover their differences and and their application in single-cell RNA-seq data.
 
 
 
 
 
15:00

Short break

 
 
 
 
 
15:00

Diffusion map and spectral clustering

Brittany Baur

Diffusion maps are a non-linear dimensionality reduction technique used in many areas such as defining differentiation trajectories in single cell analysis. Diffusion maps aim to identify the underlying the lower dimensional structure (manifold) that the data has been sampled from. Unlike PCA, diffusion maps create a lower dimensional representation even when the underlying manifold is non-linear. Diffusion maps provide a global description of the dataset by characterizing the relationship between the samples using heat diffusion and random walk Markov chain followed by creating a low-dimensional embedding. Spectral clustering is an application of diffusion maps that is used extensively to identify communities in graphs.
 
 
 
 
 
15:00

Autoencoders and dimensionality reduction with neural nets

Henry Ward

 
 
 
 
 
05:00

Wrap-up

Everyone

We’ll point you to the tutorial website, code repository, and recommended reading list.

Materials

Slidedecks and demo codes can be found here:
https://github.com/dimension-reduction/slides-and-code