Chung’s major research interests lie in (a) the application and (b) the development of statistical methods for biomedical studies including, but not limited to, CPD’s disease diagnostic research. He provides statistical support to investigators on study design, and data analysis and interpretation, which is in direct relation to research interest (a). His contribution helps to generate high-quality publications and secure grants from public or private funding agencies. These projects often bring challenges especially when existing statistical methods are inadequate for addressing all of the research questions. Through research interest (b), he aims to solve these challenges by developing new statistical methods, which contribute to CPD’s scientific research (Figure 1). His methodological expertise includes survival data analysis, isotonic regression and statistical evaluation of biomarkers. His group uses quantitative methods, for example mathematics, statistics, computer science or biomedical informatics, to better understand biomedical data from a variety of sources.

Figure 1. Chung's Research Scope

Project 1. Machine Learning Algorithm for Personalized Disease Diagnostics

Many diseases are heterogeneous in nature between subjects. Breast cancer, for example, is a heterogeneous disease comprised of multiple subgroups with different responses to treatment and divergent prognoses. When screening or diagnosing a heterogeneous disease, discovering a single biomarker that can accurately detect the disease for all subjects is highly unlikely. Precision diagnostics is a useful alternative to account for the heterogeneity by assigning more accurate biomarkers to each individual according to their specific predictors (Figure 2). His research aims to develop a novel data-driven approach to identify personalized diagnostic rules for optimal biomarker selections and personalized disease diagnostics.

Figure 2:  Example of a personalized disease diagnostic process.

Project 2. Order-Restricted Survival Analysis

Survival analysis is a branch of statistics that analyzes the time until an event of interest occurs, such as death or disease relapse. One of his research areas is to extend survival analysis under order-restrictions when its hazard function is known to have a specific shape prior to data analysis. More formally, he proposed the isotonic proportional hazards model in the form of λ(t│z)=λ_0 (t)exp(φ(z)), where λ_0 (∙) and φ(∙) were baseline hazard and monotone increasing functions with respect to t and z. He developed an efficient computational method to estimate φ(∙) by maximizing the partial likelihood. The motivating example was a HIV data, where the proposed isotonic method captured nonlinear and monotone covariate effects of CD4 (Figure 3). An R package isoSurv is available via CRAN.

Figure 3:  Example of Isotonic proportional hazards models.

Project 3. Analysis of Large-Scale Single-Cell RNA Sequencing Data

Single-cell RNA sequencing (scRNA-seq) is a powerful technology used to study gene expression, cellular heterogeneity, and the delineation of cell states within cell cultures, tissues, organ systems, and many others. Analyzing scRNA-seq data, however, is a challenge because of their large volume. For example, 10X Genomics publicly released a scRNA-seq dataset with 1.3 million cells from an E18.5 mouse brain. Analyzing such a large number of cells requires special processing capabilities. Therefore, his research focuses on developing scalable statistical methods to visualize and cluster a large number of scRNA-seq datasets (Figure 4).

Figure 4. Example of divide-and-combine algorithm.