The analysis of data that transforms high-throughput raw data into biologically meaningful information can present a challenge to clinical, translational, and basic researchers alike. The CRI Bioinformatics Core offers services and expertise designed to allow BSD investigators to take full advantage of high-throughput technologies.

 View a selection of publications partially made possible by our bioinformatics analysis work.



  • Bioinformatics analysis of high-throughput biological data, including proteomics, using our well-defined analysis pipelines
  • Consulting services for custom analysis beyond our standard pipelines, including genome-wide association studies
  • Grant writing assistance, including assistance fully developing the bioinformatics components of a grant, cost analysis, letters of support, and documenting the availability of tools and expertise to complete the research indicated
  • Free training sessions in bioinformatics topics: visit the CRI Seminar Series to find out about upcoming classes and download materials from past sessions



The Bioinformatics Core’s work combines the use of powerful computing resources, advanced analytics tools, and a commitment to security to transform large amounts of raw data into meaningful results. We use the CRI’s high-performance computing cluster and large-scale storage resources, with which complex analytics can be executed in parallel or distributed environments to produce fast data processing rates, improving application performance and cost effectiveness. Through advanced high-throughput analytics solutions, we dig down to the root of each computational challenge and design the most direct path to a solution. All our work is protected with the CRI’s automated, resilient backup and security systems, ensuring data integrity and access controls that are aligned with standards required by HIPAA, FISMA, and other regulations.



Planning an experiment? The Bioinformatics Core has created a guide to Bioinformatics Experimental Design to help you get the best results.

For each project request, the Core creates a proposal for the researcher that includes the scope of deliverables, a timeline for completion, and the estimated cost. The execution of the project is guided by frequent discussion between researchers and bioinformaticians, with progress updates provided regularly. Project results are delivered in the form of a written report.


Questions? Email us here.
Have you worked with us before? We’d love your feedback.


We currently offer bioinformatics pipelines for genomic and proteomic data.



  • RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis
  • ChIP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation
  • Exome Sequencing: Raw Data QC, Pre-processing, Mapping with 3 different tools, Realignment and Quality Recalibration, Multiple Samples Variant Calling, Variant Annotation, Varian Comparison, Filtration, and Summarization
  • Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation
  • Consensus Genotyping Pipeline: Genotyping, SNP Detection and InDel Detection using three different methods (Samtools, GATK and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls
  • De-novo Assembly: Raw data QC, Merging, Clipping, Filtering, Contigs Assemble, Scaffold Assembly, Assemble Statistics, and Downstream Analysis
  • Somatic Mutation Detection for Tumor/Normal Pairs: Raw Data QC, Pre-processing, Mapping with 2 different tools, Realignment and Quality Recalibration, Somatic Mutation Detection with 4 different tools, Variant Annotation, and Summarization
  • Small RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Quantification, Detection of Differentially Expressed miRNAs, Putative Gene Target Prediction, and Pathways Analysis

Illumina and Affymetrix Expression Arrays

  • Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis

Affymetrix and Exiqon miRNA Arrays

  • Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis

scRNA-seq pipeline

  • We provide bioinformatics analysis service for single-cell transcriptomic sequencing data generated by different technologies, including UMI-based Drop-seq and 10x Genomics Chromium, as well as non-UMI-based CEL-seq and Smart-seq. Our pipeline consists of (1) data de-multiplexing; (2) read alignment; (3) processing of cellular and molecular barcodes; (4) read count table; (5) quality control and normalization; (6) removal of debris, dead cells, and doublets; (7) detect batch effects; (8) clustering and identification of cell subsets; (9) cluster visualization using tSNE or UMAP; (10) marker gene identification; (11) differential gene expression analysis; (12) GO term and pathway analysis. Our customized pipeline can also perform single cell lineage analysis and other study-specific analyses upon request.



Proteomics Analysis Pipeline

  • Raw MS data processing, deisotoping, data conversion, spectral filtering, spectral library creation, extracted ion chromatogram generation (MASIC), peptide and protein identification (X!Tandem, MaxQuant, Mascot, MSGF+), quantification (MaxQuant, Scaffold), targeted assay generation (Skyline), and dataset alignment and feature generation (MultiAlign)

CyTOF pipeline

  • We provide bioinformatics analysis service for single-cell Mass Cytometry (CyTOF) data. Our pipeline consists of (1) pre-gating (removal of debris, dead cells, and doublets); (2) diagnostic analysis; (3) marker ranking; (4) clustering, identification, and annotation of cell subsets; (5) cluster visualization using tSNE or UMAP; (10) differential analysis, including differential cell population abundance, differential analysis of marker expression stratified by cell population, and differential analysis of the overall marker expression. Our customized pipeline can also be incorporated into other study-specific analyses upon request.

Additional Services

  • Profiling of post-translational modifications (phosphorylation, glycosylation, etc.), statistical analysis for labeled and label-free proteomics, and pathway enrichment analysis (Ingenuity Pathway Analysis)

We also offer custom-made pipelines and expertise for types of analysis not listed above, including genome-wide association studies. Email us to get started!


The Bioinformatics Core maintains the following catalogs of software tools, reference datasets, and databases for use by the BSD research community:


Our team of bioinformaticians offer a diversity of experience and areas of expertise.

Mengjie Chen, PhD
Faculty Director, Bioinformatics Core

Mengjie Chen is an assistant professor in the Section of Genetic Medicine in the Department of Medicine and the Department of Human Genetics at the University of Chicago. Before joining UChicago, she was an assistant professor in the Department of Biostatistics and Genetics at UNC-Chapel Hill from 2014 to 2016. She obtained her PhD in Computational Biology and Bioinformatics from Yale University in 2014. Dr. Chen was a recipient of the Alfred P. Sloan Research fellowship in Computational and Molecular Evolutionary Biology in 2019. As a computational biologist and statistician by training, Dr. Chen’s research bridges statistical methodological advances and biomedical applications. Her group develops computational methods and open source tools to address challenges posed by high-throughput technologies for data analysis and interpretation. She has led analysis for numerous genomics projects, including single cell RNA-seq analysis for female reproductive systems in Human Cell Atlas project. Other scientific highlights in cancer genomics include: profiling esophageal squamous-cell carcinoma for Asian population for the first time and profiling matched primaries and multiple metastases from 16 breast cancer patients. As Faculty Director, Dr. Chen oversees development of all bioinformatics projects, facilitates on-campus team science, and sets overall policy/infrastructure for the Bioinformatics Core Facility.

Chang ChenChang Chen, PhD

Chang obtained his PhD in Bioengineering from University of Illinois at Chicago. Prior to joining Bioinformatics Core in November 2020, Chang worked as a postdoctoral researcher at Northwestern University Feinberg School of Medicine. Chang’s current projects focused on Next Generation Sequencing (NGS) data analysis and pipeline development, and statistical modeling for biomarker discovery. Chang has a broad expertise among biomedical sciences and bioinformatics, including genetics and epigenetics of human diseases, cancer genomics, protein function and structure, and public health.

Wenjun KangWenjun Kang, MS

Wenjun joined the CRI Bioinformatics Core in 2012. He has extensive experience in developing web-based applications for clinical studies, molecular pathology labs, and project management. His areas of expertise also include biostatistics analysis, study design, and software development. Wenjun’s preferred languages are Python/Django, jQuery, MySQL for web development, and SAS/R for statistical analysis. He holds master’s degrees in Biostatistics and Health Informatics from the University of Minnesota.

Yan LiYan Li, PhD

Yan received her PhD in Bioinformatics (December 2013) and dual MS degrees in Statistics and Plant Pathology (August 2009) from the University of Georgia. Prior to joining the CRI in 2014, she worked on multidisciplinary projects using advanced computational tools, statistical methods, and mathematical modeling to address a variety of questions across the biological sciences. Currently, she is focused on developing and applying bioinformatics computational software and pipelines to facilitate the analysis of Next Generation Sequencing (NGS) data. She works closely with faculty and researchers to explore genomics-associated biomedical questions, including the NGS data analysis of RNA-Seq, ChIP-Seq, small RNA-Seq, and exome sequencing.

Ziyou RenZiyou Ren, PhD

Ziyou received his PhD  in biomedical informatics and dual MS degrees in Statistics and Clinical Investigation from Northwestern University. Prior to joining the Bioinformatics Core, Ziyou worked as a postdoctoral scholar at Northwestern University. Currently, he works on development and application of innovative computational algorithms, advanced pipelines, and statistical models on genomics analysis. His projects include complex network analysis, machine learning, data visualization, variant calling, and differential gene analysis of next generation sequencing data in high performance computing environment. He has expertise in multiple programming languages and platforms as well as strong clinical and biological knowledge to interpret NGS results.

Lisha ZhuLisha Zhu, PhD

Lisha received her PhD in Computational Biology (2013) from the University of Chinese Academy of Sciences. Before joining the CRI Bioinformatics Core in 2020, she was a research scientist in the UT Health Science Center at Houston. She has a lot of experience with Next Generation Sequencing (NGS) data analysis, including RNA-Seq, ChIP-Seq, ATAC-Seq, DNA-Seq and DNA methylation data.  Currently, she is mainly focused on applying bioinformatics approaches to single cell sequencing data analysis, including scRNA-Seq and scATAC-Seq.