Note: If you would like to use the CRDW for COVID-19 research, please refer to this summary of available resources and find further information here.

The CRI’s Clinical Research Data Warehouse (CRDW) is one of the deepest, richest, and most research-ready data repositories of its kind.

Containing more than a decade of University of Chicago medical data, it seamlessly brings together multiple internal and external data sources to provide researchers with access to more than 12 million encounters for 2.3 million patients. The associated diagnoses, labs, medications, and procedures number in the tens of millions each.

Our team of developers, nurse informaticians, and data analysts are experts in working with clinical data to create clean, cohesive datasets for research. Since its launch in 2012, the CRDW has contributed to dozens of publications and grants.

The CRDW is run on IBM Netezza Pure Data System for Analytics servers, a patented Asymmetric Massively Parallel Processing architecture designed to deliver exceptional query performance and modular scalability on highly complex mixed workloads.


The CRDW contains data from the University of Chicago Medicine (UCM) from 2006 to the present. We draw from a broad range of internal and external data sources, including Epic electronic medical records (EMR), the Centricity billing system, the Cancer Registry, the National Death Registry, LabVantage, and REDCap.

CRDW data elements include patient demographics, lab values, procedure and diagnosis codes, medications, and visit information. Our self-service cohort discovery tools are an easy way to explore the data in the CRDW and see what’s available.

How We’re Growing

The CRDW is continually enriched with new and deeper data sources to drive world-class research at the University of Chicago. A major factor in this growth has been UCM’s partnerships with healthcare institutions throughout the Chicago area. UCM completed its merger with Ingalls Memorial Hospital, located in Harvey, IL, in 2016. In addition, partnerships with other local healthcare providers have been established, including Silver Cross Hospital, Edward-Elmhurst Health, and Little Company of Mary Hospital. Through these agreements, certain clinical data for patients are aggregated within the UCM EMR system and become available for the CRDW. The numbers tell the story: in 2016, the total patients in the CRDW stood at 900,000; today researchers have access to 2.1 million patients. The CRDW is also growing its diversity of national and international patients. As a highly ranked teaching hospital, UCM is a regional and national destination for patients with some of the most difficult-to-treat diseases. Finally, our team works continually to obtain access to new domains and registries and to integrate new subject areas from existing data sources; these additions are often based on the interests and needs of our researchers. Thanks to all these efforts, the pool of patients and data points in the CRDW is consistently growing in both depth and breadth.


2,394,417 patients

12,827,152 encounters

65,258,272 procedures

58,053,655 medications

28,700,293 diagnoses

433,445,743 labs


Very large datasets are a necessity for much of the biomedical research being done today. This emphasis on big data makes repositories that bring together data from multiple sources, as the CRDW does, an essential research resource. However, a large quantity of data isn’t a meaningful asset unless the data is also of high quality. For data coming from disparate sources, this means not only that the data is clean and checked for discrepancies, but also that it is harmonized and mapped to a consistent data model so that data from different sources can be accurately compared and analyzed.

Our team has drawn on decades of experience working with clinical data to develop standards and procedures, including a customized extract, transform, and load (ETL) process, that ensure that all new data added to the CRDW meets these high standards. In addition to ensuring the reliability and usability of the datasets we curate, our commitment to standardized data models and good data governance practices makes it possible for us to collaborate and share data with other institutions.

Common Data Models

The CRI is committed to investing in curation and standardization of data through shared data models for the purposes of portability and analysis. Our participation in data-sharing initiatives with other institutions, such as CAPriCORN and AllofUs, requires both data standardization and shared data models. To this end, we have built data transformation pipelines to populate and maintain an instance of Observational Medical Outcomes Partnership (OMOP).

In an effort to shift the current, internally-focused paradigm in health care data warehousing, we have also aligned ourselves with other large academic medical centers in the development of multi-institutional collaborative health care data warehouses. We believe that real value from medical data will be achieved through its portability when common standardized data models allow institutions to seamlessly work with one another. Our commitment to shared data models is making a positive impact on the quality and scope of current and future research at both the University of Chicago and other institutions.



Our nurse informaticians bring both detailed technical knowledge and decades of experience on the clinic floor to their role in guiding investigators through the data request process. They are experts in viewing your research questions through the lens of what relevant data points are available in the CRDW, and translating these questions into effective data requests.

Join us for a free consultation at to discuss your research question, explore what relevant data is in the CRDW, and design a request that will give you exactly the data you need.

CRDW Consultation

For data intensive clinical research (excluding clinical trials), please request a consultation with the CRI team.

If you have any questions, please contact


Dataset Curation

Once we’ve worked with you to customize your request, our team of data analysts will translate it into SQL queries and extract the data, which will be securely delivered to you in research-ready form. Data we provide may be identified or de-identified via the Safe Harbor method, depending on the IRB protocol and needs of the project.


For researchers at the University of Chicago, the CRI and OCR have enhanced the process for obtaining data used for in clinical research studies (excluding clinical trials). This enhancement is designed to ensure a more streamlined process to obtain relevant data by providing a pre-IRB and/or DUA submission consultation opportunity with CRI staff to assure feasibility as well as optimal mechanisms for data delivery. Please visit myCRI login ( to request a consultation with the CRI team. (Please note that you must be using the University VPN to access this link.)

For more detailed information about using the CRDW as a University of Chicago researcher, please read our CRDW User Guide.

For our external clients, CRDW partnerships are established on a case-by-case basis to provide exactly what you need for your studies. Get in touch with us to discuss how we can collaborate!


Our work with the CRDW goes beyond the discovery phase of research. Our team also includes experts in advanced clinical data analytics, allowing us to collaborate on studies and generate deeper levels of meaning from CRDW data. Sr. Statistician and Research Assistant Professor Anoop Mayampurath, PhD, and Scientific Software Engineer Tomasz Oliwa, PhD, apply their specialized knowledge to work laterally across different types of clinical data and take research to new levels using innovative techniques.

Our team has specific expertise in several key areas:

Statistical Modeling

Algorithm Development

Machine Learning

Natural Language Processing

Data Visualization


The Clinical Research Data Warehouse was designed and built by the CRI under the direction of Timothy Holper, MS, MA. Our CRDW team now maintains and refreshes the warehouse, obtains and integrates new data sources, and keeps data harmonized and research-ready. Our nurse informaticians use their knowledge of both medical and technical domains to consult with investigators and curate data sets to effectively answer their research questions, which our experienced developers then translate into efficient SQL queries. Statisticians collaborate with investigators to provide advanced analytics.

Get to know our team




Environmental Influences on Child Health Outcomes

Elligo/FDA Data Harmonization

Elligo/FDA Data Harmonization



Chicago Area Patient Centered Outcomes Research Network



Chicago-area Shared Health Research Information Network

Comprehensive Care Program

Comprehensive Care Program