Select datasets and repositories used by Precision Health team members are listed below. This is not an exhaustive list of what is available. Consider using the NIH Finder tool to see what database and repository is right for your work https://www.nnlm.gov/finder

WU One Protocol One Consent
Information flow for using a standardized, common consent to create a shared, institutional genomic database linked with the research medical record.

What is the “one protocol one consent”?
The “one protocol one consent” is an IRB approved protocol and consent that will permit the Institute of Informatics (I²) to link participant or patient genetic data that you collect to the research copy of their BJC electronic health record.

What is the purpose of the “one protocol one consent”?
The goal of this standardized protocol and consent is to reduce the burden for individual researchers wishing to make progress in genomics and precision medicine, both for their teams and the wider research community. The standardized consent incorporates best-practices language for informing participants about genomics research and potential return of research results. It has been approved by the IRB for use by all WashU/BJH researchers and clinicians. Use of this protocol and consent as a companion to the consent for your own study or in your clinical care will facilitate future analyses for your team and lead to a large campus-wide resource for long-term research in precision medicine.

How do I get started?
Researchers and clinicians interested in using the “one protocol one consent” should contact the Precision Health Navigator Tricia Salyer at salyerp@wustl.edu to be added to the study’s protocol and provided instructions for use.

UK Biobank

ICTS UK Biobank Genomic Repository

To facilitate genomic research, the ICTS Precision Health Function has established the ICTS UK Biobank Genomic Repository.  This Repository includes genomic data for 500,000 participants from the UK Biobank and has been enriched through annotation by the McDonnell Genome Institute (MGI).  Using the ICTS UK Biobank Genomic Repository allows researchers to access a curated and enriched version of the data.  The Repository is stored with Research Information Services (RIS) and is available for access by all approved UK Biobank users at WUSM.

The vast majority of UK Biobank data is accessible only within their cloud environment. Access to the UK Biobank data is available for a fee and cloud resources are charged when used.

UK Biobank offers free credits to early career researchers (within 4 years of degree, 4 years of starting their first academic appointment or student status). More than $40,000 available per user. This program is available whether or not your PI has access to UK Biobank. Enrollment gives you access to full genomes, exomes, imaging, proteomes and more.

If you have an approved UK Biobank project and would like access to the ICTS UK Biobank Genomic Repository for yourself and your research team, please email the Precision Health Administrator Debra Warren at debrawarren@wustl.edu.

Details about the ICTS UKB Repository can be found in the documents below.

dbGap

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in humans. Investigators can request access to dbGaP datasets for approved research projects, or they can deposit their own data.

The documents below provide detailed guidance for researchers who wish to request and submit data to dbGaP.

Contact Jenny McKenzie at j.mckenzie@wustl.edu with any questions about these documents.

Greater Plains Collaborative

The Greater Plains Collaborative (GPC) is a network of 13 leading medical centers in 8 states committed to a shared vision of improving healthcare delivery through ongoing learning, adoption of evidence – based practices, and active research dissemination. The network brings together a diverse population of over 34 million patients. More information can be found on their website: https://gpcnetwork.org/ Researchers can submit data requests for patient counts, de-identified data or limited data at the GPC Query and Data Request form.

All of Us Research Program

The All of Us Research Program, part of the National Institutes of Health, is building one of the largest biomedical data resources of its kind. The All of Us Research Hub stores health data from a diverse group of participants from across the United States.

There are three tiers of data available. The public tier contains aggregate data with identifiers removed. Registered tier data contains electronic health records (EHRs), wearables, surveys, and physical measurement data. Controlled tier data contains genomic data in the form of whole genome sequencing (WGS) and genotyping arrays, including previously suppressed demographic data fields from EHRs and surveys.

Approved researchers can access All of Us data and tools to conduct studies to help improve our understanding of human health.

Who at Washington University is publishing with this dataset? View the list here (updated monthly, last update November 2024) To request list edits contact ictsprecisionhealth@wustl.edu

To learn more about the All of Us Research Program and the resources available to researchers, please review the following resources:

Want to learn more about All of Us?

Digital Commons Data@Becker (WUSM Data Repository)

Digital Commons Data@Becker is a repository for faculty, staff, students and trainees at Washington University School of Medicine to share their data and supporting files in compliance with funder and publisher policies.

To start the data sharing process, submit the Data Management and Sharing Consultation Request form.

For more information about our services in this area please visit Becker Medical Library’s Data Management and Sharing site or contact Seonyoung Kim and Xing Jian at BeckerDMS@wustl.edu.

Patient-Derived Models Repository

The National Cancer Institute (NCI) developed a national repository of Patient-Derived Models (PDMs) comprised of patient-derived xenografts (PDXs), in vitro patient-derived tumor cell cultures (PDCs) and cancer associated fibroblasts (CAFs) as well as patient-derived organoids (PDOrg).

These models include a limited amount of patient data including previous clinical therapies, smoking history, and race/ethnicity and have representative sequence for a sub-set of PDXs for a targeted gene panel, whole exome, and RNASeq. 

Learn more at https://pdmr.cancer.gov/

Genomic Data Commons

The NCI’s Genomic Data Commons (GDC) provides the cancer research community with a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy.

The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET).

GDC Analysis Tools empower users to explore GDC data interactively, fostering a robust cancer genomics knowledge base. These cohort-centric tools facilitate gene-level variant analysis and clinical data examination, enabling users to analyze custom cohorts within the GDC Data Portal.

Learn more at https://gdc.cancer.gov/

St. Jude Cloud Genomics Platform

St. Jude Cloud, an initiative of St. Jude Children’s Research Hospital, provides data and analysis resources to the global research community. Their goal is to empower researchers across the world to advance cures for pediatric cancer and other pediatric catastrophic diseases. They partner with Microsoft and DNAnexus to develop apps that are a cohesive blend of comprehensive data, scientific expertise, engineering innovation, and cloud infrastructure.

Explore and request data from one of the largest pediatric cancer genome repositories. Our app offers high-quality whole genome sequencing (WGS), whole exome sequencing (WES) and RNA-Seq data aligned to the latest reference genome, GRCh38. Data can be viewed by disease, by publication and by curated dataset.

Learn more at https://platform.stjude.cloud/

Gabriella Miller Kids First Data Resource Center

The Gabriella Miller Kids First Data Resource Center is a collaborative, pediatric research effort with the goal of understanding the genetic causes and links between childhood cancer and structural birth defects.

The Kids First Data Resource Portal provides access to newly-released, large-scale, pediatric genomic and clinical disease data and empowers accelerated discovery efforts by enabling collaborative cloud-based analyses across institutions and researchers around the globe. Data from approximately 8,000 DNA and RNA samples from children affected with cancer or structural birth defects and their families is available for analysis and cross-disease discovery.

Learn more at https://kidsfirstdrc.org/

Human Tumor Atlas Network

The Human Tumor Atlas Network (HTAN) is a National Cancer Institute (NCI)-funded Cancer MoonshotSM initiative to construct 3-dimensional atlases of the dynamic cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease.

Learn more at https://humantumoratlas.org/

WashU Research Data

What is WashU Research Data (WURD)?

WURD is a formal research data repository that supports research data sharing by providing:

  • DataCite metadata
  • Digital Object Identifiers (DOIs)
  • Open Researcher and Contributor ID (ORCiD)
  • Research Organization Registry (ROR)
  • Award Information
  • Integrated curatorial review for data quality
  • Github integration
  • Long-term preservation
  • FAIRness review
  • Indexed for search engines, integrated with discovery tools

Who can use WURD?

All Washington University faculty, students, staff, and other authorized University affiliates can use WURD and related Libraries’ services for data curation and sharing. Washington University Libraries provides data curation and sharing services to support Washington University researchers and scholars across all WashU schools and campuses.

Choosing a Data Repository

WURD is an institutional repository, and you may also consider domain and generalist repositories available to you. Becker Medical Library also offers repository services tailored for School of Medicine affiliates. More information on their services can be found on the Becker Medical Library Data Management and Sharing website. Library staff on either campus are available to assist you in choosing the most appropriate repository option.

How to Get Started

You can start your data deposit today at the WURD website.

Questions?

Contact researchdata@wustl.edu.

National Brain Gene Registry

The National Brain Gene Registry is a highly collaborative initiative. They aim to better understand the impact of rare gene variants in intellectual and developmental disabilities, with the vision of improving the lives of individuals and families touched by these conditions. Researchers, patients and families can join.

Led by investigators at Washington University in St. Louis, Harvard Medical School/Boston Children’s Hospital, and the University of North Carolina, the National Brain Gene Registry is funded by the NIH’s National Center for Advancing Translational Sciences. Thirteen leading US research institutions are participating in this initiative, enrolling participants and contributing to this rich source of genotypic and phenotypic data.

Want to learn more?

Visit the website