uc-0006: Gathering blood cancer data sets
Completion Date: ✅ September 2020
Tutorial walkthrough of this use case
NIH Goal:
Enhance the ability to ask scientific questions across data sets
Persona
p-001: Clinical Researcher
Objective
obj-0001: Multi DCC Comparison
Description
Acute Myeloid Leukemia (AML) is a type of blood cancer. In AML, the affected myeloid cells, a type of white blood cells, are not functional and build up in the bone marrow leaving reduced capacity for healthy white and red blood cells. While risk factors for developing AML exist, often times the underlying cause remains unknown. Gene mutations and chromosomal abnormality in the leukemia cells occur sporadically. Characterization of the wide spectrum of genetic events involved in AML will aide in better understanding of its etiology and ultimately in development of improved therapy.
Amberose would like to combine whole genome sequencing (WGS) data with global transcriptomic profiling using RNA-sequencing (RNA-seq) to look for functional dysregulation of a few genes. They know that there are likely already data sets created by NIH researchers that they could use for their initial hypothesis generation, and decide to start by searching Common Fund data at the CFDE portal.
Amberose navigates to the CFDE portal, and searches by Biosample, then filters that list by Anatomy, and searches within those results for 'blood' and 'venous blood'. They then filter these results to keep only whole genome sequencing assay (WGS) and RNA-seq assay values. Amberose then looks at the "Project" filter and finds that the several thousand results in their search belong to only 17 projects. By reading the project information for each, Amberose narrows their search down to the two that seem the most applicable: Genotype-Tissue Expression (GTEx)and TARGET: Acute Myeloid Leukemia. Amberose exports the list of files that belong to those two projects, which they can use to actually obtain these files (or request access to them) at their parent portals (Kids First and GTEx).
Tasks for this use case:
-
t-0001: Access CFDE interface ✅ June 2020
-
t-0018: Search/filter data sets by biosample ✅ June 2020
-
t-0006: Search/filter data sets by anatomic terms ✅ June 2020
-
t-0005: Search/filter data by assay type terms ✅ June 2020
-
t-0019: Search/filter data sets by project ✅ June 2020
-
t-0003: Export a file of results ✅ June 2020
Requirements for this use case:
-
r-00001: The interface will support GUI web access to end users ✅ June 2020
-
r-00002: The interface will support user authentication ✅ June 2020
-
r-00034: The interface will support the selection of a biosample of interest ✅ June 2020
-
r-00035: The C2M2 model will support information relating biosamples to CF programs ✅ June 2020
-
r-00036: The catalog will store information relating Uberon terms to CF programs ✅ June 2020
-
r-00003: The interface will support the selection of an Uberon term of interest ✅ June 2020
-
r-00004: The C2M2 model will support information relating anatomy terms to CF programs ✅ June 2020
-
r-00005: The catalog will store information relating anatomy terms to CF programs ✅ June 2020
-
r-00006: The interface will support the selection of an assay type term of interest ✅ June 2020
-
r-00007: The C2M2 model will support information relating assay types to CF programs ✅ June 2020
-
r-00008: The catalog will store information relating assay types to CF programs ✅ June 2020
-
r-00010: The catalog will store information relating projects to CF programs ✅ June 2020
-
r-00011: The C2M2 model will support information relating projects to CF programs ✅ June 2020
-
r-00014: The interface will support end user download of tables and figures in common formats ✅ June 2020