Call/WhatsApp: +1 332 209 4094

Genomics and Proteomics in Biomedicine

You may use your text, lectures, notes, old assignments, and online resources to answer these questions to the best of your ability. But you CANNOT discuss with your classmates! If you were found to copy anything from another person, both of you would be scored zero (0)!

By signing this exam, I certify that I did not collaborate with or receive help from anyone else regarding this exam. This work is of my own thoughts and ideas and I have referenced others as appropriate.

E-SIGNATURE
Also put your name in filename in Word file, e.g., JohnSmith_Final_TakeHome_Exam.docx

Total: 120 points

I. Multiple Choices (3 points for each question)

1. Approximately what’s the percentage of mRNA in total RNA?
A. 5% C. 25%
B. 15% D. 35%
ANSWER:

2. Which of the following is NOT interspersed repeats in eukaryotic genome?
A. SINE C. LINE
B. LTR D. FASTA
ANSWER:

3. Which of the following is NOT a human disease database?
A. OMIM
B. SRA
C. HGMD
D. GeneCards
ANSWER:

4. Which of the following sequencing platform is NOT based on Sequencing by Synthesis (SBS)?
A. Roche 454 C. Illumina/Solexa
B. ABi SOLiD D. LifeTech Ion Torrent
ANSWER:

5. The yeast two-hybrid method is used to
A. determine if two genes are turned on together
B. make hybrid proteins for in vitro isolation
C. identify protein-protein interactions
D. identify protein-DNA interactions
ANSWER:

6. Which of the following is a criterion for selection of genome for sequencing?
A. Relevance to human disease
B. Relevance to basic biological questions
C. Genome size
D. All of the above
ANSWER:

II. True/False (3 points for each question)

If the statement is True, place a “T” in front of the number. If it is false, place an “F” in front of the number and explain why it is false.

7. DNA amplification is not required for most of Next Generation Sequencing technologies, such as 454, Illumina and SOLiD.

8. Both RNA-Seq and Exome-Seq can detect gene expression levels.

9. Next Generation Sequencing can detect SNPs, point mutations, chromosomal rearrangements, chromosomal duplications, DNA methylations, gene expression levels, novel transcripts and alternative splicing.

10. 45% of the human genome or more consists of repeats derived from transposons.

III. Short Answers (your answers should be concise and to the point)

11. In Illumina/Solexa sequencing technology, what unique design ensures that one and only one dNTP is incorporated into the newly synthesized DNA and is able to be detected in each chemistry cycle? (5 points)
Answers:

12. You are planning on hiking to the top of Mount Everest. This hike will take you to an unknown climate of decreased oxygen and atmospheric pressure. You are interested in how this hike might affect your muscle cell transcriptome.
a. Describe how you might use next-gen sequencing (NGS) to determine this. Include all possible steps. (5 points)

b. What technique(s) might you use, other than NGS and microarray, to verify your results? (2 points)

13. What are the 8 model organisms for functional genomics? Why were they chosen? (4 points)
Answers:

14. Mutations in Htt cause Huntington Disease (HD). What is the phenotype of the disease? Which chromosome is Htt localized? (4 points)
Answers:

15. What are the significant difference between structural genomics and classical structural biology? (4 points)
Answers:

16. What’s difference between reverse and forward genetics? (4 points)
Answers:

17. What’s the general workflow of CRISPR/Cas9 gene editing approach? What’s the critical step to target the specific gene? (4 points)
Answers:

IV. NGS data analysis
18. In this experiment, a transcription factor (TF) was studied using ChIP-Seq. The chromatin DNA associated with TF was immunoprecipitated by the antibody against TF, and subjected to high-throughput sequencing. This is the “ChIP” data. Meanwhile, the genomic DNA without immunoprecipitation was also sequenced, called “input”, which is regarded as background. The dataset is listed below:
http://104.192.7.254/temp/ChIP.fq.gz
http://104.192.7.254/temp/input.fq.gz

Your task is to analyze the dataset to identify the binding sites using Galaxy. During the analysis, use human “hg19” (Human Feb. 2009 (GRCh37/hg19 (hg19)) as reference genome assembly, and tag size of 60bp when appropriate.

Note: Due to the long analysis and waiting time, do it as early as possible!

1) Report how many binding sites (regions) are identified. Also take a screenshot to show the number of regions. (15 points)

2) Report how many regions for each chromosome. Do not include the contigs with names “random”, “hap” or “chrUn”, just report “chr1 to chr22”, “chrX”, and “chrY”. (5 points)

3) Report the top 10 regions (without “random”, “hap” or chrUn” in chromosome names) of strongest binding. (5 points)

4) Visualize the strongest binding site in chr3 in UCSC Genome Browser. Take a screenshot. Report which gene(s) are closest to this binding site and briefly describe the gene(s) (limit 1 sentence). (8 points)

5) Find the closest peak (binding site) of gene “BRAF” that is in upstream promoter region (not in intron region), report the coordinates of this peak and take a screenshot of this peak along with RefSeq track. (5 points)

6) Design a set of primers to detect this binding site. Report the DNA sequence you are using for primer design and give the primer set of your design. (5 points)

7) Visualize the primers obtained from step b) in UCSC Genome Browser. (5 points)

8) Briefly describe this gene “BRAF”. (5 points)

9) With the identification of a binding site of BRAF gene promoter, a qPCR experiment was conducted to verify. The qPCR data were listed below. Plot the data into a bar graph with error bars. (5 points)

BRAF Ct Conc.(ng/ul)
IgG 33.02 0.060
34.56 0.023
33.86 0.036
CREB ChIP 30.30 0.340
30.83 0.243
30.32 0.337