Dissertation+of+Hojoon+Lee

found here C:\kurt\storage\CIM Research Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation

Hojoon dissertation placemark (pg. 31)

--- I e-mailed Hojoon to get the chimeric transcript info 1-15-13 --

Notes

The use of FS peptides as tumor antigens was first suggested by Townsend et al in 1994 (33).

Since then, several studies have shown the potential of FS peptides as novel antigens for cancer treatments by inducing tumor-specific cell-mediated immunity (34-39).

Imatinib targets the bcr-abl fusion gene.

microsatellites. . .high mutation rate (43-45).

What exactly is the function of the microsatellites

SelTarbase has coding repeats

q According to Wang et al., 92-94% of human genes have splicing variants

q Neo-antigens can be identified by screening immune response of cancer patients. This approach would be lengthy and expensive.

^ haha this is what I am doing, and it is lengthy and expensive

q the EST database (51)

^ I guess this is the EST database Hojoon used

q RNA-seq data (58-62)

q complete viral genome sequences were retrieved from NCBI by querying “viruses [Organism] AND reference sequences”

q UniVec database contains vector sequences as well as sequences of adapters, linkers, and primers for cloning

I wonder which software Hojoon used to make his annotated virus ORF Figure (Figure 2.2)

//look up specificity and sensitivity again//

what exactly is an EST database again? how is it constructed EST Database An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence. An EST results from one-shot sequencing of a cloned cDNA. http://en.wikipedia.org/wiki/Expressed_sequence_tag

//what exactly is next generation sequencing and deep sequencing again?// //DNA Sequencing//

//how many diseases are preventable through prophylactic vaccination// //q// //According to the Center of Disease Control and Prevention (CDC), there are 26 infectious diseases that are now preventable through prophylactic vaccination.// It would be nice to have the list of 26

//what exactly is the definition of cytogenetic//

//q// //Collectively, through this approach, we validated 48 FS chimeric transcripts that when classified by the chromosomal location of the genes involved in the fusions, 13 are intra chromosomal, 34 are read-through, and 1 is inter-chromosomal.//

//table 3.1 with the validated breast cancer frame-shifted chimeric transcripts looks useful//

//databases that Hojoon used// //Specifically, we used the sequences found within the Expressed Sequence Taq (EST) library (51) and the Human RefSeq database (76) from the National Center for Biotechnology Information (NCBI).//

//The percentage of the population which would be protected by these chimeric transcripts and appropriate MHC alleles was then calculated by using “Population coverage calculation” program (90).//

//q// //Currently there are over 70 different gene fusions that have been reported for more than 60 different cancer types. Included in this list are 78 gene fusions that have been found in breast of which 33 are considered FS chimeric transcripts (61, 62, 80, 85, 86).//

//good cancer antigen criteria described by Cheever et al// //ideal antigens according to nine criteria suggested by Cheever et al.; i) therapeutic function, ii) immunogenicity, iii) oncogenecity, iv) specificity, v) expression level and % positive cells, vi) stem cell expression, vii) No. patients with antigens-positive cancers, viii) No of epitopes and ix) cellular location of expression (23)//

//q// //The list of about 770 gene fusions from the literatures and our study was obtained and stored as a table//

//database of gene fusions// //q// //The information regarding gene fusions or chimeric transcripts was retrieved from two sources; Mitelman Database (98) and the Catalogue Of Somatic Mutations In Cancer (COSMIC), Sanger Institute (99).//

//I wonder if the gene clusters he found contain many genes that are part of the same pathway//

i might want to look up imatinib

q Recent large-scale sequencing studies show that the prevalence and patterns of somatic mutations are substantially different between samples even though there are more mutations involved in tumor than previous estimation (106, 107).

q A recent large- 63 scale sequencing study of cancer genome done by Greenman et al. (107)

q Several studies have already shown the potential of FS peptides from coding MS DNAs as novel targets of cancer treatment (35, 39, 46).

q there is no real consensus about what is microsatellite DNA in terms of number of iterations and degeneracy (43).

//wikipedia definition of microsatellite?// Microsatellites in DNA and RNA

//how he aligned low complexity sequences// //q// //we aligned qualified (see method) transcripts with human mRNA reference sequences that have MS DNA in their coding sequences by// //67// //using BLASTN without repeat masking option. Therefore, we can align transcript sequences with MS DNAs, which were usually masked due to their low sequence complexity.//

//With Hojoons gene fusion network graphs, I think it also would have been neat if he measured how frequently these genes interact with other proteins. This would let researchers know whether these proteins were hubs in the protein network or not and may indicate the importance of the protein.//

//Table 5.4 which has the list of top 10 coding microsatellite DNA looks useful//

what exactly is RNA-seq? DNA Sequencing

//It seems that Hojoon usually uses the chi-square test for his tests of statistical significance. When should a chi-square test be used again? Chi-Square Test

In humans, 92~94% of total genes have multiple isoforms generated by alternative splicing, and a large number of them are tissue-specific variants (48).

I wonder why all of the transcripts I look at only seem to have one protein product

several studies reported tumor-specific alternative splicing that had not been detected in normal tissues (112-114).

Table 6.1 could be useful since it has the putative tumor-associated splicing variants that cause a frame-shift of a certain length

q In addition, several studies reported truncated proteins by splicing in cancers; A-Raf, VEGFR, BCR-ABL, JAK2, and TrkB (128).

q Eight different alternative splicing events were suggested by Wang et al (48).

q Our studies showed that about half of aberrant splicing in cancer samples was frame-shifted, which is consistent to the result from Venables et al. (50)

q However, all the frame-shift splicing variants that we tested were found in normal samples as well.

^could this suggest that these splicing variants are not errors, but just normal variants sometimes used by a cell

q The frequency of each cancer type was obtained from “Cancer Statistics 2011” (1).

Hojoon found some potential mutated genes that are similar to some of the genes uncovered by my screen. he found a sec gene and an rnf gene. The numbers after the gene name are a little different though so I don't know how similar the genes are -Hojoon has sec62 in a table, and I have sec61b. --Both of these proteins seem to be necessary for protein translocation in the endoplasmic reticulum. Also, when looking at the protein network on the string.embl website, sec62 and sec61b are only three links away from eachother (sec61b has one link away from sec61g and sec62 has two links away from sec61g). ---Here is an image of the protein graph generated with STRING EMBL: "C:\kurt\storage\CIM Research Folder\DR\2013\1-18-13\protein network\sec62 sec61b sec61g protein graph 1-18-13.png" --These genes for these transcripts are not on the same chromosome ---sec61b chromosome location, 4:47474658-47483242:1 sec62 chromosome location, 3:30792876-30821263:1 -Hojoon has RNF103, RNF139, and RNF216 in a table, and I have RNF130. --Here is a protein network graph for these 4 proteins. RNF216 seems to be separate from the other 3. "C:\kurt\storage\CIM Research Folder\DR\2013\1-18-13\protein network\rnf130 rnf103 rnf139 rnf216 protein graph.png" "C:\kurt\storage\CIM Research Folder\DR\2013\1-18-13\protein network\rnf net zoomed out 216 separated.png" "C:\kurt\storage\CIM Research Folder\DR\2013\1-18-13\protein network\rnf net zoomed out 216 separated.png"

---RNF130 chromosome location, 11:50025346-50104758:1 RNF103 chromosome location, 6:71493894-71525020:1 RNF139 chromosome location, 15:58889229-58902390:1 RNF216 chromosome location, 5:142990893-143113020:-1

EST sequences are downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA