PCR+to+screen+for+selected+frameshifts+and+gene+fusions+4-11-13

Chimeric transcript primer order form 4-18-13

I would like to screen two pools (each pool consisting of 1/2 of the 41 pools from the PCR plates) for SMC1fs as well as the 13 validated mouse chimeric transcripts from Hojoon.

SMC1 WT Size: 464 bp SMC1fs Size: 162 bp

Primers SMC1-mou-RV: GAGCTGTCCTCTCCTTG SMC1-mou-FWD: CTGTCATGGGTTTCCTG

see Primers for amplifying SMC1fs

Need to find 13 validated mouse chimeric transcript primers see List of Chimeric Transcripts from Hojoon

primers found here C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion info\Mouse_Human_primers_for_mouse.xls

"C:\kurt\storage\CIM Research Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3 from Hojoon's dissertation.xlsx" First I need to find the ID that matches with the fusion in Table 3.3. Then I need to find the primers that match with this ID. Fusions in Table 3.3: "C:\kurt\storage\CIM Research Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3 from Hojoon's dissertation.xlsx" IDs can be found here: "C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion info\Total_FG_peptide_annotation2-1.xlsx" Not all of the IDs could be found in this file These chimeric transcripts { Rnf139 + Ndufb9 Lats2 + Xpo4 Mia1+Rab4b } were here S:\Research\Cancer_Eradication\Candidate_List\96Pep_chip_0820_2009.txt

I need to find a few more transcripts { Tmem170 + Cfdp1 Slc35a3 + Hiat1 Noslap + EG665574 Samd5 + Sash1 } I should search for these here S:\Research\Cancer_Eradication\ but not here S:\Research\Cancer_Eradication\file swap folder

file information for Sash1 search "C:\kurt\storage\CIM Research Folder\DR\2013\4-12-13\searching transcripts\file information for Sash1 search 4-12-13.txt" file information for Cfdp1 search "C:\kurt\storage\CIM Research Folder\DR\2013\4-12-13\searching transcripts\Cfdp1 search.txt"

I'm kind of stuck for now. I'll email Hojoon 4-13-13 about this

Noslap + EG665574 seems to correlate with NOS1AP_^Exon10_FLJ13137_^Exon1, TR_E10176 in "C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion info\Total_FG_peptide_annotation2-1.xlsx" because the accession NM_001085375.1 matches with homo sapien C1orf226 nucleotide sequence and a search for EG665574 also matches with the same C1orf226 from the HomoloGene database

Tmem170 + Cfdp1 Slc35a3 + Hiat1 Samd5 + Sash1

I could try to retrieve a batch of accessions using batch entrez http://www.ncbi.nlm.nih.gov/sites/batchentrez used these accessions "C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript search\accessions_4-13-13.txt" obtained these results "C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript search\accession_search_nucleotide database.gb" and "C:\kurt\storage\CIM Research Folder\DR\2013\4-13-13\transcript search\accession_search_gene_database.txt"

but neither of these files contained these search terms Tmem170 + Cfdp1 Slc35a3 + Hiat1 Samd5 + Sash1

4-16-13 Alright Hojoon sent me a spreadsheet that looks like it has what I will need. "C:\kurt\storage\CIM Research Folder\DR\2013\4-16-13\64_primer_fusion_for_mouse.xlsx"

Now I can order the primers fresh for the 13 chimeric transcripts that I would like to screen in my tumor cDNA library.

Since I can't find the Cnpy2-Cs annotation, I think I will just design these primers myself

First accession of transcript

NM_014255.4

Second accession of transcript

NM_004077.2

Cnpy2_section_protein_sequence

MKGWGWLALLLGALLGTAWARRSQDLHCGACRALVDELEWEIAQVDPKKTIQMGSFRINPDGSQSVVE

Cs_section_protein_sequence

NASCLVLAARHASASSTNLKDILADLIPKEQARIKTFRQQHGKTVVGQITVDMMYGGMRGMKGLVYETSVLDPDEGIRFRGFSIPECQKLLPKAKGGEEPLPEGLFWLLVTGHIPTEEQVSWLSKEWAKRAALPSHVVTMLDNFPTNLHPMSQLSAAVTALNSESNFARAYAQGISRTKYWELIYEDSMDLIAKLPCVAAKIYRNLYREGSGIGAIDSNLDWSHNFTNMLGYTDHQFTELTRLYLTIHSDHEGGNVSAHTSHLVGSALSDPYLSFAAAMNGLAGPLHGLANQEVLVWLTQLQKEVGKDVSDEKLRDYIWNTLNSGRVVPGYGHAVLRKTDPRYTCQREFALKHLPNDPMFKLVAQLYKIVPNVLLEQGKAKNPWPNVDAHSGVLLQYYGMTEMNYYTVLFGVSRALGVLAQLIWSRALGFPLERPKSMSTEGLMKFVDSKSG

CNPY2_nucleotide_seq

ctggactgctcgctggccggcagcgcaccgttttgaaggtcctagcccacctgggctggctcacgcgcacgactagccgctcccatacagcacgcccggactctgtcgtcgcttaaggccactcctattctacggctgacccctggtggtcacgtggatctgttcgccacgcaagtctgggtccttcggcgattgaccggggtccttgctgttcgggagcctctcctaagctgcctgttcgcgcgagagtttggaggggcgggtttggggtcggtgtctgattggggctcgcaccgcagcacgctggagtcccgcttaggtaccagttagcgtcaggggagctgggtcaggcggtcgccgggacaccccgtgtgtggcaggcggcgaagcgctctggagaatcccggacagccctgctccctgcagccaggtgtagtttcgggagccactggggccaaagtgagagtccagcggtcttccagcgcttgggccacggcggcggccctgggagcagaggtggagcgaccccattacgctaaagatgaaaggctggggttggctggccctgcttctgggggccctgctgggaaccgcctgggctcggaggagccaggatctccactgtggagcatgcagggctctggtggatgaactagaatgggaaattgcccaggtggaccccaagaagaccattcagatgggatctttccggatcaatccagatggcagccagtcagtggtggaggtgccttatgcccgctcagaggcccacctcacagagctgctggaggagatatgtgaccggatgaaggagtatggggaacagattgatccttccacccatcgcaagaactacgtacgtgtagtgggccggaatggagaatccagtgaactggacctacaaggcatccgaatcgactcagatattagcggcaccctcaagtttgcgtgtgagagcattgtggaggaatacgaggatgaactcattgaattcttttcccgagaggctgacaatgttaaagacaaactttgcagtaagcgaacagatctttgtgaccatgccctgcacatatcgcatgatgagctatgaaccactggagcagcccacactggcttgatggatcacccccaggaggggaaaatggtggcaatgccttttatatattatgtttttactgaaattaactgaaaaaatatgaaaccaaaagt

-chimeric transcript protein start 542 AAG^ATGA

-chimeric transcript protein end 746 GAG^GTG

-cnpy2 chimeric transcript portion

CTGGACTGCTCGCTGGCCGGCAGCGCACCGTTTTGAAGGTCCTAGCCCACCTGGGCTGGCTCACGCGCACGACTAGCCGCTCCCATACAGCACGCCCGGACTCTGTCGTCGCTTAAGGCCACTCCTATTCTACGGCTGACCCCTGGTGGTCACGTGGATCTGTTCGCCACGCAAGTCTGGGTCCTTCGGCGATTGACCGGGGTCCTTGCTGTTCGGGAGCCTCTCCTAAGCTGCCTGTTCGCGCGAGAGTTTGGAGGGGCGGGTTTGGGGTCGGTGTCTGATTGGGGCTCGCACCGCAGCACGCTGGAGTCCCGCTTAGGTACCAGTTAGCGTCAGGGGAGCTGGGTCAGGCGGTCGCCGGGACACCCCGTGTGTGGCAGGCGGCGAAGCGCTCTGGAGAATCCCGGACAGCCCTGCTCCCTGCAGCCAGGTGTAGTTTCGGGAGCCACTGGGGCCAAAGTGAGAGTCCAGCGGTCTTCCAGCGCTTGGGCCACGGCGGCGGCCCTGGGAGCAGAGGTGGAGCGACCCCATTACGCTAAAGATGAAAGGCTGGGGTTGGCTGGCCCTGCTTCTGGGGGCCCTGCTGGGAACCGCCTGGGCTCGGAGGAGCCAGGATCTCCACTGTGGAGCATGCAGGGCTCTGGTGGATGAACTAGAATGGGAAATTGCCCAGGTGGACCCCAAGAAGACCATTCAGATGGGATCTTTCCGGATCAATCCAGATGGCAGCCAGTCAGTGGTGGAG

Cs_nucleotide_seq

agtgggcggggcctccttgaggaccccgggctgggcgccgccgccggttcgtctactctttccttcagccgcctcctttcaaccttgtcaacccgtcggcgcggcctctggtgcagcggcggcggctcctgttcctgccgcagctctctccctttcttacctccccaccagatcccggagatcgcccgccatggctttacttactgcggccgcccggctcttgggaaccaagaatgcatcttgtcttgttcttgcagcccggcatgccagtgcttcctccacgaatttgaaagacatattggctgacctgatacctaaggagcaggccagaattaagactttcaggcagcaacatggcaagacggtggtgggccaaatcactgtggacatgatgtatggtggcatgagaggcatgaagggattggtctatgaaacatcagttcttgatcctgatgagggcatccgtttccgaggctttagtatccctgaatgccagaaactgctacccaaggctaagggtggggaagaacccctgcctgagggcttattttggctgctggtaactggacatatcccaacagaggaacaggtatcttggctctcaaaagagtgggcaaagagggcagctctgccttcccatgtggtcaccatgctggacaactttcccaccaatctacaccccatgtctcagctcagtgcagctgttacagccctcaacagtgaaagtaactttgcccgagcatatgcacagggtatcagccgaaccaagtactgggagttgatttatgaagactctatggatctaatcgcaaagctaccttgtgttgcagcaaagatctaccgaaatctctacagagaaggcagcggtattggggccattgactctaacctggactggtctcacaatttcaccaacatgttaggctatactgatcatcagttcactgagctcacgcgcctgtacctcaccatccacagtgaccatgagggtggcaatgtaagtgcccataccagccatttggtgggcagtgccctttccgacccttacctgtcctttgcagcagccatgaacgggctggcagggcctctccatggactggcaaatcaggaagtgcttgtctggctaacacagctgcagaaggaagttggcaaagatgtgtcagatgagaagttacgagactacatctggaacacactcaactcaggacgggttgttccaggctatggccatgcagtactaaggaagactgatccgcgatatacctgtcagcgagagtttgctctgaaacacctgcctaatgaccccatgtttaagttggttgctcagctgtacaagattgtgcccaatgtcctcttagagcagggtaaagccaagaatccttggcccaatgtagatgctcacagtggggtgctgctccagtattatggcatgacggagatgaattactacacggtcctgtttggggtgtcacgagcattgggtgtactggcacagctcatctggagccgagccttaggcttccctctagaaaggcccaagtccatgagcacagagggtctgatgaagtttgtggactctaagtcagggtaaaactggagactgggtgaaagtgactaccagaaagtgaggaagcctaaataaaaagtatacttttgtttcagggggcctttaaagacttaagattaaattatatctgaggcactgataatatgtttgaggttaaaatataaattaagactttaaaagatgaaaaatggtcccttcttccctaatcagctcccttcccctgcctggtatgagttgcccatcatacgcatggtcctggaggatgaccaggactaatgcatgtggtatgagtaggtttggccccctcactatctctagagtgagaatctggctcctgtttccatgggtcaaagccggttgcagagaatctgtagtcactttggagctttagcttctctgccaagccctcaataagccagcaaaccaggactctgccccttctgtttccataggaatcatgttggatagtcagctgtaccaagccccttggccctctcccatgcacacaaacacctcctagcaagacctgttggttagctggacatgctttggcaatttttttatactaccaagtgaccataaaggcatggcatttgttgtgactggcacccaatgtttgattttttttttaaaactatccaattaaaattaaggtctgggagtgttctgtttcccattactttaatactcacctcctcccagactttctacacctgttgcacctcaggcagaggatgttctggacctccccctcttggtccctactagagacctctcaacagatctgtgggcccagtcattgggttttatcagtgcttaatgtgaactaagttttttacttccacagaatacaagccactaccttctgacctccccaccccccaccaacccccatcttttaatatgctgtggggcatagaactccggaatgaccagcatgatattttcagagtcttgtccccggggtattagcacctctttttgaacagggaattgattcaagattggacatggtctcctctgattatcaggtactggggctgagggcattaaaaatagtaagcctccctcctcgtcccctgcctcaagaaattgcctccttatttatcaacatctttttcctccctttccctgagagctcacagtacaatgtttcagaagccccatttgcacaggttttcagcaactcagaatgctctacttctttttctttgagaaaggattaagatacactcctgctgtgcccccatctttcctccaaactcctgcctgtgtttgtgtggatacccagtcccagaaccacactgttgagttggacacactgtaaacccctgggtaactgtcaagtcatgatggagacttcaggttgttctgtataaaatgcaaaataaatgtttttattaacaatgaaaaaaaaaaaaaaaaaaaaa

-chimeric transcript protein start 233 CAAG^AAT

-chimeric transcript protein end 1589 GGG^TA

-cs chimeric transcript portion

AATGCATCTTGTCTTGTTCTTGCAGCCCGGCATGCCAGTGCTTCCTCCACGAATTTGAAAGACATATTGGCTGACCTGATACCTAAGGAGCAGGCCAGAATTAAGACTTTCAGGCAGCAACATGGCAAGACGGTGGTGGGCCAAATCACTGTGGACATGATGTATGGTGGCATGAGAGGCATGAAGGGATTGGTCTATGAAACATCAGTTCTTGATCCTGATGAGGGCATCCGTTTCCGAGGCTTTAGTATCCCTGAATGCCAGAAACTGCTACCCAAGGCTAAGGGTGGGGAAGAACCCCTGCCTGAGGGCTTATTTTGGCTGCTGGTAACTGGACATATCCCAACAGAGGAACAGGTATCTTGGCTCTCAAAAGAGTGGGCAAAGAGGGCAGCTCTGCCTTCCCATGTGGTCACCATGCTGGACAACTTTCCCACCAATCTACACCCCATGTCTCAGCTCAGTGCAGCTGTTACAGCCCTCAACAGTGAAAGTAACTTTGCCCGAGCATATGCACAGGGTATCAGCCGAACCAAGTACTGGGAGTTGATTTATGAAGACTCTATGGATCTAATCGCAAAGCTACCTTGTGTTGCAGCAAAGATCTACCGAAATCTCTACAGAGAAGGCAGCGGTATTGGGGCCATTGACTCTAACCTGGACTGGTCTCACAATTTCACCAACATGTTAGGCTATACTGATCATCAGTTCACTGAGCTCACGCGCCTGTACCTCACCATCCACAGTGACCATGAGGGTGGCAATGTAAGTGCCCATACCAGCCATTTGGTGGGCAGTGCCCTTTCCGACCCTTACCTGTCCTTTGCAGCAGCCATGAACGGGCTGGCAGGGCCTCTCCATGGACTGGCAAATCAGGAAGTGCTTGTCTGGCTAACACAGCTGCAGAAGGAAGTTGGCAAAGATGTGTCAGATGAGAAGTTACGAGACTACATCTGGAACACACTCAACTCAGGACGGGTTGTTCCAGGCTATGGCCATGCAGTACTAAGGAAGACTGATCCGCGATATACCTGTCAGCGAGAGTTTGCTCTGAAACACCTGCCTAATGACCCCATGTTTAAGTTGGTTGCTCAGCTGTACAAGATTGTGCCCAATGTCCTCTTAGAGCAGGGTAAAGCCAAGAATCCTTGGCCCAATGTAGATGCTCACAGTGGGGTGCTGCTCCAGTATTATGGCATGACGGAGATGAATTACTACACGGTCCTGTTTGGGGTGTCACGAGCATTGGGTGTACTGGCACAGCTCATCTGGAGCCGAGCCTTAGGCTTCCCTCTAGAAAGGCCCAAGTCCATGAGCACAGAGGGTCTGATGAAGTTTGTGGACTCTAAGTCAGGGTAAAACTGGAGACTGGGTGAAAGTGACTACCAGAAAGTGAGGAAGCCTAAATAAAAAGTATACTTTTGTTTCAGGGGGCCTTTAAAGACTTAAGATTAAATTATATCTGAGGCACTGATAATATGTTTGAGGTTAAAATATAAATTAAGACTTTAAAAGATGAAAAATGGTCCCTTCTTCCCTAATCAGCTCCCTTCCCCTGCCTGGTATGAGTTGCCCATCATACGCATGGTCCTGGAGGATGACCAGGACTAATGCATGTGGTATGAGTAGGTTTGGCCCCCTCACTATCTCTAGAGTGAGAATCTGGCTCCTGTTTCCATGGGTCAAAGCCGGTTGCAGAGAATCTGTAGTCACTTTGGAGCTTTAGCTTCTCTGCCAAGCCCTCAATAAGCCAGCAAACCAGGACTCTGCCCCTTCTGTTTCCATAGGAATCATGTTGGATAGTCAGCTGTACCAAGCCCCTTGGCCCTCTCCCATGCACACAAACACCTCCTAGCAAGACCTGTTGGTTAGCTGGACATGCTTTGGCAATTTTTTTATACTACCAAGTGACCATAAAGGCATGGCATTTGTTGTGACTGGCACCCAATGTTTGATTTTTTTTTTAAAACTATCCAATTAAAATTAAGGTCTGGGAGTGTTCTGTTTCCCATTACTTTAATACTCACCTCCTCCCAGACTTTCTACACCTGTTGCACCTCAGGCAGAGGATGTTCTGGACCTCCCCCTCTTGGTCCCTACTAGAGACCTCTCAACAGATCTGTGGGCCCAGTCATTGGGTTTTATCAGTGCTTAATGTGAACTAAGTTTTTTACTTCCACAGAATACAAGCCACTACCTTCTGACCTCCCCACCCCCCACCAACCCCCATCTTTTAATATGCTGTGGGGCATAGAACTCCGGAATGACCAGCATGATATTTTCAGAGTCTTGTCCCCGGGGTATTAGCACCTCTTTTTGAACAGGGAATTGATTCAAGATTGGACATGGTCTCCTCTGATTATCAGGTACTGGGGCTGAGGGCATTAAAAATAGTAAGCCTCCCTCCTCGTCCCCTGCCTCAAGAAATTGCCTCCTTATTTATCAACATCTTTTTCCTCCCTTTCCCTGAGAGCTCACAGTACAATGTTTCAGAAGCCCCATTTGCACAGGTTTTCAGCAACTCAGAATGCTCTACTTCTTTTTCTTTGAGAAAGGATTAAGATACACTCCTGCTGTGCCCCCATCTTTCCTCCAAACTCCTGCCTGTGTTTGTGTGGATACCCAGTCCCAGAACCACACTGTTGAGTTGGACACACTGTAAACCCCTGGGTAACTGTCAAGTCATGATGGAGACTTCAGGTTGTTCTGTATAAAATGCAAAATAAATGTTTTTATTAACAATGAAAAAAAAAAAAAAAAAAAAA -the whole Cnpy2-Cs chimeric transcipt (note that I am not completely sure that the sequence I have here before the protein coding part and the protein sequence I have here after the protein coding part is correct. However, at least the protein coding part from Hojoon's table ("C:\kurt\storage\CIM Research Folder\DR\2013\4-11-13\genefusion info\Total_FG_peptide_annotation2-1.xlsx") should be correct)

"C:\kurt\storage\CIM Research Folder\DR\2013\4-18-13\chimeric transcript\Cnpy2_Cs chimeric transcript info 4-18-13.ape" need to design forward primer for TR_E10150

ATGAAAGGCTGGGGTTGGCTG

Length: 21

Tm: 56.2

Hairpin Tm (none) need to design reverse primer for TR_E10150

GTTTGTGGACTCTAAGTCAGGGT

Length: 23

Tm 55.2

Hairpin Tm (none)

location of sequencher file to verify everything is facing the correct direction "C:\kurt\storage\CIM Research Folder\DR\2013\4-18-13\chimeric transcript\Cnpy2-Cs chimeric transcript.SPF"

Expected length of amplified chimeric transcript

1561 bp

Chimeric transcript primer order form 4-18-13

4-19-13 I found Hojoon's original primers for Cnpy2-Cs here

S:\Research\Cancer_Eradication\HoJoon\Primers\1st&2nd_plate_Invitrogen_ Oligo_Plate_Jan16_2009.xls

file information for TR_E10150 search for Cnpy2-Cs primers 4-19-13 "C:\kurt\storage\CIM Research Folder\DR\2013\4-19-13\chimeric transcript primer\TR_E10150 search for Cnpy2-Cs primers 4-19-13.txt"

TR_E10150Fo (the "o" is for original) AGCGCTCTGGAGAATCCCG

TR_E10150Ro (the "o" is for original) CCACCACCGTCTTGCCATG

order form for these primers here "C:\kurt\storage\CIM Research Folder\DR\2013\4-19-13\chimeric transcript primer\Kurt primer order form 4-19-13.xls"

5-1-13 I don't have the size information for all of the chimeric transcripts I would like to PCR screen.

Gene fusion Size (bp) TR_E10142 449 TR_E10150 884 TR_E10339 315 TR_E20081 TR_E10028 400 TR_E20166 TR_E20026 TR_E20131 TR_E10176 506 TR_E20151 TR_E10002 293 TR_E10446 384 TR_E10324 499

Maybe if I find files containing the name, I can find the missing sizes. I'll do a file search for "TR_E20081" search started 5-1-13

No files were found which contained "TR_E20081" in S:\Research\Cancer_Eradication\

Shen sent me a file that has the mouse chimeric transcript peptide info "F:\kurt\storage\CIM Research Folder\DR\2013\5-3-13\chim transcr\GF antigens list from Hojoon.xlsx" here's another file that may be somewhat related "F:\kurt\storage\CIM Research Folder\DR\2013\5-3-13\chim transcr\summary of Mus GF .xls"

I now have the size information for the PCR products for all 13 validated mouse chimeric transcripts in Hojoon's Table 3.3 in his dissertation "F:\kurt\storage\CIM Research Folder\DR\2012\11-18-12\Dissertations\Hojoon Lee Dissertation\Table 3p3 from Hojoon's dissertation.xlsx"