comparison+of+Tiger's+peptide+sequences+with+protein+sequences+3-25-13

I would like to look at the HER2 specific peptides from Tiger and see if any of them align with the proteins in my single clones in the final screen of my tumor cDNA library.

Information about HER2 specific peptides from Tiger 1-7-13

-Database -I will need to convert my 9 proteins into a database that each peptide can be blasted against. -Lu sent me some commands about how to convert a fasta file into a database. 1.PROTEIN MASKING segmasker -in database-v3.fasta -infmt fasta -parse_seqids -outfmt maskinfo_asn1_bin -out database_v3.asnb

2.MAKING DATABASE makeblastdb -in database-v3.fasta -dbtype prot -parse_seqids -out database_v3 -title "database_v3" -mask_data database_v3.asnb -max_file_sz 100GB -When I do the BLAST I will want the parameters to be appropriate for a short sequence. Therefore I should use some type of "blastp short" option.

3-30-13

see versions of protein sequences from associated transcript sequences

Now that I have a FASTA file of all of the versions of the protein sequences, I can try to use programs to compare Tiger's peptides with these sequences. I could potentially try to use GUITOPE, GLAM2, and BLAST. I could also potentially look at Hojoon's frameshift sequences rather than Tiger's peptides.

FASTA file of all of the versions of the protein sequences C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\versions of protein sequences from associated transcript sequences 3-30-13.fa shorter name here "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\prot_3-30-13.fa"

fasta file of Tiger's peptides "C:\kurt\storage\CIM Research Folder\DR\2013\3-25-13\her2 peptides\all peptides without CSG.fa"

Here's a FASTA file with the versions of the protein sequences combined with Tiger's peptides "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\versions of protein sequences from associated transcript sequences plus tigers peptides 3-30-13.fa" Here's glam2 output from this FASTA file. With a 1st glance I can't get much meaning from this. A BLAST search might be better. C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\glam2 results 033013

Alright I will try to make a BLAST database. First I guess I need to mask using this command segmasker -in prot_3-30-13.fasta -infmt fasta -parse_seqids -outfmt maskinfo_asn1_bin -out prot_3-30-13.asnb

navigate to directory cd C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis

run segmasker segmasker -in prot_3-30-13.fa -infmt fasta -parse_seqids -outfmt maskinfo_asn1_bin -out prot_3-30-13.asnb The command line froze up after this command. Maybe I can just make the blast database directly

Now I can make the blast database makeblastdb -in prot_3-30-13.fa -dbtype prot -title protdb

Now that I have my blast database I can blast Tiger's peptides against this database.

1807 I actually don't think I needed the database. I can just blast two fasta files against eachother. Actually a database is needed.

blastp -task blastp-short -db prot_3-30-13.fa -query "C:\kurt\storage\CIM Research Folder\DR\2013\3-25-13\her2 peptides\all peptides without CSG.fa" -out results_3-31-13_1634.txt

This command resulted in a file with a lot of information. "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\results_3-30-13_1747.txt"

I need to figure out how to organize and make sense of this information. Maybe I can write a program to collect the information piece by piece and then start putting it all together. Here's how I might want to collect and group some of the information. -protein --Which peptides align with this protein and how many peptides are there? --How many alignments align with this protein, where are the alignments, which peptide aligns, what is the position of the alignment relative to the peptide sequence, which "version" of the protein does this align with? --protein version ---Which peptides align with this protein and how many peptides are there? ---How many alignments align with this protein, where are the alignments, which peptide aligns, what is the position of the alignment relative to the peptide sequence, which "version" of the protein does this align with? How do the positions of the alignment with this version correspond with the whole transcript? I guess to figure that out I would need to know information about the position of the start codon in the original and how long the sequence is.

Basically I think the program can have an alignment object. The alignment object would have the following information. corresponding peptide list of match positions in peptide corresponding transcript version list of match positions in transcript version corresponding transcript list of match positions in transcript id of alignment

There would be a transcript object which would have the following information length start codon position number of alignments a list of alignment objects

So maybe the program could basically go through the output file and collect all of the alignment information and create all of the alignment objects. Then it could assign these alignment objects to transcript versions. Then it could collect the alignments to the transcript versions to the main original transcript for these versions. Then it could possibly output an ape file to visualize this alignment information. I could also make a bar graph of the number of alignments for each transcript version.

3-31-13

Created a new Java project titled AnalyzeAlignmentsToTranscripts

Made an alignment object. Now I'll make a transcript version object. This will contain "String transcript_version_name, String portion_of_transcript, String sequence, int reading_frame, ArrayList alignment_objects_al, int number_of_alignments"

image of notes to figure out positions on transcript depending on alignment of a certain frame "C:\kurt\storage\CIM Research Folder\DR\2013\4-3-13\transcript position note image 4-3-13.jpg"

Visualization of alignments to the 9 proteins can be found here C:\kurt\storage\CIM Research Folder\DR\2013\3-31-13\visualization of alignments

code found here C:\kurt\storage\CIM Research Folder\DR\2013\3-31-13\code\code as of 2141

4-1-13 I would also like to see how Tiger's peptides align with GST, KLH, and BSA.

fasta file of GST, KLH, and BSA "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\non-human_4-1-13.fa"

Oh I forgot that I want the other two reading frames as well. I couldn't find the nucleotide sequence for KLH. This is just a test so I'll just use the reverse translate tool http://www.bioinformatics.org/sms2/rev_trans.html to get a possible nucleotide sequence and then get my other two frames for all of these proteins.

some sequence information for gst klh and bsa 4-1-13

some command line commands non-human_4-1-13.fa

"C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\non-human_4-1-13_2.fa"

navigate to directory cd C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis

make the database makeblastdb -in non-human_4-1-13_2.fa -dbtype prot -title protdb

blast tiger's peptides against these non-human proteins

blastp -task blastp-short -db non-human_4-1-13_2.fa -query "C:\kurt\storage\CIM Research Folder\DR\2013\3-25-13\her2 peptides\all peptides without CSG.fa" -out non-human_results_4-1-13_1429.txt

output file "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\non-human_results_4-1-13_1429.txt"

I'll now use my Java program to look at the sequence information in this file.

summary of transcript info "C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\summary of transcript info.xlsx"

summary of output from analysis with Guitope here C:\kurt\storage\CIM Research Folder\DR\2013\3-30-13\sequence analysis\Guitope Summary

4-22-13 I would like to determine if any of the alignment objects align at the same position, and try to determine how many of these there are.