Josh+presentation+4-10-13

After this presentation I had Josh use his program to find the most significant epitope in the 108 peptide SMC1fs immunosignature epitope in 108 SMC1fs peptide immunosignature identified by program of Josh

some slides that explain how the program made by Josh can find the predominant epitope in an immunosignature can be found here (note that these are not the slides for the presentation he gave here on 4-10-13 in the General Meeting; I think I actually missed this presentation and it was given at one of the Friday 1pm Array meetings) "C:\kurt\storage\CIM Research Folder\DR\2013\4-20-13\Linear epitope finding presentation from Josh\Linear Epitope Finding and Substitution Analysis on 330k.pptx"

Josh uses Biopython

he didn't look at percentage of biological space the 330k random peptide array covers. he just looked at how much of random space he covers.

What did he use to make his dendrograms? Plotting /home/josh/CIM/Research/ labdata/jaricher/newDecipher/Data for Database/RegExDBSearch/Plot2.py

How to do the HClustering: @http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html

Paper on required length of n-mer: @http://www.mcponline.org/content/7/2/247.full How many n mers are required to resolve a pathogen. also here: Related papers 2-13-13

How to find how many alignments would align with a protein by chance. Here's a sketch Josh drew "C:\kurt\storage\CIM Research Folder\DR\2013\4-10-13\Josh sketch alignments by random chance 4-10-13.jpg" Basically, he aligned a particular sequence with 100 random proteins and established the probability of alignments by chance for that sequence. He then multiplied that probability by the number of proteins in a particular pathogen to determine the number of alignments to that pathogen by chance. He has a script for this here /home/josh/CIM/Research/labdata/jaricher/newDecipher/Data for Database/RandomTest/randEpi.py

Here's an example of running Josh's script to find the predominant main epitopes in an immunosignature In [1]: import Data

In [2]: Data.ChipData.fromFileName(".. /") ../Array Results/ ../Proteins_Genomes/ ../Bioinformatics_Papers/ ../RandomTest/ ../Paper/ ../RegExDBSearch/ ../Peptide_Corpus/

In [2]: x=Data.ChipData.fromFileName("../Array Results/First Chip Disease Dataset/llnl.csv")

In [3]: x.me x.medianNormalize x.mergeCols

In [3]: x=x.medianNormalize

In [4]: x.samples Out[4]: ['BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'BPE', 'Borrelia', 'Borrelia', 'Borrelia', 'Borrelia', 'Borrelia', 'Borrelia', 'Borrelia', 'Borrelia', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'Dengue', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HBV', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'HNP', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Malaria', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Syphllis', 'Syphllis', 'Syphllis', 'Syphllis', 'Syphllis', 'Syphllis', 'Syphllis', 'Syphllis', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV', 'WNV']

In [5]: x=x.foldChange("WN",'g')

In [6]: ?x.searchMotifs Type: instancemethod String Form:> Definition: x.searchMotifs(self, above=4, fuzzy=6, prob='Bon') Docstring:

In [7]: z=x.searchMotifs(10) searching 32 peptides

In [8]: z Out[8]: [(32, 1, 1.8392572851598359, '^.{2,2}DAAPW', [121469], [115566, 121469, 64645], 3.1132551477345425)]

In [9]:

In [9]: %hst ERROR: Line magic function `%hst` not found. In [10]: %his ERROR: Line magic function `%his` not found. In [11]: