Work+092612

Another thing I need to figure out is how to retrieve a sequence from a database with the accession number from the command line with blast.

Here is an example command that retrieves sequences from a database. blastdbcmd -db nr.00 -entry ZP_01172638.1 -outfmt %f -out sequence.fsa

Now that I have all of the capabilities that I need, I can now continue writing the Java code. I think I'll start with the blasting rather than bepipred.

Created a Blastp_Handler class.

copied code over from glam2scan handler class. Deleted much of it deleted these lines console_output = system_command_string; useful_tools.createTextFile(directory, glam2scan_output_filename, system_command_string);

I then added them back in and wrote the output to a console file that I could check later.

There was a problem executing the code because there cannot be a space in a directory name. It looks like I have to escape the space with a backslash. http://forums.macrumors.com/showthread.php?t=291346

I tried escaping the space as well as putting quotes around the whole path, but nothing seems to work. Maybe I will just move everything to a directory without spaces. Moved everything to this directory S:\Research\Cancer_Eradication\Users\kwhittem\DR\2012\9-26-12 It appears to be working now.

Now blasting the most representative motif against the whole nr database appears to work Now I can see if there are any overlaps in the different blast result lists from the different motif groups. These overlap proteins may be proteins which the antibodies binding to these random peptides originally bound to.

For testing purposes, I will have to create some artificial scenarios. In reality with my testing I only had two motifs: PMRE, and HEE. However, HEE does not align with anything in the nr database. Therefore, for testing I will probably compare the blast results from PMRE and PQREGS from another search that I did before.

Started making the compareAllSequencesInFile1WithFile2 method in Blastp_Handler

I found that if I blast PMRE from the BLAST website I get some SMC hits, but I don't when I blast just the nr database. I wonder what database I should be using.

What is the most comprehensive protein blast database? .. The website blast uses these databases: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects database name was just nr though

I think I wasn't getting as many hits, because my search was just against nr.00 and not against the other nrs up to nr.07 that can be downloaded. Multiple databases can be specified in the command line simply by having them separated by a space. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_node25.html That site says that it is better to search for multiple databases with aliases though.

What e value is considered a good match? The default blast evalue is 10. http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#expect

How do I add an item to the last line of a text file without rewriting the whole text file in Java? This has some information http://stackoverflow.com/questions/4614227/how-to-add-a-new-line-of-text-to-an-existing-file-in-java doesn't look totally easy though. Actually maybe it's not that bad. Here's another site http://www.kodejava.org/examples/108.html

Almost all of the compareAllSequencesInFile1WithFile2 method is written. I haven't tested it though since my program is trying to blast against the whole nr database this time instead of just nr.00. I might want to add some of the information to a text file by appending to the end. I'd like to do that for the entire program as well as the program makes a blog. I haven't done this yet though.