A program for extracting from blast output a line containing: name of the hit, number of bits, frame, query
and subject locations, e value for the first hit and difference in e value of the first two hits, as for example:
Protein Contig1001_98304 sp|P80303|NUCB2_HUMAN_Nucleobindin-2 bits 367 Frame = +1 Query: 1 Sbjct: 155 1e-102 22.
Nuclotides NODE_15_length_647_cov_20.438950 >Contig1310_123726_incomplete_UBL7_Ubiquitin-like_protein_7 Expect= 1e-200 ident_of 638 (100%) Plus Query: 1 Sbjct: 204 00 200
See also example below
#!/usr/bin/perl -w
#read from blastn output search file output simple
$flag=1;$flag1=0;
while (<>){ s/ 0\.0/ 1e-200/;
if (/Query=/) {chomp; s/Query=\s+//;s/\s+\S+/,$&/;s/ .*//;$navn=$_;$x="00";$y=0;}
if (/significant/) {$flag1=$.;$flag=0;}
if (length($_) >5 && $flag1 == $.-3 ) {s/ \S+$/$&/;$x=$&;s/-\S+$/$&/;$y=$&;
if($x eq " 0.0"){$x="1e-200";$y=-200}}
if ($flag==0) {chomp;
if (/^>/ ) {s/ /_/g;print "\n$navn $_" ;}
if (/Score =/ ) {s/Score \=.*\,//;s/ =/=/; print "$_" ;s/-\S+/$&/; $y=$y-$&;}
if (/Identities =/) {s/Identities = \d+\//ident_of /;s/\, Gap.*//;print $_ ;}
if (/Strand|Frame/) {s/Strand = Plus \/ //;print $_ ;}
if (/Query:/ ) {s/Query\:\s+\d+/$&/;print " $& ";}
if (/Sbjct/ ) {$flag=$.;s/Sbjct\:\s+\d+//;print " $& $x $y";}
}
}
Example: Blast output:
BLASTX 2.2.23 [Feb-03-2010]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= Contig1072_32458_IMA8
(978 letters)
Database: hum-proteome.fasta
65,485 sequences; 25,588,757 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value
sp|A9QM74|IMA8_HUMAN Importin subunit alpha-8 OS=Homo sapiens GN... 424 e-119
sp|P52292|IMA2_HUMAN Importin subunit alpha-2 OS=Homo sapiens GN... 286 3e-77
.
.
.
>sp|A9QM74|IMA8_HUMAN_Importin_subunit_alpha-8_OS=Homo_sapiens_GN=KPNA7_E=1_SV=1
Length = 516
Score = 424 bits (1091), Expect = e-119
Identities = 214/263 (81%), Positives = 232/263 (88%), Gaps = 2/263 (0%)
Frame = +1
Query: 196 MPTLDAPEGRLRKFKYRGKDASIRRHQRMAVSLELRKAKKDEQALKRRNITIFSPEPASG 375
MPTLDAPE R RKFKYRGKD S+RR QRMAVSLELRKAKKDEQ LKRRNIT F P+ S
Sbjct: 1 MPTLDAPEERRRKFKYRGKDVSLRRQQRMAVSLELRKAKKDEQTLKRRNITSFCPDTPSE 60
Query: 376 ELTKGV--SLTLQEIISGVNASDPDLCFQATQAARKMLSQEKNPPLKLIVEAGLIPRLVE 549
+ KGV SLTL EII GVN+SDP LCFQATQ ARKMLSQEKNPPLKL++EAGLIPR+VE
Sbjct: 61 KTAKGVAVSLTLGEIIKGVNSSDPVLCFQATQTARKMLSQEKNPPLKLVIEAGLIPRMVE 120
Query: 550 FLKLSPHPCLQFEAAWALTNIASGTSEQTQAVVEGGAIPPLVELLSSPHMTVCEQAVWAL 729
FLK S +PCLQFEAAWALTNIASGTSEQT+AVVEGGAI PL+ELLSS ++ VCEQAVWAL
Sbjct: 121 FLKSSLYPCLQFEAAWALTNIASGTSEQTRAVVEGGAIQPLIELLSSSNVAVCEQAVWAL 180
Query: 730 GNIAGDGPEFRDLVISSNAIPYLLALVSSTIPITFLRNITWTLSNLCRNKNPYPSVKAVK 909
GNIAGDGPEFRD VI+SNAIP+LLAL+S T+PITFLRNITWTLSNLCRNKNPYP AVK
Sbjct: 181 GNIAGDGPEFRDNVITSNAIPHLLALISPTLPITFLRNITWTLSNLCRNKNPYPCDTAVK 240
Query: 910 QMLPVLSHLLQHQDSEILSDTCW 978
Q+LP L HLLQHQDSE+LSD CW
Sbjct: 241 QILPALLHLLQHQDSEVLSDACW 263
Output using program
Contig1072_32458 >sp|A9QM74|IMA8_HUMAN_Importin_subunit_alpha-8_OS=Homo_sapiens_GN=KPNA7_E=1_SV=1 > 424 bits (1091), Expect = e-119 ident of 263 (81%) Frame = +1 Query: 196 Sbjct: 1 3e-77 42