Linux sequence software - extract one line from blast

by Knud Christensen

return

A program for extracting from blast output a line containing: name of the hit, number of bits, frame, query and subject locations, e value for the first hit and difference in e value of the first two hits, as for example:

Protein
Contig1001_98304 sp|P80303|NUCB2_HUMAN_Nucleobindin-2 bits 367 Frame = +1 Query: 1 Sbjct: 155  1e-102 22.

Nuclotides NODE_15_length_647_cov_20.438950 >Contig1310_123726_incomplete_UBL7_Ubiquitin-like_protein_7 Expect= 1e-200 ident_of 638 (100%) Plus Query: 1 Sbjct: 204 00 200

See also example below


#!/usr/bin/perl -w
#read from blastn output search file  output simple

$flag=1;$flag1=0;

while (<>){	s/ 0\.0/ 1e-200/;	
		if (/Query=/) {chomp; s/Query=\s+//;s/\s+\S+/,$&/;s/ .*//;$navn=$_;$x="00";$y=0;}
  	       if (/significant/) {$flag1=$.;$flag=0;}

		if (length($_) >5 && $flag1 == $.-3 )  {s/ \S+$/$&/;$x=$&;s/-\S+$/$&/;$y=$&; 
		if($x eq " 0.0"){$x="1e-200";$y=-200}}
		  
		if ($flag==0)       {chomp;
 		if (/^>/   )        {s/ /_/g;print "\n$navn $_" ;}
		if (/Score =/    )  {s/Score \=.*\,//;s/ =/=/; print "$_" ;s/-\S+/$&/;   $y=$y-$&;}
		if (/Identities =/) {s/Identities = \d+\//ident_of /;s/\, Gap.*//;print $_ ;}
		if (/Strand|Frame/) {s/Strand = Plus \/ //;print $_ ;}
		if (/Query:/  )     {s/Query\:\s+\d+/$&/;print " $& ";}
		if (/Sbjct/  )      {$flag=$.;s/Sbjct\:\s+\d+//;print " $& $x $y";}
				      }
	}  







Example: Blast output:

BLASTX 2.2.23 [Feb-03-2010]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= Contig1072_32458_IMA8
         (978 letters)

Database: hum-proteome.fasta
           65,485 sequences; 25,588,757 total letters

Searching..................................................done



                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

sp|A9QM74|IMA8_HUMAN Importin subunit alpha-8 OS=Homo sapiens GN...   424   e-119
sp|P52292|IMA2_HUMAN Importin subunit alpha-2 OS=Homo sapiens GN...   286   3e-77
.
.
.


>sp|A9QM74|IMA8_HUMAN_Importin_subunit_alpha-8_OS=Homo_sapiens_GN=KPNA7_E=1_SV=1
          Length = 516

 Score =  424 bits (1091), Expect = e-119
 Identities = 214/263 (81%), Positives = 232/263 (88%), Gaps = 2/263 (0%)
 Frame = +1

Query: 196 MPTLDAPEGRLRKFKYRGKDASIRRHQRMAVSLELRKAKKDEQALKRRNITIFSPEPASG 375
           MPTLDAPE R RKFKYRGKD S+RR QRMAVSLELRKAKKDEQ LKRRNIT F P+  S
Sbjct: 1   MPTLDAPEERRRKFKYRGKDVSLRRQQRMAVSLELRKAKKDEQTLKRRNITSFCPDTPSE 60

Query: 376 ELTKGV--SLTLQEIISGVNASDPDLCFQATQAARKMLSQEKNPPLKLIVEAGLIPRLVE 549
           +  KGV  SLTL EII GVN+SDP LCFQATQ ARKMLSQEKNPPLKL++EAGLIPR+VE
Sbjct: 61  KTAKGVAVSLTLGEIIKGVNSSDPVLCFQATQTARKMLSQEKNPPLKLVIEAGLIPRMVE 120

Query: 550 FLKLSPHPCLQFEAAWALTNIASGTSEQTQAVVEGGAIPPLVELLSSPHMTVCEQAVWAL 729
           FLK S +PCLQFEAAWALTNIASGTSEQT+AVVEGGAI PL+ELLSS ++ VCEQAVWAL
Sbjct: 121 FLKSSLYPCLQFEAAWALTNIASGTSEQTRAVVEGGAIQPLIELLSSSNVAVCEQAVWAL 180

Query: 730 GNIAGDGPEFRDLVISSNAIPYLLALVSSTIPITFLRNITWTLSNLCRNKNPYPSVKAVK 909
           GNIAGDGPEFRD VI+SNAIP+LLAL+S T+PITFLRNITWTLSNLCRNKNPYP   AVK
Sbjct: 181 GNIAGDGPEFRDNVITSNAIPHLLALISPTLPITFLRNITWTLSNLCRNKNPYPCDTAVK 240

Query: 910 QMLPVLSHLLQHQDSEILSDTCW 978
           Q+LP L HLLQHQDSE+LSD CW
Sbjct: 241 QILPALLHLLQHQDSEVLSDACW 263

Output using program


Contig1072_32458 >sp|A9QM74|IMA8_HUMAN_Importin_subunit_alpha-8_OS=Homo_sapiens_GN=KPNA7_E=1_SV=1 > 424 bits (1091), Expect = e-119 ident of 263 (81%) Frame = +1 Query: 196 Sbjct: 1 3e-77 42