I have a list.csv file contains series of coordinates and the concerned coordinates are belongs to different fasta files namely ocean.fasta, lake.fasta, river.fasta which are present in a target folder. The list.csv and ocean.fasta, lake.fasta, river.fasta files are shown below,

.

/target/
 list.csv
 Contig3,15,7
 Contig2,5,10
 Xantho1,12,3

 ocean.fasta 
>Contig2 contig1 Bacillus 985, ocean [298]
ACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCC
>Contig85 Bacillus 956, ocean [895]
ATGCNNNGCTAT

lake.fasta
>Xantho1 [Pseudomonas] cissicola strain CCUG 18839 contig_0000001
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAG
>Contig8 [Pseudomonas] cissicola strain CCUG 18839 contig_0000008
ATGCTTAGCTGATGC
>Contig20 [Pseudomonas] cissicola strain CCUG 18839 contig_0000020
ATGCTTAGCTGATGCTAGTA

river.fasta
>Contig3 8954 e.coli [856]
GCTGCGGCGCTGATCCTGGCGGCCCGCGCCGAG
>Contig8 8954 e.coli [859] 
TAGTGCGTATAT

Contig3,15,7 and Xantho1,12,3 are not in right order, I mean the $2>$3, therefore I need to order those coordinates as $2<$3. Further I need to reverse complement those sequences extracted from Xantho1,3,12 and Contig3,7,15. In addition to that, I need to save those extracted sequences as new_ocean.fasta, new_lake.fasta, new_river.fast in a fresh folder namely target_sequences. The expected output as follows,

./target/target_sequences

new_river.fasta
>Contig3    
GATCAGCGC

new_ocean.fasta 
>Contig2
AGCGCT

 new_lake.fasta
 >Xantho1
 CTTGATGGCC

I have used following script but end up with an error,

for file in *.fasta
do
fastaexplode "$file" &&
awk -F[:-] '{ 
                if($2>$3){
                    start=$3-1; 
                    len=$2-start" -"
                }
                else{
                    start=$2-1; 
                    len=$3-start
                } 
                print $1,start,len}' list.csv &&
tmpFile=$(mktemp); 
    > subseqs.fa
    awk -F[:-] '{ 
                if($2>$3){
                    start=$3-1; 
                    len=$2-start" -"
                }
                else{
                    start=$2-1; 
                    len=$3-start
                } 
                print $1,start,len}' list.csv |
    while read cont start len rev; do 
        fastasubseq "$cont".fa $start $len > $tmpFile; 
        if [[ -n $rev ]]; then 
            fastarevcomp "$tmpFile" >> subseqs.fa; 
        else 
            cat "$tmpFile" >> subseqs.fa; 
    fi && cp subseqs.fa target_sequences/"new_${file}" 
done

Please help me to do the same.



Source link