I have a list.csv file contains series of coordinates and the concerned coordinates are belongs to different fasta files namely ocean.fasta, lake.fasta, river.fasta
which are present in a target
folder. The list.csv
and ocean.fasta, lake.fasta, river.fasta files are shown below,
.
/target/
list.csv
Contig3,15,7
Contig2,5,10
Xantho1,12,3
ocean.fasta
>Contig2 contig1 Bacillus 985, ocean [298]
ACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCC
>Contig85 Bacillus 956, ocean [895]
ATGCNNNGCTAT
lake.fasta
>Xantho1 [Pseudomonas] cissicola strain CCUG 18839 contig_0000001
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAG
>Contig8 [Pseudomonas] cissicola strain CCUG 18839 contig_0000008
ATGCTTAGCTGATGC
>Contig20 [Pseudomonas] cissicola strain CCUG 18839 contig_0000020
ATGCTTAGCTGATGCTAGTA
river.fasta
>Contig3 8954 e.coli [856]
GCTGCGGCGCTGATCCTGGCGGCCCGCGCCGAG
>Contig8 8954 e.coli [859]
TAGTGCGTATAT
Contig3,15,7
and Xantho1,12,3
are not in right order, I mean the $2>$3, therefore I need to order those coordinates as $2<$3. Further I need to reverse complement those sequences extracted from Xantho1,3,12
and Contig3,7,15
. In addition to that, I need to save those extracted sequences as new_ocean.fasta, new_lake.fasta, new_river.fast in a fresh folder namely target_sequences. The expected output as follows,
./target/target_sequences
new_river.fasta
>Contig3
GATCAGCGC
new_ocean.fasta
>Contig2
AGCGCT
new_lake.fasta
>Xantho1
CTTGATGGCC
I have used following script but end up with an error,
for file in *.fasta
do
fastaexplode "$file" &&
awk -F[:-] '{
if($2>$3){
start=$3-1;
len=$2-start" -"
}
else{
start=$2-1;
len=$3-start
}
print $1,start,len}' list.csv &&
tmpFile=$(mktemp);
> subseqs.fa
awk -F[:-] '{
if($2>$3){
start=$3-1;
len=$2-start" -"
}
else{
start=$2-1;
len=$3-start
}
print $1,start,len}' list.csv |
while read cont start len rev; do
fastasubseq "$cont".fa $start $len > $tmpFile;
if [[ -n $rev ]]; then
fastarevcomp "$tmpFile" >> subseqs.fa;
else
cat "$tmpFile" >> subseqs.fa;
fi && cp subseqs.fa target_sequences/"new_${file}"
done
Please help me to do the same.