Replace headers of fasta while with simplified headers from another file

Hello,

I am creating a custom TE library and need fasta file headers to be in a specific format.

If I have file 1 with headers like such:

' >L2-10_EL__1_000087d4-94a9-4af9-a82b-db9caeebb418--3803-3889 LINE/L2__frg=1__len=87_st=C_div=21.6_sp=idaho.fa
AAGTGACGTTCTCAGCAATCTTGGAGATGTTGTAAGGTCCTAGAAGGGCAGTTTCAGTGCACGTGTTTGGCTCTGAACCCCGACTGG '

and file 2 (a text file) with just the simplified names:

>000087d4-94a9-4af9-a82b-db9caeebb418#LINE/L2

>000087d4-94a9-4af9-a82b-db9caeebb418#Unknown

>000087d4-94a9-4af9-a82b-db9caeebb418#LINE/L1

>000087d4-94a9-4af9-a82b-db9caeebb418#LINE/L2

My actual sequence lines are all 1 line, so I want to replace the line of the fast file that contains the matching contents, and retain the line of sequence that follows. Note that some of the sequence names are copies due to the TE being from a different part of the initial read, so I'm also unsure how to ensure that all copies get included in the output. Hopefully that makes sense? I'm worried that this isn't possible due to the odd format of the fasta file headers.

Thank you in advance!!!


bash


python


fasta

• 66 views



Source link