Edit Headers of Fasta file

1

Hi, I have a fasta file for a genome assembly with 646 sequences. The first 7 are pseudochromosomes and the rest are unassigned scaffolds. And the headers look something like this:

>GWHABKY00000001    Chromosome 1    Complete=F  Circular=F  OriSeqID=chr1   Len=553355525
>GWHABKY00000002    Chromosome 2    Complete=F  Circular=F  OriSeqID=chr2   Len=740519526
>GWHABKY00000003    Chromosome 3    Complete=F  Circular=F  OriSeqID=chr3   Len=676969686
>GWHABKY00000004    Chromosome 4    Complete=F  Circular=F  OriSeqID=chr4   Len=612967577
>GWHABKY00000005    Chromosome 5    Complete=F  Circular=F  OriSeqID=chr5   Len=625473173
>GWHABKY00000006    Chromosome 6    Complete=F  Circular=F  OriSeqID=chr6   Len=584270320
>GWHABKY00000007    Chromosome 7    Complete=F  Circular=F  OriSeqID=chr7   Len=744096988
>GWHABKY00000008    OriSeqID=scaffold1  Len=1816015
>GWHABKY00000009    OriSeqID=scaffold10 Len=942477
>GWHABKY00000010    OriSeqID=scaffold100    Len=268586
>GWHABKY00000011    OriSeqID=scaffold101    Len=265196
>GWHABKY00000012    OriSeqID=scaffold102    Len=259718
>GWHABKY00000013    OriSeqID=scaffold103    Len=258511
>GWHABKY00000014    OriSeqID=scaffold104    Len=258489
>GWHABKY00000015    OriSeqID=scaffold105    Len=257418
>GWHABKY00000016    OriSeqID=scaffold106    Len=256425
...

I want to edit the headers so that the first 7 just say:

>Chr1E
>Chr2E
...

And for the rest I just want the scaffold ID

>scaffold1
>scaffold10
...

What's the best way to do this using sed/awk?


header


fasta

• 70 views



Source link