gravatar for projetoic

2 hours ago by

I have several files with the sequence of the organism or species and its reference sequence (CDS) and I would like to eliminate the reference sequences from them leaving only the sequence of the organism.

But so far the only solution I've found is editors. And you have to select one by one of the sequences to eliminate it. Is there any solution on the command line or some more automated way of doing this to eliminate these strings?

I gathered them all in one file to use in the editor ...

file input:

  >lcl|NC_001477.1_cds_NP_059433.1_1/
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    >gb:AB189120|Organism:Dengue/1-10179
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    >lcl|NC_001477.1_cds_NP_059433.1_1/
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    >gb:AB189120|Organism:Dengue/1-10179
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    >lcl|NC_001477.1_cds_NP_059433.1_1/
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   >gb:AB189120|Organism:Dengue/1-10179
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  

Output:

> gb:AB189120|Organism:Dengue/1-10179
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> gb:AB189120|Organism:Dengue/1-10179
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> gb:AB189120|Organism:Dengue/1-10179
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  

or

file input:

> lcl|NC_001477.1_cds_NP_059433.1_1/
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> gb:AB189120|Organism:Dengue/1-10179
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  

Output:

> gb:AB189120|Organism:Dengue/1-10179
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

link

modified 2 hours ago

written
2 hours ago
by

projetoic0



Source link