Remove spaces from fasta file in python

1

I have downloaded Severe acute respiratory syndrome coronavirus 2 isolate FASTA sequence but when I tried to get the exact length and C and G count I got the wrong answer every time.

Script:

genome = open("C://Users//USER//Desktop/Corona.fasta")
dna = genome.read()
dna1 = dna.rstrip("n")
print(len(dna1))

The exact length of the sequence is 29903 without a header 30018 but I don't understand why I am not getting 30018 although I am using strip function as well

Plz help me

29903


Python


Bioinformatics

• 266 views

There are numerous examples of substring counting that can be found with a bit of googling, e.g. here.

Please post a link to the sequence/file in question if you want specific help with your issue. My guess is that it could be a platform-specific issue (try dna.rstrip("nr"), which should remove all newline characters. Alternatively, just plan dna.rstrip() should also remove all newline and whitespace characters (spaces, tabs).

You are reading the whole file into a single string, from which only the last newline character is being stripped. It is easier to replace all instances of the newline character since it is going to be interspersed throughout your read.

dna1 = dna.replace('n', '')


Login
before adding your answer.



Source link