Remove spaces from fasta file in python
I have downloaded Severe acute respiratory syndrome coronavirus 2 isolate FASTA sequence but when I tried to get the exact length and C and G count I got the wrong answer every time.
genome = open("C://Users//USER//Desktop/Corona.fasta") dna = genome.read() dna1 = dna.rstrip("n") print(len(dna1))
The exact length of the sequence is 29903 without a header 30018 but I don't understand why I am not getting 30018 although I am using strip function as well
Plz help me
• 266 views
There are numerous examples of substring counting that can be found with a bit of googling, e.g. here.
Please post a link to the sequence/file in question if you want specific help with your issue.
My guess is that it could be a platform-specific issue (try
dna.rstrip("nr"), which should remove all newline characters. Alternatively, just plan
dna.rstrip() should also remove all newline and whitespace characters (spaces, tabs).
You are reading the whole file into a single string, from which only the last newline character is being stripped. It is easier to
replace all instances of the newline character since it is going to be interspersed throughout your read.
dna1 = dna.replace('n', '')