Hi there

I have a quick question. I am dealing with some gigabits of fasta files and I am trying to develop a statistical analysis of protein composition (count aacs, kmers and using some probabilistic models) and I need some random data to use.

I have the done the code to randomization and my question is I need to randomize all the data?
The randomized data is based int the 'original fasta files'. It kind of reading the data and applying the randomization in each of the sequences keeping the length and the aac composition of each sequence.

For ex:

>seq 1



Thank you for your time and attention!


