specific k-mer count in multi fasta file

1

Hello, I'm trying to count a specific kmer in multifasta files contained in a directory with approximately 65 thousand multifasta files, I have tried with Bioconductor but not it's possible to save all data in memory, and I would like to do this task with a perl or python script. I would like to modify the next script to count the kmer occurrences in the fasta file.

#!/usr/bin/perl -w 
# Searching for motifs 
# Ask the user for the filename of the file containing 
# the protein sequence data, and collect it from the 
keyboard 
print "Please type the filename of the DNA sequence 
data: "; 
$dnafilename = <STDIN>; 
# Remove the newline from the DNA filename 
chomp $dnafilename; 
# open the file, or exit 
unless ( open(DNAFILE, $dnafilename) ) { 
    print "Cannot open file "$dnafilename"nn"; 
    exit; 
} 
# Read the dna sequence data from the file, and store 
it 
# into the array variable @protein 
@dna = <DNAFILE>; 
# Close the file - we've read all the data into @dna 
now. 
close DNAFILE; 
# Put the DNA sequence data into a single string, as 
it's easier 
# to search for a motif in a string than in an array of 
# lines (what if the motif occurs over a line break?) 
$dna = join( '', @dna); 
# Remove whitespace 
$dna =~ s/s//g; 
# In a loop, ask the user for a motif, search for the motif, 
# and report if it was found. 
# Exit if no motif is entered. 
do { 
    print "Enter a motif to search for: "; 
    $motif = <STDIN>; 
    # Remove the newline at the end of $motif 
    chomp $motif; 
    # Look for the motif 
    if ( $dna =~ /$motif/ ) { 
        print "I found it!nn"; 
    } else { 
        print "I couldn't find it.nn"; 
    } 
# exit on an empty user input 
} until ( $motif =~ /^s*$/ ); 
# exit the program 
exit;


sequence

• 1.4k views



Source link