I am trying to use a pipeline for RNA seq which has been used by one of my colleagues. I have got in total of 8 files. These files are 8 count files of RNA seq data. I am using less command
less wth1count.txt less wth2count.txt less wth3count.txt less wth4count.txt less 99h1count.txt less 99h2count.txt less 99h3count.txt less 99h4count.txt
and to count the number of lines for each of the 8 files the command that I use is
wc -l wth1count.txt wc -l wth2count.txt wc -l wth3count.txt wc -l wth4count.txt wc -l 99h1count.txt wc -l 99h2count.txt wc -l 99h3count.txt wc -l 99h4count.txt
My subsequent goal is to filter out reads that are lower than 5 and for that I am using python scripts
cat filtercounts.py #!/usr/bin/python #Load relevant modules import sys #Read in the HTseq gene count output files for each of the 8 samples inputlist = [sys.argv, sys.argv, sys.argv, sys.argv, sys.argv, sys.argv, sys.argv, sys.argv] ##Determine the genes with sufficient coverage #Set up a variable to hold genes with sufficient coverage in any of the 8 samples genelist =  #For each HTseq file for i in range(len(inputlist)): #Open it inputfile = open(inputlist[i], 'r') #For each line for line in inputfile: #Determine the number of reads/transcripts for this gene splitline = line.strip().split('t') #Does it pass the threshold? if int(splitline) >= 5: #If so, add it to list if not splitline in genelist: genelist.append(splitline) #Close the file inputfile.close() #Write out the list of sufficiently expressed genes outputlist = open(sys.argv, 'w') for gene in genelist: outputlist.write(gene + 'n') outputlist.close() ##Filter each of the HTseq files for only sufficiently expressed genes across our samples #For each HTseq file for i in range(len(inputlist)): #Open it inputfile = open(inputlist[i], 'r') #Open an output 'filtered' file outputfile = open(sys.argv[i+10], 'w') #For each line for line in inputfile: #Determine the gene splitline = line.strip().split('t') #Is it in our list? if splitline in genelist: #If so, write out the original line outputfile.write(line) #Close the files inputfile.close() outputfile.close()
Problem - When I am trying to execute the script I am getting the error "index list out of range"
I am also not sure where in the script I can feed the string of files as input?
Any help will be useful.