I am trying to use a pipeline for RNA seq which has been used by one of my colleagues. I have got in total of 8 files. These files are 8 count files of RNA seq data. I am using less command
less wth1count.txt
less wth2count.txt
less wth3count.txt
less wth4count.txt
less 99h1count.txt
less 99h2count.txt
less 99h3count.txt
less 99h4count.txt
and to count the number of lines for each of the 8 files the command that I use is
wc -l wth1count.txt
wc -l wth2count.txt
wc -l wth3count.txt
wc -l wth4count.txt
wc -l 99h1count.txt
wc -l 99h2count.txt
wc -l 99h3count.txt
wc -l 99h4count.txt
My subsequent goal is to filter out reads that are lower than 5 and for that I am using python scripts
cat filtercounts.py
#!/usr/bin/python
#Load relevant modules
import sys
#Read in the HTseq gene count output files for each of the 8 samples
inputlist = [sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8]]
##Determine the genes with sufficient coverage
#Set up a variable to hold genes with sufficient coverage in any of the 8 samples
genelist = []
#For each HTseq file
for i in range(len(inputlist)):
#Open it
inputfile = open(inputlist[i], 'r')
#For each line
for line in inputfile:
#Determine the number of reads/transcripts for this gene
splitline = line.strip().split('t')
#Does it pass the threshold?
if int(splitline[1]) >= 5:
#If so, add it to list
if not splitline[0] in genelist:
genelist.append(splitline[0])
#Close the file
inputfile.close()
#Write out the list of sufficiently expressed genes
outputlist = open(sys.argv[9], 'w')
for gene in genelist:
outputlist.write(gene + 'n')
outputlist.close()
##Filter each of the HTseq files for only sufficiently expressed genes across our samples
#For each HTseq file
for i in range(len(inputlist)):
#Open it
inputfile = open(inputlist[i], 'r')
#Open an output 'filtered' file
outputfile = open(sys.argv[i+10], 'w')
#For each line
for line in inputfile:
#Determine the gene
splitline = line.strip().split('t')
#Is it in our list?
if splitline[0] in genelist:
#If so, write out the original line
outputfile.write(line)
#Close the files
inputfile.close()
outputfile.close()
Problem - When I am trying to execute the script I am getting the error "index list out of range"
I am also not sure where in the script I can feed the string of files as input?
Any help will be useful.