gravatar for aranyak111

2 hours ago by

I am trying to use a pipeline for RNA seq which has been used by one of my colleagues. I have got in total of 8 files. These files are 8 count files of RNA seq data. I am using less command

less wth1count.txt
less wth2count.txt
less wth3count.txt
less wth4count.txt
less 99h1count.txt
less 99h2count.txt
less 99h3count.txt
less 99h4count.txt

and to count the number of lines for each of the 8 files the command that I use is

wc -l wth1count.txt
wc -l wth2count.txt
wc -l wth3count.txt
wc -l wth4count.txt
wc -l 99h1count.txt
wc -l 99h2count.txt
wc -l 99h3count.txt
wc -l 99h4count.txt

My subsequent goal is to filter out reads that are lower than 5 and for that I am using python scripts

cat filtercounts.py

#!/usr/bin/python

#Load relevant modules

import sys

#Read in the HTseq gene count output files for each of the 8 samples

inputlist = [sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8]]  

##Determine the genes with sufficient coverage
#Set up a variable to hold genes with sufficient coverage in any of the 8 samples

genelist = []

#For each HTseq file

for i in range(len(inputlist)):
    #Open it
    inputfile = open(inputlist[i], 'r')

    #For each line
    for line in inputfile:
        #Determine the number of reads/transcripts for this gene
        splitline = line.strip().split('t')
        #Does it pass the threshold?
        if int(splitline[1]) >= 5: 
            #If so, add it to list
            if not splitline[0] in genelist:
                genelist.append(splitline[0])
    #Close the file
    inputfile.close()

#Write out the list of sufficiently expressed genes
outputlist = open(sys.argv[9], 'w') 

for gene in genelist:
    outputlist.write(gene + 'n')

outputlist.close()

##Filter each of the HTseq files for only sufficiently expressed genes across our samples
#For each HTseq file
for i in range(len(inputlist)):
    #Open it
    inputfile = open(inputlist[i], 'r')
    #Open an output 'filtered' file
    outputfile = open(sys.argv[i+10], 'w')
    #For each line
    for line in inputfile:
        #Determine the gene
        splitline = line.strip().split('t')
        #Is it in our list?
        if splitline[0] in genelist:
            #If so, write out the original line
            outputfile.write(line)
    #Close the files
    inputfile.close()
    outputfile.close()

Problem - When I am trying to execute the script I am getting the error "index list out of range"

I am also not sure where in the script I can feed the string of files as input?

Any help will be useful.



Source link