gravatar for plberry

2 hours ago by

Kansas City, Missouri, USA

I am trying to generate an index of ncRNAs for use with STAR aligner. I downloaded the long nc RNA GTF and FASTA files from GENCODE here: www.gencodegenes.org/human/

When I run Star in Genome Generate mode STAR --runThreadN 10 --runMode genomeGenerate --genomeDir . --genomeFastaFiles gencode.v36.lncRNA_transcripts.fa --sjdbGTFfile gencode.v36.long_noncoding_RNAs.gff3 --sjdbOverhang 100 --limitGenomeGenerateRAM 34068260906 --sjdbGTFtagExonParentTranscript transcript_id

I get this error:
Fatal INPUT FILE error, no valid exon lines in the GTF file: gencode.v36.long_noncoding_RNAs.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

I then tried the ncRNA GTF and fasta files on ensembl from here: www.ensembl.org/info/data/ftp/index.html

I ran STAR in Genome Generate mode: STAR --runThreadN 10 --runMode genomeGenerate --genomeDir . --genomeFastaFiles Homo_sapiens.GRCh38.ncrna.fa --sjdbGTFfile Homo_sapiens.GRCh38.102.gtf sjdbOverhang 100 --limitGenomeGenerateRAM 40764467242

And got the same error: Fatal INPUT FILE error, no valid exon lines in the GTF file: Homo_sapiens.GRCh38.102.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

Obviously I am missing something in the way I need to create this index, but after going through the STAR documentation and searching the error I am coming up completely empty. My end goal is to remove all reads in the sequencing file that map to ncRNAs, so if I'm going at this in completely the wrong way please let me know.

link

modified 1 hour ago

by

GenoMax95k

written
2 hours ago
by

plberry10



Source link