gravatar for Shadi.nayeri

2 hours ago by

Hello All,

I know this question has been answered a couple of times, though I am confused about how the indexing should be done.

I have RNA-seq data and two conditions. I am planning to get both DE mRNAs and LncRNAs using HISAT2.

To identify DE LncRNAs from RNA-seq data, I know that I should use the GTF file from the GeneCode website.
Below is the order of what I did:

I have two GTF files

  1. known_lncRNA.gtf (obtained from Genecode)
  2. gencode.v35.annotation.gtf (obtained from Genecode)

To identify known DE LncRNA, I performed the below steps:

  • make an index by
  • taking first the splice sites from the known_lncRNA.gtf file:

    hisat2_extract_splice_sites.py known_lncRNA.gtf > known_lncRNA_splicSite.ss

  • extracting exons from the whole GTF file:

  • hisat2_extract_exons.py gencode.v35.annotation.gtf > genome.exon
    (or should I used the known_lncRNA.gtf here instead of gencode.v35.annotation.gtf)

  • Then make the index file:

  • hisat2-build -p 16 --exon genome.exon --ss known_lncRNA_splicSite.ss
    genome.fa ./genome_tran

Is this the correct way of making the index for specifically LncRNAs?

I then performed
1. QC reads and remove adapters
2. HISAT2
3. feature counts
4. DESEq or EdgeR

Also, for the featurecounts step, should I used the integrated GTF file: known_lncRNA.gtf+gencode.v35.annotation.gtf or just the "known_lncRNA.gtf"

I really appreciated any hint as I am stuck in this step.



Source link