I am new to microRNA analysis.
I have got sequences Homo_sapiens.GRCh38.miRNA.fasta representing the latest build of genome index few of whose first lines look like this.

 >hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p UGAGGUAGUAGGUUGUAUAGUU
 >hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CUAUACAAUCUACUGUCUUUC
 >hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p CUGUACAGCCUCCUAGCUUUCC
 >hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p UGAGGUAGUAGGUUGUGUGGUU
 >hsa-let-7b-3p MIMAT0004482 Homo sapiens let-7b-3p CUAUACAACCUACUGCCUUCCC

I converted all the Us to Ts using sed and awk scripts

The first few lines of the converted file look like this.

hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CTATACAATCTACTGTCTTTC.

I have built the reference genome based on the bowtie script

/gpfs/ycga/scratch60/nicoli/ag2646/Bowtie
2.4.2/bowtie2-2.4.2-linux-x86_64/bowtie2-build Homo_sappeins.GRCH38_DNA.fasta Homo_sapiens_microRNA_sub_ref

I now want to align my trimmed FASTQ file of human microRNAs to this indexed genome .
The first few lines of my trimmed FASTQ file looks like this

@SRR8248787.1601 HWI-D00306:1090:HKVGMBCX2:1:1101:8925:2469/1 TCACAGTGAACCGGTCTCTTTAGATCGGAAGATCACACGTCT
+ DDADDHHHEDF<[email protected]@GHIHHD<<G @SRR8248787.1602 HWI-D00306:1090:HKVGMBCX2:1:1101:9104:2269/1 TGAGGTAGTAGTTTGTGCTGTTAGATCGGAAGAGCACACGTCT
+ DDDDDIIIIIIIHIIIHIIIIIHIHIIIIIIIHIIIIIIHHII @SRR8248787.1603 HWI-D00306:1090:HKVGMBCX2:1:1101:9222:2274/1

I am using Bowtie2 default scripts to align the reads that have been trimmed using BBDuk

The bowtie scripts look like this

module load  Bowtie2; bowtie2  -x Homo_sapiens_microRNA_sub_ref -U /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1.bbdukfastq
-S /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1_Gr38trimmed.sam

The log file that I get after alignment is

29980128 reads; of these:
  29980128 (100.00%) were unpaired; of these:
    29979332 (100.00%) aligned 0 times
    546 (0.00%) aligned exactly 1 time
    250 (0.00%) aligned >1 times
0.00% overall alignment rate

The scripts used to do adapter trimming was

bbduk.sh in=3kPa_HDF_miRNA_1.fastq out=3kPa_HDF_miRNA_1bbduk.fastq ref=polyA.fa.gz,truseq_rna.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=r trimq=10 minlength

Any help will be useful.



Source link