Reads alignment on human and pathogen combined genome

0

Hi everyone,

I have fastq files from a RNA sequencing experiment; my samples are human cells infected with an intracellular pathogen, thus I would like to align the total reads on both genomes (human and pathogen). I am working on Linux and I have performed some standard alignments before, using STAR and Ensembl genome reference.

I read it is better to perform the alignment in one step rather than two separated steps. However, I can't figure out how to build the "hybrid" STAR reference genome; ideally, I would like to have an "hybrid" genome where the sequence of the pathogen looks like an additional chromosome at the end of the human genome.
For a standard alignment, I would use STAR in --runMode genomeGenerate to build the reference; I can provide a "hybrid" fasta to STAR, obtained by concatenating fasta files from human and pathogen sequences (by simply using function "cat"). Is it okay?
What about .gtf files? How should I handle them to build the reference (and to count the aligned reads after)?

Note: I downloaded both fasta files from NCBI (as the pathogen sequence is only available from this resource), and both gtf files as well.

I am completely new to these kind of tasks and to the command line, sorry if my question is badly formulated. Thanks to anyone who can help me through this!


linux


STAR


alignment

• 39 views



Source link