Allele Specific Expression (ASE) from RNA seq data


Hey all,
I have a few samples of RNA seq data (Illumina paired-end 161 bp), for which I need to do allele-specific expression testing.

Now, I have done some standard RNA seq work before (DGE, GO enrichment etc.) but am completely clueless about the current assignment. I have been suggested to use ASEReadCounter from GATK for the job and while going over the documentation I saw it needs both a BAM file ( which I have , after aligning my reads to STAR) and a vcf file containing the sites to be processed. Since I do not have any genotype or DNA seq data from my samples, I tried downloading the 1000 genomes vcf sites file (called on hg38). Heres the link
and heres the file from the directory

I tried running GATK ASEReadCounter with this and reference as primary_genome fasta file (hg38) downloaded from GENCODE release 36. The output gives the following warning

“Ignoring site: variant is not het at position: chr1:(numbers)”.

for all positions in the vcf file. This is likely due to the fact that this vcf file does not contain any genotype info (no GT field).

I would be thankful if someone could answer these questions

  1. Is it possible to do ASE with only RNA seq data, like I have and go DNA data? If so, it would be really helpful if you could direct me to a link/workflow.

  2. Is it feasible if I just add a GT field to the sites vcf and populate it with 0/1 for all the sites. This will trick the program into thinking that all sites are heterozygous and so the program will maybe work as intended. Is this logic okay? Or do I need genotype info for my samples for ASE to work?

Thanks a lot,





