Using bowtie2 I want to get only reads that align to animal1 genome and not animal 2. I alligned to animal1, with high sensitivity. Then I took the aligned .sam file and converted it to fastq using gatk SamToFastq (then would align to animal2 and take unaligned reads -> which would give me only animal1 reads).

Problem is that bowtie2 throws an error when try to align to animal2 when using this fastq file:

Error reading RefRecord offset from FILE

Looking at github.com/BenLangmead/bowtie2/blob/master/ref_read.h

RefRecord(FILE *in, bool swap) {
    assert(in != NULL);
    if(!fread(&off, OFF_SIZE, 1, in)) {
        cerr << "Error reading RefRecord offset from FILE" << endl;
        throw 1;
    }
    if(swap) off = endianSwapU(off);
    if(!fread(&len, OFF_SIZE, 1, in)) {
        cerr << "Error reading RefRecord offset from FILE" << endl;
        throw 1;
    }
    if(swap) len = endianSwapU(len);
    first = fgetc(in) ? true : false;
}

The issue appears because there is just an id in the output .sam (shown converted to fastq) below:

@SRR10111187.1.1
GTTTATTAGTACGTTGAGGTTGTGATCCGGAGTTTTCGGGGTATGGGCAACCTAGCTTGCTTAGCTGACCTTATTATAGATTGTGGTGTAAGTT

But bowtie want additional information, not just id.

Can I just use the original raw fastq file, get the matching read id, then copy the rest of the read header information and replace the read header in the fastq file output from bowtie2?

Eg

@SRR10111187.1.1 1 length=150
GTTTATTAGTACGTTGAGGTTGTGATCCGGAGTTTTCGGGGTATGGGCAACCTAGCTTGCTTAGCTGACCTTATTATAGATTGTGGTGTAAGTT

I presume this is valid as reads as far as I know are not altered by bowtie2?



Source link