How To Split A Bam File By Chromosome

8

Hello,
I am having a hard time opening a very large bedgraph. I have been suggested to split my bam file by chromosome with chrom-bed.py but it didn't work.
Is there any other alternative?
Thanks,
GP.


bam


split


chromosome

• 71k views

in this other answer, Aaron Quinlan stated:

bamtools has a "split" command for exactly this purpose

I can only add that I've just tried it with this simple command

bamtools split -in file.bam -reference

and it works like a charm. the bam file gets split into different bam files, which are suffixed with .REF_xxx.bam by default, which is very convenient.

Try samtools: samtools view -?

A region should be presented in one of the following formats:
`chr1',`chr2:1,000' and `chr3:1000-2,000'. When a region is
specified, the input alignment file must be an indexed BAM file.

something like samtools view in.bam chr1 > chr1.bam should work

samtools view in.bam chr1 -b > out.bam

Use -b to output bam format

updated 21 months ago by

34k

written 8.0 years ago by

▴

120

I wrote a java tool to split a BAM per chromosome see code.google.com/p/jvarkit/wiki/SplitBam

It also creates an empty BAM (filled with a pair of mock SAMRecords) for each chromosome in the Reference, if no SAMRecord was found for the chromosome.

You can use the following pipeline to extract chrY reads from the raw bam files and with the header

samtools sort A.bam -o A.sort.bam
samtools index A.sort.bam
samtools view -H A.sort.bam > output.extraction.sam
samtools view A.sort.bam chrY >> output.extraction.sam
samtools view -hb output.extraction.sam > output.extraction.bam
samtools view  -H output.extraction.bam

output.extraction.bam is the bam file which extracted chrY reads.

updated 21 months ago by

34k

written 5.5 years ago by

★

8.7k

There is also a nice blog post about this by Sam Nicholls here.

tl;dr (extracted from the blog post)

samtools view -H in.bam | grep -P '^@SQ' | cut -f 2 -d ':' | cut -f 1 | while read contig; do
    samtools faidx reference.fa $contig > my_contig.fa
    java -jar picard.jar CreateSequenceDictionary R=my_contig.fa O=my_contig.dict
    java -jar picard.jar ReorderSam INPUT=in.bam OUTPUT=out-${contig}.bam REFERENCE=my_contig.fa S=true VERBOSITY=WARNING
    rm my_contig.fa my_contig.dict
done

This will go through the input whole bam file (in.bam) and make separate bams for all the contigs (out-${contig}.bam) in the reference fasta (reference.fa). The output bam files are compatible with most of the common bioinformatics software. Of course, you can skip the for loop and just use the contig name of your choice instead.

samtools view -b in.bam chr1 > out.bam

did not work for me.
But

samtools view -b in.bam 1  > out.bam

worked.


Login
before adding your answer.

Traffic: 1334 users visited in the last hour



Source link