Identification of plastid derived sequences

I would like to identify plastid derived sequences in the mitogenome.

The complete mitochondrial genome of the early flowering plant Nymphaea colorata is highly repetitive with low recombination
By referring to the method in this paper, it mentioned to identify plastid derived mitochondrial sequences, the Nymphaea mitogenome was searched against the plastid genome of Nymphaea colorata (data unpublished), and all plant mitogenome database with an e-value cut-off of 1e− 6 and a word size of 7, simultaneously

May I know what is the actual steps involved in identifying plastid derived sequences in mitogenome? is that blast again NCBI? how to choose the e-value cut off setting during the blast? which plant mitogenome database can be used for such setting?



I recommend to do the following:

  1. Align by Blastn a reference plastid genome to your mitochondrial genome. Plastid genomes are very conservative, so you can take a plastid genome from any phylogenetically close species as a reference. The best would be to take the plastid genome of the species whose mitochondrial genome you study.
    An e-value of 10-3 will be OK, I think. It will provide enough sensitivity, without producing too many false positives.
  2. Exclude matches to mitochondrial genes. Both mitochondria and plastids are descendants of bacteria, and they have some homologous genes, for example ribosomal RNA genes, which will produce matches.

You can see a similar method in the paper Mitochondrial Genome of Fagopyrum esculentum and the Genetic Diversity of Extranuclear Genomes in Buckwheat, of which I am a co-author. Plastid-to-mitochondria transfers are called "MIPTs" there. Instead of the maximum e-value criterion, we used two alternative criteria: MIPTs should have had sequence identity >=90% and length >=100bp.

