gravatar for Arsenal

2 hours ago by

Hi!

In (Roux et al, 2017). The authors state the use of nucmer (from mummer) to cluster the contigs:

' Contigs from all samples were clustered with nucmer (Delcher, Salzberg & Phillippy, 2003) at ≥95% ANI across ≥80% of their lengths, as in (Brum et al., 2015; Gregory et al., 2016), to generate a pool of non-redundant “population contigs” '

  • I have all my contigs from all the samples (which are grouped by experimental conditions) in only one denovo assembly file (with megahit). There is no reference; the samples come from mouse gut. Theoretically, there are several (probably unknown) genomes.
  • nucmer has at least two obligatory multifasta inputs; a reference and the query.
  • Merge a selection of viral genomes and use it as reference?
  • Assemble the samples/groups separately and then use one assembly as reference?
  • Use the same assembly file as both reference and query?
  • Split the assembly file then use one (maybe the largest) contig as the reference?
  • Anything else?

Alternatively, I've performed clustering with CD-HIT. Would nucmer be better at clustering? I can only answer that if I could somehow run nucmer.

If anyone has good experience with another viromics pipeline, I would be happy to test it.
Any help will be very much appreciated. Thanks!



Source link