I'm trying to do a pangenome analysis, and I'm getting some inconsistent results using Roary. I don't really expect the roary pipeline to be deterministic, but the results are variable enough that I'd really appreciate some feedback. I don't have much experience with genomics, and would be very grateful to hear from more experienced people.

To learn how to do the analysis, I started by following the tutorial outlined at I followed the steps described exactly, with the exception that I added a --prefix flag to the Prokka command. I first ran the analysis on my laptop, and the plots generated automatically by are slightly different than the plots shown in the tutorial. I then re-ran the exact same analysis on our cluster (I'm eventually going to do this for hundreds of genomes), and got slightly different results than I did on my laptop and compared to the tutorial. I've attached two of the plots just to give an idea.

The results are not enormously different, it's like 4492 vs 4464 gene clusters and 2475 vs 2454 shell genes. Is this kind of variance normal/acceptable/what you'd expect? Should this worry me and should I look into this further? I can imagine multiple places in the pipeline that would cause this kind of variation, but I don't know what the best practice is here.

Thanks for your help!


