The biggies are obviously DESeq2, limma and edgeR, but they are massive packages doing some very complex statistics, and also have dependency trees that would need to be considered.

Depending on your background, you might want to look into the rtracklayer/GenomicRanges eco-system. While I personally am not a fan, I know they are very popular, and AFAIK no standard for genomic features has arrisen in python (we have our own classes for dealing with GTF/Bed files). The other thing that R has that python doesn't (AFAIK) is tools for creating genomic graphics in python - Gviz and ggbio equivalents.

As people have already mentioned, another big thing missing from python is all the annotation related packages in biocondutor - the AnnotationDbi packages, and biomaRt - these might also be more managable for a BSc thesis that some of the giant packages mentioned above.

Finally, while pandas and dplyr are quite good at matching each others features, I think there is stuff in packages like tidyr that arn't in pandas.

In fact, the thing that brings me back to R again and again, apart from things like DESeq2 etc, is ggplot. I've tried the python plotting libraries, and I just don't love any of them as much as I love ggplot. I wouldn't recommend trying to port that though. There are at least 2 ports out there already that havn't really suceeded. Its unlikely that any port ever will - RStudio has like, a whole team of coders employed to maintain ggplot - a one off port is never going to compete.



Source link