I'd like to try to reconstruct a plausible structure for the C-terminal end of SARS-CoV-2 spike protein. The existing structures in PDB (6VXX, 6VYB, 6VSB) only cover the N-terminal up to to residue 1147 or so; the C-terminal end is unknown, but presumably includes a "stalk" and membrane anchor plus a short inside-virion region.
Here is the partial info that is known:
2FXP is a structure for heptad repeat 2 (HR2) domain from CoV-1, it includes residues 1141-1193. It is a symmetrical trimer helix bundle; the structure comes from solvent NMR
2RUN is a structure for the near-membrane region also from CoV-1, it includes residues 1185-1202. It is not a trimer; the structure comes from NMR in the presence of lipid micelles. This overlaps the 2FXP structure but doesn't necessarily agree with it.
The transmembrane domain 1195-1219 in CoV-1 and 1213-1237 in CoV-2 is predicted to be just a simple helix by various peptide folding tools like ITASSER, QUARK, etc; it is probably a trimeric helix bundle inserted into the membrane, but it's unknown what the relative orientation of the helixes would be (or the tilt angle, etc). Again this overlaps the 2RUN structure.
How does one assemble a protein structure from known fragments like this?
What tools would be best? What is the workflow/pipeline? How to tell that the structure is good quality?
There are a few aspects of this problem which make it quite hard, and also not exactly easy to describe as any standard workflow like "template based folding" or "ab initio folding" or "docking":
The partial structures overlap, but don't necessarily agree - and they may not even be correct, for example the near-membrane region may have a different fold in the context of a whole protein. How do I build a structure from several partial structures?
This has to reconstruct a trimer - it should dock well with itself! Symmetric "fold-and-dock" in Rosetta would seem to be a good start, except I don't know how to tell it about the known (parts of) the structure.
The transmembrane region should be docked in a way that makes sense for the interior of the membrane (of a known thickness / orientation). I've tried docking just the transmembrane helixes in Rosetta but they often end up being much more tilted than is realistic, so they wouldn't span the full thickness of the membrane.
Ideally I'd like to keep the HR2 structure fixed, initialize the near-membrane region from the known structure but leave it flexible, and dock the helixes using that pose as a starting point (with the constraint that they have to be helixes, and have to be a trimer, and should bind tightly to each other and also span the membrane).
I know this is a tough problem, and I'm not yet at the level where I can handle it, but I'm learning quite a lot by trying to handle it - my Rosetta knowledge has gotten way better in just a couple of days. I'm not sure if Rosetta is the right tool either - is there something better for this?
P.S. An example of what this is expected to look like is in 5V2S, a full structure of the transmembrane domain from HSV glycoprotein B which is also a trimer and has a similar function: