Co-occurrence of proteins of two families

1

For example, I have two proteins from two different families. To estimate if some species have them both I would run two rounds of blast and see.

But what If I have not only one species but three hundred, how to automize this search? As output, I want to get a list of species and for each species information about the existence of each protein in this species. I work with bacteria and this co-occurrence can be a signal of interactions between these proteins. Additionally, I don't want to see co-occurrence of these proteins in every bacteria, I want to estimate what percent of bacteria that have the first protein has the second protein.

Sorry for the simple questions, I'm new in this field, and sometimes it's very hard to find the point to start. Thanks in advance)


proteins


BLAST


Co-occurrence

• 104 views

Are you doing this with code or with the web? Probably want to do something like:

1) Do blastp (query A and query B vs database of the proteomes of the 300 species). You will get a set of hits from the BLASTP, with one or more proteins from each strain. Each hit protein has a hsp whose e-val/bit score exceeds the threshold settings you set when you run the program. "what bacteria have this protein" can be subjective depending on how you're defining a homolog - you will be defining it using these settings.

2) At this point you will probably want to consolidate your results to get the best hit for each strain. So if you consider 10 hits for a single query, five to strain A (A-protein #1 to A-protein #5) and five to strain B (B-protein #1 to B-protein #5), where A and B are two of the 300 strains, you want two output sequences (best A protein and best B protein) with respect to the single query. The number of output sequences here = number of strains mentioned below.

3) query with max hits (max_query) can be considered as the reference you're choosing to compare against for co-occurance, unless you have a specific protein in mind for this. So then % co-occurance = number of strains (max_query) / number of strains (other query) x 100.


Login
before adding your answer.

Traffic: 1115 users visited in the last hour



Source link