Getting comfortable with protein domains

Protein domains

Domains are compact arrangements of folded chains.  From a purely artistic perspective, you can think of a domain as a major substructure (a chunk) of the overall protein.  A domain stands apart from the rest of the structure. If a protein were a human body, the head would be one of its domains, the trunk another, the left arm another, etc.  Some proteins have a single domain, others have many.  A chain sometimes folds into a single domain , sometimes into multiple domains.  Myoglobin (an oxygen binding protein that is richly abundant in muscle) is a single chain and a single domain.  An IgG molecule (a main kind of antibody molecule in the blood) includes four chains folded into six domains.

Myoglobin by Geis

Myoglobin, by Geis

Antibody structure by Geis

Antibody structure, by Geis

One of the great discoveries of the past decade is the conservation of domains across all of biology.  The biological world includes a few hundred domains as the canonical elements that account for the structures and functions of essentially all of the many thousands of existing proteins.  Long ago nature evidently discovered a set of compact machines (domains) and has used them creatively in assorted mix-and-match combinations.  This is amazing:  The many thousands of known protein structures (they all can be looked up in the PDB!) fold into just a few hundred generic protein domains.

  • For the artist, one very helpful exercise is to make 2-D sketches of 3-D proteins.  A 2-D “topological diagram” can serve as a quick and easy surrogate for a rotatable 3D computer model when you are trying to make sense of how a chain travels through a molecule.  Below are examples from Jane Richardson.  She is deservedly credited as the inventor of 2-D topological sketches of proteins. Note how her topo diagrams readily highlight the differences between superficially  similar alpha-beta class proteins:

  • An artist can take advantage of 2-D topologies while attempting to depict the complicated multi-domain and  oligomeric-structure of big protein molecules.  No matter which depiction the artist has in mind, it is useful to make some 2D top0 sketches:  Will you depict a small protein by showing its details, or will you smudge out the detail and portray a big protein?  The sketch book is your friend!

Examples of protein domains

Proteins adopt shapes that are related to each other in discernible ways that beg for classification schemes that help us to make sense out of complexity in sort of the same way that the Linnaean classification scheme allows us to compare zebras and horses and sharks and then proclaim that two of those three are more closely related.

Below is a sampling of some of the thousands of ways that protein chains fold into domains. A domain is simply a discernible shape, often repeated in related proteins, often explanatory of the function of the protein.  A protein may consist of a single domain, and more complicated proteins are assemblages of numerous domains.  If the human body were a single protein molecule, one might claim that it includes a single head domain but also several other domains that might be lumped together as appendage domains or more narrowly defined as hand domains and feet domains, for example.

How to classify domains?

The people who study protein domains scientifically are often artists at heart.  This is also a great field for those interested in classification and bioinformatics.  Some very large and interesting databases have been created to help keep track of the families of proteins as arranged according to domain structure.  One of the well-known databases is called CATH.  Another is SCOP.  The following quotation (which announced CATH to the world) gives you an idea of how such classifications are organized.

CATH (class, architecture, topology, homology) is a hierarchical protein domain classification (1) where domains are classified manually by curators, guided by prediction algorithms (such as structure comparison). Each protein structure is decomposed into one or more chains which in turn are split into one or more domains before being classified into homologous superfamilies according to both structure and function. At the Class, or C-level, the domains are classified simply on the basis of their secondary structure content [whether they are mostly α-helical (Class 1) or β-sheet (Class 2), contain a significant percentage of both secondary structure elements (Class 3) or contain very little secondary structure (Class 4)]. The domains within each class are then sorted according to their architecture—that is similarities in the arrangements of secondary structures in 3D space. Each architecture (A-level) is further broken down into one or more topology, or fold, groups (T-level), where the connectivity between these secondary structures are taken into account. The domains are then classified into their respective homologous superfamilies (H-level) according to similarities in sequence, structure and/or function. Clustering performed at the H-level (>35% sequence identity and above) then produces one or more sequence families for each of the homologous superfamilies (S-level).”

–From: Nucleic Acids Res. 2009 January; 37 (Database issue): D310–D314.


Print Friendly, PDF & Email