Babbitt Lab > Resources > Supplementary data from "An atlas of the thioredoxin fold class reveals the complexity of function-enabling adaptations"
Atkinson, HJ, and Babbitt, PC. "An atlas of the thioredoxin fold class reveals the complexity of function-enabling adaptations." 2009, in preparation.
|
Background Methodology/Principal Findings Conclusions & significance |
1. Supplementary figures referenced in the text. All files are in Portable Document Format (PDF)
| File | Description |
SF1_strucNets_minority.pdf
|
Fig. S1. A structure-based similarity network describes a map of the Trx fold class: colored by minority Thioredoxin-like Clan families A. Structure-similarity network, containing 159 structures that are a maximum of 60% identical (by sequence) that span the Trx fold class. Similarity is defined by FAST scores better than a score of 4.5; edges at this limiting score represent alignments with a median of 2.75Å RMSD across 72 aligned positions. Each node is colored by a PFAM Thioredoxin-like Clan family if the chain sequence is a member of that family. Nodes with thick red borders and bold labels denote chains present in the hierarchical clustering tree in D. Labels like 1ON4_A denote PDB ID 1ON4, chain A. B. Structure similarity network containing the same structures as in A, shown at the more stringent threshold of 7.5. Edges at this limiting score correspond to alignments with a median of 2.45Å RMSD across 89 aligned positions. Nodes are colored as in A. C. Structure similarity network containing the 105 structures from the large connected cluster in B, displayed at a FAST score cutoff of 12.0; edges at this limiting score represent alignments with a median of 2.21Å RMSD across 102 aligned positions. Nodes are colored as in A. D. Complete linkage hierarchical clustering tree based on pairwise FAST scores for 15 representative structures singled out in the networks in A-C, with PDB IDs in bold, and associated SWISSPROT sequence IDs in plain text. |
SF2_TrxClan_AnnotBySP.pdf
|
Fig. S2. A sequence similarity network shows how each Trx fold superfamily is distributed (colored by SwissProt classification) [From UniProtKB/Swiss-Prot family/domain classification: http://ca.expasy.org/cgi-bin/get-similar?all=all] Sequence similarity network, containing 4,082 representative sequences that are a maximum of 40% identical that span the Trx fold class. Similarity is defined by pairwise BLAST alignments better than an E-value of 1x10-12; edges at this threshold represent alignments with a median 30% identity over 120 residues, while the rest of the edges represent better alignments. Each node is colored by the sequences SWISSPROT family classification, if available; sequences that are not classified in SWISSPROT are colored grey. Large nodes represent sequences that are at least 40% identical to the 159 structures in Fig. 3. The sequences associated with the 15 representative structures in Fig. 3C are labeled using bold text and white arrows. The general locations of other sequences representing different superfamilies are noted using italicized text. |
SF3_TrxClan_AnnotByDomOrder.pdf
|
Fig. S3. Many Trx domains occur in combination with other Trx domains A. Sequence similarity network, containing 4,082 representative sequences that are a maximum of 40% identical that span the Trx fold class. Similarity is defined by pairwise BLAST alignments better than an E-value of 1x10-12; edges at this threshold represent alignments with a median 30% identity over 120 residues, while the rest of the edges represent better alignments. Nodes are colored by the number of PFAM Thioredoxin-like Clan family domains occurring within the sequence; with the exception of H. influenzae Prx 5 -- labeled (iii) -- and the monothiol glutaredoxins -- labeled (ii) -- these domains are typically duplications of the same domain, such as the PDI-type enzymes (iv), which can contain two to four thioredoxin domains, or the few DSBA-like enzymes (i) which contain up to three DSBA-like domains. Large nodes represent sequences that are at least 40% identical to the 159 structures in Fig. 3. The sequences associated with the 15 representative structures in Fig 3C are labeled using bold text and white arrows. The occurrence of other sequences representing different superfamilies are noted using italicized text. B. Domain structures for example sequences from the groups labeled (i)-(iv); some domains are shorter than expected and this is denoted by a gradient that fades to white. The sequences are identified by their UNIPROT sequence IDs. |
SF4_TrxFold_byPFAM_tally.tsv.R.pdf
|
Fig. S4. The relative populations of the Trx fold superfamilies vary A. 4,082 representative sequences that are a maximum of 40% identical and span the Trx fold class, binned according to their membership in PFAM families within the Thioredoxin-like Clan. B. All 29,206 sequences in the Trx fold class. |
SF5_strucNets_withSeqNet.pdf
|
Fig. S5. There is good correspondence between the structure and sequence-based Trx fold class networks The three views of the structure-based network from Fig. 3 are repeated in A-C, and panel D contains a sequence-based network derived from the amino acid sequences in the 159 structure chains. A. Structure similarity network, containing 159 structures that are a maximum of 60% identical (by sequence) that span the Trx fold class. Similarity is defined by FAST scores better than a score of 4.5; edges at this threshold represent alignments with a median of 2.75Å RMSD across 72 aligned positions, while the rest of the edges represent better alignments. Each node is colored by a PFAM Thioredoxin-like Clan family if the chain sequence is a member. Nodes with thick white borders and bold labels denote chains present in the hierarchical clustering tree in Fig. 3D. Labels like 1ON4_A denote PDB ID 1ON4, chain A. B. Structure similarity network containing the same structures as in A, shown at the more stringent threshold of 7.5. Edges at this threshold correspond to alignments with a median of 2.45Å RMSD across 89 aligned positions. Nodes are colored as in A. C. Structure similarity network containing the 105 structures from the large connected cluster in B, displayed at a FAST score cutoff of 12.0; edges at this threshold represent alignments with a median of 2.21Å RMSD across 102 aligned positions. Nodes are colored as in A. D. Sequence similarity network, containing 159 chain sequences from A-C. Similarity is defined by pairwise BLAST alignments better than an E-value of 1x10-5; edges at this threshold represent alignments with a median 27% identity over 84 residues, while the rest of the edges represent better alignments. |
SF6_TrxClan_AnnotByTax.pdf
|
Fig. S6. Use of some members of the Trx fold class is restricted to taxonomic subsets Here, the sequence similarity network from Fig. 4, containing 4,082 sequences, is colored by the species kingdom (Metazoa, Fungi, Viridiplantae) or superkingdom (Bacteria, Eukaryota, Archaea). Note that Eukaryota includes all eukyaryotic species without a more specific kingdom, and is primarily associated with protozoan parasites. Large nodes represent sequences that are associated with the structures from Fig. 3. Blue letter labels correspond to sequence groups in Fig. 5. |
Back to top of page
2. Supplementary tables referenced in the text. All files are in Portable Document Format (PDF)
| File | Description |
| ST_1-3.pdf | Table S1. Table S2. Table S3. |
3. Datafiles generated in the analysis: sequence files, trees, protein similarity networks, etc
1. Dataset files
| File | Description | ||||||||||||||||||||||||||
| tab-separated file: SIMILARITY_classes.txt |
List of the 20 Trx-fold-relevant SwissProt superfamilies, example sequence IDs, and counts for each class that contributed to the total sequence set. | ||||||||||||||||||||||||||
| tab-separated file: SwissProtPfamTrxClan.ids.tsv |
All 29,206 Trx fold class sequences:
|
||||||||||||||||||||||||||
| sequences: SwissProtPfamTrxClan40nrVips60gr.fa |
4,082 Trx-fold sequences: the above sequences filtered to a max of 40% identity and a minimum length of 60 amino acids | ||||||||||||||||||||||||||
| sequences: SwissProtPfamTrxClanOnly.pdb.fa |
563 sequences extracted from Trx-fold structures:
|
||||||||||||||||||||||||||
| sequences: SwissProtPfamTrxClanOnly60nr.pdb.fa |
159 sequences extracted from Trx-fold structures (chains): The above sequences, filtered to a maximum of 60% identity |
Back to top of page
2.1 Structure-based networks and tree
| File | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| network:
SwissProtPfamTrxClanOnly60nr.pdb.tsv. ids_4.5.fc.xgmml
[3M]
|
Fig. 3A: structure similarity network, FAST score cutoff
of 4.5
Structure-similarity network, containing 159 structures that are a maximum of 60% identical (by sequence) that span the Trx fold class. Similarity is defined by FAST scores better than a score of 4.5; edges at this threshold represent alignments with a median of 2.75Å RMSD across 72 aligned positions, while the rest of the edges represent better alignments. Load in Cytoscape using File: Import network
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| network:
SwissProtPfamTrxClanOnly60nr.pdb.tsv. ids_7.5.fc.xgmml
[1M]
|
Fig. 3B: structure similarity network, FAST score cutoff
of 7.5
Structure-similarity network, containing 159 structures that are a maximum of 60% identical (by sequence) that span the Trx fold class. ... same structures as in A, shown at the more stringent threshold of 7.5. Edges at this threshold correspond to alignments with a median of 2.45Å RMSD across 89 aligned positions. *See description of Fig. 3A network above for xgmml file structure attributes |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| network:
fast7.5_subset.ids_12.0.fc.xgmml
[686K]
|
Fig. 3C: structure similarity network, FAST score cutoff
of 12.0
Structure-similarity network containing the 105 structures from the large connected cluster in B, displayed at a FAST score cutoff of 12.0; edges at this threshold represent alignments with a median of 2.21Å RMSD across 102 aligned positions. *See description of Fig. 3A network above for xgmml file structure attributes |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| tree:
repr_strucs2.ids.fast.m.cLink0-30.tre
[4K]
|
Fig. 3D: hierarchical clustering tree
Complete linkage hierarchical clustering tree based on pairwise FAST scores for 15 representative structures singled out in the networks in A-C |
Back to top of page
2.2 Sequence-based networks
| File | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| zipped network:
SwissProtPfamTrxClan40nrVips60gr.fa _1e-12.fc.xgmml.zip
[4M zipped; 60M unzipped]
|
Fig. 4, 5, 6, S2, S3, S6: sequence similarity network, BLAST
E-value cutoff of 1x10-12 Sequence-similarity network, containing 4,082 sequences that are a maximum of 40% identical that span the Trx fold class. Similarity is defined by pairwise BLAST alignments better than an E-value of 1x10-12; edges at this threshold represent alignments with a median 30% identity over 120 residues, while the rest of the edges represent better alignments. 1. Unzip file using gunzip (or double-click) 2. Load in xgmml file in Cytoscape using File: Import network Note: This file is huge; will take at least minutes to load
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| File | Description |
| network:
159_chain_seqs.fa_1e-05.fc.xgmml
[1M]
|
Fig. S5D: sequence similarity network, BLAST E-value
cutoff of 1x10-5
Sequence-similarity network, containing 159 chain sequences from Fig 1 A-C. Similarity is defined by pairwise BLAST alignments better than an E-value of 1x10-5; edges at this threshold represent alignments with a median 27% identity over 84 residues, while the rest of the edges represent better alignments. Load in Cytoscape using File: Import network *See description of Fig. 3A network above for xgmml file structure/sequence attributes |
| File | Description |
| tab-separated text file:
activeSiteMotifsByFamily.tsv
[961B] |
Text file associating each Thioredoxin-like Clan PFAM model with
a CxxC motif, if present.
Columns:
Contents: PFAM model motif notes example seq and structure AhpC-TSA tPvC =cxxC PRDX6_HUMAN 1PRX T44,C47 ArsC Cstc =CxxC ARSC1_ECOLI 1I9D C12,S15 Calsequestrin [no motif] has multiple Trx domains, no CxxC in any CASQ1_RABIT 1A8Y DSBA CPyC =CxxC DSBA_ECOLI 1FVK C30,C33 DUF1687 [no motif] YK29_YEAST 1WPI DUF836 ChLC =CxxC Q8P6W3_XANCP 1TTZ C11,C14 DUF953 CGpC =CxxC Q9BRA2_HUMAN 1WOU C43,C46 ERp29_N [no motif] WBL_DROME 1OVN GSHPx CGlT =Cxxc GPX7_HUMAN 2P31 C57,T60 GST_N spra poor fit: cxxc GSTT1_HUMAN 2C3N S11,C14 Glutaredoxin CpfC =CxxC GLRX3_ECOLI 3GRX C11,S14 HyaE [no motif] HYAE_ECOLI 2HFD OST3_OST6 CqlC =Cxxc OST3_YEAST [no structure] C73,C76 Phosducin GtdA poor fit: model based on 2 seq; 't'-pos is Cys PHOS_RAT 1B9Y_C C148 Redoxin cPtC =cxxC PRDX5_HUMAN 1OC3 T44,C47 SCO1-SenC CPdiC =CxxxC SCO1_HUMAN 2GGT C169,C173 SH3BGR [no motif] SH3L1_HUMAN 1U6T T4_deiodinase TCP IOD2_HUMAN [no structure] U133 (Sec) Thioredoxin CGpC =CxxC THIO_BACSU 2GZY C29,C32 |
Back to top of page