Babbitt Lab > Resources > Supplementary data from "The global cysteine peptidase landscape in parasites"

The global cysteine peptidase landscape in parasites

Atkinson HJ, Babbitt PC, Sajid M. The global cysteine peptidase landscape in parasites. Trends in Parasitology 2009; submitted.

The accumulation of sequenced genomes has expanded the already sizeable population of cysteine peptidases from parasites. Characterization of a few of these enzymes has ascribed key roles to peptidases in parasite life cycles and also shed light on mechanisms of pathogenesis. Here, we discuss recent observations on the physiological activities of cysteine peptidases of parasitic organisms, paired with a global view of all cysteine peptidases from the MEROPS database grouped by similarity. This snapshot of the landscape of parasite cysteine peptidases is complex and highly populated, suggesting that expansion of research beyond the few model parasite peptidases is now timely.

Available files and info:

  1. Supplementary figures
  2. Sequence similarity network (SSN) files
  3. Sequences and data sources used to create SSNs (primarily derived from MEROPS database)
  4. Tutorials on sequence similarity networks

Supplementary figures

File
Right-click on link to download
Description
Fig_S1-C1AbySpec.pdf
[PDF format; 729K]
Fig S1. Shows the network from Fig. 2c, colored by ten specific parasite species. This figure shows where certain clusters contain multiple proteins from the same organism.
  • T. brucei
  • T. congolense
  • G. intestinalis
  • L. mexicana
  • T. annulata
  • T. vaginalis
  • E. histolytica
  • P. falciparum
  • F. hepatica
  • S. japonicum
Fig_S2_meta_align.pdf
[PDF format; 853K]
Fig S2. Shows an annotated alignment of the region expected to contain the catalytic residues of metacaspases. Also shown is the network containing the corresponding sequences (Fig 3c). Illustrates the ambiguity of estimating the catalytic cysteine in metacaspases.

Back to top of page


Sequence similarity network files

These files are all in zipped (compressed) XGMML format (*.xgmml.zip).
You may find the "How to view SSN files" section useful.

File
Right-click on link to download
Description
CA.fa_1e-05.fc.xgmml.zip
[Zipped: 2M; unzipped: 61M]
from Fig. 2b: Clan CA

Threshold: 1e-5
Selected Attributes Description
ID MEROPS sequence ID; e.g., MER054779
Clan MEROPS clan: see documentation on MEROPS classifications here; e.g., CA
Family MEROPS family; e.g., C01
Identifier MEROPS Peptidase Identifier; e.g., 001; combine w/ Family to form labels like "C01.001" or "C01.UPW"
Ident_name MEROPS identifier name; e.g., falcipain-1
Type Peptidase MEROPS type peptidase for the family
species species
genus genus
genus_type helminth, protozoan, reference, or none (used in figures); see genuses.txt
kingToSuperking if species is associated with a kingdom in the NCBI taxonomy database, the value is the kingdom. Otherwise, the superkingdom is used--e.g., superkingdom:Eukaryota
sequence the protein sequence of the peptidase domain (from MEROPS)

C01A.all.ids.fa_1e-60.4.cys.xgmml.zip
[Zipped: 2M; unzipped: 61M]
from Fig. 2c: Family C1

Threshold: 1e-60

* see CA network above for more attribute descriptions
Additional Selected Attributes Description
catCys The amino acid at the catalytic Cys position (predicted by aligning the peptidase sequence to a model (HMM) of family C1 and recording the amino acid that aligns to the expected catalytic cysteine position); e.g., C or S
catHis The amino acid at the catalytic His position (predicted by aligning the peptidase sequence to a model (HMM) of family C1 and recording the amino acid that aligns to the expected catalytic histidine position)

CD.fa_1e-05.fc.xgmml.zip
[Zipped: 4M; unzipped: 113M]
from Fig. 3a: Clan CD

Threshold: 1e-5

* see CA network above for selected attribute descriptions

C13s_1e-5.fa_1e-30.cys.xgmml.zip
[Zipped: 358K; unzipped: 7M]
from Fig. 3b: Family C13

Threshold: 1e-30

* see CA network above for selected attribute descriptions

metas.fa_1e-40.2.cys.xgmml.zip
[Zipped: 132K; unzipped: 2M]
from Fig. 3c: Family C14: metacaspases

Threshold: 1e-40

* see CA and C1 networks above for selected attribute descriptions

Back to top of page


Sequences and data sources used to create SSNs

File
Right-click on link to download
Description
genuses.txt
[text file]
List of genuses considered parasitic helminths, protozoa, or references in the review
pepunit2.6.09.lib.fa
[fasta-format sequence file; 41M]
Snapshot of MEROPS "pepunit.lib" file of peptidase sequences from Feb. 6, 2009, used to derive statistics and SSNs in this review.

Source: http://merops.sanger.ac.uk

Rawlings, N.D., Morton, F.R., Kok, C.Y., Kong, J., Barrett, A.J. (2008) MEROPS: the peptidase database. Nucleic Acids Res 36, D320-D325.

parasite_CPs.tsv
[tab-separated text file; 329K]
List of 834 parasite CPs based on filtering the 2/6/09 MEROPS sequence database above according to the parasite- associated genuses in genuses.txt; corresponds to statistics in Fig. 1

Column definitions: see attribute definitions from SSNs above

(includes: Clan, Family, Identifier, Ident_name, species, genus, genus_type, Type Peptidase, sequence)

Peptidase_C1_ls.hmm
[Hidden Markov Model (HMM)]
Model used to predict catalytic residues of family C1 peptidases (as shown in Fig. 2c).

Family: Peptidase_C1 (PF00112); accessed 3/5/09 (Pfam 23.0)

Cys position: C in 'GSCWAF' consensus motif
His position: H in 'LdHa' consensus motif

Source: Pfam database

The Pfam protein families database: R.D. Finn, J. Tate, J. Mistry, P.C. Coggill, J.S. Sammut, H.R. Hotz, G. Ceric, K. Forslund, S.R. Eddy, E.L. Sonnhammer and A. Bateman Nucleic Acids Research (2008) Database Issue 36:D281-D288

metas_60nr.afa.hmm
[Hidden Markov Model (HMM)]
Model used to predict catalytic residues of metacaspases from family C14 peptidases (as shown in Fig. 3c; also, see Fig. S2 above for an annotated sequence alignment of metacaspases).

His position: H in 'GHG' consensus motif
Cys position: C in 'CHY' consensus motif

Based on a sequence alignment of the metacaspases in Fig. 3c filtered to a maximum of 60% identity (includes 95 sequences)

Back to top of page


Sequence similarity network tutorials

Back to top of page

How to view sequence similarity network files using Cytoscape

The network files provided with this review must be downloaded and can then be viewed using a free software application called Cytoscape (http://www.cystoscape.org).
  1. Go to http://www.cystoscape.org; download and install the version of Cytoscape that is appropriate for your operating system; the instructions here correspond to Cytoscape version 2.6
  2. Download the network files of interest using the links below
  3. The network files are provided in zipped (compressed) XGMML format -- you must first unzip them (on a Mac, double-click on the file to unzip); the resulting file should have the ".xgmml" file suffix
  4. Launch Cytoscape
  5. From the File drop-down menu, select
    Import > Network (Multiple file types) ^L
    and navigate to the downloaded, unzipped file (e.g., network.xgmml)
  6. Cytoscape will proceed to load the file. If the network is large, this may take several minutes.

Basic Cytoscape tutorial

Below are some screenshots illustrating some of the ways you can interact with a network using Cytoscape (version 2.6).

Illustrated below:

1. Navigating:

 

2. Getting more information about nodes/sequences and edges:

 

3. Searching for specific nodes:

A. Go to the Select drop down box > Use old filters

B. click "Create new filter"

C. select String filter to search for text

D. configure your search

E. ... and examine the results

Back to top of page