Using pLink to Analyze Cross-Linked Peptides

UNIT 8.21

Sheng-Bo Fan,1,2 Jia-Ming Meng,1,2 Shan Lu,3 Kun Zhang,1,2 Hao Yang,1,2 Hao Chi,1 Rui-Xiang Sun,1 Meng-Qiu Dong,3,4 and Si-Min He1 1

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 2 University of the Chinese Academy of Sciences, Beijing, China 3 National Institute of Biological Sciences, Beijing, China 4 Corresponding author ([email protected])

pLink is a search engine for high-throughput identification of cross-linked peptides from their tandem mass spectra, which is the data-analysis step in chemical cross-linking of proteins coupled with mass spectrometry analysis. pLink has accumulated more than 200 registered users from all over the world since its first release in 2012. After 2 years of continual development, a new version of pLink has been released, which is at least 40 times faster, more versatile, and more user-friendly. Also, the function of the new pLink has been expanded to identifying endogenous protein cross-linking sites such as disulfide bonds and SUMO (Small Ubiquitin-like MOdifier) modification sites. Integrated into the new version are two accessory tools: pLabel, to annotate spectra of cross-linked peptides for visual inspection and publication, and pConfig, to assist users in setting up search parameters. Here, we provide detailed guidance on running a database search for identification of protein C 2015 by John Wiley & Sons, cross-links using the 2014 version of pLink.  Inc. Keywords: cross-linking r mass spectrometry r pLink

How to cite this article: Fan, S.-B., Meng, J.-M., Lu, S., Zhang, K., Yang, H., Chi, H., Sun, R.-X., Dong, M.-Q. and He, S.-M. 2015. Using pLink to Analyze Cross-Linked Peptides. Curr. Protoc. Bioinform. 49:8.21.1-8.21.19. doi: 10.1002/0471250953.bi0821s49

INTRODUCTION pLink is a search engine for mass spectrometry (MS) analysis of cross-linked proteins or protein complexes (Yang et al., 2012a). It is developed at the Institute of Computing Technology, Chinese Academy of Sciences, and National Institute of Biological Sciences, Beijing. pLink originally focused on chemical cross-linking of proteins coupled with mass spectrometry analysis (CXMS) to provide spatial distance constraints between cross-linked residues within a protein or between subunits of a protein complex. pLink works with commonly available, inexpensive, gas-phase-stable chemical crosslinkers, either homo-bifunctional ones such as BS3 [bis(sulfosuccinimidyl)suberate], BS2G [bis(sulfosuccinimidyl)glutarate], and DSS (disuccinimidyl suberate), or heterobifunctional ones such as sulfo-GMBS [N-(γ-maleimidobutyryl-oxy) sulfosuccinimide ester)] and EDC [1-ethyl-3-(3-dimethylaminopropyl)carbodiimide; Yang et al., 2012a]. Samples need not be cross-linked with a 1:1 mix of light and heavy stable isotope-labeled cross-linkers (for example, [d0 /d4 ]-BS3, [d0 /d4 ]-BS2G, or [d0 /d12 ]-DSS), although data generated from such samples are perfectly compatible with pLink. A cross-linker of

Current Protocols in Bioinformatics 8.21.1-8.21.19, March 2015 Published online March 2015 in Wiley Online Library (wileyonlinelibrary.com). doi: 10.1002/0471250953.bi0821s49 C 2015 John Wiley & Sons, Inc. Copyright 

Analyzing Molecular Interactions

8.21.1 Supplement 49

natural isotope composition is sufficient for pLink, which could translate into considerable savings in experimental costs. pLink can process CXMS data of purified proteins as well as those of complex samples such as E. coli lysates. Besides, pLink estimates and controls the false discovery rate (FDR) in identification results, and provides graphic spectral annotation with well-tended details for manual inspection or publication (Yang et al., 2012a). pLink is available at http://pfind.ict.ac.cn/software/pLink/index.html. During the past two years, more than 200 users from six continents have downloaded pLink for their research in academic institutions or pharmaceutical companies. Some of the achievements have been seen in publications (Epshtein et al., 2014; Plocinski et al., 2014; Wong et al., 2014). In the same period, the pLink program has been refined. Powered by an entirely new engine, the new pLink is accelerated at least 40-fold in cross-link search time (unpub. observ.). Additionally, the new pLink is optimized for identification of protein disulfide bonds and covalent attachment sites of ubiquitin-like modifiers (UBLs) such as SUMO (Small Ubiquitin-like Modifier; unpub. observ.). These new functions are a natural growth of pLink, thanks to a principle insisted on by pLink developers, i.e., to make minimal demands on cross-linkers and try to solve problems on the software side. Because pLink requires no isotope labeling in cross-linkers, it can be used to map cross-linking sites in proteins that are either in vitro products of chemical cross-linking reactions or natural products of living organisms, i.e., endogenous, native protein cross-links. With the assistance of pConfig, an accessory tool with an intuitive and interactive interface, users can easily set up search parameters and activate one of the three data-processing flows—pLink, pLink-SS, and pLink-SUMO—to identify chemically cross-linked peptides, disulfide-bonded peptides, and SUMO modification sites, respectively. For the task of spectrum annotation, pLink provides another tool called pLabel. pLabel allows users to view automatically labeled spectra of identified cross-links, to manually annotate spectra in different ways to explore more possibilities, and to save labeled or re-labeled spectra for publication. This unit describes how to analyze cross-link data using the new pLink (released in 2014). Basic Protocol 1 describes how to set up a search; Basic Protocol 2 describes how to browse the search results; Basic Protocol 3 describes how to use pLabel to annotate spectra of identified cross-links; Support Protocol 1 describes customization to include your own linkers, modifications, and enzymes; and Support Protocol 2 shows how to install pLink. NOTE: Words and terms that are found in the pLink program user interface are underlined. NOTE: A number of search terms are described in the Understanding Search Parameters section at the end of the Commentary. BASIC PROTOCOL 1

SETTING UP A pLink SEARCH pLink requires two inputs: (1) a set of tandem mass spectra (high-resolution, high-mass accuracy data are strongly recommended, typically R ࣙ 60,000 for MS and R ࣙ 7500 for MS/MS at m/z 400) in RAW, MGF, or MS2 file format; (2) a custom database of protein sequences in FASTA format.

Necessary Resources Hardware Using pLink to Analyze Cross-Linked Peptides

Microsoft Windows PC or workstation, preferably with 64-bit architecture and a minimum of 4 GB RAM.

8.21.2 Supplement 49

Current Protocols in Bioinformatics

Figure 8.21.1 files.

Main interface: MS Data tab. Use this interface to add mass spectrometry data

Software pLink (Support Protocol 2) Operating system: Microsoft Windows Vista or higher Microsoft .Net Framework 3.5 or higher Thermo Scientific Xcalibur 2.0 or higher (or the free version, Thermo MSFileReader) should be installed if RAW files are searched 1. Launch pLink. Double-click pLink.exe, which is under the installation path. If pLink is launched for the first time, a registration window will pop up. See Support Protocol 2 for details and also for explanation of the search parameters. After registration, the pLink graphical user interface (GUI) will appear (Fig. 8.21.1). The workspace, a folder to hold the search results, also needs to be defined upon first use. 2. Optional: Change the workspace path. The user can change workspace by clicking the File menu item Set Workspace. Search results and parameters will be stored in the workspace.

3. Set the spectra. Click tab MS Data and add MS/MS data to Data File List (Fig. 8.21.1). Multiple files can be selected and added at a time. With Xcalibur (or MSFileReader) installed, RAW files can be searched, and pParse (Yuan et al., 2012), which has been included in pLink, will launch automatically to recalibrate the precursor monoisotopic mass.

4. Set the database. On tab Identification under Search, specify the database to be used in the search (Fig. 8.21.2). If a database is used for the first time and cannot be found in the list, choose Tool in the menu bar and select Configuration to launch pConfig (Fig. 8.21.3). In pConfig, select tab Databases and click Add to add the database FASTA file. The decoy database is not needed because pLink will generate the reverse database automatically. There is also a

Analyzing Molecular Interactions

8.21.3 Current Protocols in Bioinformatics

Supplement 49

Figure 8.21.2 Main interface: Identification tab. Use this interface to configure the parameters for a pLink search.

Figure 8.21.3 pConfig. pConfig is integrated in pLink to assist users to set up databases, linkers, modifications, and enzymes.

Using pLink to Analyze Cross-Linked Peptides

built-in contaminant database, which can be automatically merged into the database if selected. Lastly, choose Save in pConfig and return to pLink. Now in the pLink page, on tab Identification under Search, select the database just added. pConfig helps users to configure databases, linkers, modifications, enzymes, and amino acids. More details about pConfig can be found in Support Protocol 1.

8.21.4 Supplement 49

Current Protocols in Bioinformatics

5. Optional: Set the Missed Cleavage Number and the Enzymes. By default, Missed Cleavage Number is set to 3, and Trypsin is used. On tab Identification in pLink, under Search, Missed Cleavage Number can be set. To set a site-specific protease that is not trypsin, click Set Enzymes, and use the left arrow to add enzymes and the right arrow to remove enzymes. Multiple enzymes are supported. To add new enzymes, see Support Protocol 1.

6. Set the Precursor Tolerance and Fragment Tolerance. On the tab Identification in pLink, under Search, precursor tolerance and fragment tolerance can be set in either ppm or Daltons. 7. Set the Linker. On the tab Identification under Search, Linker can be specified. For cross-linkers BS3 and BS2G, there is a checkbox Heavy. This should be checked if data come from samples cross-linked with [d0 /d4 ]-BS3 or [d0 /d4 ]-BS2G. For other isotopically coded light and heavy cross-linker pairs, matching linker parameters can be set using pConfig. For SUMO, there are three text boxes to input the necessary information. The first one is used to specify the sequence of the remnant peptide of SUMO (or another UBL) after protease digestion, the second one is used to specify which amino acid in the remnant SUMO peptide is directly connected to a modified protein (glycine by default), and the last one is used to specify what amino acid in a modified protein can serve as an acceptor site (lysine by default). For identification of disulfide bonds, linker SS, which sets the linker mass to –2.01456 Da, leads to a typical pLink search and outputs peptides or peptide pairs each containing one disulfide bond; linker SS_0 can output complex disulfide forms, i.e., peptides or peptide pairs each containing two or more disulfide bonds, but they should be manually examined. To add new linkers, see Support Protocol 1 for the usage of pConfig.

8. Optional: Set modifications. For both fixed and variable modifications, click Add Modifications on tab Identification. Use the left arrow to add modifications and the right arrow to remove modifications. To add new modifications, see Support Protocol 1. 9. Optional: Set filtering criteria. Click Filter on the tab Identification. There are two filtering modes to choose in the combo box: Global and Separate. In the Global mode, common (linear peptides not modified by linkers), mono-link, and looplink results are filtered together; all inter-link matches of either an intra-protein or an inter-protein origin are filtered together. In the Separate mode, common, mono-link, looplink, intra-molecular inter-link and inter-molecular inter-link search results are filtered separately. The FDR threshold can be set to any value from 1 to 100 using the text box FDR. The default value is 1, for 1%. FDR can be calculated at the spectrum level or peptide level. Score Threshold is used to remove results whose score values are lower than the set threshold. Precursor Filter Tolerance is the allowed maximal mass difference between the precursor mass and the mass of a candidate peptide.

10. Optional: Set the Core Number. pLink can use multiple cores to accelerate the search. If n is the maximal core number of the computer in use, the default value for Core Number is 4 if n is 5 or above, or (n – 1) if n is 4 or below. To speed up the search, more cores can be used with the necessary increase in memory. 11. Optional: Set the Task Name. Task Name is on tab Identification, under Flow. By default the task name is the date and time when this search task is created. It can be modified.

Analyzing Molecular Interactions

8.21.5 Current Protocols in Bioinformatics

Supplement 49

Figure 8.21.4 parameters.

Main interface: Summary tab. This tab allows users to view all the specified

12. Save and start the search. Click the tab Summary; this panel shows the parameters that have been set (Fig. 8.21.4). Incorrect parameters are highlighted in red, and they need to be corrected before the search starts. When the parameters are modified, the title bar of pLink will show a an asterisk (*). Click Save to save the parameters to a file under the workspace path for later use. The radio buttons above the Save button let users decide at which step, Search or Filter, the pLink program starts. Search should be chosen if the data have not been analyzed before or are to be re-analyzed under different search options. Search includes the filtering step. Filter is for re-filtering existing search results using different filtering criteria. All the saved parameters can be loaded later. Choose File in the menu, select Load Task, choose the folder with the task name in the workspace folder, and click OK. Then, all the saved parameters are loaded into the program. SUPPORT PROTOCOL 1

CUSTOMIZE YOUR OWN LINKERS, MODIFICATIONS, AND ENZYMES Many linkers, modifications, and enzymes have been added to the pLink program. Considering various laboratory requirements, adding, deleting, or modifying linkers, modifications, and enzymes are also supported.

Necessary Resources Hardware Microsoft Windows PC or workstation, preferably with 64-bit architecture and a minimum of 4 GB RAM Software

Using pLink to Analyze Cross-Linked Peptides

pLink (Support Protocol 2) Operating system: Microsoft Windows Vista or higher Microsoft .Net Framework 3.5 or higher Thermo Scientific Xcalibur 2.0 or higher (or the free version, Thermo MSFileReader) should be installed if RAW files are searched

8.21.6 Supplement 49

Current Protocols in Bioinformatics

1. Launch pConfig (Fig. 8.21.3). Choose Tool in the menu and select Configuration. 2. Modify, add or delete linkers. In pConfig, click tab Linkers. There are three types of operations: a. Modify an existing linker. Double click on the linker in the table; the Linker Information dialog box will appear. Modify and click Update. b. Add a new linker. Click the button Add; the Linker Information dialog box will appear. Fill in the necessary information and click Update. c. Delete a linker. Choose the linker to be deleted, and click the button Delete. Then, click the button Save to save all the modifications.

The Linker Information dialog box consists of five text boxes and one button: d. In the first text box, enter the name of the linker. e. In the second text box, enter the alpha site of the linker, which is the amino acid to which one end of the linker is covalently attached. Choices are the 20 amino acids, the N-terminus of a peptide, "(", the C-terminus of a peptide, ")", the N-terminus of a protein, "[", or the C-terminus of a protein, "]". Combination of different amino acids is also supported. For example, when using BS3, the second text box should be "[K", which means that it can react with lysine and the N-terminus of a protein. f. In the third text box, enter the beta site of the linker, which is the amino acid to which the other end of the linker is covalently attached. The symbols are the same as those in the second text box. g. In the fourth text box, enter the mass change (in Daltons, usually mass addition for chemical cross-linkers, occasionally mass deduction such as in the case of EDC) brought about by an inter-link or loop link. h. In the fifth text box, enter the mass change (in Daltons) brought about by a monolink. Enter all the required information and click the Update button.

3. Modify, add, or delete modifications. In pConfig, click tab Modifications. The operations are similar to those for Linkers. The Modification Information dialog box consists of five text boxes, one button and one combo box.

a. In the first text box, enter the name of the modification. b. In the second text box, enter the composition of this modification. This information is currently not in use. The format is Element1(Element1 number)Element2(Element2 number), for example, O(1) for Oxidation[M]. c. In the third text box, enter the mass difference (in Daltons) caused by the modification. d. In the fourth text box, enter the amino acid to be modified. Choose from 20 amino acids. Combinations of different amino acids are also supported. e. In the fifth text box, enter the neutral-loss mass of this modification. This field can be left empty if no neutral loss is expected. f. Use the combo box named Position to specify at what position this modification occurs. Anywhere means no limitation, PepN-term and PepC-term mean the N-terminus and the C-terminus of a peptide, respectively, and ProN-term and ProC-term mean the N-terminus and the C-terminus of a protein, respectively. Enter all the required information and click Update.

Analyzing Molecular Interactions

8.21.7 Current Protocols in Bioinformatics

Supplement 49

4. Modify, add or delete enzymes. In pConfig, click the tab Enzymes. The operations are similar to those for Linkers. The Enzyme Information dialog box consists of three text boxes and one combo box.

a. The first text box is the name of the enzyme. b. The second text box is the cleavage site. Choose one or more from 20 amino acids. c. The third text box used to define the site(s) to ignore. When the cleavage site is followed by a to-be-ignored site, it is no longer cleavable. This field can be left empty. d. The combo box specifies the type of the enzyme. N-term means that this enzyme will cut on the N-terminal side of the cleavage-site amino acid, and C-term means that this enzyme will cut on the C-terminal side of the cleavage-site amino acid. Enter all the required information, and click Update. BASIC PROTOCOL 2

BROWSING pLink RESULTS When a pLink search is done, an HTML Web page will open automatically to show the search results (Fig. 8.21.5). Results are also provided in CSV format (Fig. 8.21.6) for the convenience of browsing, searching, sorting, and importing into other programs. In addition, a program called pLabel is provided for spectral labeling (Fig. 8.21.7), and the details are described in Basic Protocol 3.

Necessary Resources Hardware Microsoft Windows PC or workstation Software Microsoft Windows Vista or higher Any Web browser Microsoft Excel or any text editor 1. Open the report Web page. The report Web page will open automatically after a successful pLink search.

Using pLink to Analyze Cross-Linked Peptides

Figure 8.21.5

An HTML result report.

8.21.8 Supplement 49

Current Protocols in Bioinformatics

Figure 8.21.6

A CSV result report.

Figure 8.21.7 An example of the use of pLabel for spectral labeling and manual annotation. The upper panel below the menu displays the annotated spectrum and a summary of matched b/y ions along the identified cross-linked peptides. The lower panel displays various types of information associated with this annotated spectrum.

If the report Web page is accidentally closed, it can be recovered from the stored file in the task folder. Choose File in the menu, and select Open Task Location. In the folder that opens, the report Web page can be found under the name general.html. Double click it to open.

2. Inspect the report Web page. It consists of five parts:

Analyzing Molecular Interactions

8.21.9 Current Protocols in Bioinformatics

Supplement 49

a. Cross-Linked Results. Table 1 on the report Web page shows the cross-link results after filtering (FDR, Score Threshold, and Precursor Filter Tolerance), including the number of identified spectra, number of peptide pairs, and number of unique cross-linked residue pairs (linked sites). Click on individual numbers in the table for more details. b. All Results. The results are shown in four categories: cross-linked, loop-linked, mono-linked, and un-linked linear peptides. Table 2 and Table 3 display the number of identified spectra and the number of identified peptides or peptide pairs for each category. Table 4 shows how many pairs of unique cross-linked sites are identified in the inter-link or loop-link form. Click on individual numbers in the tables for more information. c. FDR Curve. This figure shows the relationship between the FDR values and the spectral counts for inter-protein cross-links, intra-protein cross-links, loop-links, mono-links, and peptides not modified by cross-linkers (common peptides). If a category has too few identification results, it may not have a visible curve in the figure. d. Precursor Error Distribution. The x axis of the figure shows the precursor mass error (Daltons) and the y axis shows the value of –lg(Score). Every dot in the figure represents an identified spectrum. Blue dots are the target-target results, green pots are the target-decoy results, and red dots are the decoy-decoy results. e. Running Time. Table 5 reports the time cost of the identification task under inspection. 3. Optional: Inspect the results in comma-separated value (CSV) format. a. Open the CSV-format results. Choose File from the menu bar and select Open Task Location. In the search task folder that opens up, there is a subfolder named reports and it holds all the CSV files. b. Examine the results in different cross-link categories: common (un-linked linear peptides), mono-link, loop-link, and cross-link. For each type, separate CSV files are generated at the level of spectra, peptides and cross-linked sites. c. Read a specified CSV file. The CSV file can be opened using any text editor. For the CSV files of spectra, peptides, and linked sites, the columns are different. The title of the spectrum, charge, precursor mass, and peptide mass are listed, along with other basic information such as score. The smaller the score, the better the peptide-spectrum match (PSM). Further details can be found in Guidelines for Understanding Results. BASIC PROTOCOL 3

USING pLabel TO ANNOTATE PEPTIDE-SPECTRUM MATCHES pLabel is a mass-spectrum peak-labeling software tool developed for proteomics research (http://pfind.ict.ac.cn/software/pLabel/index.html). It is currently included in the pLink program kit, and can be used as the annotation tool for cross-linked peptides (Fig. 8.21.7).

Necessary Resources Hardware Microsoft Windows PC or workstation Software pLabel (part of pLink program kit; see Support Protocol 2 for installation) Using pLink to Analyze Cross-Linked Peptides

1. Open pLabel. Open the pLink installation path. pLabel is in the folder bin. Double click pLabel.exe to open the program.

8.21.10 Supplement 49

Current Protocols in Bioinformatics

2. Open the *.pLabel file to load the identified results. Choose File from the menu bar, then click Load pLabel File. The *.pLabel files for un-linked linear, monolinked, loop-linked, and cross-linked peptides are located in the task folder. Load any *.plabel file of interest. 3. Check the results. When a pLabel file is loaded, most of the parameters have been selected automatically. Use the mouse wheel or the right and left arrows to switch between different spectra. Moreover, manual adjustment is supported. Below are some guidelines to setting the parameters. The parameters are at the bottom of the panel, in the parameters setting area.

a. SEQ: Type in a peptide sequence; only characters A to Z and a- to z are allowed. A cross-linked peptide pair should be presented as two peptide sequences linked by a hyphen (-). For example, VSEMLSTLDGAAYIER is an un-linked linear peptide sequence and VSEMLSTLDK-GAAYIER denotes a pair of cross-linked peptides. b. Precursor Info: Precursor mass (MH+), precursor charge and precursor mass deviation are listed here. They cannot be edited. c. CID/ETD: Two fragmentation methods: CID stands for collision-induced dissociation and ETD stands for electron-transfer dissociation. For HCD (higher-energy c-trap dissociation), choose CID. d. Display: Information on matched peaks. e. Normal/Cross-link: Click to switch between different cross-link types. For unlinked linear peptides, choose Normal; for others, choose XLink. When XLink is chosen, three drop-down boxes become available on the right. The first one is used to switch between mono-, loop-, and cross-links; the second one is used to select the linker; the third one is used to change cross-linking sites on a peptide if there are more than one potential site on it. The linker can be edited by clicking the button XL-Reagent. f. TOL: Only peaks satisfying the set mass tolerance are taken into account. g. Threshold: Only peaks above the set threshold (in percentage of the base peak) are taken into account h. Mass Measurement: Mono/average. i. Match Type: In the Highest mode, the most intense peak within the mass tolerance window of a theoretical fragment ion is selected as the matched peak. In the Nearest mode, the peak with the closest mass is selected. 4. Optional: Save labeled spectra. With this function, users can save a picture file (.bmp, .jpg, .png, .tiff) of a labeled spectrum or one picture per spectrum for all the spectra listed in the pLabel file. Click File, select Save Spectrum As/Save All Spectra As, and follow the steps. For more details about the use of pLabel, please read the user’s guide for pLabel online (http://pfind.ict.ac.cn/software/pLabel/Manual.pdf).

INSTALLING AND UNINSTALLING pLink pLink is currently free. It can be downloaded at http://pfind.ict.ac.cn/software/pLink/ index.html.

SUPPORT PROTOCOL 2

Necessary Resources Hardware Microsoft Windows PC or workstation, preferably with 64-bit architecture and a minimum of 4 GB RAM

Analyzing Molecular Interactions

8.21.11 Current Protocols in Bioinformatics

Supplement 49

Software Microsoft Windows Vista or higher; Thermo Scientific Xcalibur 2.0 or higher (or Thermo MSFileReader) if RAW search is needed; Microsoft .Net Framework 3.5 or higher 1. Download and run the setup kit from the Web site. Follow the guide to finish the setup. 2. Click pLink.exe. If the program has not been activated, the License Dialog will pop up. Fill in the User Name, Institute/Company Name, Country/Region and Email Address. Click the button Copy to Clipboard, and paste the content into an e-mail. Send it to [email protected] to get the activation file. 3. Place the activation file in the installation folder. Reopen pLink.exe to check if the program has been activated successfully. 4. To uninstall pLink, choose Uninstall in the Windows Control Panel, or rerun the setup kit and select Uninstall.

GUIDELINES FOR UNDERSTANDING RESULTS pLink presents search results in three lists: list of spectra, list of peptides, and list of cross-linked sites. Listed below are most of the data fields, with explanations for the less obvious ones.

List of Spectra Spectra list is given with 13 columns. The list is sorted by Score values in ascending order. Order The identifier of the spectrum. Title The title of the spectrum. This is read from the data file to distinguish the spectrum. Charge The precursor charge of the spectrum. Precursor_MH The precursor mass (MH) of the spectrum, which is obtained from the data file. Peptide The identified peptide sequence. For mono-linked results, the modified site is also given inside a pair of brackets; for loop-linked results, the two linked sites are both given; for cross-linked results, the two peptide sequences are separated with a dash; linked sites are given after the sequences. Peptide_Type There are different types: common (un-linked linear) peptides, mono-linked peptides, loop-linked peptides, and cross-linked peptide pairs.

Using pLink to Analyze Cross-Linked Peptides

Peptide_MH The theoretical mass (MH) of an identified peptide sequence. This is calculated by pLink with the mass of the linker (different values for a cross-link and a mono-link) taken into consideration.

8.21.12 Supplement 49

Current Protocols in Bioinformatics

Modifications This shows the modification(s) on an identified peptide. Score The PSM score given by pLink. The smaller the score, the better the PSM. Results are presented as follows. All the PSM results are sorted according to this score in ascending order. The false discovery rate is estimated as described previously (Yang et al., 2012a). Only the PSMs above the user-defined threshold will be reported. The Score value can be used in further filtering. Precursor_Mass_Error(Da) This is the mass difference in Daltons between the Precursor_MH and the Peptide_MH. The reliability of results can be further verified using this value. Precursor_Mass_Error(ppm) This is the mass difference in ppm between the Precursor_MH and the Peptide_MH. For high-resolution instruments, such as Q-Exactive, less than 10 ppm is normal when resolution is set higher than 70,000. Proteins The parent protein(s) of the identified peptides and the sites of cross-linking as in the protein sequence(s) are also given. When there are several candidate proteins, all the possible combinations are shown, separated by semicolons. Protein_Type For inter-link results only. Intra-protein means that the two peptides come from the same protein, and inter-protein means that they come from different proteins. List of Peptides This list is organized by peptides. The report of each identified peptide or peptide pair starts with one row of peptide information, followed by all the spectra that support this identification. This list has 13 columns, the same as those found in the spectra list. List of Cross-Linked sites There are cases where different cross-linked peptides of overlapping sequences correspond to the same cross-link on the same protein(s). This can be due to a number of reasons, including cleavage sites missed by a protease and incomplete modification of a peptide. Therefore, it is necessary to organize the results by pairs of cross-linked sites on proteins. The report of each pair of cross-linked sites begins with one row of information about this pair, followed by all the spectra that support this identification. This list contains 15 columns, most of which can be found in the spectra list except the following two: Unique_Peptide_Number The number of peptides or peptide pairs that support this pair of linked sites. Spectrum_Number The number of the spectra that support this pair. All the spectra that support this pair are listed. The more spectra identified, the more reliable the identification of this pair. This number can also be used as a threshold for further filtering.

Analyzing Molecular Interactions

8.21.13 Current Protocols in Bioinformatics

Supplement 49

COMMENTARY Background Information

Using pLink to Analyze Cross-Linked Peptides

Chemical cross-linking of proteins coupled with mass spectrometry analysis can provide spatial structural information at the residue level. Although such information is limited to amino acids (usually lysine residues) that are accessible to cross-linkers used in a CXMS analysis, this approach is not limited by the size of a protein or protein complex, nor does it require a large amount of purified proteins as in crystallography and NMR. As such, structural information can be obtained from micrograms of samples in a couple of days using CXMS. Therefore, it is a very attractive and powerful approach to understanding protein folding and protein complex assembly. Research on the CXMS method can be traced back to 1970s (Davies and Stark, 1970), but did not gather momentum until after 2005. The past 10 years have witnessed the most exciting development of the CXMS technology (Rappsilber, 2011; Stengel et al., 2012; Marion, 2013). It has completed the transformation from being a "technology never heard of" to being a "technology to watch," and further to being a "technology to use." Behind this rapid development are two driving forces: (1) The invention of novel cross-linkers (Tang et al., 2005a; Chowdhury et al., 2006; Soderblom and Goshe, 2006; Kasper et al., 2007; Soderblom et al., 2007; Gardner et al., 2008; Krauth et al., 2009; Petrotchenko et al., 2009; Zhang et al., 2009; Gardner and Brodbelt, 2010; Muller et al., 2010; Trnka and Burlingame, 2010; Vellucci et al., 2010; Yang et al., 2010; Kao et al., 2011; Petrotchenko et al., 2011; Yan et al., 2011; Clifford-Nunn et al., 2012; Liu et al., 2012; Luo et al., 2012; Sohn et al., 2012). (2) The development of software tools (Bennett et al., 2000; Young et al., 2000; Schilling et al., 2003; Tang et al., 2005a; Tang et al., 2005b; de Koning et al., 2006; Gao et al., 2006; Seebacher et al., 2006; Wefing et al., 2006; Anderson et al., 2007; Lee et al., 2007; Maiolica et al., 2007; Gao et al., 2008; Nadeau et al., 2008; Rinner et al., 2008; Singh et al., 2008; Xu et al., 2008; Lee, 2009; Choi et al., 2010; Chu et al., 2010; Hoopmann et al., 2010; McIlwain et al., 2010; Panchaud et al., 2010; Petrotchenko and Borchers, 2010; Shen et al., 2010; Xu et al., 2010; Du et al., 2011; Gotze et al., 2012; Soderberg et al., 2012; Walzthoeni et al., 2012; Yang et al., 2012a). Data analysis has been a bottleneck for CXMS until recently (Tabb, 2012). Compared

with regular linear peptides, cross-linked peptides are much harder to identify for the following reasons. First, search space explosion: taking a human cell lysate sample as an example, the number of possible tryptic peptides in the sample is 3.33 × 106 , and the number of possible cross-linked peptide pairs is 4.44 × 1012 , causing an exponential increase of random matches and search time. Second, complex fragment ions: for crosslinked peptides, each experimental spectrum contains fragment ions from two peptides, and theoretical spectra of candidate pairs also tend to have more fragment ions, which makes it easier to have a random match and harder to make a correct identification. Third, multiple types of cross-linking products: un-linked, mono-linked, loop-linked, and cross-linked peptide candidates all need to be considered for each spectrum because they are all possible in CXMS data. Lastly, there is the issue of how to control the quality of CXMS identification. This was a big problem until 2012 (Walzthoeni et al., 2012; Yang et al., 2012a), when the FDR control methods were developed for CXMS. There exist over a dozen software tools for CXMS. Many of the earlier ones operate by converting database search of cross-linked peptides into database search of linear peptides (Maiolica et al., 2007; Panchaud et al., 2010). Some require the use of special chemical cross-linkers that dissociate readily in the gas phase at the MS2 stage to release cross-linked peptides, which are then sequenced separately in MS3 (Tang et al., 2005a; Anderson et al., 2007; Hoopmann et al., 2010; Petrotchenko and Borchers, 2010; Yang et al., 2012b; Weisbrod et al., 2013). xQuest/xProphet relies on a pair of light- and heavy-isotope-labeled cross-linkers to differentiate cross-linked peptides from non-cross-linked ones, and for cross-linked peptides, fragment ions containing the cross-linker from those lacking it (Rinner et al., 2008; Walzthoeni et al., 2012). For pLink, the single most important feature is that it works with a variety of commonly available, inexpensive chemical cross-linkers without isotope labeling, such as BS3, DSS, sulfo-GMBS, and EDC (Yang et al., 2012a). Now, pLink has gone beyond CXMS and into the frontier of endogenous protein cross-links. The new pLink can be used to identify protein disulfide bonds and SUMOylation sites in complex samples with an appropriate FDR control.

8.21.14 Supplement 49

Current Protocols in Bioinformatics

Critical Parameters Obviously, the path to the correct MS/MS data and the path to a correct protein sequence database are crucial for a successful pLink search. It is surprising how often a search goes on forever because of an erroneous file path, so verify the file paths very carefully before the search starts. The other parameters determine the quality of the identification results. The settings for instrument, enzymes, precursor and fragment tolerances, modifications, linker, and filter methods all influence the final results. Here, some important parameters are discussed. The setting on the instrument determines the fragment ions considered in the search engine. For HCD, b-, y-, a-ions, internal ions and fragment ions containing a cleaved linker are considered. For ETD, c- and zions are considered. Details can be found in instrument.ini in the installation folder. Specify the correct fragmentation method to obtain correct results. The mass tolerance values for precursors and fragment ions should be appropriate for the data being analyzed. These parameters determine the maximal allowed difference between the theoretical mass and the observed mass. For high-resolution data, choose 10 to 50 ppm depending on the resolution setting. Users may first search with a large precursor mass tolerance, analyze Precursor Error Distribution in the analysis report, and then set a more stringent precursor tolerance. Set enzyme-related parameters according to the sample processing conditions. The maximal missed cleavage number varies depending on the digestion efficiency. One should be mindful that a search with no enzyme specificity against a large database takes a long time, often prohibitively long, and therefore it is not recommended. Set variable or differential modifications wisely based on the experimental conditions. Avoid irrelevant modifications, since they can expand the search space so much that correct results may fail to stand out. In a regular search, it is sensible to add only high-frequency modifications (for example, oxidation of methionine or deamidation of asparagine). If some modifications must be considered, make sure that the total number of variable modifications is few. Too many modifications not only make the search time unbearably long, but also reduce the sensitivity of identification.

The filtering criteria also influence the final results. The false discovery rate is controlled using the extended target-decoy algorithm for cross-linked peptides (Walzthoeni et al., 2012; Yang et al., 2012a), and is set to 1% by default. If more results are wanted for manual evaluation, the FDR can be increased, for example, to 5%. More results will be returned, but with lower confidence. Score Threshold and Precursor Filter Tolerance are also adjustable. Another critical parameter is Separate/Global. Search results of different types may be filtered separately in the Separate mode, or together in the Global mode. Compared to the Global mode, the Separate mode usually returns more intra-proteins cross-links, and fewer inter-proteins cross-links.

Troubleshooting Incorrect parameters may cause unknown errors. Choose appropriate parameters; when in doubt, leave at default. Using RAW-format data without Thermo Scientific Xcalibur (or MSFileReader) installed or with only an older version installed may cause an unknown error. When MGF-format data are used, the m/z values are assumed to represent centroid peaks, with the standard "BEGIN IONS" "END IONS." Charge information is not needed since it will be re-determined in the program. Operating system–related limitations may also affect the search flow. For a 32-bit system, a single program may use roughly 2 GB memory at most. When this limitation is reached, the program will crash. When this happens, reducing the number of cores used in the search will help. pLink will generate some temporary files in the task folder during a search. After the search, the report files are generated. If there is not enough disk space or the program is not given enough authority to write to disk, these operations will fail. Make sure that there is enough disk space available and the program is given sufficient authority. A hidden problem is the upper limit of the length of a path. The Windows system has a path length limit of 256 characters. When the path to a task location is already long (for example, due to many layers of folders or a long folder name), pLink will not be able to generate search files under it. In such a case, shorten the path length. Alternatively, if data files have very long names, make their names shorter.

Analyzing Molecular Interactions

8.21.15 Current Protocols in Bioinformatics

Supplement 49

Using pLink to Analyze Cross-Linked Peptides

Understanding Search Parameters

Acknowledgements

Data File List: A list of mass spectrometry data files (RAW, MGF or MS2 file format) to be searched (Fig. 8.21.1). Use Add, Delete, and Clear to configure. By default it is empty. Core Number: The number of CPUs (central processing units) to be used (Fig. 8.21.2). pLink can use multiple cores to accelerate the search on a multi-core computer. By default, it is 4 if the computer has more than 4 cores. Database: Containing protein sequences in FASTA format (Fig. 8.21.2). Protein sequence files can be downloaded from Uniprot (http://www.uniprot.org/) or other sites. This parameter can be configured using pConfig. Missed Cleavage Number: Due to varying degrees of completeness of digestion, not all protease cleavage sites in a protein are cleaved, so some resulting peptides contain one or more protease cleavage sites. Missed Cleavage Number is the maximal number of missed cleavage sites considered on each peptide (Fig. 8.21.2). Set this parameter based on past experience. If this number is too large, the number of identifications may decrease. This parameter is 3 by default. Enzymes: The protease(s) used to prepare samples (Fig. 8.21.2). Most of the common enzymes can be found in pLink, and can be customized using pConfig. See Support Protocol 1 for more details. Precursor Tolerance: The error window considered for the experimental precursor mass (Fig. 8.21.2). By default, it is set to 20 ppm. Fragment Tolerance: The error window considered for the fragment ion mass (Fig. 8.21.2). By default, it is set to 20 ppm. Linker: The chemical cross-linker used (Fig. 8.21.2). This can also be customized using pConfig, See Support Protocol 1 for more details. By default, linker is set as BS3. Modifications: The post-translational modification(s) considered on peptides (Fig. 8.21.2). Select known or potential modifications here. There are no modifications by default. FDR threshold: Some mass spectra can be matched to peptides in the database by random chance, leading to false identifications in the results. We use an extended target-decoy algorithm (Walzthoeni et al., 2012; Yang et al., 2012a) to control the false discovery rate. This parameter is the upper limit of FDR (Fig. 8.21.2). The lower the FDR threshold, the fewer the number of false identifications, usually accompanied by a fewer number of true identifications. This parameter is 1% by default.

This work was funded by the National Natural Science Foundation of China (21475141), the National Scientific Instrumentation Grant Program (2011YQ09000506 to M.-Q.D.), the Ministry of Science and Technology of China (973 grants 2013CB911203, 2012CB910602 to R.-X. S. and 2010CB912701 to S.-M. H.), the CAS Knowledge Innovation Program (ICT-20126033 to S.-M. H.), and the municipal government of Beijing.

Literature Cited Anderson, G.A., Tolic, N., Tang, X., Zheng, C., and Bruce, J.E. 2007. Informatics strategies for large-scale novel cross-linking analysis. J. Proteome. Res. 6:3412-3421. Bennett, K.L., Kussmann, M., Bjork, P., Godzwon, M., Mikkelsen, M., Sorensen, P., and Roepstorff, P. 2000. Chemical cross-linking with thiol-cleavable reagents combined with differential mass spectrometric peptide mapping: A novel approach to assess intermolecular protein contacts. Protein Sci. 9:1503-1518. Choi, S., Jeong, J., Na, S., Lee, H.S., Kim, H.Y., Lee, K.J., and Paek, E. 2010. New algorithm for the identification of intact disulfide linkages based on fragmentation characteristics in tandem mass spectra. J. Proteome. Res. 9:626-635. Chowdhury, S.M., Munske, G.R., Tang, X., and Bruce, J.E. 2006. Collisionally activated dissociation and electron capture dissociation of several mass spectrometry-identifiable chemical crosslinkers. Anal. Chem. 78:8183-8193. Chu, F., Baker, P.R., Burlingame, A.L., and Chalkley, R.J. 2010. Finding chimeras: A bioinformatics strategy for identification of cross-linked peptides. Mol. Cell Proteomics 9:25-31. Clifford-Nunn, B., Showalter, H.D., and Andrews, P.C. 2012. Quaternary diamines as mass spectrometry cleavable crosslinkers for protein interactions. J. Am. Soc. Mass. Spectrom. 23:201212. Davies, G.E. and Stark, G.R. 1970. Use of dimethyl suberimidate, a cross-linking reagent, in studying the subunit structure of oligomeric proteins. Proc. Natl. Acad. Sci. U.S.A. 66:651-656. de Koning, L.J., Kasper, P.T., Back, J.W., Nessen, M.A., Vanrobaeys, F., Van Beeumen, J., Gherardi, E., de Koster, C.G., and de Jong, L. 2006. Computer-assisted mass spectrometric analysis of naturally occurring and artificially introduced cross-links in proteins and protein complexes. FEBS. J. 273:281-291. Du, X., Chowdhury, S.M., Manes, N.P., Wu, S., Mayer, M.U., Adkins, J.N., Anderson, G.A., and Smith, R.D. 2011. Xlink-identifier: An automated data analysis platform for confident identifications of chemically cross-linked peptides using tandem mass spectrometry. J. Proteome. Res. 10:923-931. Epshtein, V., Kamarthapu, V., McGary, K., Svetlov, V., Ueberheide, B., Proshkin, S., Mironov,

8.21.16 Supplement 49

Current Protocols in Bioinformatics

A., and Nudler, E. 2014. UvrD facilitates DNA repair by pulling RNA polymerase backwards. Nature 505:372-377. Gao, Q., Xue, S., Doneanu, C.E., Shaffer, S.A., Goodlett, D.R., and Nelson, S.D. 2006. Procrosslink. Software tool for protein cross-linking and mass spectrometry. Anal. Chem. 78:21452149. Gao, Q., Xue, S., Shaffer, S.A., Doneanu, C.E., Goodlett, D.R., and Nelson, S.D. 2008. Minimize the detection of false positives by the software program DetectShift for 18O-labeled cross-linked peptide analysis. Eur. J. Mass Spectrom. (Chichester, Eng.) 14:275-280. Gardner, M.W. and Brodbelt, J.S. 2010. Preferential cleavage of N-N hydrazone bonds for sequencing bis-arylhydrazone conjugated peptides by electron transfer dissociation. Anal. Chem. 82:5751-5759.

Liu, F., Wu, C., Sweedler, J.V., and Goshe, M.B. 2012. An enhanced protein crosslink identification strategy using CID-cleavable chemical crosslinkers and LC/MS(n) analysis. Proteomics 12:401-405. Luo, J., Fishburn, J., Hahn, S., and Ranish, J. 2012. An integrated chemical cross-linking and mass spectrometry approach to study protein complex architecture and function. Mol. Cell Proteomics 11:M111 008318. Maiolica, A., Cittaro, D., Borsotti, D., Sennels, L., Ciferri, C., Tarricone, C., Musacchio, A., and Rappsilber, J. 2007. Structural analysis of multiprotein complexes by cross-linking, mass spectrometry, and database searching. Mol. Cell Proteomics 6:2200-2211. Marion, D. 2013. An introduction to biological NMR spectroscopy. Mol. Cell Proteomics 12:3006-3025.

Gardner, M.W., Vasicek, L.A., Shabbir, S., Anslyn, E.V., and Brodbelt, J.S. 2008. Chromogenic cross-linker for the characterization of protein structure by infrared multiphoton dissociation mass spectrometry. Anal. Chem. 80:4807-4819.

McIlwain, S., Draghicescu, P., Singh, P., Goodlett, D.R., and Noble, W.S. 2010. Detecting crosslinked peptides by searching against a database of cross-linked peptide pairs. J. Proteome. Res. 9:2488-2495.

Gotze, M., Pettelkau, J., Schaks, S., Bosse, K., Ihling, C.H., Krauth, F., Fritzsche, R., Kuhn, U., and Sinz, A. 2012. StavroX: A software for analyzing crosslinked products in protein interaction studies. J. Am. Soc. Mass. Spectrom. 23:7687.

Muller, M.Q., Dreiocker, F., Ihling, C.H., Schafer, M., and Sinz, A. 2010. Cleavable cross-linker for protein structure analysis: Reliable identification of cross-linking products by tandem MS. Anal. Chem. 82:6958-6968.

Hoopmann, M.R., Weisbrod, C.R., and Bruce, J.E. 2010. Improved strategies for rapid identification of chemically cross-linked peptides using protein interaction reporter technology. J. Proteome. Res. 9:6323-6333. Kao, A., Chiu, C.L., Vellucci, D., Yang, Y., Patel, V.R., Guan, S., Randall, A., Baldi, P., Rychnovsky, S.D., and Huang, L. 2011. Development of a novel cross-linking strategy for fast and accurate identification of cross-linked peptides of protein complexes. Mol. Cell Proteomics 10:M110 002212. Kasper, P.T., Back, J.W., Vitale, M., Hartog, A.F., Roseboom, W., de Koning, L.J., van Maarseveen, J.H., Muijsers, A.O., de Koster, C.G., and de Jong, L. 2007. An aptly positioned azido group in the spacer of a protein cross-linker for facile mapping of lysines in close proximity. Chembiochem 8:1281-1292. Krauth, F., Ihling, C.H., Ruttinger, H.H., and Sinz, A. 2009. Heterobifunctional isotope-labeled amine-reactive photo-cross-linker for structural investigation of proteins by matrix-assisted laser desorption/ionization tandem time-offlight and electrospray ionization LTQ-Orbitrap mass spectrometry. Rapid Commun. Mass Spectrom. 23:2811-2818. Lee, Y.J. 2009. Probability-based shotgun crosslinking sites analysis. J. Am. Soc. Mass. Spectrom. 20:1896-1899. Lee, Y.J., Lackner, L.L., Nunnari, J.M., and Phinney, B.S. 2007. Shotgun cross-linking analysis for studying quaternary and tertiary protein structures. J. Proteome. Res. 6:39083917.

Nadeau, O.W., Wyckoff, G.J., Paschall, J.E., Artigues, A., Sage, J., Villar, M.T., and Carlson, G.M. 2008. CrossSearch, a user-friendly search engine for detecting chemically cross-linked peptides in conjugated proteins. Mol. Cell Proteomics 7:739-749. Panchaud, A., Singh, P., Shaffer, S.A., and Goodlett, D.R. 2010. xComb: A cross-linked peptide database approach to protein-protein interaction analysis. J. Proteome. Res. 9:25082515. Petrotchenko, E.V. and Borchers, C.H. 2010. ICCCLASS: Isotopically-coded cleavable crosslinking analysis software suite. BMC. Bioinformatics 11:64. Petrotchenko, E.V., Serpa, J.J., and Borchers, C.H. 2011. An isotopically coded CID-cleavable biotinylated cross-linker for structural proteomics. Mol. Cell Proteomics 10:M110 001420. Petrotchenko, E.V., Xiao, K., Cable, J., Chen, Y., Dokholyan, N.V., and Borchers, C.H. 2009. BiPS, a photocleavable, isotopically coded, fluorescent cross-linker for structural proteomics. Mol. Cell Proteomics 8:273-286. Plocinski, P., Laubitz, D., Cysewski, D., Stodus, K., Kowalska, K., and Dziembowski, A. 2014. Identification of protein partners in mycobacteria using a single-step affinity purification method. PLoS One 9:e91380. Rappsilber, J. 2011. The beginning of a beautiful friendship: Cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes. J. Struct. Biol. 173:530-540. Rinner, O., Seebacher, J., Walzthoeni, T., Mueller, L.N., Beck, M., Schmidt, A., Mueller, M., and

Analyzing Molecular Interactions

8.21.17 Current Protocols in Bioinformatics

Supplement 49

Aebersold, R. 2008. Identification of crosslinked peptides from large sequence databases. Nat. Methods 5:315-318. Schilling, B., Row, R.H., Gibson, B.W., Guo, X., and Young, M.M. 2003. MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides. J. Am. Soc. Mass. Spectrom. 14:834-850. Seebacher, J., Mallick, P., Zhang, N., Eddes, J.S., Aebersold, R., and Gelb, M.H. 2006. Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing. J. Proteome. Res. 5:2270-2282. Shen, Y., Tolic, N., Purvine, S.O., and Smith, R.D. 2010. Identification of disulfide bonds in protein proteolytic degradation products using de novo-protein unique sequence tags approach. J. Proteome. Res. 9:4053-4060. Singh, P., Shaffer, S.A., Scherl, A., Holman, C., Pfuetzner, R.A., Larson Freeman, T.J., Miller, S.I., Hernandez, P., Appel, R.D., and Goodlett, D.R. 2008. Characterization of protein cross-links via mass spectrometry and an open-modification search strategy. Anal. Chem. 80:8799-8806. Soderberg, C.A., Lambert, W., Kjellstrom, S., Wiegandt, A., Wulff, R.P., Mansson, C., Rutsdottir, G., and Emanuelsson, C. 2012. Detection of crosslinks within and between proteins by LCMALDI-TOFTOF and the software FINDX to reduce the MSMS-data to acquire for validation. PLoS One 7:e38927. Soderblom, E.J. and Goshe, M.B. 2006. Collisioninduced dissociative chemical cross-linking reagents and methodology: Applications to protein structural characterization using tandem mass spectrometry analysis. Anal. Chem. 78:8059-8068. Soderblom, E.J., Bobay, B.G., Cavanagh, J., and Goshe, M.B. 2007. Tandem mass spectrometry acquisition approaches to enhance identification of protein-protein interactions using lowenergy collision-induced dissociative chemical crosslinking reagents. Rapid Commun. Mass. Spectrom. 21:3395-3408. Sohn, C.H., Agnew, H.D., Lee, J.E., Sweredoski, M.J., Graham, R.L., Smith, G.T., Hess, S., Czerwieniec, G., Loo, J.A., Heath, J.R., Deshaies, R.J., and Beauchamp, J.L. 2012. Designer reagents for mass spectrometry-based proteomics: Clickable cross-linkers for elucidation of protein structures and interactions. Anal. Chem. 84:2662-2669. Stengel, F., Aebersold, R., and Robinson, C.V. 2012. Joining forces: Integrating proteomics and cross-linking with the mass spectrometry of intact complexes. Mol. Cell Proteomics 11:R111 014027.

Using pLink to Analyze Cross-Linked Peptides

Tabb, D.L. 2012. Evaluating protein interactions through cross-linking mass spectrometry. Nat. Methods 9:879-881. Tang, X., Munske, G.R., Siems, W.F., and Bruce, J.E. 2005a. Mass spectrometry identifiable

cross-linking strategy for studying proteinprotein interactions. Anal. Chem. 77:311-318. Tang, Y., Chen, Y., Lichti, C.F., Hall, R.A., Raney, K.D., and Jennings, S.F. 2005b. CLPM: A crosslinked peptide mapping algorithm for mass spectrometric analysis. BMC Bioinformatics 6:S9. Trnka, M.J. and Burlingame, A.L. 2010. Topographic studies of the GroEL-GroES chaperonin complex by chemical cross-linking using diformyl ethynylbenzene: The power of high resolution electron transfer dissociation for determination of both peptide sequences and their attachment sites. Mol. Cell Proteomics 9:23062317. Vellucci, D., Kao, A., Kaake, R.M., Rychnovsky, S.D., and Huang, L. 2010. Selective enrichment and identification of azide-tagged cross-linked peptides using chemical ligation and mass spectrometry. J. Am. Soc. Mass. Spectrom. 21:14321445. Walzthoeni, T., Claassen, M., Leitner, A., Herzog, F., Bohn, S., Forster, F., Beck, M., and Aebersold, R. 2012. False discovery rate estimation for cross-linked peptides identified by mass spectrometry. Nat. Methods 9:901-903. Wefing, S., Schnaible, V., and Hoffmann, D. 2006. SearchXLinks. A program for the identification of disulfide bonds in proteins from mass spectra. Anal. Chem. 78:1235-1241. Weisbrod, C.R., Chavez, J.D., Eng, J.K., Yang, L., Zheng, C., and Bruce, J.E. 2013. In vivo protein interaction network identified with a novel realtime cross-linked peptide identification strategy. J. Proteome. Res. 12:1569-1579. Wong, W., Webb, A.I., Olshina, M.A., Infusini, G., Tan, Y.H., Hanssen, E., Catimel, B., Suarez, C., Condron, M., Angrisano, F., Nebi, T., Kovar, D.R., and Baum, J. 2014. A mechanism for actin filament severing by malaria parasite actin depolymerizing factor 1 via a low affinity binding interface. J. Biol. Chem. 289:4043-4054. Xu, H., Zhang, L., and Freitas, M.A. 2008. Identification and characterization of disulfide bonds in proteins and peptides from tandem MS data by use of the MassMatrix MS/MS search engine. J. Proteome. Res. 7:138-144. Xu, H., Hsu, P.H., Zhang, L., Tsai, M.D., and Freitas, M.A. 2010. Database search algorithm for identification of intact cross-links in proteins and peptides using tandem mass spectrometry. J. Proteome. Res. 9:3384-3393. Yan, F., Che, F.Y., Nieves, E., Weiss, L.M., Angeletti, R.H., and Fiser, A. 2011. Photo-assisted peptide enrichment in protein complex crosslinking analysis of a model homodimeric protein using mass spectrometry. Proteomics 11:41094115. Yang, L., Tang, X., Weisbrod, C.R., Munske, G.R., Eng, J.K., von Haller, P.D., Kaiser, N.K., and Bruce, J.E. 2010. A photocleavable and mass spectrometry identifiable cross-linker for protein interaction studies. Anal. Chem. 82:35563566.

8.21.18 Supplement 49

Current Protocols in Bioinformatics

Yang, B., Wu, Y.J., Zhu, M., Fan, S.B., Lin, J., Zhang, K., Li, S., Chi, H., Li, Y.X., Chen, H.F., Luo, S.K., Ding, Y.H., Wang, L.H., Hao, Z., Xiu, L.Y., Chen, S., Ye, K., He, S.M., and Dong, M.Q. 2012a. Identification of cross-linked peptides from complex samples. Nat. Methods 9:904906. Yang, L., Zheng, C., Weisbrod, C.R., Tang, X., Munske, G.R., Hoopmann, M.R., Eng, J.K., and Bruce, J.E. 2012b. In vivo application of photocleavable protein interaction reporter technology. J. Proteome. Res. 11:1027-1041. Young, M.M., Tang, N., Hempel, J.C., Oshiro, C.M., Taylor, E.W., Kuntz, I.D., Gibson, B.W., and Dollinger, G. 2000. High throughput protein fold identification by using experimental constraints derived from intramolecular crosslinks and mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 97:5802-5806. Yuan, Z.F., Liu, C., Wang, H.P., Sun, R.X., Fu, Y., Zhang, J.F., Wang, L.H., Chi, H., Li, Y., Xiu, L.Y., Wang, W.P., and He, S.M. 2012. pParse: A method for accurate determination of monoiso-

topic peaks in high-resolution mass spectra. Proteomics 12:226-235. Zhang, H., Tang, X., Munske, G.R., Tolic, N., Anderson, G.A., and Bruce, J.E. 2009. Identification of protein-protein interactions and topologies in living cells with chemical cross-linking and mass spectrometry. Mol. Cell Proteomics 8:409-420.

Internet Resources http://pfind.ict.ac.cn/software/pLink/index.html pLink. http://pfind.ict.ac.cn/software/pLabel/index.html pLabel. http://pfind.ict.ac.cn/index.html pFind Studio. http://thermo-msfilereader.software.informer.com/ MSFileReader. http://www.uniprot.org/ Uniprot.

Analyzing Molecular Interactions

8.21.19 Current Protocols in Bioinformatics

Supplement 49

Using pLink to Analyze Cross-Linked Peptides.

pLink is a search engine for high-throughput identification of cross-linked peptides from their tandem mass spectra, which is the data-analysis step i...
1009KB Sizes 0 Downloads 19 Views