# dripTrain

## Usage:

./dripTrain.py [options] --psm-library <PSM library> --spectra <spectra file>

## Description:

DripTrain learns the model parameters for DRIP via the expectation-maximization algorithm utilizing a library of high-confidence PSMs (such as this PSM library and the corresponding set of spectra). The learned parameters may then be used in dripSearch. If you use DRIP trained parameters in your research, please cite:

John T. Halloran, Jeff A. Bilmes, and William S. Noble. "Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry". Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014). AUAI, Quebic City, Quebec Canada, July 2014.

## Input:

• --psm-library <PSM library> – Collection of high-confidence peptide-spectrum matches (PSMs). File must be in tab-delimited format with fields Peptide, Scan, and Charge.
• --spectra <spectra file> – Corresponding ms2 spectra for the PSM library.

## Output:

The program writes the learned parameters to dripLearned.params by default. The name of the output file can be set by the user using the --dripTrain-file option.

## Options:

• ### Training parameters

• --dripTrain-file <string> – Name of GMTK output file. Default = dripLearned.params.
• --output-mean-file <string> – Name of output file for learned Gaussian means. Default = dripLearned.means.
• --output-covar-file <string> – Name of output file for learned Gaussian covariances. Default = dripLearned.covars.
• --high-res-ms2 <T|F> – boolean, whether the search is over high-res ms2 (high-high) spectra. When this parameter is true, DRIP uses the real valued masses of candidate peptides as its Gaussian means; for low-res ms2 (low-low or high-low), the observed m/z measurements are much less accurate so these Gaussian means are learned using training data. Default = False.
• ### Amino acid modifications

• --mods-spec <string> – The general form of a modification specification has three components, as exemplified by 1STY+79.966331.
The three components are: [max_per_peptide]residues[+/-]mass_change
In the example, max_per_peptide is 1, residues are STY, and mass_change is +79.966331. To specify a static modification, the number preceding the amino acid must be omitted; i.e., C+57.02146 specifies a static modification of 57.02146 Da to cysteine. Note that Tide allows at most one modification per amino acid. Also, the default modification (C+57.02146) will be added to every mods-spec string unless an explicit C+0 is included. Default = C+57.02146.
• --cterm-peptide-mods-spec <string> – Specify peptide c-terminal modifications. See nterm-peptide-mods-spec for syntax. Default = <empty>.
• --nterm-peptide-mods-spec <string> – Specify peptide n-terminal modifications. Like --mods-spec, this specification has three components, but with a slightly different syntax. The max_per_peptide can be either "1", in which case it defines a variable terminal modification, or missing, in which case the modification is static. The residues field indicates which amino acids are subject to the modification, with the reside X corresponding to any amino acid. Finally, added_mass is defined as before. Default = <empty>.

## Example usage

• To learn parameters using the high-confidence PSMs described here with one static modification of Carbamidomethyl, run in the download directory:
• To run the above and save the results to parameter file output.params, run in the download directory:

