./dripSearch.py [options] --spectra
<spectra file> --digest-dir <protein database>
DripSearch utilizes a DBN for Rapid Identification of Peptides
(DRIP) to identify peptides from tandem mass spectra. DRIP is
primarily used for high peptide identification accuracy and improved
derived features regarding PSMs (the latter is utilized in
dripExtract). Model parameters may also be learned via
expectation-maximization (implemented in
utilized during search for improved accuracy.
If you use
DRIP in your research, please
John T. Halloran, Jeff A. Bilmes, and William S. Noble. "Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry". Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014). AUAI, Quebic City, Quebec Canada, July 2014.
--spectra <spectra file>– The name of the file from which to parse the fragmentation spectra, in ms2 file format.
--digest-dir <dripDigest output directory>– Output directory of dripDigest (note, the protein database must be digested with dripDigest prior to running dripSearch). Default =
The following directories will be created:
log– directory containing DRIP results. If used in cluster mode (
--cluster-mode True), cluster search results are written to this directory. If used in standalone mode (
--cluster-mode False), GMTK output files are written to this directory.
encode– directory containing GMTK input files.
drip_collection– directory containing DRIP parameter files for GMTK.
--precursor-window <float>– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window' of the spectrum value. The precursor window units depend upon precursor-window-type. Default =
--precursor-window-type <Da|ppm>– Specify the units for the window that is used to select peptides around the precursor mass location, either in Daltons (
Da) or part-per-million (
ppm). Default =
--charges <comma-separated-integers|all>– precursor charges to search. To specify individual charges, list as comma-delimited, e.g., “1,2,3” to search all charge 1, 2, or 3 spectra. Default =
--high-res-ms2 <T|F>– boolean, whether the search is over high-res ms2 (high-high) spectra. When this parameter is true, DRIP uses the real valued masses of candidate peptides as its Gaussian means; for low-res ms2 (low-low or high-low), the observed m/z measurements are much less accurate so these Gaussian means are learned using training data. Default =
--high-res-gauss-dist <float>– m/z distance for 99.9% of m/z Gaussian mass to lie within. Only available for high-res MS2 searches. Default=
--precursor-filter <T|F>– boolean, when true, filter all peaks 1.5Da from the observed precursor mass. Default=
--decoys <T|F>– whether to create (shuffle target peptides) and search decoy peptides. Default =
--num-threads <integer>– the number of threads to run on a multithreaded CPU. If supplied value is greater than number of supported threads, defaults to the maximum number of supported threads. Multithreading is not suppored for cluster use as this is typically handled by the cluster job manager. Default =
--top-match <integer>– The number of psms per spectrum written to the output files. Default =
--beam <integer>– K-beam width to use to speed up inference. Default value of 0 means exact inference. Warning - identifications may be significantly poor if the beam width is too small, i.e., beam < 100. Default =
--random-wait <integer>– randomly wait up to specified number of seconds before writing results back to NFS (for cluster use). Default =
--num-jobs <integer>– the number of jobs to run in parallel (for cluster use). Default =
--cluster-mode <T|F>– evaluate dripSearch prepared data as jobs on a cluster. Only set this to true once dripSearch has been run to prepare data for cluster use. Default =
--write-cluster-scripts <T|F>– write scripts to be submitted to cluster queue. Only used when num-jobs > 1. Job outputs will be written to log subdirectory in current directory. Default =
--cluster-dir <string>– absolute path of directory to run cluster jobs. Default =
--merge-cluster-results <T|F>– merge dripSearch cluster results collected in directory
log. Default =
--output <string>– output file to write both target and decoy results. Default =
The following examples are available in
dripTrain first, as necessary.