./dripSearch.py [options] --spectra
<spectra file> --digest-dir <protein database>
DripSearch utilizes a DBN for Rapid Identification of Peptides
(DRIP) to identify peptides from tandem mass spectra. DRIP is
primarily used for high peptide identification accuracy and improved
derived features regarding PSMs (the latter is utilized in
dripExtract
). Model parameters may also be learned via
expectation-maximization (implemented in dripTrain
) and
utilized during search for improved accuracy.
If you use
DRIP in your research, please
cite:
John T. Halloran, Jeff A. Bilmes, and William S. Noble. "Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry". Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014). AUAI, Quebic City, Quebec Canada, July 2014.
--spectra <spectra file>
– The name of the file from
which to parse the fragmentation spectra, in ms2 file format.--digest-dir <dripDigest output directory>
– Output
directory of dripDigest (note, the protein database must be digested
with dripDigest prior to running dripSearch). Default
= dripDigest-output
The following directories will be created:
log
– directory containing DRIP results. If used in cluster mode
(--cluster-mode True
), cluster search results are
written to this directory. If used in standalone mode
(--cluster-mode False
), GMTK output files are
written to this directory.
encode
–
directory containing GMTK input files.
drip_collection
–
directory containing DRIP parameter files for GMTK.
--precursor-window <float>
– Tolerance used
for matching peptides to spectra. Peptides must be within +/-
'precursor-window' of the spectrum value. The precursor window units
depend upon precursor-window-type. Default = 3
.--precursor-window-type <Da|ppm>
–
Specify the units for the window that is used to select peptides
around the precursor mass location, either in Daltons
(Da
) or part-per-million (ppm
). Default
= Da
.--charges <comma-separated-integers|all>
– precursor
charges to search. To specify individual charges, list as
comma-delimited, e.g., “1,2,3” to search all charge 1, 2, or 3
spectra. Default = All
.--high-res-ms2 <T|F>
–
boolean, whether the search is over high-res ms2 (high-high)
spectra. When this parameter is true, DRIP uses the real valued
masses of candidate peptides as its Gaussian means; for low-res
ms2 (low-low or high-low), the observed m/z measurements are much
less accurate so these Gaussian means are learned using training
data. Default = False
.
--high-res-gauss-dist <float>
–
m/z distance for 99.9% of m/z Gaussian mass to lie within. Only
available for high-res MS2 searches. Default=0.05
.
--precursor-filter <T|F>
–
boolean, when true, filter all peaks 1.5Da from the observed
precursor mass. Default=False
.
--decoys <T|F>
–
whether to create (shuffle target peptides) and search decoy
peptides. Default = T
.
--num-threads <integer>
– the number of threads to run on a multithreaded CPU. If supplied value is greater than number of supported threads, defaults to the maximum number of supported threads. Multithreading is not suppored for cluster use as this is typically handled by the cluster job manager. Default = 1
.
--top-match <integer>
– The number of psms per spectrum written to the output files. Default = 1
.
--beam <integer>
– K-beam width to use to speed
up inference. Default value of 0 means exact inference. Warning -
identifications may be significantly poor if the beam width is too
small, i.e., beam < 100. Default = 0
.
--random-wait <integer>
– randomly wait up to
specified number of seconds before writing results back to NFS
(for cluster use). Default = 10
.
--num-jobs <integer>
– the number of jobs to
run in parallel (for cluster use). Default = 1
.
--cluster-mode <T|F>
– evaluate dripSearch
prepared data as jobs on a cluster. Only set this to true once
dripSearch has been run to prepare data for cluster use. Default
= False
.
--write-cluster-scripts <T|F>
– write scripts
to be submitted to cluster queue. Only used when num-jobs > 1.
Job outputs will be written to log subdirectory in current
directory. Default = True
.
--cluster-dir <string>
– absolute path of
directory to run cluster jobs. Default = /tmp
.
--merge-cluster-results <T|F>
– merge
dripSearch cluster results collected in directory log
.
Default = False
.
--output <string>
– output file to write
both target and decoy results. Default = none
.
The following examples are available in test.sh
.
Run dripDigest
and dripTrain
first, as necessary.