dripExtract [options] <PSM file> <spectra file>
DripExtract utilizes a DBN for Rapid Identification of Peptides
(DRIP) to derive features regarding PSMs. Model parameters may also be learned via
expectation-maximization (implemented in dripTrain
) and
utilized during feature extraction. Note that the amino acid modifications used for the initial search
must be correctly specified to extract the relevant features.
If you use features extracted using DRIP, please cite (handle this later, current work is unpublished).
<PSM file>
– Collection of peptide-spectrum
matches (PSMs) for which to derive DRIP features for. File
must be in either tab-delimited (with dripSearch output
fields) or PIN format.
<spectra file>
– The name of the file from
which to parse the fragmentation spectra, in ms2 file format.
The program writes to the file dripExtract.pin
by
default. The name of the output file can be set by the user using
the --output-file
option.
--output-file <string>
– Name of output file. Default = dripExtract.pin
.
--high-res-ms2 <T|F>
–
boolean, whether the search is over high-res ms2 (high-high)
spectra. When this parameter is true, DRIP uses the real valued
masses of candidate peptides as its Gaussian means; for low-res
ms2 (low-low or high-low), the observed m/z measurements are much
less accurate so these Gaussian means are learned using training
data. Default = False
.
--high-res-gauss-dist <float>
–
m/z distance for 99.9% of m/z Gaussian mass to lie within. Only
available for high-res MS2 searches. Default=0.05
.
--precursor-filter <T|F>
–
boolean, when true, filter all peaks 1.5Da from the observed
precursor mass. Default=False
.
--num-threads <integer>
– the number of threads to run on a multithreaded CPU. If supplied value is greater than number of supported threads, defaults to the maximum number of supported threads minus one. Multithreading is not suppored for cluster use as this is typically handled by the cluster job manager. Default = 1
.
--write-pin <T|F>
– Write output in percolator PIN format. If true and the input PSM file is not a PIN file, the relevant database digested using dripSearch must be available to access peptide-candidate-per-spectrum features {dm, absdM, enzN, enzC, Protein} (see option --dripSearch-database-file
). Default = T
.
--append-to-pin <T|F>
– Append DRIP features to the features of the input PIN file. Default = T
.
--learned-means <string>
– Output of dripTrain,
DRIP Gaussian means to be used during feature extraction. Default = ''
.
--learned-covariances <string>
– Output of dripTrain,
DRIP Gaussian covariances to be used during feature extraction. Default = ''
.
--mods-spec <string>
–
The general form of a modification specification has three
components, as exemplified by 1STY+79.966331.C+57.02146
.
--cterm-peptide-mods-spec <string>
–
Specify peptide c-terminal modifications. See
nterm-peptide-mods-spec for syntax. Default
= <empty>
.
--nterm-peptide-mods-spec <string>
–
Specify peptide n-terminal modifications. Like --mods-spec, this
specification has three components, but with a slightly different
syntax. The max_per_peptide can
be either "1", in which case it defines a variable terminal
modification, or missing, in which case the modification is
static. The residues field
indicates which amino acids are subject to the modification, with
the reside X corresponding to
any amino acid. Finally, added_mass is defined as before. Default
= <empty>
.