dripExtract

Description:

DripExtract utilizes a DBN for Rapid Identification of Peptides (DRIP) to derive features regarding PSMs. Model parameters may also be learned via expectation-maximization (implemented in dripTrain) and utilized during feature extraction. Note that the amino acid modifications used for the initial search must be correctly specified to extract the relevant features.

If you use features extracted using DRIP, please cite (handle this later, current work is unpublished).

Input:

<PSM file> – Collection of peptide-spectrum matches (PSMs) for which to derive DRIP features for. File must be in either tab-delimited (with dripSearch output fields) or PIN format.

<spectra file> – The name of the file from which to parse the fragmentation spectra, in ms2 file format.

Options:

Feature extraction parameters

--output-file <string> – Name of output file. Default = dripExtract.pin.
--high-res-ms2 <T|F> – boolean, whether the search is over high-res ms2 (high-high) spectra. When this parameter is true, DRIP uses the real valued masses of candidate peptides as its Gaussian means; for low-res ms2 (low-low or high-low), the observed m/z measurements are much less accurate so these Gaussian means are learned using training data. Default = False.
--high-res-gauss-dist <float> – m/z distance for 99.9% of m/z Gaussian mass to lie within. Only available for high-res MS2 searches. Default=0.05.
--precursor-filter <T|F> – boolean, when true, filter all peaks 1.5Da from the observed precursor mass. Default=False.
--num-threads <integer> – the number of threads to run on a multithreaded CPU. If supplied value is greater than number of supported threads, defaults to the maximum number of supported threads minus one. Multithreading is not suppored for cluster use as this is typically handled by the cluster job manager. Default = 1.
--write-pin <T|F> – Write output in percolator PIN format. If true and the input PSM file is not a PIN file, the relevant database digested using dripSearch must be available to access peptide-candidate-per-spectrum features {dm, absdM, enzN, enzC, Protein} (see option --dripSearch-database-file). Default = T.
--append-to-pin <T|F> – Append DRIP features to the features of the input PIN file. Default = T.
--learned-means <string> – Output of dripTrain, DRIP Gaussian means to be used during feature extraction. Default = ''.
--learned-covariances <string> – Output of dripTrain, DRIP Gaussian covariances to be used during feature extraction. Default = ''.

Amino acid modifications

--mods-spec <string> – The general form of a modification specification has three components, as exemplified by 1STY+79.966331.
The three components are: [max_per_peptide]residues[+/-]mass_change
In the example, max_per_peptide is 1, residues are STY, and mass_change is +79.966331. To specify a static modification, the number preceding the amino acid must be omitted; i.e., C+57.02146 specifies a static modification of 57.02146 Da to cysteine. Note that Tide allows at most one modification per amino acid. Also, the default modification (C+57.02146) will be added to every mods-spec string unless an explicit C+0 is included. Default = C+57.02146.
--cterm-peptide-mods-spec <string> – Specify peptide c-terminal modifications. See nterm-peptide-mods-spec for syntax. Default = <empty>.
--nterm-peptide-mods-spec <string> – Specify peptide n-terminal modifications. Like --mods-spec, this specification has three components, but with a slightly different syntax. The max_per_peptide can be either "1", in which case it defines a variable terminal modification, or missing, in which case the modification is static. The residues field indicates which amino acids are subject to the modification, with the reside X corresponding to any amino acid. Finally, added_mass is defined as before. Default = <empty>.

Example usage:

Extract low-resolution MS2 features, output PIN file


          python -OO dripExtract.py \

          --write-pin true 
          --learned-means dripLearned.means 
          --learned-covars dripLearned.covars 
          --psm-file dripSearch-test-output.txt \

          --num-threads 8 
          --mods-spec 'C+57.0214' 
          --spectra data/test.ms2 \

          --output dripExtract-test-output.txt

Extract high-resolution MS2 features for a search with variable mods, append features to input PIN file


          python -OO dripExtract.py \

	  --append-to-pin true 
	  --high-res-ms2 true 
	  --precursor-filter 'True' 
	  --psm-file crux-malariaTestVarmods-output.pin \

	  --num-threads 8 
	  --spectra data/malariaTest.ms2
	  --output dripExtract-malariaTestVarmods-output.pin \

    	  --mods-spec '3M+15.9949,C+57.0214,K+229.16293' 
    	  --nterm-peptide-mods-spec 'X+229.16293'

DRIP Toolkit home