dtk
Description:
Python library of PSM analysis tools. Given PSMs and MS2 spectra,
supports modules for plotting DRIP PSMs' decoded
Viterbi paths and generation
of Lorikeet
files for interative, in-browser analysis. Within the python interactive shell,
supports instantation of PSMs, real-time Viterbi decoding of DRIP
PSMs, and plotting of a DRIP PSM's decoded Viterbi path.
Modules:
-
plot_psms()
- given PSM file, decode and plot all
DRIP PSMs
Inputs:
- PSM file in DRIP's output format (required)
.ms2
file of observed spectra (required)
- HTML file to write list of output figures to (if not
specified, defaults to 'currPsms.html')
- Boolean denoting whether high-resolution MS2
spectrum (if unspecified, defaults to
False
)
- Used DRIP learned means, output
of
dripTrain
(if unspecified, defaults to
''
)
- Used DRIP learned covars, output
of
dripTrain
(if unspecified, defaults to
''
)
- Modifications specification (if unspecified, defaults to
'C+57.0214'
)
- N-terminal modifications specification (if unspecified,
defaults to
''
)
- C-terminal modifications specification (if unspecified,
defaults to
''
)
- Boolean denoting whether to filter the precursor MS1
peak (if unspecified, defaults to
False
)
- If high-resolution MS2 spectrum, m/z value to fit 99.9% of
theoretical Gaussian mass within (if unspecified, defaults
to
0.05
)
Output:
- Figure for each PSM in input file of DRIP decoded Viterbi
path, listed in the specified HTML file (3rd input)
-
gen_lorikeet()
- generate Lorikeet files for input
PSMs. The Lorikeet
package is assumed to be unzipped in the run directory
Input:
- Tab-delimited PSM file with user-specified fields (required)
.ms2
file of observed spectra (required)
- directory to write output Lorikeet HTML files to (required)
- HTML file to write list of output Lorikeet files to (if not
specified, defaults to 'currPsms.html')
- Modifications specification (if unspecified, defaults to
'C+57.0214'
)
- N-terminal modifications specification (if unspecified,
defaults to
''
)
- C-terminal modifications specification (if unspecified,
defaults to
''
)
- Header field denoting scan ID number (if unspecified,
defaults to
'scan'
)
- Header field denoting peptide sequence (if unspecified,
defaults to
'sequence'
)
- Header field denoting PSM charge (if unspecified,
defaults to
'charge'
)
- Optional header field denoting score (if unspecified,
defaults to
'score'
)
- Header field denoting variable mod sequence if variable mods
were specified during the search, following
DRIP's output specification (if
unspecified, defaults to
''
)
Output:
- Lorikeet file for each PSM in input file, listed in the
specified HTML file (4th input)
-
percolatorPsms_gen_lorikeet()
- generate Lorikeet
files specifically for Percolator output
PSMs (variable mods not supported since this information is not typically
output by
Percolator). The Lorikeet
package is assumed to be unzipped in the run directory
Input:
- Tab-delimited PSM file with user-specified fields (required)
.ms2
file of observed spectra (required)
- directory to write output Lorikeet HTML files to (required)
- Boolean denoting whether file was generated by
crux
percolator
(if not specified, defaults
to False
)
- HTML file to write list of output Lorikeet files to (if not
specified, defaults to 'currPsms.html')
- Modifications specification (if unspecified, defaults to
'C+57.0214'
)
- N-terminal modifications specification (if unspecified,
defaults to
''
)
- C-terminal modifications specification (if unspecified,
defaults to
''
)
Output:
- Lorikeet file for each PSM in input file, listed in the
specified HTML file (5th input)
-
load_spectra()
– Load spectra
from .ms2
file.
Input:
Output:
- Dictionary whose keys are scan numbers and elements are
spectrum objects with relevant
attributes
.spectrum_id
(scan ID
number), .intensity
(list of intensity
values), and .mz
(list of m/z values)
-
psm()
– instantiate PSM object.
Inputs:
- Peptide sequence (required)
- Observed spectrum (accessed via dictionary returned
by
load_spectra
) (required)
- PSM charge (if unspecified, defaults to
2
)
- Boolean denoting whether high-resolution MS2
spectrum (if unspecified, defaults to
False
)
- Used DRIP learned means, output
of
dripTrain
(if unspecified, defaults to
''
)
- Used DRIP learned covars, output
of
dripTrain
(if unspecified, defaults to
''
)
- Modifications specification (if unspecified, defaults to
'C+57.0214'
)
- N-terminal modifications specification (if unspecified,
defaults to
''
)
- C-terminal modifications specification (if unspecified,
defaults to
''
)
- Boolean denoting whether to filter the precursor MS1
peak (if unspecified, defaults to
False
)
- If high-resolution MS2 spectrum, m/z value to fit 99.9% of
theoretical Gaussian mass within (if unspecified, defaults
to
0.05
)
Output:
- decoded DRIP PSM object with relevant
attributes
.peptide
(peptide
sequence), .spectrum
(PSM observed spectrum
object), .insertion_sequence
(decoded sequence of
Booleans denoting whether the ith peak in the observed spectrum
is an insertion or not), .used_ions
(decoded
sequence of non-deleted theoretical peaks)
-
plot_drip_viterbi()
– for p = psm()
,
p.plot_drip_viterbi(plotName)
plots decoded sequence
of DRIP scored b- and y-ions to figure plotName
.
Example usage:
Step-by-step examples of usage are
given here. In what
follows, lines beginning with >>>
are run within python's
shell. To run the following examples, import the library in the
python shell using
>>> import dtk
DRIP decode and plot PSMs
>>> psms = 'dripSearch-test-output-beam75.txt'
>>> spectra = 'data/test.ms2'
>>> dtk.plot_psms(psms, spectra, 'currPsms.html')
DRIP decode and plot high-res MS2 PSMs with variable mods
>>> psms = 'dripSearch-malariaTestVarmods-output.txt'
>>> ms2="data/malariaTest.ms2"
>>> mods = '3M+15.9949,C+57.0214,K+229.16293'
>>> nterm_mods = 'X+229.16293'
>>> cterm_mods = ''
>>> dtk.plot_psms(psms, ms2, 'currPsms.html',
True, '', '',
mods, nterm_mods)
Generate Lorikeet files for general PSM file with variable mods
>>> ms2="data/malariaTest.ms2"
>>> mods = '3M+15.9949,C+57.0214,K+229.16293'
>>> nterm_mods = 'X+229.16293'
>>> cterm_mods = ''
>>> scanField = 'scan'
>>> chargeField = 'charge'
>>> peptideField = 'sequence'
>>> scoreField = 'percolator score'
>>> psmFile='data/lorikeetData/cruxPercolatorMalariaVarModsTestOutput.txt'
>>> dtk.gen_lorikeet(psmFile, ms2,'genLorikeetPlasmoCruxPlots','genLorikeetPlasmoCruxPlots.html', mods, nterm_mods, cterm_mods, scanField, peptideField, chargeField, scoreField, 'Var_mod_seq')
Generate Lorikeet files for stand-alone Percolator output
>>> ms2="data/test.ms2"
>>> mods = 'C+57.0214'
>>> psmFile='data/lorikeetData/percolatorTestOutput.txt'
>>> dtk.percolatorPsms_gen_lorikeet(psmFile, ms2,'lorikeetPlots',False,'currPsms.html',mods)
DRIP
Toolkit home