|
|
About
|
I am a machine learning scientist at
Amazon, where I focus on methods to effectively
train deep generative models. Before Amazon, I was a postdoc at
UC Davis.
My previous research focused primarily on two
different approaches to accelerate the training of
widely used machine learning models:
- Deriving theoretical-guarantees to efficiently learn
large numbers of model parameters
- Developing fast machine
learning algorithms for high-performance computing
systems
These two methodologies were regularly applied to computational biology
problems, leading to both
speed and accuracy improvements on massive-scale
data.
I received my Ph.D. in Electrical Engineering
from the University
of Washington, advised by Jeff
Bilmes and Bill
Noble.
I received the UC Davis award for
Excellence in Postdoctoral Research, and a Genome
Training Grant while at UW.
Curriculum Vitae
|
Recent News |
- July, 2022 - Received an Outstanding Reviewer
Award for ICML 2022.
- July, 2021 - Joined Amazon as an ML Applied Scientist.
- March 22, 2021 - Presenting at
the SoCal
ML & NLP Symposium 2021.
[Poster]
- March 19, 2021 - Received an Outstanding Reviewer Award for ICLR 2021.
- March 2021 - Presenting some recent GPU speedup
work at
the UC
Davis Postdoc Research Symposium.
- Feb. 9, 2021 - Gave a seminar in the UCLA CS
department, video here.
- Nov. 14, 2020 - New preprint and software
available, ProteoTorch:
deep learning and ultrafast machine learning for
improved semi-supervised analysis of shotgun proteomics data.
[Preprint],
[Software],
[Documentation]
- Sept. 25, 2020 - Our paper, GPU-Accelerated
Primal Learning for Extremely Fast Large-Scale
Classification, was
accepted to NeurIPS 2020.
[Paper],
[Code]
- June 17, 2020 - Paper accepted for poster
presentation at the
the 2020
ICML Workshop on Computational Biology.
- May 2020 - Our recent speedups have been
released
in Percolator 3.5, including lower memory
consumption and more efficient multithreading than
before (the speedups are described here
and here).
Have fun!
- November 22 2019 - Giving a talk at the UC
Riverside Data Science Center .
- November 18 2019 - Giving a talk at the UC
Irvine AI/ML seminar.
- November 7, 2019 - Paper accepted for poster
presentation at
the Machine
Learning in Comp Bio (MLCB) workshop,
co-located with NeurIPS.
- September 2019 - Top 50% of NeurIPS reviewers.
- August 2019 - Our
paper, Speeding
up Percolator, was
accepted to the Journal of Proteome Research. The upgraded Percolator software described therein is freely
available here.
- April 15, 2019 - Received the UC Davis Award for Excellence in Postdoctoral Research, 2019.
- Oct. 29, 2018 - Our paper, Learning
Concave Conditional Likelihood Models for Improved
Analysis of Tandem Mass Spectra, was
accepted to NeurIPS 2018.
[Code], [PDF]
- See here for less recent news.
|
Selected Publications |
See
my publications page for the
full list. |
- John T. Halloran and David
M. Rocke.
GPU-Accelerated
Primal Learning for Extremely Fast Large-Scale
Classification.
Advances in Neural
Information Processing Systems (NeurIPS).
2020
20% Acceptance rate,
1900 out of 9454 submissions.
[Paper],
[Poster],
[Code]
- John T. Halloran, Hantian Zhang, Kaan Kara, Cedric
Renggli, Matthew The, Ce Zhang, David M. Rocke, Lukas
Kall, William Stafford Noble.
Speeding up Percolator.
Journal of Proteome
Research (JPR). 2019
[PDF], [Software]
- John T. Halloran and David
M. Rocke.
Learning
Concave Conditional Likelihood Models for Improved
Analysis of Tandem Mass
Spectra.
Advances in Neural
Information Processing Systems (NeurIPS).
2018
20.8% Acceptance rate, 1011 out of 4856 submissions.
[Code], [PDF]
- John T. Halloran and David
M. Rocke.
Gradients of Generative Models for
Improved Discriminative Analysis of Tandem Mass
Spectra.
Advances in Neural
Information Processing Systems (NIPS). 2017
Spotlight presentation; 3.5% Acceptance rate, 112 out
of 3240 submissions.
[PDF], [Supplementary]
- Jie Liu, John T. Halloran, Jeffrey Bilmes, Riza
Daza, Choli Lee, Elisabeth Mahen, Donna
Prunkard, Chaozhong Song, Sibel Blau, Michael
Dorschner, Vijayakrishna Gadi, Jay
Shendure, Anthony Blau, and William
Noble.
Comprehensive statistical inference of
the clonal structure of cancer from multiple
biopsies.
Scientific Reports. 2017
[PDF],
[URL]
[Software]
- Shengjie Wang, John
T. Halloran, Jeff A. Bilmes and William
S. Noble.
Faster and more accurate graphical
model identification of tandem mass spectra using
trellises.
Conference on Intelligent Systems for
Molecular Biology (ISMB). 2016
[PDF]
- John
T. Halloran, Jeff A. Bilmes, and William
S. Noble.
Learning Peptide-Spectrum Alignment
Models for Tandem Mass Spectrometry,
Uncertainty in Artificial Intelligence
(UAI). 2014
[PDF],
[Supplementary Data]
|
Selected Awards |
- Award for Excellence in Postdoctoral
Research, UC Davis 2019
- Nvidia Hardware Grant (Tesla K40 GPU awarded for
Deep Learning research), 2016
- Genome Training Grant, University
of Washington 2011-2013
|
Software |
ProteoTorch: Deep
Semi-Supervised Learning for Accurate Recalibration of Shotgun
Proteomics Data |
ProteoTorch faithfully implements the
semi-supervised learning framework (with
cross-validation) pioneered by the C++ Percolator software, in a
light and flexible Python package with an emphasis
on speed. By default, deep neural network
classifiers are used to accurately recalibrate PSMs
and features collected from a database-search. A
host of other classifiers are also available,
including ultrafast support vector machine (SVM)
implementations.
[Preprint],
[Software],
[Documentation]
|
GPU-Optimized
Logistic Regression and SVM training algorithms |
The following contains highly optimized
GPU-training primal algorithms for logistic regression (in LIBLINEAR for
sparse features) and SVMs (in Percolator for dense
features). The underlying CPU-centric algorithms
considered natively resist GPU-optimizations, due to
the heavy sequential dependence of variables in the
underlying algorithm (i.e., the trust-region second-order
algorithm, TRON) and reliance on random access.
Thus, the code enables GPU-optimizations by
extensively using the following principles: 1) decouple CPU and GPU variable
dependencies, (2) minimize transfer latency between
the CPU and GPU, (3) saturate the GPU optimally by
using routines which allow memory coalescing, and
(4) maximize concurrency between the CPU and GPU.
[NeurIPS
2020 Paper],
[Software]
|
Jensen: A General Toolkit for Large-Scale Machine Learning and Convex Optimization |
Written in C++, Jensen is an easily-customizable
toolkit for production-level machine learning and
convex optimization. Fast, flexible, and light on
external dependencies (only CMake is necessary to
build the source), Jensen natively supports a large
number of popular loss functions,
state-of-the-art optimization algorithms, and
machine learning applications.
Documentation and code
examples are
described here.
The software is freely
available here and supported on Unix, OSX, and
Windows operating systems.
[Paper],
[Software]
|
Optimized
Percolator Software |
The following repository contains SVM solvers
highly optimized for large-scale Percolator
analysis: bitbucket.org/jthalloran/percolator_upgrade.
Both solvers, Trust Region Newton (TRON) and
L2-SVM-MFN*, support multithreading and are
optimized for single-threaded use. Further details
may be found in our
paper, A
Matter of Time: Faster Percolator Analysis via Efficient SVM
Learning for Large-Scale Proteomics.
[Paper],
[Software]
|
The DRIP Toolkit |
The DRIP Toolkit (DTK), for searching a tandem mass spectra using a dynamic Bayesian network (DBN) for Rapid Identification of Peptides (DRIP), is now available! DTK supports parameter estimation for low-resolution MS2 searches, multithreading on a single machine, utilities easing cluster use, instantiating/decoding/plotting DRIP PSMs in the python shell, and in-browser analysis of identified spectra via the Lorikeet plugin. Further information and documentation regarding the toolkit's use is available in the DRIP Toolkit documentation.
Details of the DRIP model may be found
in here.
[Software],
[Paper], [Documentation]
|
HMM-DNN tutorial for GMTK |
I've written a short tutorial for the Graphical Models Toolkit (GMTK), with all pertinent files available in this tarball. The following is a copy of the tutorial's documentation. This tutorial covers training an HMM in GMTK via generative training (expectation maximization), discriminative training (maximum mutual information), and training an HMM/DNN hybrid.
|
Contact Info |
- Email: j concatenated with the first five
letters of my last name at amazon
dot com
|
|