John T. Halloran

 
About

I am a machine learning scientist at Amazon, where I focus on methods to effectively train deep generative models. Before Amazon, I was a postdoc at UC Davis.

My previous research focused primarily on two different approaches to accelerate the training of widely used machine learning models:

  • Deriving theoretical-guarantees to efficiently learn large numbers of model parameters
  • Developing fast machine learning algorithms for high-performance computing systems
These two methodologies were regularly applied to computational biology problems, leading to both speed and accuracy improvements on massive-scale data. I received my Ph.D. in Electrical Engineering from the University of Washington, advised by Jeff Bilmes and Bill Noble.

I received the UC Davis award for Excellence in Postdoctoral Research, and a Genome Training Grant while at UW.

Curriculum Vitae

Recent News
  • July, 2022 - Received an Outstanding Reviewer Award for ICML 2022.
  • July, 2021 - Joined Amazon as an ML Applied Scientist.
  • March 22, 2021 - Presenting at the SoCal ML & NLP Symposium 2021.
    [Poster]
  • March 19, 2021 - Received an Outstanding Reviewer Award for ICLR 2021.
  • March 2021 - Presenting some recent GPU speedup work at the UC Davis Postdoc Research Symposium.
  • Feb. 9, 2021 - Gave a seminar in the UCLA CS department, video here.
  • Nov. 14, 2020 - New preprint and software available, ProteoTorch: deep learning and ultrafast machine learning for improved semi-supervised analysis of shotgun proteomics data.
    [Preprint], [Software], [Documentation]
  • Sept. 25, 2020 - Our paper, GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification, was accepted to NeurIPS 2020.
    [Paper], [Code]
  • June 17, 2020 - Paper accepted for poster presentation at the the 2020 ICML Workshop on Computational Biology.
  • May 2020 - Our recent speedups have been released in Percolator 3.5, including lower memory consumption and more efficient multithreading than before (the speedups are described here and here). Have fun!
  • November 22 2019 - Giving a talk at the UC Riverside Data Science Center .
  • November 18 2019 - Giving a talk at the UC Irvine AI/ML seminar.
  • November 7, 2019 - Paper accepted for poster presentation at the Machine Learning in Comp Bio (MLCB) workshop, co-located with NeurIPS.
  • September 2019 - Top 50% of NeurIPS reviewers.
  • August 2019 - Our paper, Speeding up Percolator, was accepted to the Journal of Proteome Research. The upgraded Percolator software described therein is freely available here.
  • April 15, 2019 - Received the UC Davis Award for Excellence in Postdoctoral Research, 2019.
  • Oct. 29, 2018 - Our paper, Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra, was accepted to NeurIPS 2018.
    [Code], [PDF]
  • See here for less recent news.


Selected Publications
See my publications page for the full list.
  • John T. Halloran and David M. Rocke.
    GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification.
    Advances in Neural Information Processing Systems (NeurIPS). 2020
    20% Acceptance rate, 1900 out of 9454 submissions.
    [Paper], [Poster], [Code]
  • John T. Halloran, Hantian Zhang, Kaan Kara, Cedric Renggli, Matthew The, Ce Zhang, David M. Rocke, Lukas Kall, William Stafford Noble.
    Speeding up Percolator.
    Journal of Proteome Research (JPR). 2019
    [PDF], [Software]
  • John T. Halloran and David M. Rocke.
    Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra.
    Advances in Neural Information Processing Systems (NeurIPS). 2018
    20.8% Acceptance rate, 1011 out of 4856 submissions.
    [Code], [PDF]
  • John T. Halloran and David M. Rocke.
    Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra.
    Advances in Neural Information Processing Systems (NIPS). 2017
    Spotlight presentation; 3.5% Acceptance rate, 112 out of 3240 submissions.
    [PDF], [Supplementary]
  • Jie Liu, John T. Halloran, Jeffrey Bilmes, Riza Daza, Choli Lee, Elisabeth Mahen, Donna Prunkard, Chaozhong Song, Sibel Blau, Michael Dorschner, Vijayakrishna Gadi, Jay Shendure, Anthony Blau, and William Noble.
    Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies.
    Scientific Reports. 2017
    [PDF], [URL] [Software]
  • Shengjie Wang, John T. Halloran, Jeff A. Bilmes and William S. Noble.
    Faster and more accurate graphical model identification of tandem mass spectra using trellises.
    Conference on Intelligent Systems for Molecular Biology (ISMB). 2016
    [PDF]
  • John T. Halloran, Jeff A. Bilmes, and William S. Noble.
    Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry,
    Uncertainty in Artificial Intelligence (UAI). 2014
    [PDF], [Supplementary Data]


Selected Awards
  • Award for Excellence in Postdoctoral Research, UC Davis 2019
  • Nvidia Hardware Grant (Tesla K40 GPU awarded for Deep Learning research), 2016
  • Genome Training Grant, University of Washington 2011-2013


Software
ProteoTorch: Deep Semi-Supervised Learning for Accurate Recalibration of Shotgun Proteomics Data

    ProteoTorch faithfully implements the semi-supervised learning framework (with cross-validation) pioneered by the C++ Percolator software, in a light and flexible Python package with an emphasis on speed. By default, deep neural network classifiers are used to accurately recalibrate PSMs and features collected from a database-search. A host of other classifiers are also available, including ultrafast support vector machine (SVM) implementations.
    [Preprint], [Software], [Documentation]

GPU-Optimized Logistic Regression and SVM training algorithms

    The following contains highly optimized GPU-training primal algorithms for logistic regression (in LIBLINEAR for sparse features) and SVMs (in Percolator for dense features). The underlying CPU-centric algorithms considered natively resist GPU-optimizations, due to the heavy sequential dependence of variables in the underlying algorithm (i.e., the trust-region second-order algorithm, TRON) and reliance on random access. Thus, the code enables GPU-optimizations by extensively using the following principles: 1) decouple CPU and GPU variable dependencies, (2) minimize transfer latency between the CPU and GPU, (3) saturate the GPU optimally by using routines which allow memory coalescing, and (4) maximize concurrency between the CPU and GPU.
    [NeurIPS 2020 Paper], [Software]

Jensen: A General Toolkit for Large-Scale Machine Learning and Convex Optimization

    Written in C++, Jensen is an easily-customizable toolkit for production-level machine learning and convex optimization. Fast, flexible, and light on external dependencies (only CMake is necessary to build the source), Jensen natively supports a large number of popular loss functions, state-of-the-art optimization algorithms, and machine learning applications. Documentation and code examples are described here. The software is freely available here and supported on Unix, OSX, and Windows operating systems.
    [Paper], [Software]

Optimized Percolator Software

    The following repository contains SVM solvers highly optimized for large-scale Percolator analysis: bitbucket.org/jthalloran/percolator_upgrade. Both solvers, Trust Region Newton (TRON) and L2-SVM-MFN*, support multithreading and are optimized for single-threaded use. Further details may be found in our paper, A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.
    [Paper], [Software]

The DRIP Toolkit

    The DRIP Toolkit (DTK), for searching a tandem mass spectra using a dynamic Bayesian network (DBN) for Rapid Identification of Peptides (DRIP), is now available! DTK supports parameter estimation for low-resolution MS2 searches, multithreading on a single machine, utilities easing cluster use, instantiating/decoding/plotting DRIP PSMs in the python shell, and in-browser analysis of identified spectra via the Lorikeet plugin. Further information and documentation regarding the toolkit's use is available in the DRIP Toolkit documentation. Details of the DRIP model may be found in here.
    [Software], [Paper], [Documentation]

HMM-DNN tutorial for GMTK

    I've written a short tutorial for the Graphical Models Toolkit (GMTK), with all pertinent files available in this tarball. The following is a copy of the tutorial's documentation. This tutorial covers training an HMM in GMTK via generative training (expectation maximization), discriminative training (maximum mutual information), and training an HMM/DNN hybrid.

Contact Info
  • Email: j concatenated with the first five letters of my last name at amazon dot com
Miscellaneous