John T. Halloran

 
About

I am a machine learning postdoc at UC Davis. My research primarily deals with two different approaches to accelerate the training of popular machine learning models:

  • Deriving theoretical-guarantees to efficiently learn large numbers of model parameters
  • Developing fast machine learning algorithms for high-performance computing systems
With a focus on computational biology applications, my work has regularly improved both the speed and accuracy of analysis on massive-scale data.

I joined UC Davis in 2016 to work with David Rocke. I received my Ph.D. that same year from the University of Washington Department of Electrical Engineering working with Jeff Bilmes and Bill Noble, where I was also affiliated with the Department of Genome Sciences.

Curriculum Vitae

News
  • November 22 2019 - Giving a talk at the UC Riverside Data Science Center .
  • November 18 2019 - Giving a talk at the UC Irvine AI/ML seminar.
  • November 7, 2019 - Paper accepted for poster presentation at the Machine Learning in Comp Bio (MLCB) workshop, co-located with NeurIPS.
  • August 2019 - Our paper, Speeding up Percolator, was accepted to the Journal of Proteome Research. The upgraded Percolator software described therein is freely available here.
  • April 15, 2019 - Received the UC Davis Award for Excellence in Postdoctoral Research, 2019.
  • Oct. 29, 2018 - Our paper, Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra, was accepted to NeurIPS 2018.
    [Code], [PDF]
  • July 21, 2018 - A new book chapter detailing how to use the DRIP Toolkit (our DBN software for Mass Spec. Analysis) is available in Data Mining for Systems Biology: Methods and Protocols, Second Edition published by Springer.
  • July 17, 2018 - Release v0.1.0 of Jensen, our easily extensible toolkit for large-scale machine learning and convex optimization, is freely available for download on github. Further toolkit details and documentation are available here.
  • June 27, 2018 - The multithreaded Percolator software from our recent JPR paper has been upgraded to support multithreaded cross-validation. The updated software is freely available here.
  • April 2018 - Our paper, A Matter of Time - Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics, was accepted to the Journal of Proteome Research. The optimized Percolator software described therein is freely available here.
  • Dec. 2017 - Our paper, Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies, was accepted to Scientific Reports. Software for inference in the described graphical model, THEMIS, is available here.
  • Sept. 2017 - Our paper, Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra, was accepted for a spotlight at NIPS 2017.
  • May 2017 - I'll be giving a keynote at the The Third Workshop on Advanced Methodologies for Bayesian Networks in Kyoto, Japan. More info is available here.


Selected Publications
See my publications page for the full list.
  • John T. Halloran and David M. Rocke
    GPU-Accelerated SVM Learning for Extremely Fast Large-Scale Proteomics Analysis.
    Machine Learning in Computational Biology (MLCB) Meeting. 2019
  • John T. Halloran and David M. Rocke.
    Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra.
    Advances in Neural Information Processing Systems (NeurIPS). 2018
    20.8% Acceptance rate, 1011 out of 4856 submissions.
    [Code], [PDF]
  • John T. Halloran and David M. Rocke.
    Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra.
    Advances in Neural Information Processing Systems (NIPS). 2017
    Spotlight presentation; 3.5% Acceptance rate, 112 out of 3240 submissions.
    [PDF], [Supplementary]
  • Jie Liu, John T. Halloran, Jeffrey Bilmes, Riza Daza, Choli Lee, Elisabeth Mahen, Donna Prunkard, Chaozhong Song, Sibel Blau, Michael Dorschner, Vijayakrishna Gadi, Jay Shendure, Anthony Blau, and William Noble.
    Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies.
    Scientific Reports. 2017
    [PDF], [URL] [Software]
  • Shengjie Wang, John T. Halloran, Jeff A. Bilmes and William S. Noble.
    Faster and more accurate graphical model identification of tandem mass spectra using trellises.
    Conference on Intelligent Systems for Molecular Biology (ISMB). 2016
    [PDF]
  • John T. Halloran, Jeff A. Bilmes, and William S. Noble.
    Learning Peptide-Spectrum Alignment Models for Tandem Mass Spectrometry,
    Uncertainty in Artificial Intelligence (UAI). 2014
    [PDF], [Supplementary Data]


Selected Awards
  • Award for Excellence in Postdoctoral Research, UC Davis 2019
  • Nvidia Hardware Grant (Tesla K40 GPU awarded for Deep Learning research), 2016
  • Genome Training Grant, University of Washington 2011-2013


Software
Jensen: A General Toolkit for Large-Scale Machine Learning and Convex Optimization

    Written in C++, Jensen is an easily-customizable toolkit for production-level machine learning and convex optimization. Fast, flexible, and light on external dependencies (only CMake is necessary to build the source), Jensen natively supports a large number of popular loss functions, state-of-the-art optimization algorithms, and machine learning applications. Documentation and code examples are described here. The software is freely available here and supported on Unix, OSX, and Windows operating systems.

Optimized Percolator Software

    The following repository contains SVM solvers highly optimized for large-scale Percolator analysis: bitbucket.org/jthalloran/percolator_upgrade. Both solvers, Trust Region Newton (TRON) and L2-SVM-MFN*, support multithreading and are optimized for single-threaded use. Further details may be found in our paper, A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.

The DRIP Toolkit

    The DRIP Toolkit (DTK), for searching a tandem mass spectra using a dynamic Bayesian network (DBN) for Rapid Identification of Peptides (DRIP), is now available! DTK supports parameter estimation for low-resolution MS2 searches, multithreading on a single machine, utilities easing cluster use, instantiating/decoding/plotting DRIP PSMs in the python shell, and in-browser analysis of identified spectra via the Lorikeet plugin. Further information and documentation regarding the toolkit's use is available in the DRIP Toolkit documentation. Details of the DRIP model may be found in here.

HMM-DNN tutorial for GMTK

    I've written a short tutorial for the Graphical Models Toolkit (GMTK), with all pertinent files available in this tarball. The following is a copy of the tutorial's documentation. This tutorial covers training an HMM in GMTK via generative training (expectation maximization), discriminative training (maximum mutual information), and training an HMM/DNN hybrid.

Contact Info
  • Email: jt concatenated with my last name at ucdavis dot edu