U.S. Department of Energy

Pacific Northwest National Laboratory

Online Predictive Analytics (OPA)

Objective

Traditional machine-learning approaches assume that a good training set is available a priori and also contains all the required knowledge to construct sufficient models that may be applied to new data collected under similar constraints. However, in many real-world examples, actors will change behavior dynamically and unknowingly or only subsets of behaviors are known.

This research challenges existing paradigms by taking incremental machine-learning methods from theory to practice and by modifying a single framework to simultaneously perform classification, detect anomalies, steer data collection, and incorporate missing data and anomaly detection to enable semi-supervised learning with incorporation of information from subject matter experts and/or other knowledge sources.

Approach

We are developing a framework that involves aspects of dynamic model generation, dynamic data mining, and anomaly detection to address challenge problems where features and concepts are drifting, and training examples are incrementally generated or made available to feed into an evolving model. We are merging machine-learning algorithms (e.g., Bayesian networks, support vector machines) with latent variable approaches (e.g., peak subtraction) to identify new classes of interest. We are applying our methodology across diverse domain challenges such as metabolite identification from nuclear magnetic resonance and company behavior from shipping transactions.

Achievements

  • Implemented and evaluated machine-learning algorithms
  • Developed latent variable approaches for identifying new unknown entities
  • Demonstrated ability to identify shifts in changes in underlying streaming data
  • Demonstrated potential metrics for determining when incremental model retraining is needed

Impact

  • Links sets of observations to specific classifications, outcomes, and/or predictions
  • Viable methods for automated identification of compounds in spectroscopy data
  • Incremental algorithms improve performance of incremental machine-learning algorithms compared to traditional algorithms
  • Automatically evolve the causal models as drifts in concepts and features emerge in data streams and/or evolve the models based on domain expert feedback
| Pacific Northwest National Laboratory