U.S. Department of Energy

Pacific Northwest National Laboratory

Online Predictive Analytics for Known and Unknown Chemical Identification in Spectroscopy Data

Publish Date: 
Thursday, January 23, 2014
With rapid real-time data streams, predictive analysis algorithms must execute under stringent space and time constraints. A class of algorithms that may naturally support predictive analysis on streaming data may be found in the vicinity of incremental machine learning. Traditional machine learn approaches assume that a good training set is always available a priori and contains all the required knowledge to construct sufficient models that may applied to new examples or problems. Yet, in many real-world applications that may involve aspects of dynamic control, dynamic data mining, time-series analysis, and drifting features and concepts, training examples are often incrementally generated or made available, and then fed into an evolving model. Thus, the learning process for such a dynamic model is incremental, which in many ways, mirrors how humans assimilate knowledge in piecemeal fashion over time. However, state-of-the-art algorithms in incremental machine learning that can utilize the current state of an algorithm to inform the next predictive model are limited in their capacities to ingest large-scale and/or high-velocity data, generate and maintain rapidly-evolving models, effectively deal with missing or incomplete data, and detect rare or anomalous outcomes or events. We will demonstrate the extension of incremental machine learning to identify unknown chemicals in streaming spectroscopy data. The algorithm developed has the capability to evolve as new chemicals are added to the library based on unknown identifications or new discoveries. This development could enable rapid identification of alerts in near real-time, even under the circumstance that the chemical(s) are not components of a pre-defined database.
Webb-Robertson BJM, SM Robinson, LM Bramer, J Yin, M Thomas, KT Mueller, and G Chin, Jr. In Press. "Online Predictive Analytics for Known and Unknown Chemical Identification in Spectroscopy Data." In Algorithms for Threat Detection Program Review. PNNL-SA-101271, Pacific Northwest National Laboratory, Richland, Washington.
| Pacific Northwest National Laboratory