U.S. Department of Energy

Pacific Northwest National Laboratory


Defining optimality and ensuring global optimal performance for a complex system of systems is a challenging, active area of research. These challenges are exacerbated when humans are components in such systems: while humans excel at learning and are adept at generalizing experiences to solve new problems, human behavior is often irrational and human processing speeds are comparatively slow and unpredictable at performing certain tasks (e.g., due to cognitive depletion).

Today, most streaming data is collected and stored for future analysis using custom algorithms based on the data type, which does not support real-time analysis. The goal of compressive analysis is to identify events of interest in streaming data without the need for long term storage or custom algorithms.  

We propose to develop a graph mining-based approach and framework that will allow humans to discover and detect important or critical graph patterns in data streams through the analysis of local patterns of interactions and behaviors of actors, entities, and/or features.

Cognitive depletion is poorly understood, we rely on people to self-assess their levels of fatigue or restrict them to limited periods of time on or off the job. This reduces overall performance and increases stress. Our research seeks to leverage techniques for modeling cognitive resources to create adaptive systems that extend peak performance.

The goal of this research effort is to develop human information processing modeling techniques that apply to a combination of measurable response behaviors. These behaviors include response choices, response times, eye or hand movement dynamics, vocal/speech patterns, and related psychophysiological measures (e.g., heart rate). By leveraging the richness of multimodal behavior dynamics, we can make inferences about the mental activity demanded by continuous, online decision making.

A knowledge graph is a repository of information about entities, where entities can be anything of interest. We are focused primarily on constructing knowledge graphs from natural language text. We employ state of the art natural language processing techniques to parse text documents and extract named entities and their relationships a process that involves embracing the entire data deluge on its face. 

This research challenges existing paradigms by taking incremental machine-learning methods from theory to practice and by modifying a single framework to simultaneously perform classification, detect anomalies, steer data collection, and incorporate missing data and anomaly detection to enable semi-supervised learning with incorporation of information from subject matter experts and/or other knowledge sources.

In complex environments, a single model may be of little use and fixed ensembles may be unable to arrive at a statistically optimal set that can produce insightful information for an analyst. We propose a population-based approach where different combinations of models are selected using statistical principles of effectiveness, parsimony, and goodness of fit.

Today, scientific simulations, experiments and handheld devices are producing data at exorbitant velocity. Machine Learning and Data Mining (MLDM) algorithms are quintessential in analyzing high velocity streams. Yet, existing algorithms are restricted to low velocities, and sequential execution.

SHyRe is exploring the effects of streaming data on the Semantic application stack and how volatile streaming data affects well-known reasoning platforms within the Semantic application stack.

Users interact with visual analytic systems to act upon cognitive processes as part of making sense of their information. What can we glean from the interactions that users perform during analysis? If successful, this project will develop an understanding of, or science of, interaction to systematically re-cast user interaction into mixed-initiative systems.

Stream Adaptive Foraging for Evidence (SAFE) will find and characterize events of interest within high-volume streaming data, by leveraging deep learning techniques. Deep learning techniques have been shown to be effective at learning hierarchical representations from unlabeled data.

The Streaming Data Characterization project is targeted at providing a set of algorithms to figure out, in a computationally efficient manner, which data items should be remembered as the stream flows on, and which should be forgotten. We do this by leveraging formal task descriptions and domain ontologies, and combining them with computationally efficient data structures and ranking systems. 

SQUINT is a stream summarization architecture that facilitates real-time summarization and visual representation of large collections of temporal event sequences. The capability will allow analysts to build and refine queries from example events, either observed or constructed, and then deploy these queries to continuously organize and summarize the large volumes of streaming data.

Streaming data has a natural temporal component as data is observed and collected in a sequential manner. We are developing a framework that improves statistical modeling and predictive analytics on data streams by leveraging the temporal structure of the data and including temporal dependence in probabilistic learning models.

Sensemaking is a cognitively demanding process and has been the focus of considerable research from varied disciplines including cognitive psychology, neuroscience, and information sciences. We bring this research into streaming environments now to learn how to support tasks beyond simple anomaly detection and situation awareness.

In streaming data environments, the only thing that is constant is change. To help analysts discover and reason, we will be conceptualizing a transparency-based visual analytics framework for integrating visualizations with automated methods that increase confidence and accuracy of human judgment for streaming data analysis.

Active learning methods of today have failed to live up to their promise of revolutionizing machine learning by combining human and machine into a more powerful machine learning system. Broadly, AIM User Centric Hypothesis Definition (UCHD) is concerned with how to help users understand complex dynamic or streaming data.

| Pacific Northwest National Laboratory