U.S. Department of Energy

Pacific Northwest National Laboratory

Publications

2017

Experts, cutting across domains like biology, climate science, cyber, energy, etc., frequently use visualizations as the principal medium for making analytical judgments or for presenting the results of their analysis. However, scientists are often skeptical about adopting new visualization methods over familiar ones, although the latter might be perceptually sub-optimal. This is due to the use...
Visual analytic tools combine the complementary strengths of humans and machines in human-in-the-loop systems. Humans provide invaluable domain expertise and sensemaking capabilities to this discourse with analytic models; however, little consideration has yet been given to the ways inherent human biases might shape the visual analytic process. In this paper, we establish a conceptual framework...
Systems have biases. Their interfaces naturally guide a user toward specific patterns of action. For example, modern word-processors and spreadsheets are both capable of taking word wrapping, checking spelling, storing tables, and calculating formulas. You could write a paper in a spreadsheet or could do simple business modeling in a word-processor. However, their interfaces naturally...
Wall E, LM Blaha, C Paul, KA Cook, and A Endert. 2017. "Four Perspectives on Human Bias in Visual Analytics." In Dealing with Cognitive Biases in Visualisations : a VIS 2017 workshop.
In the age of data science, the use of interactive information visualization techniques has become increasingly ubiquitous. From online scientific journals to the New York Times graphics desk, the utility of interactive visualization for both storytelling and analysis has become ever more apparent. As these techniques have become more readily accessible, the appeal of combining interactive...
Storylines are adept at communicating complex change by encoding time naturally on the x-axis and using the proximity of lines in the y direction to encode an interaction between entities at a particular time. The original definition of a storyline visualization requires data defined in terms of explicit interaction sessions. A more relaxed definition allows storyline visualization to be applied...
Cyber network analysts follow complex processes in their investigations of potential threats to their network. Much research is dedicated to providing automated tool support in the effort to make their tasks more efficient, accurate, and timely. This tool support comes in a variety of implementations from machine learning algorithms that monitor streams of data to visual analytic environments for...
Property graphs can be used to represent heterogeneous networks with attributed vertices and edges. Given one property graph, simulating another graph with same or greater size with identical statistical properties with respect to the attributes and connectivity is critical for privacy preservation and bench marking purposes. In this work we tackle the problem of capturing the statistical...
State-of-the-art visual analytics models and frameworks mostly assume a static snapshot of the data, while in many cases it is a stream with constant updates and changes. Exploration of streaming data poses unique challenges as machine-level computations and abstractions need to be synchronized with the visual representation of the data and the temporally evolving human insights. In the visual...
Electron energy loss spectroscopy (EELS) is typically conducted in STEM mode with a spectrometer, or in TEM mode with energy selction. These methods produce a 3D data set (x, y, energy). Some compressive sensing [1,2] and inpainting [3,4,5] approaches have been proposed for recovering a full set of spectra from compressed measurements. In many cases the final form of the spectral data is an...
Compressive Sensing (CS) allows a signal to be sparsely measured first and accurately recovered later in software [1]. In scanning transmission electron microscopy (STEM), it is possible to compress an image spatially by reducing the number of measured pixels, which decreases electron dose and increases sensing speed [2,3,4]. The two requirements for CS to work are: (1) sparsity of basis...
Traditionally, microscopists have worked with the Nyquist-Shannon theory of sampling, which states that to be able to reconstruct the image fully it needs to be sampled at a rate of at least twice the highest spatial frequency in the image. This sampling rate assumes that the image is sampled at regular intervals and that every pixel contains information that is crucial for the image (it even...
Compressive sensing approaches are beginning to take hold in (scanning) transmission electron microscopy (S/TEM) [1,2,3]. Compressive sensing is a mathematical theory about acquiring signals in a compressed form (measurements) and the probability of recovering the original signal by solving an inverse problem [4]. The inverse problem is underdetermined (more unknowns than measurements), so it is...
Images that are collected via scanning transmission electron microscopy (STEM) can be undersampled to avoid damage to the specimen while maintaining resolution [1, 2]. We have used BPFA to impute missing data and reduce noise [3]. The reconstruction is typically evaluated using the peak signal-to-noise ratio (PSNR). This measure is too conservative for STEM images and we propose that the Fourier...
Since Wolfgang Pauli posed the question in 1933, whether the probability densities |Ψ(r)|² (real-space image) and |Ψ(q)|² (reciprocal space image) uniquely determine the wave function Ψ(r) [1], the so called Pauli Problem sparked numerous methods in all fields of microscopy [2, 3]. Reconstructing the complete wave function Ψ(r) = a(r)e-iφ(r) with the amplitude a(r) and the phase φ(r) from the...
Scanning transmission electron microscopes (STEM) provide high resolution images at an atomic scale. Unfortunately, the level of electron dose required to achieve these high resolution images results in a potentially large amount of specimen damage. A promising approach to mitigate specimen damage is to subsample the specimen [1, 2, 3]. With random sampling, the microscope creates high resolution...
We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's...
We introduce new dictionary learning methods for tensor-variate data of any order. We represent each data item as a sum of Kruskal decomposed dictionary atoms within the framework of beta-process factor analysis (BPFA). Our model is nonparametric and can infer the tensor-rank of each dictionary atom. This Kruskal-Factor Analysis (KFA) is a natural generalization of BPFA. We also extend KFA to a...
The ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a trans- formative capability. Despite their value, automated construction of knowledge graphs remains an expensive technical challenge that is beyond the reach for most enterprises and academic institutions. We propose an end-to-end framework for developing custom knowl- edge...
To promote more interactive and dynamic machine learn- ing, we revisit the notion of user-interface metaphors. User-interface metaphors provide intuitive constructs for supporting user needs through interface design elements. A user-interface metaphor provides a visual or action pattern that leverages a user’s knowledge of another domain. Metaphors suggest both the visual representations that...
Capacity coefficient analysis could offer a theoretically grounded alternative approach to subjective measures and dual task assessment of cognitive workload. Workload capacity or workload efficiency is a human information processing modeling construct defined as the amount of information that can be processed by the visual cognitive system given a specified of amount of time. In this paper, I...
Visual data analysis helps people gain insights into data via interactive visualizations. People generate and test hypotheses and questions about data in context of the domain. This process can generally be referred to as sensemaking. Much of the work on studying sensemaking (and creating visual analytic techniques in support of it) has been focused on static datasets. However, how do the...
Visual analytic systems have long relied on user studies and standard datasets to demonstrate advances to the state of the art, as well as to illustrate the efficiency of solutions to domain-specific challenges. This approach has enabled some important comparisons between systems, but unfortunately the narrow scope required to facilitate these comparisons has prevented many of these lessons from...

2016

FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel...
The recent successes of machine learning (ML) have exposed the need for making models more interpretable and accessible to different stakeholders in the data science ecosystem, like domain experts, data analysts, and even the general public. The assumption here is that higher interpretability will lead to more confident human decision-making based on model outcomes. In this talk, we report on two...
Many techniques exist for visual representing of temporal event sequences. However, fewer techniques are purposely designed to support visual comparison of event sequences within a large collection of sequences despite the importance of this task for understanding common trends or detecting anomalies. Comparing multiple event sequences is further complicated when those sequences have a very long...
Dealing with the curse of dimensionality is a key challenge in high-dimensional data visualization. We present SeekAView to address three main gaps in the existing research literature. First, automated methods like dimensionality reduction or clustering suffer from a lack of transparency in letting analysts interact with their outputs in real-time to suit their exploration strategies. The results...
Visual analytic systems have long relied on user studies and standard datasets to demonstrate advances to the state of the art, as well as to illustrate the efficiency of solutions to domain-specific challenges. This approach has enabled some important comparisons between systems, but unfortunately the narrow scope required to facilitate these comparisons has prevented many of these lessons from...
Reasoning and querying over data streams rely on the ability to deliver a sequence of stream snapshots to the processing algorithms. These snapshots are typically provided using windows as views into streams and associated window management strategies. Generally, the goal of any window management strategy is to preserve the most important data in the current window and preferentially evict the...
In-situ (scanning) transmission electron microscopy (S/TEM) is being developed for numerous applications in the study of nucleation and growth under electrochemical driving forces. For this type of experiment, one of the key parameters is to identify when nucleation initiates. Typically the process of identifying the moment that crystals begin to form is a manual process requiring the user to...
Combining interactive visualization with automated analytical methods like statistics and data mining facilitates data-driven discovery. These visual analytic methods are beginning to be instantiated within mixed-initiative systems, where humans and ma- chines collaboratively influence evidence-gathering and decision-making. But an open research question is that, when domain experts analyze their...
Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This...
As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual archives, social networks, Internet of Things, enterprises,...
While streaming data have become increasingly more popular in business and research communities, semantic models and processing software for streaming data have not kept pace. Traditional semantic solutions have not addressed transient data streams. Semantic web languages (e.g., RDF, OWL) have typically addressed static data settings and linked data approaches have predominantly addressed static...
Precise analysis of both (S)TEM images and video are time and labor intensive processes. As an example, determining when crystal growth and shrinkage occurs during the dynamic process of Li dendrite deposition and stripping involves manually scanning through each frame in the video to extract a specific set of frames/images. For large numbers of images, this process can be very time consuming, so...
NSF workshop on Stream Reasoning
This report documents progress made on all LDRD-funded projects during fiscal year 2015.
A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic unpooling is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is...
Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for...
Storyline visualizations offer an approach that promises to capture the spatio-temporal characteristics of individual observers and simultaneously illustrate emerging group behaviors. We develop a visual analytics approach to parsing, aligning, and clustering fixation sequences from eye tracking data. Visualization of the results captures the similarities and differences across a group of...

2015

It is useful to understand and to predict the dynamics of cognitive performance and how the timing of breaks affects this process. Prior research analyzed data from online standardized test questions, enabling the creation of a model in which a secondary resource replenishes a primary resource that determines the probability of a successful outcome. However, parameters for this model require...
Computing innovations have fundamentally changed many aspects of scientific inquiry. For example, advances in robotics, high-end computing, networking, and databases now underlie much of what we do in science such as gene sequencing, general number crunching, sharing information between scientists, and analyzing large amounts of data. As computing has evolved at a rapid pace, so too has its...
We characterize the commercial behavior of a group of companies in a common line of business using a small ensemble of classifiers on a stream of records containing commercial activity information. This approach is able to effectively find a subset of classifiers that can be used to predict company labels with reasonable accuracy. Performance of the ensemble, its error rate under stable...
Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate...
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance...
Cognitive Depletion, the decline in user performance over time through the exhaustion of mental resources, ensures an increasing prevalence of human error in the interaction between computers and their users. Key logger data from the Science of Interaction project was analyzed to determine if patterns in user activity could be used to determine change in user performance. Though the majority of...
This brief white paper describes PNNL’s Analysis In Motion (AIM) initiative, with special emphasis on the requirements of AIM’s TEM use case.
Business intelligence problems are particularly challenging due to the use of large volume and high velocity data in attempts to model and explain complex underlying phenomena. Incremental machine learning based approaches for summarizing trends and identifying anomalous behavior are often desirable in such conditions to assist domain experts in characterizing their data. The overall goal of this...
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples...

Pages

| Pacific Northwest National Laboratory