U.S. Department of Energy

Pacific Northwest National Laboratory

Insider Threat Detection in the Cloud

As global computing culture quickly migrates away from traditional IT environments to a mixture of on-premise private and off-premise commercial cloud computing solutions, additional cybersecurity challenges are being introduced that many enterprises are unprepared to address. One such cybersecurity challenge, Insider Threat Detection, is a well-known and notoriously difficult area of research and application in traditional IT. These challenges are further complicated in the cloud, which simultaneously introduces unique data streams and decreased transparency, and for whom traditional threat detection solutions are not well-adapted to address.

ITDC will develop a rigorous methodology and streaming analysis paradigm to provide continuous, automated synthesis of new knowledge and enable steering of measurement or collection system steering in response to emerging knowledge. By doing so, we aim to rebalance the effort between humans and machines.

Challenge

Insider threat detection is at its core a human-centric cybersecurity challenge; in order to properly identity and mitigate insider threat activities, context and intention must be derived through the fusion of cyber and non-cyber data into actionable intelligence. The goal of actionable intelligence is to help decision makers interpret and understand observed activities and patterns of behavior, to ultimately drive informed cybersecurity decisions. Today, the insider threat detection process is at best a partially automated, highly manual process. Current practice for detecting insider threat activities, known as User Activity Monitoring (UAM), is typically accomplished via general observations consisting of basic automated alerting and filtering of cyber data combined with post hoc analysis performed by domain experts. These domain experts then begin creating rudimentary models (e.g. mental models) to characterize observed cyber phenomena in an attempt to identify what might constitute meaningful initial indicators of interest. Once patterns of interest begin emerging from initial indicators, further context must be derived through focused observations, which includes additional non-cyber data collection, before further actions or mitigations can be taken. In other words, collected cyber data must be fused with non-cyber data to generate actionable intelligence, a process that has similarities to the discipline of all-source analysis.

There are multiple difficult, generally unsolved problems that fall under the all-source, cybersecurity and insider threat detection analysis umbrellas, some of which are outside the scope of the AIM initiative; therefore this research program aims to explore two primary challenges:

  1. Extract indicators of interest from streaming cloud computing system telemetry and security auditing data
  2. Develop a methodology and tooling to efficiently and effectively perform streaming allsource analysis for insider threat detection, fusing cyber and non-cyber data

Why Cloud Computing environments?

There exists a large body of research on insider threat detection in traditional IT systems, but insider threat detection research on cloud computing environments is relatively new, and introduces a number of unique challenges:

  • Unique cloud computing environment traits
    • Software-defined networks
    • Inherent multi-tenancy
    • User/Object controlled tenants
    • Root access within tenants
    • Token and non-token based authentication
  • Extremely difficult and time-consuming to correlate and fuse cloud-specific cyber data, such as security event auditing and telemetry, with non-cyber data
  • Traditional network monitoring solutions are unavailable or dysfunctional against cloud computing environments
    • Low level of visibility, or no visibility, into regular tenant activities
      • Default root access in tenants allows users to deactivate or otherwise obfuscate cloud-provider installed software agents
    • Tenant-controlled software-defined networking (SDN) increases the complexity of directly monitoring internal network activities across the entire cloud
      • External (north/south) network traffic monitoring only available at the borders
      • Internal (east/west) network traffic unavailable for monitoring
      • NetFlow is generally unavailable
    • Traditional data streams are missing or may no longer be meaningful
      • Internal system log data may no longer make sense externally, as internal private IP addresses and hostnames are non-unique
      • External Floating IPs are not managed directly in the individual tenant SDN’s, therefore there is no internally exposed linkage between external floating IPs and internal tenant instances
  • New opportunities exist to compromise and exfiltrate data stored in cloud-based storage and network file systems, that may not exist in traditional IT environments

Approach

AIM’s transformational research agenda for Insider Threat Detection in the Cloud (ITDC) will tackle this new challenge head-on, through the development of novel, scientifically rigorous approaches, methodologies and tools for cyber defenders. These approaches and methodologies will assist cyber defenders in leveraging real-time quantitative information to make qualitative determinations of potential insider threat activities in cloud computing environments.

The ITDC research program will leverage human-in-the-loop (HIL) research outcomes, conducted under the AIM Initiative, which aims to enhance the way humans interact with large bodies of real-time, streaming data. In addition, tools will be developed to help cyber defenders identify and understand activities of potential interest or high risk in cloud telemetry and security audit streams. With this information, a cyber defender would then be provided with mechanisms to steer the appropriate data streams, ideally in real-time, to collect additional information needed to support their decision-making and analytical processes.

The program is divided into two phases:

Phase I

  • Establish use case approach, scenario, and goals through review of best practices, available resources, and research gaps
  • Using a representative user interface in a controlled experiment, establish a baseline for human performance in terms of anomalous pattern detection at differing data stream rates
  • Through interviews with subject matter experts, characterize the defender’s decision-making process with regard to cloud computing environments
  • Using a public insider threat dataset, optimize cyber data size reduction approaches to reduce the cognitive burden of understanding multiple, simultaneous real-time data streams and events on the human defender
    • Using these approaches, automatically notify the defender of interesting subsets of data (e.g. alert events)
  • Develop user interfaces (UIs) and computer interaction approaches to present the data subsets in a meaningful, approachable way and allow defenders to identify and triage events
  • Identify key gaps in the data processing and analysis loop and identify research areas, products, or partners that can help address them

Phase II

Overall goal in this phase is to integrate the data reduction, computer interaction approaches and user interfaces developed or identified in Phase I into a cohesive human-centered, all-source analysis, cyber defender system. Major activities in this phase are:

  • Develop approaches to support the real-time modeling and machine learning of streaming cyber activities and events
    • Evaluate performance of dynamic retraining methods on subsetting methods developed during Phase I and test performance of supervised methods trained by user feedback
  • Develop approaches to support the real-time human-guided steering of data collection and training of multiple competing hypotheses with supporting visualizations.
    • Continue developing UIs to allow users to efficiently manipulate the data stream and retrain models
    • Compare all-source analytical performance of humans on streaming data against the Phase I non-streaming baseline in subsequent user studies

Success Metrics

The ITDC research program has several high-level performance metrics, derived from AIM’s overarching research metrics, to be used as barometers of performance for the Initiative: Correctness, Usefulness, and Timeliness.

Additionally, the programmatic objective for this research program is to contribute to the ITDC research domain through:

  • Technical publications
  • Workshops
  • AIM Seminar speakers

Benefit

We aim to make the following contributions to the area of insider threat detection in cloud computing environments

  • Increase transparency for cyber defense of cloud computing environments, specifically for detection of insider threat activities
  • Improve quality of event alerting and activity modeling by reducing volume of false positives/false negatives presented to defenders
  • Create approaches to characterize “typical” user cloud behavior of individuals and groups using system-based data
  • Assess improvements in decision-making given AIM-developed streaming interfaces
  • Provide array of approaches (algorithms, software) that can be used independently or in concert to enhance the cyber defender’s toolkit
| Pacific Northwest National Laboratory