Applied Data Science

In 2006 I started researching and creating defensive technologies for the discovery of botnet and other cyber operations. The goal of this research was to leverage natural processes like immunological processes and develop models that distinguish between both normal and abnormal features of a system and its behaviors. In this blog post, I want to bring that focus back into view and differentiate it in today’s network defense technologies.

Traditional sensor technologies focus on the use of rules implemented as countermeasures to detect known behaviors of threat vectors. These countermeasure produce alerts and are managed by various types of SIEM technologies like Splunk. Such systems can correlate alerts, yet, in order to create an alert, most often you need to understand a signature of a threat. I wanted to take on a different approach looking at both normal and abnormal system behaviors.

During the research, and creation, of a proof of concept, I introduced the concept of system aggregate behavior analytics, which was presented at DHS’s CATCH 2009. (https://resources.sei.cmu.edu/asset_files/Presentation/2010_017_001_49775.pdf). Here, rather than interpret threat activity through events produced by instruction detection systems, firewalls, etc…, this work focused on the capture and transformation of network flow data, and then host data e.g. process behaviors.

What does network system behaviors look like?, look at the following picture. There are specific areas within the feature space that are consistent over time.

Taken from my DMnet 2009 presentation, where every dot is a separate system/device operating in terms of packets and bytes captured within flow data over a month. The hosts exchanging mostly small packets, and small byte size flow are down to the right. Servers dishing out more packets with more bytes e.g. web servers go up to the right.

Define behavior analytics in terms of system behaviors can provide a new path to understanding behavior attribution analytics (https://applieddatascience.org/?p=53&).

In the Beginning…

In 1987 Dr. Dorothy Denning, in her seminal paper, (https://www.cs.colostate.edu/~cs656/reading/ieee-se-13-2.pdf) defined network intrusions in terms Subjects, Objects, and Profiles. To summarize, in her model, subjects e.g. users/processes, act on objects creating profiles comprised of behaviors. For example, copying a executable to disk is an subject’s action that can be recorded by a sensor e.g. Host-based IDS, Tripwire.

Dr. Denning’s paper served as a cornerstone in establishing and formalizing future models for network security. My departure from her model included establishing and using more abstract system behaviors used to identify both normal and abnormal behaviors. Yet, most of the current research has been focused on user behavior analytics.

Today’s Behavior Analytics

Most behavior analytic models today, focus on user behaviors, as seen in this article written in 2020 titled “Review and insight on the behavioral aspects of cybersecurity” https://doi.org/10.1186/s42400-020-00050-w. The article surveys behavioral approaches in cyber security in terms of hacker behavior, human factors, models and simulation. In, file:///Users/owenmccusker/Downloads/07-krasznay-hamornik-aarms-2018-03-online.pdf, user behavior models are reviewed in terms identifying some cyber threats by use of authentication and user behaviors using advanced analytics.

The goal of my research and development in 2009 was not on user behavior models, but rather on establishing system behavior models.

Defining a System Behavior Analytical Model

I defined a system behavior analytical model in terms of a aggregated behavioral feature space represents an N-dimensional structure capturing contact behaviors over a set of defined time periods: hour, day, week, month, year, cumulative, and/or a custom time period.

Yet to understand normal/abnormal behaviors, I needed a different perception of system behaviors; and I chose to leverage network flow data to get there, then to be followed by system process data. I also needed to change the object from the flow, to something more tangible, e.g. a device. Doing so, flips the model from being solely focused perceiving and comprehending threats in terms of security events, event-centric/alert-centric, and moving to creating transformations that are system-centric, or device-centric.

This contact-centric feature space is scalable compared to conventional systems in that there is a known number of behaviors and time periods being collected per system. It is scalable in that there could be thousands of trained models operating over a feature space looking for specific behaviors e.g. beaconing. This is in stark contrast to the alert-centric conventional technology, in which there is a unknown number of alerts that are managed in the system over time by various contacts.

Aggregated behaviors on various contacts, e.g., hosts, and/or employees within an enterprise, are invaluable in its ability to provide indicators to changes in threat behavior.

The detection model processes network data and events, $d_{O}$ , using network sensors, $S$ , grouping the data by network objects, $O$ , \emph{e.g.}, hosts. Instead of being an alert-centric model, this model focuses on identifying threats, and threat behaviors in terms of higher-grained network objects; \emph{e.g.}, hosts. Note, that in the case of network flow, the network data represents a tuple where $tuple_{d} = \langle d_{1}, d_{2}, \ldots, d_{n} \rangle$ . Network data from sensors is inherently non-numeric and must be transformed into numerical form represented by $td_{O}$ and their tuples $tuple_{td} = \langle td_{1}, td_{2}, \ldots, td_{n} \rangle$ .

Various behavioral analysis functions, $bf_{analysis}$ , operate over the transformed data, $tuple_{td}$ , creating an n-tuple of behavioral features values, $\nu_{bf}$ , associated with each object processed, $O$ . The behavioral feature value, $\nu_{bf}$ , represents $feedback$ in the overall model, where,

(1) $\begin{equation*} bf_{analysis}(tuple_{td}) = \langle \nu_{bf1}, \nu_{bf2}, \ldots, \nu_{bfn} \rangle = tuple_{\nu bf} \end{equation*}$

These behavioral value tuples, $tuple_{\nu bf}$ , are then input into a shared sample space or behavioral feature space, $FS_{\nu bf}$ . Behavioral values can represent the results of simple heuristics such as statistically determining if network flows associated with a host are incoming outgoing. In the future, they also can represent a metric resulting from a distance measure. A distance measure may be a good measure when tight clusters of behaviors are exhibited like the AOL cluster in Figure ??. A distance measure of the host, $O_{host}$ from a centriod of a region, $c_{region}$ , of the behavioral feature space, $FS_{\nu bf}$ , can be calculated where,

(2) $\begin{equation*}\sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}.\end{equation*}$

If $O_{host}=(h\nu_{bf1},h\nu_{bf2},\ldots,h\nu_{bfn})$ and $c_{region}=(c\nu_{bf1},c\nu_{bf2},\ldots,c\nu_{bfn})$ , then formula 2 can be generalized by defining the Euclidean distance from $a$ to $b$ as

(3) $\begin{equation*}d(O_{host},c_{region})=\sqrt{(c_1-h_1)^2+(c_2-h_2)^2+\cdots+(c_n-h_n)^2}.\end{equation*}$

Classifiers are tuned to view the features space using different sets of features or feature-tuples. Within those views behavioral regions are defined and a distance measure is used to score a host within a region based on the centroid of that region. In this context, classification is determined from a set of behavioral features values, $\nu_{bf}$ , calculated using various types of methods including: simple statistical based heuristics and distance measurements from known regions.

Element	Description
Sensor	A device providing observables in the form of raw network data and/or network events.
Network Object	An object being tracked, representing either a host, host group, or subnet
Network Object Data	Network data and events are usually non-numerical in nature and must be transformed into a numerical format for processing; e.g., average incoming network bytes per flow. This data is used by the behavioral analysis functions
Transformed Network Object Data	Network data and events are usually non-numerical in nature and must be transformed into a numerical format for processing; e.g., average incoming network bytes per flow. This data is used by the behavioral analysis functions
Behavioral Analysis Function	function that operates over the sample space to create a behavioral feature value for a network object
Behavioral Feature Value	A single behavioral feature produced by ((bf_{analysis})). This represents a single classification
Behavioral Feature Space	The feature space used by classifiers and correlaters
Behavioral Feature N-Tuple	A set of features (in the form of a record of label-value pairs) describing a network object (O). The feature characteristic of a host, for example, may list inter-packet arrival time, outgoing packet size, work-weight ratio, etc

Such a system acts as overlay and underlay technology, providing a unique perspective to existing defensive strategies ingesting data from existing sensors, and pushing threat indications to existing security information management systems. As an overlay to existing alert-centric sensors, e.g., Snort®, the system transforms the raw data into a host-centric perspective and fuses behaviors. As an underlay system to a Unified Threat Management (UTM) technology, the system can provide a unique behavior perspective correlated to existing alerts and data.

Examples of System Behavior Derived from Network Flow

Transforming and aggregating flow data into a device-centric perspective provides insight into how devices behaviors are various time periods. Yet, identifying a behavior does not necessarily mean that a threat has been identified. For example, for a customer, I was asked to identify “slow-downs” on their internal network. What I found in looking at the flow was that there was behaviors indicative of the distributed hash table DHT protocol exhibited on their network, which is used to find the closest peer within a peered network. DHT is used in bittorrent, which may be acceptable under some situations, but, it is also used in the Storm botnet.

This aggregate device-centric view shows how a specific protocol can be identified within the inside of a network.

Behavior Analytic Solutions

In the work I did for DHS I started to define behavior analytics back in 2006 to 2009 in terms of system behaviors. The concept was picked up on by a number of companies, but, applied in a different manner. Some folks in the industry look at behavior analytics as a relatively new concept created a few years ago https://www.securityondemand.com/exactly-behavioral-analytics/,

Conclusion

There is a place for user behavior analytics, absolutely, but more importantly, device-centric system behavior analytics has proven to be a complementary approach to the traditional event-centric network defense. Such capabilities can offer a complementary “over the horizon” perspective complimenting today’s alert-centric technologies.