PhD Thesis Defense for Esra Nergis Yolacan
Thesis Title: Learning from Sequential Data for Anomaly Detection}
October 9, 2014
442 Dana Research Center
Anomaly detection has been used in a wide range of real world problems and has received significant attention in a number of research fields over the last decades.
Anomaly detection attempts to identify events, activities, or observations which are measurably different than an expected behavior or pattern present in a dataset.
This thesis focuses on a specific set of techniques targeting the detection of anomalous behavior in a discrete, symbolic, and sequential dataset. Since profiling complex sequential data is still an open problem in anomaly detection, and given that the rate of production of sequential data in fields ranging from finance to homeland security is exploding, there is a pressing need to develop effective detection algorithms that can handle patterns in sequential information flows.
In this thesis, we address context-aware multi-class anomaly detection as applied to discrete sequences and develop a context learning approach using an unsupervised learning paradigm.
We begin the anomaly detection process by applying our approach to differentiate normal behavior classes (contexts) before attempting to model normal behavior. This approach leads to stronger learning on each class by taking advantage of the power of advanced models to identify normal behavior of the sequence classes. We evaluate our discrete sequence-based anomaly detection framework using two illustrative applications: 1) System call intrusion detection and 2) Crowd anomaly detection. We also evaluate how clustering can guide our context-aware methodology to positively impact the anomaly detection rate.
In this thesis, we utilize a Hidden Markov Model (HMM) to perform anomaly detection. A HMM is the simplest dynamic Bayesian network. A HMM is a Markov model which can be used when the states are not observable, but observed data is dependent on these hidden states.
While there has been a large amount of prior work utilizing Hidden Markov Models (HMMs) for anomaly detection, the proposed models became overly complex when attempting to improve the detection rate, while reducing the false detection rate.
We apply HMMs to perform anomaly detection on discrete sequential data. We utilize multiple HMMs, one for each context class. We demonstrate our multi-HMM approach to system call anomalies in cybersecurity and provide results in the presence of anomalies. Applying process trace analysis with multi-HMMs, system call anomaly detection achieves better results using better tuned model settings and a less complex structure to detect anomalies.
To evaluate the extensibility of our approach, we consider a second application, crowd behavior analytics. We attempt to classify crowd behavior and treat this as an anomaly detection problem on sequential data. We convert crowd video data into a discrete/symbolic sequence of data. We apply computer vision techniques to generate features from objects, and use these features for frame-based representations to model the behavior of the crowd in a video stream.
We attempt to identify anomalous behavior of a crowd in a scene by applying machine learning techniques to understand what it means for a video stream to be identified as ``normal''.
The results of applying our context-aware multi-HMMs approach to crowd analytics show the generality of our anomaly detection approach, and the power of our context-learning approach.
PhD Thesis Committee:
Dr. Fatemeh Azmandian, EMC
Dr. Jennifer Dy
Dr. David Kaeli, Advisor