Detecting Privileged Accounts in Cybersecurity Data

Nariman Mammadli
7 min readMay 12, 2020

A data-driven approach to infer privileged accounts inspired from Alan Turing’s Imitation Game

Photo by Jeremy Bishop on Unsplash

Previously, I gave an introduction to applications of Artificial Intelligence (AI) and Machine Learning (ML) algorithms to cybersecurity problems, and argued that it will change the game of cyber defense. Here, I discuss a promising application of AI/ML to privileged access management. A privileged user is someone who has administrative access to critical systems, applications and accounts. Privileged access management can be defined as managing and auditing account and data access by privileged users. Conventionally, such privileged users, credentials, or accounts are identified beforehand. I propose a novel solution to learn which users, credentials, and accounts are privileged from the event logs, without prior knowledge. The AI solution for this inference problem is inspired by the work of the father of Artificial Intelligence, Alan Turing.

Alan Turing, in order to measure machine intelligence, ingeniously used a simple criteria. If a machine can deceive humans into believing it to be a human over a communication channel, then the machine can be called intelligent. Turing did not articulate this test using any sort of mathematics. He reasoned that, even though the precise description of intelligence eludes us humans, we have built in expectations and intuitions about it. Turing, in his Imitation Game, uses this human intuition without explicitly formulating it. Here, I formulate Turing’s test mathematically by connecting the Imitation Game with the Information Theory concept of Kullback–Leibler divergence. Finally, I show how this concept can be used for the task of inferring privileges from event logs in cybersecurity context.

Privileged accounts or privileged credentials have elevated access and permissions to critical systems, applications, and other accounts. Special management of such accounts is crucial to best defend against cyberattacks or misuses. In the context of threat detection, awareness of privileged accounts is required for the prioritization of security events, especially when the size of the network is large, and resources for defense are limited. Conventionally, such accounts and credentials are identified beforehand and access to them is monitored more closely. Security events that occur around them are also highly prioritized. This approach has two blind spots:

  1. There are accounts, especially in a complex network, that have accidentally been granted privileged permissions. These privileges can happen due to the unforeseen side effects of various security policies.
  2. Privileged credentials can be shared across privileged and non-privileged accounts. An admin user might use the same credentials for a highly critical account and his account on his laptop. In this case, the seemingly innocent personal account is as critical as the other one, since an attacker gaining access to personal accounts can in principle gain access to the other one.

Here, I explore the possibility of filling these gaps by learning privileges from data. My reverse reasoning about privileges using a machine learning algorithm extends the above definition of privilege and provides new insights into a cyber defense strategy. The essential component of the approach lies in the concept of KL divergence.

An interesting comparison metric: Kullback–Leibler divergence

KL divergence is built upon the concept of surprise in information theory. The surprise of an event is defined as the logarithm of the probability of the event. There is intuitive reasoning behind the choice of the logarithm. If an event whose probability of occurrence is 1 (100%), the incurred surprise is zero when it occurs. If an event whose probability of occurrence is 0 (0%) then its occurrence incurs infinite surprise. If an event has a 50% probability of happening, then we get 1 bit — the unit of information, if base 2 logarithm is chosen. We can quantify average surprise for probability distributions. For example, if we have a prior probability distribution over possible weather conditions for tomorrow, we can quantify the level of average surprise that awaits us.

KL divergence is an information theory tool for quantifying the difference or distance between two probability distributions. It is closely related to the famous concept of Imitation Game developed by Alan Turing to measure machine intelligence. Although not mathematically formulated by Turing, the Imitation Game has a mathematical intuition in it which is tied to the concept of KL divergence. To recreate the Imitation Game setting, imagine we are interacting via a computer terminal with two users, one of which is a human participant and another is a bot. Assume that both human and bot have the same dictionary to use for communication, and also assume that there is a probability distribution over all the words for both of them, denoted as P(human) and P(bot). These probability distributions are quantifying every word’s chance of being uttered next respectively by human and bot. Another, useful way of thinking about probability distributions is seeing them as our state of knowledge or expectations about the bot and the human.

Now take 4 scenarios:

  1. We believe we are interacting with a human user (we employ Pr(human) in our expectations of the conversation), and it is indeed the human that is typing the messages on the other side. This incurs in us a surprise amount of S_human.
  2. We believe we are interacting with a bot and it is the bot that is talking to us. This causes a surprise amount of S_bot.
  3. We believe we are interacting with a human user, but it is the bot that is typing on the other side. This results in surprise amount of S_bot_human.
  4. We believe we are interacting with a bot, but it is the human user that is typing on the other side. This results in surprise amount of S_human_bot.

Now, KL divergence from bot to human, divergence from P(bot) to P(human) denoted as KL (P(bot) -> P(human)) equals :

KL (P(bot) -> P(human))=S_bot_human — S_bot

This reads as how much more surprised we expect to be if the bot is typing on the other side, but we mistakenly believe we are talking to the human user.

Similarly,

KL(P(human) -> P(bot))=S_human_bot — S_human

This reads as how much more surprised we expect to be if the human user is typing on the other side, but we mistakenly believe we are talking to the bot.

If KL (P(bot) -> P(human)) < KL(P(human) -> P(bot)) then, the bot is better at imitating the human user than vice versa. Better, because the bot incurred less extra surprise while deceiving us than the human user while pretending to be the bot. Turing’s test indirectly measures KL (P(bot) -> P(human)). The less this value, the better the bot managed to deceive us into believing that we are talking to a human user.

Imitation game among the cyber accounts

Now, given the probability distributions summarizing our state of knowledge about accounts A and B, we can apply the technique above and compute which account is the better imitator of the other one. We can do this for all account pairs present in the data, and rank the accounts based on their imitation power of other accounts. What is special about the account that ranks first? The first ranking account is the one that can imitate all the other accounts better than them imitating it. Being able to imitate all the other accounts implies that the behavior repertoire of this account is not only complex and rich but that it contains in itself the behavior repertoire of other accounts. From a cybersecurity perspective, one can reason that if an attacker gets hold of such an account, he can navigate and carry out the most diverse set of operations against the victim network compared to a case in which he gets hold of a low ranking account. Given this reasoning, I can extend the definition of what privilege means in cybersecurity settings as follows: privileged accounts or credentials are those whose activity repertoire contains in itself the repertoire of the other accounts; they can in principle imitate the other accounts if needed, but not vice versa.

An example: analyzing login events

Login events are one example of cyber data from which I can infer the privileges of accounts involved. Every login event tells a story of who logged in to what account, at what time, using what software, and so on. This data can be represented in a network of entities, where two entities are connected if there was a login event that occurred between them. The intensity of the connections between the entities can be derived from the frequency of their connections (how many times does it occur?) or by their persistence (does it happen every hour or sporadically?). This network represents our knowledge. We can extract our knowledge of accounts A and B from this network or graph in the form of probability distributions. One way of doing this is to use the page rank algorithm. It is heavily used by Google to assign probabilities to pages and ranking them based on those probabilities. Here, I use a slight variation of the page rank algorithm, personalized page rank. The personalized page rank also returns a probability distribution over the entities; however, it is conditioned by the source entity. Running personalized page rank for accounts A and B returns probability distribution for each of them. Given these probability distributions, the above ranking technique can be applied to see which account is the better imitator of the other.

The ranking of an account based on this technique may not be in full agreement with the prior expectations about the account. This situation might hint at multiple scenarios:

  1. The account is granted elevated access mistakenly
  2. The account might have been subject to a privilege escalation attack.
  3. The account is used by a user who has access to other critical accounts (see blind spot 2 above)

Conclusion

I have presented the previously unexplored mathematical basis of the concept of the Imitation Game by Alan Turing by tying it to the concept of Kullback–Leibler (KL) divergence. I proposed a technique based on the KL divergence to learn privileged account usage from cybersecurity data that can remedy the possible blind spots left by the conventional approaches to privileged access management.

--

--

Nariman Mammadli

Exploring the boundaries of artificial intelligence with a special interest in its applications on cybersecurity. linkedin.com/in/mammadlinariman