The Case of The Stolen Szechuan Sauce¶

This is a simple colab demonstrating one way of analyzing data from the Stolen Szechuan Sauce challenge (found here).

This colab will not go into any of the data upload. It assumes that all data is already collected and uploaded to Timesketch. To see one way of uploading the data to Timesketch, use this colab

For a more generic instructions of Colab can be found here

Setup¶

If you are running this on a cloud runtime you'll need to install these dependencies:

In [ ]:

# @markdown Only execute if not already installed and running a cloud runtime
!pip install -q timesketch_api_client
!pip install -q vt-py nest_asyncio pandas
!pip install -q picatrix

In [ ]:

# @title Import libraries
# @markdown This cell will import all the libraries needed for the running of this colab.

import re
import requests

import pandas as pd

from timesketch_api_client import config
from picatrix import notebook_init

import vt
import nest_asyncio # https://github.com/VirusTotal/vt-py/issues/21

nest_asyncio.apply()
notebook_init.init()

In [ ]:

# @title VirusTotal Configuration
# @markdown In order to be able to lookup domains/IPs/samples using VirtusTotal we need to get an API key.
# @markdown
# @markdown If you don't have an API key you must sign up to [VirusTotal Community](https://www.virustotal.com/gui/join-us).
# @markdown Once you have a valid VirusTotal Community account you will find your personal API key in your personal settings section. 

VT_API_KEY = '' # @param {type: "string"}

# @markdown If you don't have the API key you will not be able to use the Virustotal API
# @markdown to lookup information.

In [ ]:

# @title Declare functions

# @markdown This cell will define few functions that we will use throughout
# @markdown this colab. This would be better to define outside of the notebook
# @markdown in a library that would be imported, but we keep it here for now.

def print_dict(my_dict, space_before=0):
  """Print the content of a dictionary."""
  max_len = max([len(x) for x in my_dict.keys()])
  spaces = ' '*space_before
  format_str = f'{spaces}{{key:{max_len}s}} = {{value}}'
  for key, value in my_dict.items():
    if isinstance(value, dict):
      print(format_str.format(key=key, value=''))
      print_dict(value, space_before=space_before + 8)
    elif isinstance(value, list):
      value_str = ', '.join(value)
      print(format_str.format(key=key, value=value_str))
    else:
      print(format_str.format(key=key, value=value))


def ip_info(address):
  """Print out information about an IP address using the VT API."""
  url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'
  params = {
      'apikey': VT_API_KEY,
      'ip': address}

  response = requests.get(url, params=params)
  j_obj = response.json()

  def _print_stuff(part):
    print('')
    header = part.replace('_', ' ').capitalize()
    print(f'{header}:')
    for item in j_obj.get(part, []):
      print_dict(item, 2)

  _print_stuff('resolutions')
  _print_stuff('detected_urls')
  _print_stuff('detected_referrer_samples')
  _print_stuff('detected_communicating_samples')
  _print_stuff('detected_downloaded_samples')

In [ ]:

# @markdown Get a copy of the Timesketch client object.
# @markdown Parameters to configure the client:
# @markdown + host_uri: https://demo.timesketch.org
# @markdown + username: demo
# @markdown + auth_mode: timesketch (username/password)
# @markdown + password: demo

ts_client = config.get_client(confirm_choices=True)

Now that we've got a copy of the TS client we need to get to the sketch.

In [ ]:

for sketch in ts_client.list_sketches():
  if not sketch.name.startswith('Szechuan'):
    continue

  print('We found the sketch to use')
  print(f'[{sketch.id}] {sketch.name} - {sketch.description}')
  break

OK, sketch nr 6 is the one that we are after, let's set that as the active sketch. This is something that the Timesketch picatrix magics expect, that is to first set the active sketch that you will be using. After that all the magics don't need sketch definitions.

In [ ]:

%timesketch_set_active_sketch 6

To learn more about picatrix and how it works, please use the magic %picatrixmagics and see what magics are available and then use %magic --help or magic_func? to see more information about that magic.

One such example could be:

In [ ]:

timesketch_list_saved_searches_func?

Pre-Thoughts¶

Timesketch analyzers can provide quite a lot of value to any analysis. They can do pretty much everything that can be achieved in a colab like this, and in the Timesketch UI, except programatically. In this case, one of the very valuable analyzers is the logon analyzer. That analyzer will look for evidence of logons, and then extract values out of the logon entries and add them to the dataset.

Another potentially valuable analyzer is browser search, etc. To get a history of what analyzers have been run you can visit this page or run the following code snippet:

In [ ]:

for status in sketch.get_analyzer_status():
  print(f'Analyzer: {status["analyzer"]} - status: {status["status"]}')
  print(f'Results: {status["results"]}')
  print('')

From there you can get a glance at what has analysis has been done on the dataset, and what the results were.. for instance that login was completed and it found several logon and logoff entries.

However now we can start answering the questions.

Questions¶

What’s the Operating System of the Server?¶

Let's start exploring this, OS information is stored in the registry. Let's query it

In [ ]:

search_query = timesketch_query_func(
    'parser:"winreg/windows_version"',
    fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc'
)
cur_df = search_query.table

In [ ]:

cur_df[['hostname', 'product_name']]

So we now have the all the data, we can read the data from the table or do one more filtering to get the answer:

In [ ]:

cur_df[cur_df.hostname == 'CITADEL-DC01'].product_name.value_counts()

What’s the Operating System of the Desktop?¶

we can use the same data as we collected before:

In [ ]:

cur_df[cur_df.hostname == 'DESKTOP-SDN1RPT'].product_name.value_counts()

What was the local time of the Server?¶

To answer that we need to get the current control set

In [ ]:

cur_df = timesketch_query_func(
    'HKEY_LOCAL_MACHINE*System*Select AND hostname:"CITADEL-DC01"',
    fields=(
        'datetime,key_path,data_type,message,timestamp_desc,parser,display_name,'
        'product_name,hostname,timestamp_desc,values')
).table

Now let's look at what the value is set for the key.

In [ ]:

for key, value in cur_df[['key_path', 'values']].values:
  print(f'Key: {key}')
  print(f'Value: {value}')

We can parse this out a bit more if we want to, or just read from there that the current value is 1

In [ ]:

cur_df['current_value'] = cur_df['values'].str.extract(r'Current: \[[A-Z_]+\] (\d) ')

cur_df[['key_path', 'current_value']]

The current one is set 1

In [ ]:

cur_df = timesketch_query_func(
    'TimeZoneInformation AND hostname:"CITADEL-DC01"',
    fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc,configuration'
).table
cur_df

Let's increase the column with for pandas, that will make it easier to read columns with longer text in them.

In [ ]:

pd.set_option('max_colwidth', 400)

In [ ]:

cur_df[cur_df.key_path.str.contains('ControlSet001')][['configuration']]

So we need to extract what is in TimeZoneKeyName, we can do this differently. For now we can just read the configuration field, and then split it into a dict and then construct a new DataFrame with these fields, that is taking a line that is key1: value1 key2: value2 ... and creating a data frame with key1, key2, ... being the column names.

In [ ]:

lines = []

for value in cur_df[cur_df.key_path.str.contains('ControlSet001')]['configuration'].values:
  items = value.split(':')
  line_dict = {}
  key = items[0]
  for item in items[1:-1]:
    *values, new_key = item.split()

    line_dict[key] = ' '.join(values)
    key = new_key

  line_dict[key] = items[-1]
  lines.append(line_dict)

time_df = pd.DataFrame(lines)

Let's look at the newly constructed data frame

In [ ]:

time_df

Then we've got the time zone of the server, which is Pacific Standard Time

What was the initial entry vector (how did they get in)?¶

If we assume they got in from externally, doing some statistics on the network data might be useful. For that we need to do some aggregations.

First to understand what aggregations are available to use, and how to use them, let's use the list_available_aggregators which produces a data frame with the names of the aggregators and what parameters they need for configuration.

In [ ]:

%timesketch_available_aggregators

Now that we know what aggregators are available, let's start with aggregating the field Source, and get the top 10.

For that we need to use the field_bucket aggregator, and configuring it using the parameters field, limit and supported_charts.

The charts that are available are:

barchart
hbarchart
table
circlechart
linechart

For this let's use a horizontal bar chart, hbarchart

In [ ]:

params = {
    'field': 'Source',
    'limit': 10,
    'supported_charts': 'hbarchart',
    'chart_title': 'Top 10 Source IP',
}

aggregation = timesketch_run_aggregator_func(
    'field_bucket', parameters=params
)
aggregation.chart

If you are viewing this as in Colab but connecting to a local runtime you may need to enable this in order to be able to view the charts:

(if it doesn't work, uncomment the code that is applicable to you and then re-run the aggregation cell)

In [ ]:

# Remove the commend and run this code if you are running in colab
# but have a local Jupyter kernel running:
# alt.renderers.enable('colab')

# Remove this comment if you are running in Jupyter and the chart is not displayed
# alt.renderers.enable('notebook')

If you prefer to get the data frame instead of the chart you can call aggregation.table

In [ ]:

aggregation.table

Now let's look at the Destination field, same as before:

In [ ]:

params = {
    'field': 'Destination',
    'limit': 10,
    'supported_charts': 'hbarchart',
    'chart_title': 'Top 10 Source IP',
}

aggregation = timesketch_run_aggregator_func('field_bucket', parameters=params)
aggregation.chart

We can clearly see that the 194.61.24.102 sticks out, so lets try to understand what this IP did. Also note that it is not common that a system from the internet tries to connect to a intranet IP.

A Look at IP 194.61.24.102¶

In [ ]:

attacker_dst = timesketch_query_func(
    'Source:"194.61.24.102" AND data_type:"pcap:wireshark:entry"',
    fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port').table
attacker_dst.head(10)

OK, we can see that the API says we got 40k records returned but the search actually produced 128.328 records,so let's increase our max entries...

In [ ]:

search_obj = timesketch_query_func(
    'Source:"194.61.24.102" AND data_type:"pcap:wireshark:entry"',
    fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port')

search_obj.max_entries = 150000
attacker_dst = search_obj.table
attacker_dst.head(10)

We got a fairly large table, let's look at the size:

In [ ]:

attacker_dst.shape

We will now need to do some aggregation on the data that we got, let's use pandas for that. For that there is a function called groupby where we can run aggregations.

We want to group based on DST port and Destination, so we only need those two columns + one more to store the count/sum.

In [ ]:

attacker_group = attacker_dst[['DST port','Destination', 'Protocol']].groupby(
    ['DST port','Destination'], as_index=False)

Now we got a group, and to get a count, we can use the count() function of the group.

In [ ]:

attacker_dst_mytable = attacker_group.count()
attacker_dst_mytable.rename(columns={'Protocol': 'Count'}, inplace=True)
attacker_dst_mytable.sort_values(by=['Count'], ascending=False)

So we can already point out that there is a lot of traffic from this ip to 10.42.85.10 on port 3389which is used for Remote Desktop Protocol (RDP)

Let's now look at the IP traffic as it was parsed by scapy

In [ ]:

attacker_dst = timesketch_query_func(
    '194.61.24.102 AND data_type:"scapy:pcap:entry"',
    fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window').table

Let's look at a few entries here:

In [ ]:

attacker_dst.head(10)

What we can see here is that quite a bit of the information is in the message field that we need to decode.

We also see that the evil bit is set... we could query for that as well. Let's start there, to do an aggregation based on that.

In [ ]:

params = {
    'field': 'ip_src',
    'query_string': 'ip_flags:"evil"',
    'supported_charts': 'hbarchart',
    'chart_title': 'Source IPs with "evil" bit set',
}

aggregation = timesketch_run_aggregator_func('query_bucket', parameters=params)
aggregation.table

We could even save this (if you have write access to the sketch, which the demo user does not have)

In [ ]:

name = 'Source IPs with "evil" bit set'
aggregation.name = name
aggregation.title = name
aggregation.save()

And now we could use this in a story for instance.

But let's move on and parse the message field:

First let's look at a single entry. To see how it is constructed:

In [ ]:

attacker_dst.iloc[0].message

Now that we know that, let's first remove the <bound method... in the beginning. Let's check to see if it's the same across the board:

In [ ]:

attacker_dst.message.str.slice(start=0, stop=30).unique()

OK, so it's the same, we can therefore just use the slice method to remove this part of the string. After that we can then split the string based on | which separates the protocols.

In [ ]:

attacker_packages = attacker_dst.message.str.slice(start=30).str.split('|', expand=True)

Let's explain what was done in the above syntax. First of all we used the slice method to cut the first 30 characters out of the messages field. What we are left with is the rest of the message string. We then use the split method to split the string, based on |, and adding the option of expand=True, which then expands the results into a separate dataframe (as an opposed to just a list).

Now let's look at how this looks like:

In [ ]:

attacker_packages.head(3)

We can see a lot of values there are marked as None.. and basically all the columns from 3 and up are not useful, so let's remove those. And then rename the remaining columns

In [ ]:

attacker_packages = attacker_packages[[0, 1, 2]]
attacker_packages.columns = ['ether', 'ip', 'transport']

And let's look at how this looks like now:

In [ ]:

attacker_packages.head(3)

Now let's look at what happened in the first few packages:

In [ ]:

attacker_packages[['transport']].head(10)

What we can see here is that there is first an ICMP (Ping) then two HTTP/HTTPS Requests , another ICMP and then the 3389 traffic begins.

We could obviously parse this even further if we want to.

In [ ]:

def parse_row(row):
  items = row.split()
  protocol = items[0][1:]
  line_dict = {
      'protocol': protocol
  }
  for item in items[1:]:
    key, _, value = item.partition('=')
    if key == 'options':
      # We don't want options nor anything after that.
      break
    line_dict[key] = value
  return line_dict

proto_df = pd.DataFrame(list(attacker_packages['transport'].apply(parse_row).values))

Let's look at it, but first let's add in the datetime, since these are the same records as we had in the original DF we can simply apply the datatime there.

In [ ]:

proto_df['datetime'] = attacker_dst['datetime']

In [ ]:

proto_df.head(3)

So now if we look at the first few actions made:

In [ ]:

proto_df[['datetime', 'protocol', 'type', 'dport']].head(10)

So you can see the first action here.

ICMP echo request
TCP HTTPS
TCP HTTP
ICMP timestamp request
ICMP echo reply
TCP Remote Desktop, 3389

Let's look at the pair of both IPs:

In [ ]:

attacker_dst = timesketch_query_func(
    '(194.61.24.102 AND 10.42.85.10) AND data_type:"scapy:pcap:entry"', 
    fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window', max_entries=500000).table
attacker_dst.head(10)

We can then do the same as we did before to break things down.

In [ ]:

attacker_packages = attacker_dst.message.str.slice(start=30).str.split('|', expand=True)
attacker_packages = attacker_packages[[0, 1, 2]]
attacker_packages.columns = ['ether', 'ip', 'transport']

proto_df = pd.DataFrame(list(attacker_packages['transport'].apply(parse_row).values))
proto_df['datetime'] = attacker_dst['datetime']

proto_df[['datetime', 'protocol', 'type', 'dport']].head(10)

So we know that this seems to be a RDP connection from the IP 194.61.24.102. Let's look at login events:

In [ ]:

evtx_df = timesketch_query_func(
    '194.61.24.102 AND data_type:"windows:evtx:record"', fields='*').table

In [ ]:

evtx_df.head(3)

Let's get a quick overview of the data:

In [ ]:

evtx_df.username.value_counts()

In [ ]:

evtx_df.event_identifier.value_counts()

In [ ]:

evtx_df.source_name.value_counts()

Let's look at the Administrator logins here:

In [ ]:

evtx_df[evtx_df.username == 'Administrator'][['datetime', 'event_identifier', 'tag', 'logon_type', 'source_address']]

So we can see here that the user Administrator was logged in remotely on 2020-09-19 quite a few times, all between 3 and 4 am UTC.

Lets look at whether there was any other user that got logged into the machine from this IP address:

In [ ]:

timesketch_query_func(
    'source_address:"194.61.24.102" AND data_type:"windows:evtx:record"',
    fields='logon_type,username').table[['logon_type', 'username']].drop_duplicates()

Does not look like it. Only administrator. But now we've got a timeframe to search for.

We can now start looking at some actions on the system around that time.

In [ ]:

timeframe_df = timesketch_query_func(
    '*', start_date='2020-09-19T01:00:00', end_date='2020-09-19T04:20:00', max_entries=50000
).table

OK, we can see that in this timeframe we have 925k records, but we only got back 50k, so let's re-run this and increase the size

warning: since we are pulling in quite a lot of records (925k) this will take a bit of time, as well as memory

In [ ]:

max_entries = 1500000

timeframe_df = timesketch_query_func(
    '*', start_date='2020-09-19T01:00:00', end_date='2020-09-19T04:20:00', max_entries=max_entries, fields='*'
).table

And now look at the size:

In [ ]:

timeframe_df.shape

And to look at what we've got here:

In [ ]:

timeframe_df.data_type.value_counts()

Let's start by looking at what type of EVTX records we are seeing:

In [ ]:

group = timeframe_df[
    timeframe_df.data_type == 'windows:evtx:record'][['event_identifier', 'timestamp', 'source_name']].groupby(
        by=['event_identifier', 'source_name'], as_index=False
    )

In [ ]:

group.count().rename(columns={'timestamp': 'count'}).sort_values('count', ascending=False)

The two most common alerts here are Schannel/36888 (A Fatal Alert Was Generated) and Schannel/36874 (An SSL Connection Request Was Received From a Remote Client Application, But None of the Cipher Suites Supported by the Client Application Are Supported by the Server).

In [ ]:

timeframe_evtx = timeframe_df[timeframe_df.data_type == 'windows:evtx:record'].copy()
timeframe_evtx['event_identifier'] = timeframe_evtx.event_identifier.fillna(value=0)

timeframe_evtx[timeframe_evtx.event_identifier == 36888].strings.str.join('|').unique()

Let's ignore those for a while, but we do see a lot of RDP connections:

261: EVENT_LISTENER_CONNECTION_RECEIVED)
131: RDP connection established

In this timeframe we are seeing 700 established RDP connections, looks like someone is brute-forcing the password? LEt's look at these records:

In [ ]:

timeframe_evtx = timeframe_df[timeframe_df.data_type == 'windows:evtx:record'].copy()
timeframe_evtx['event_identifier'] = timeframe_evtx.event_identifier.fillna(value=0)

timeframe_evtx[(timeframe_evtx.event_identifier == 131) & (timeframe_evtx.source_name == 'Microsoft-Windows-RemoteDesktopServices-RdpCoreTS')].strings.str.join('|').unique()

It all seems to be our infamous IP address. That is brute forcing RDP.

That's how they got in, brute-forcing RDP until they got in via the Administrator account.

Was malware used? If so what was it? If there was malware answer the following:¶

What IP Address is the malware calling to?¶

When did it first appear?¶

Did someone move it?¶

What were the capabilities of this malware?¶

Is this malware easily obtained?¶

Was this malware installed with persistence on any machine?¶

When?¶

Where?¶

OK, we've still got the data from the timeframe, let's look at it again:

In [ ]:

timeframe_df.data_type.value_counts()

We see some windows:prefetch:execution, let's look at that one (remember first logon entry is at 2020-09-19 03:21:48.891087+00:00 ):

In [ ]:

timeframe_df[timeframe_df.data_type == 'windows:prefetch:execution'][['datetime', 'executable', 'run_count']]

Let's look at this another way:

In [ ]:

timeframe_df[timeframe_df.data_type == 'windows:prefetch:execution'].executable.value_counts()

Or by using run count as an indicator of something that is rare

In [ ]:

timeframe_df[(timeframe_df.data_type == 'windows:prefetch:execution') & (~timeframe_df.run_count.isna()) & (timeframe_df.run_count < 2)][['executable', 'run_count']].drop_duplicates()

Here we see few applications that we may want to take a look at...

sc.exe - why was that executed?
coreupdater.exe - what is this?
onedrivesetup.exe - why is one drive being installed?
filesyncconfig.exe - what is this?

Since SC was used, we should be looking for some services being registered:

In [ ]:

timeframe_evtx[(timeframe_evtx.event_identifier == 7045) & (timeframe_evtx.source_name == 'Service Control Manager')]['strings']

OK, here we do see some services being created, onoe of which is to create an auto start service called coreupdater. This is one of the files that we saw only executed once. Let's take a closer look at coreupdater.

What process was malicious?¶

That would be the coreupdater.exe

(and here is where we need to do some memory analysis, since we are going to miss some data here by not including the memory content... if you were to look at that, then you would see another malicious process)

Identify the IP Address that delivered the payload.¶

Lets simply do the same again from above with slicing the packages based on HTTP Get and the potential evil domain (just a guess, but somehow the attacker needs to deliver the payload)

In [ ]:

attacker_dst_http = timesketch_query_func(
    '(194.61.24.102 AND 10.42.85.10) AND data_type:"scapy:pcap:entry" AND *http* AND *GET*', 
    fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window').table
attacker_dst_http.head(4)

In [ ]:

attacker_dst.shape

Let's look at HTTP traffic:

In [ ]:

attacker_dst_http[attacker_dst_http.message.str.contains(r'GET|POST')].message.str.extract(r'<Raw  load=([^|]+)')

Lets look at this again, this time splitting on the new lines, etc..

In [ ]:

for v in attacker_dst_http[attacker_dst_http.message.str.contains(r'GET|POST')].message.str.extract(r'<Raw  load=([^|]+)').values:
  value = str(v)
  print(value.replace('\\\\r\\\\n', '\n'))

So we are doing a HTTP request to: http://194.61.24.102/coreupdater.exe

We could now go back to the PCAP and extract that file from the stream

But the IP address is 192.61.24.102

Where is this malware on disk?¶

Now we know the name of a suspicious file, lets search for it...

In [ ]:

coreupdater = timesketch_query_func(
    'coreupdater.exe AND data_type:"fs:stat"',
    fields='file_size,filename,hostname,message,data_type,datetime,sha256_hash').table

coreupdater.sort_values(by=['datetime'], ascending=True, inplace=True)
coreupdater.head(10)

So we have all file entries with the filename coreupdater.exe. What is interesting here is that it is on both systems.

Let's start to make sure all the hashes are the same:

In [ ]:

coreupdater[['hostname', 'filename', 'sha256_hash']].drop_duplicates()

We can now use the hash to lookup on Virustotal

In [ ]:

if VT_API_KEY:
  vt_client = vt.Client(VT_API_KEY)
else:
  vt_client = None

Let's extract the hash value

In [ ]:

hash_value = list(coreupdater[coreupdater.filename == '\Windows\System32\coreupdater.exe'].sha256_hash.unique())[0]

And now we can look up the data.

In [ ]:

if vt_client:
  file_info = vt_client.get_object(f'/files/{hash_value}')

  print_dict(file_info.last_analysis_stats)

Let's look at some of the summary

In [ ]:

if file_info:
  stars = '*'*10
  print(f'{stars}Summary{stars}')
  print('')
  print_dict(file_info.sigma_analysis_summary)
  print('')
  print(f'{stars}Analysis Stats{stars}')
  print('')
  print_dict(file_info.sigma_analysis_stats)
else:
  print('No VT API key defined, you\'ll need to manually loookup the information')

This clearly does not look good. Let's look at some information here:

In [ ]:

if file_info:
  print_dict(file_info.exiftool)

In [ ]:

if file_info:
  print_dict(file_info.last_analysis_results)

This does not look very good. Lets look for other events where coreupdater.exe is present.

In [ ]:

coreupdater = timesketch_query_func(
    'coreupdater.exe AND NOT data_type:"fs:stat"', fields='*').table

coreupdater.sort_values(by=['datetime'], ascending=True, inplace=True)
coreupdater.head(10)

Ok there is a lot to see here, lets start with the Top event, the autoruns are mathing the md5 sum we discovered earlier.

In [ ]:

coreupdater.loc[coreupdater['data_type'] == 'autoruns:record']

Then we have the service install events (note that it happened on both systems):

In [ ]:

coreupdater.loc[coreupdater['data_type'] == 'windows:registry:service'][['datetime', 'hostname', 'key_path', 'image_path', 'name', 'start_type', 'service_type', 'values']]

And we have the GET events again:

In [ ]:

coreupdater.loc[coreupdater['data_type'] == 'pcap:wireshark:entry'][['datetime', 'message']]

What malicious IP Addresses were involved?¶

Were any IP Addresses from known adversary infrastructure?¶

Are these pieces of adversary infrastructure involved in other attacks around the time of the attack?¶

If we assume 10.42.0.0 is the internal network, lets see which connections are made from that internal network

(here we make use of a cell magic for timesketch to demonstrate different ways of using the picatrix library)

In [ ]:

%%timesketch_query search_obj --fields datetime,ip_src,message,timestamp_desc,tcp_dport,ip_dst --max_entries 150000
ip_src:10.42.85* AND NOT ip_dst:10.42.85* AND data_type:"scapy:pcap:entry"'

In [ ]:

pd.options.display.max_colwidth = 200

Let's print out an aggregation of these IP pairs:

In [ ]:

data = search_obj.table
data[['datetime','ip_src', 'ip_dst', 'tcp_dport']]

mytable = data.groupby(['ip_src','ip_dst']).size().to_frame('count').reset_index()
mytable.sort_values(by=['count'], ascending=False, inplace=True)
mytable

The vast majority is the 194. IP, let's get a list of all external IPs here:

In [ ]:

ip_set = set()
list(map(ip_set.add, data['ip_src'].unique()))
list(map(ip_set.add, data['ip_dst'].unique()))

print(f'Found: {len(ip_set)} IPs (including internal)')

Let's look up the IP from earlier:

In [ ]:

ip_info('194.61.24.102')

We can now look up all of the IPs, and put it into a data frame

(this will take a while, since we are not doing bulk lookups here, but individual lookups per IP)

In [ ]:

lines = []
url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'

for ip_address in ip_set:
  if ip_address.startswith('10.'):
    continue

  params = {
      'apikey': VT_API_KEY,
      'ip': ip_address,
  }
  response = requests.get(url, params=params)
  j_obj = response.json()

  line_dict = {
      'resolutions': [x.get('hostname') for x in j_obj.get('resolutions', [])],
      'ip': ip_address,
      'detected_urls': j_obj.get('detected_urls'),
      'detected_downloaded_samples': [
          x.get('sha256') for x in j_obj.get('detected_downloaded_samples', [])] or None,
      'detected_referrer_samples': [
          x.get('sha256') for x in j_obj.get('detected_referrer_samples', [])] or None,
      'detected_communicating_samples': [
          x.get('sha256') for x in j_obj.get('detected_communicating_samples', [])] or None,
  }  
  lines.append(line_dict)
ip_df = pd.DataFrame(lines)

Now we have a data frame, let's look at it briefly

In [ ]:

ip_df.head(3)

We had our hash value from last time:

In [ ]:

hash_value

Let's see if that shows up:

In [ ]:

ip_df[~ip_df.detected_referrer_samples.isna() & ip_df.detected_referrer_samples.str.contains(hash_value)]

In [ ]:

ip_df[~ip_df.detected_communicating_samples.isna() & ip_df.detected_communicating_samples.str.contains(hash_value)]

In [ ]:

ip_df[~ip_df.detected_downloaded_samples.isna() & ip_df.detected_downloaded_samples.str.contains(hash_value)]

That does not seem to show up in our data frame, but let's see if there are other potentially malicious IPs in the data set.

Let's look at those that have values in detected_downloaded_samples

In [ ]:

ip_df[~ip_df.detected_downloaded_samples.isna()]

There are several here.. let's get a summary:

In [ ]:

ip_df[~ip_df.detected_downloaded_samples.isna()]['ip'].value_counts()

Let's look for some strings in nthe data...

In [ ]:

timesketch_query_func(
    '"This program cannot be run in DOS" AND data_type:"scapy:pcap:entry"',
    fields='*'
).table[['ip_dst', 'ip_src']].drop_duplicates()

There is another IP address here that we should look for... 203.78.103.109

Did the attacker access any other systems?¶

How?¶

When?¶

Did the attacker steal or access any data?¶

When?¶

In [ ]:

search_obj = %timesketch_query --fields * username:"Administrator"
other_df = search_obj.table

Let's look at what systems are involved:

In [ ]:

other_df.hostname.value_counts()

Let's look at this a bit more:

In [ ]:

other_df[other_df['datetime'].dt.strftime('%Y%m%d') == '20200919'][['datetime', 'hostname', 'logon_type']]

Let's look at the different data types we've got:

In [ ]:

other_df.data_type.value_counts()

Let's look at the content of the registry entry for MSTC connection:

In [ ]:

other_df[other_df.data_type == 'windows:registry:mstsc:connection'][['datetime', 'display_name', 'hostname', 'message']]

We can see that it seems that the attacker did enter the desktop (DESKTOP-SDN1RPT) using RDP from the Domain Controller (that we already established that they brute-forced to gain access)

Let's look at all logon events that occurred on 2020-09-19

In [ ]:

day_filter = other_df.datetime.dt.strftime('%Y%m%d') == '20200919'
filtered_view = other_df[day_filter & other_df.tag.str.join(',').str.contains('logon-event')][[
    'datetime', 'computer_name', 'logon_process', 'logon_type', 'source_address', 'source_username', 'windows_domain', 'username', 'workstation']]
filtered_view

We can look at fewer records here, or some summaries:

In [ ]:

filtered_view.workstation.value_counts()

So we go from kali to both the DC server and the Desktop.

In [ ]:

filtered_view[filtered_view.workstation == 'kali']

Likely that the attacker used Kali as their attack platform or operating system of choice for the attacker. Let's look at the remote interactive logons:

In [ ]:

filtered_view[filtered_view.logon_type == 'RemoteInteractive']

Here is is clear that the attacker did RDP into CITADEL-DC01 from the IP of 194.61.24.102 and from there they entered DESKTOP-SDN1RPT (also via RDP using the same credentials).

What was the network layout of the victim network?¶

This is a question we might not be able to answer with Timesketch (at least not with the current version, but this may change soon with the introduction of graphing)

Did the attacker steal the Szechuan sauce? If so, what time?¶

Ok so here we are just starting by guessing for potential interesting filenames based on the case description

In [ ]:

secret_files = timesketch_query_func('secret', fields='*').table

Let's take a look at what we've got

In [ ]:

secret_files.data_type.value_counts()

and

In [ ]:

secret_files.parser.value_counts()

Let's look at some of these browser entries...

In [ ]:

secret_files[secret_files.data_type == 'msie:webcache:container'][['datetime', 'timestamp_desc', 'url']]

Uh that looks interesting: Administrator@file:///C:/FileShare/Secret/Szechuan%20Sauce.txt and the whole directory.

Let's filter out the Secret folder... use the message field to start with:

In [ ]:

secret_files[secret_files.message.str.contains(r'FileShare\/Secret')]['message'].unique()

Let's look at some of the details here:

In [ ]:

secret_files[secret_files.message.str.contains(r'FileShare\/Secret')][['timestamp_desc', 'message']].drop_duplicates()

Let's remove the expiration time from this...

In [ ]:

secret_files[secret_files.message.str.contains(r'FileShare\/Secret') & secret_files['timestamp_desc'] != 'Expiration Time'][['datetime', 'message']].drop_duplicates()

Let's extract unique filenames here

In [ ]:

secret_files[secret_files.message.str.contains(r'FileShare\/Secret')].message.str.extract(r'(FileShare\/Secret\/[^ ]+)')[0].unique()

There are more files here than just the stolen szechuan sauce....

Let's look more closely into the beth files:

In [ ]:

beth_df = secret_files[secret_files.message.str.contains('beth', case=False)]
beth_df.data_type.value_counts()

Start with the simple fs stat:

In [ ]:

beth_df[beth_df.data_type == 'fs:stat'][['datetime', 'display_name', 'timestamp_desc']]

Let's look at another source of timestamped information:

In [ ]:

beth_df[beth_df.data_type == 'olecf:dest_list:entry'][['datetime', 'hostname', 'timestamp_desc', 'path']]

In [ ]:

beth_df[beth_df.data_type.str.contains('windows:shell_item:file_entry|msie:webcache:container')][[
    'datetime', 'data_type', 'source_long', 'timestamp_desc', 'url', 'long_name', 'origin', 'shell_item_path']]

The Administrator account accesses the file at "2020-09-18 20:32:13.141000+00:0', yet the file's creation date is at 2020-09-18 23:33:54+00:00.. that's odd...

There are also shell entries (for instance jumplist entries) there before the file was created.

This needs to be looked at more closely, but it looks like there may be some timestomping activity here.

These files are also visited by the Administrator account... let's look at logins...

In [ ]:

search_obj = %timesketch_query --fields * username:"Administrator" AND tag:"logon-event"
admin_login = search_obj.table

In [ ]:

admin_login.shape

In [ ]:

admin_login['datetime'] = pd.to_datetime(admin_login['datetime'])
admin_login[admin_login.datetime.dt.strftime('%Y%m%d') == '20200918'][[
    'datetime', 'logon_type', 'source_address', 'source_username', 'username', 'windows_domain', 'workstation']]

Did the attacker steal or access any other sensitive files? If so, what times?¶

Let's look for signs of compressed files.

In [ ]:

%%timesketch_query search_obj --fields *
*.zip OR *.tar.gz OR *.tgz OR *.tbz OR *.tar.bz2 OR *.cab OR *.7a

In [ ]:

zip_df = search_obj.table
zip_df.columns

In [ ]:

zip_df.data_type.value_counts()

In [ ]:

zip_df[~zip_df.filename.isna() & (zip_df.filename.str.contains('.zip$'))]['filename'].value_counts()

Let's limit ourselves to the day of the attack

In [ ]:

zip_day = zip_df[zip_df['datetime'].dt.strftime('%Y%m%d') == '20200919']
zip_day['filename'].value_counts()

OK, there are some interesting files here... most notably the Secret.zip the temporary Secret.zip, the My Social Security Number.zip and loot.zip

We can check first what files have an associated LNK entry there:

In [ ]:

zip_day[zip_day.data_type == 'windows:lnk:link'][['link_target', 'local_path']].drop_duplicates()

In [ ]:

zip_day[zip_day.data_type == 'fs:ntfs:usn_change'][['filename']].drop_duplicates()

These should be looked at a bit more closely.

What file was time stomped?¶

There are many possible ways to "timestomp" (or alter the timestamps) of a file. Some of these are harder to detect than others. One method of altering timestamps changes the file timestamp to the second; however, NTFS timestamps have greater precision than that. An easy way to detect files with timestamps altered in this way is to look for timestamps with the fractional seconds set to 0.

Let's query for the file timestamps ("fs:stat"). As this would pull back a lot of results, I'll add some search terms found earlier in the case to limit the results. This would work with the keyword (but it would take much longer).

In [ ]:

timestomp_df = timesketch_query_func(
    'data_type:"fs:stat" AND (*secret* OR *zip* OR *coreupdater* OR *Szechuan*)', 
    fields='datetime,timestamp,timestamp_desc,filename,hostname,inode').table

Let's see how many results we got:

In [ ]:

timestomp_df.shape

And finally, let's look for any timestamps the end with 0s (this could have false positivies, but it's literally a 1 in a million chance).

In [ ]:

timestomp_df[timestomp_df.timestamp.astype('str').str.endswith('000000')]

There we go! Beth_Secret.txt has timestamps that lack fractional seconds; that's strange. Let's take a closer look to see if anything else looks strange about this file.

From the above results, the inode (or MFT file reference number) is 87111. Since we got a disk image of the server relatively quickly after the incident, there's a good chance we still have USN journal entries that cover that file. Let's look!

The following query looks for any USN change journal records for 87111, as well as the timestamps from the above table (for comparison), limited to the server:

In [ ]:

secret_timestomp_df = timesketch_query_func(
    '"File reference: 87111-" AND data_type:"fs:ntfs:usn_change"',
    fields='data_type,datetime,filename,hostname,inode,message').table

In [ ]:

secret_timestomp_df.sort_values('datetime')

Interesting! We do have change journal records for that file. Let's work through these results (sorted oldest to newest):

Rows 0 through 3: pertain to a favicon; row 3 shows it being deleted. This isn't relevant to our file of interest.
Rows 4 and 5: New Text Document.txt is created
Rows 6, 7 & 8: all have the same timestamp (so the order here might be incorrect). Together they show a rename from New Text Document.txt to Beth_Secret.txt
Rows 9 & 10: Show an OBJECT_ID_CHANGE; honestly not sure what that is.
Rows 11 & 12: Data being added to the file (likely the "secret" being type into it (USN_REASON_DATA_EXTEND).
Rows 13 & 14: USN_REASON_BASIC_INFO_CHANGE; maybe, basic info like changing the timestamps :)

That's a pretty comprehensive timeline for actions on that particular "secrets" file. All the timestamps seem to fit with each other and tell a plausible story. Now, let's add in those timestamps that look a bit suspicious:

In [ ]:

more_timestomp_df = timesketch_query_func(
    '(("File reference: 87111-" AND data_type:"fs:ntfs:usn_change") OR (inode:87111)) AND hostname:"CITADEL-DC01"', 
    fields='data_type,datetime,timestamp,timestamp_desc,filename,hostname,inode,message').table

In [ ]:

more_timestomp_df

They are all way before the USN journal activity! That, along with the null fractional seconds and the tight USN change journal timeline, point toward manipulated timestamps.