This is a simple colab demonstrating one way of analyzing data from the Stolen Szechuan Sauce challenge (found here).
This colab will not go into any of the data upload. It assumes that all data is already collected and uploaded to Timesketch. To see one way of uploading the data to Timesketch, use this colab
For a more generic instructions of Colab can be found here
If you are running this on a cloud runtime you'll need to install these dependencies:
# @markdown Only execute if not already installed and running a cloud runtime
!pip install -q timesketch_api_client
!pip install -q vt-py nest_asyncio pandas
!pip install -q picatrix
# @title Import libraries
# @markdown This cell will import all the libraries needed for the running of this colab.
import re
import requests
import pandas as pd
from timesketch_api_client import config
from picatrix import notebook_init
import vt
import nest_asyncio # https://github.com/VirusTotal/vt-py/issues/21
nest_asyncio.apply()
notebook_init.init()
# @title VirusTotal Configuration
# @markdown In order to be able to lookup domains/IPs/samples using VirtusTotal we need to get an API key.
# @markdown
# @markdown If you don't have an API key you must sign up to [VirusTotal Community](https://www.virustotal.com/gui/join-us).
# @markdown Once you have a valid VirusTotal Community account you will find your personal API key in your personal settings section.
VT_API_KEY = '' # @param {type: "string"}
# @markdown If you don't have the API key you will not be able to use the Virustotal API
# @markdown to lookup information.
# @title Declare functions
# @markdown This cell will define few functions that we will use throughout
# @markdown this colab. This would be better to define outside of the notebook
# @markdown in a library that would be imported, but we keep it here for now.
def print_dict(my_dict, space_before=0):
"""Print the content of a dictionary."""
max_len = max([len(x) for x in my_dict.keys()])
spaces = ' '*space_before
format_str = f'{spaces}{{key:{max_len}s}} = {{value}}'
for key, value in my_dict.items():
if isinstance(value, dict):
print(format_str.format(key=key, value=''))
print_dict(value, space_before=space_before + 8)
elif isinstance(value, list):
value_str = ', '.join(value)
print(format_str.format(key=key, value=value_str))
else:
print(format_str.format(key=key, value=value))
def ip_info(address):
"""Print out information about an IP address using the VT API."""
url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'
params = {
'apikey': VT_API_KEY,
'ip': address}
response = requests.get(url, params=params)
j_obj = response.json()
def _print_stuff(part):
print('')
header = part.replace('_', ' ').capitalize()
print(f'{header}:')
for item in j_obj.get(part, []):
print_dict(item, 2)
_print_stuff('resolutions')
_print_stuff('detected_urls')
_print_stuff('detected_referrer_samples')
_print_stuff('detected_communicating_samples')
_print_stuff('detected_downloaded_samples')
# @markdown Get a copy of the Timesketch client object.
# @markdown Parameters to configure the client:
# @markdown + host_uri: https://demo.timesketch.org
# @markdown + username: demo
# @markdown + auth_mode: timesketch (username/password)
# @markdown + password: demo
ts_client = config.get_client(confirm_choices=True)
Now that we've got a copy of the TS client we need to get to the sketch.
for sketch in ts_client.list_sketches():
if not sketch.name.startswith('Szechuan'):
continue
print('We found the sketch to use')
print(f'[{sketch.id}] {sketch.name} - {sketch.description}')
break
OK, sketch nr 6 is the one that we are after, let's set that as the active sketch. This is something that the Timesketch picatrix magics expect, that is to first set the active sketch that you will be using. After that all the magics don't need sketch definitions.
%timesketch_set_active_sketch 6
To learn more about picatrix and how it works, please use the magic %picatrixmagics
and see what magics are available and then use %magic --help
or magic_func?
to see more information about that magic.
One such example could be:
timesketch_list_saved_searches_func?
Timesketch analyzers can provide quite a lot of value to any analysis. They can do pretty much everything that can be achieved in a colab like this, and in the Timesketch UI, except programatically. In this case, one of the very valuable analyzers is the logon
analyzer. That analyzer will look for evidence of logons, and then extract values out of the logon entries and add them to the dataset.
Another potentially valuable analyzer is browser search, etc. To get a history of what analyzers have been run you can visit this page or run the following code snippet:
for status in sketch.get_analyzer_status():
print(f'Analyzer: {status["analyzer"]} - status: {status["status"]}')
print(f'Results: {status["results"]}')
print('')
From there you can get a glance at what has analysis has been done on the dataset, and what the results were.. for instance that login
was completed and it found several logon and logoff entries.
However now we can start answering the questions.
Let's start exploring this, OS information is stored in the registry. Let's query it
search_query = timesketch_query_func(
'parser:"winreg/windows_version"',
fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc'
)
cur_df = search_query.table
cur_df[['hostname', 'product_name']]
So we now have the all the data, we can read the data from the table or do one more filtering to get the answer:
cur_df[cur_df.hostname == 'CITADEL-DC01'].product_name.value_counts()
we can use the same data as we collected before:
cur_df[cur_df.hostname == 'DESKTOP-SDN1RPT'].product_name.value_counts()
To answer that we need to get the current control set
cur_df = timesketch_query_func(
'HKEY_LOCAL_MACHINE*System*Select AND hostname:"CITADEL-DC01"',
fields=(
'datetime,key_path,data_type,message,timestamp_desc,parser,display_name,'
'product_name,hostname,timestamp_desc,values')
).table
Now let's look at what the value is set for the key.
for key, value in cur_df[['key_path', 'values']].values:
print(f'Key: {key}')
print(f'Value: {value}')
We can parse this out a bit more if we want to, or just read from there that the current value is 1
cur_df['current_value'] = cur_df['values'].str.extract(r'Current: \[[A-Z_]+\] (\d) ')
cur_df[['key_path', 'current_value']]
The current one is set 1
cur_df = timesketch_query_func(
'TimeZoneInformation AND hostname:"CITADEL-DC01"',
fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc,configuration'
).table
cur_df
Let's increase the column with for pandas, that will make it easier to read columns with longer text in them.
pd.set_option('max_colwidth', 400)
cur_df[cur_df.key_path.str.contains('ControlSet001')][['configuration']]
So we need to extract what is in TimeZoneKeyName
, we can do this differently. For now we can just read the configuration field, and then split it into a dict and then construct a new DataFrame with these fields, that is taking a line that is key1: value1 key2: value2 ...
and creating a data frame with key1, key2, ...
being the column names.
lines = []
for value in cur_df[cur_df.key_path.str.contains('ControlSet001')]['configuration'].values:
items = value.split(':')
line_dict = {}
key = items[0]
for item in items[1:-1]:
*values, new_key = item.split()
line_dict[key] = ' '.join(values)
key = new_key
line_dict[key] = items[-1]
lines.append(line_dict)
time_df = pd.DataFrame(lines)
Let's look at the newly constructed data frame
time_df
Then we've got the time zone of the server, which is Pacific Standard Time
If we assume they got in from externally, doing some statistics on the network data might be useful. For that we need to do some aggregations.
First to understand what aggregations are available to use, and how to use them, let's use the list_available_aggregators
which produces a data frame with the names of the aggregators and what parameters they need for configuration.
%timesketch_available_aggregators
Now that we know what aggregators are available, let's start with aggregating the field Source
, and get the top 10.
For that we need to use the field_bucket
aggregator, and configuring it using the parameters field
, limit
and supported_charts
.
The charts that are available are:
For this let's use a horizontal bar chart, hbarchart
params = {
'field': 'Source',
'limit': 10,
'supported_charts': 'hbarchart',
'chart_title': 'Top 10 Source IP',
}
aggregation = timesketch_run_aggregator_func(
'field_bucket', parameters=params
)
aggregation.chart
If you are viewing this as in Colab but connecting to a local runtime you may need to enable this in order to be able to view the charts:
(if it doesn't work, uncomment the code that is applicable to you and then re-run the aggregation cell)
# Remove the commend and run this code if you are running in colab
# but have a local Jupyter kernel running:
# alt.renderers.enable('colab')
# Remove this comment if you are running in Jupyter and the chart is not displayed
# alt.renderers.enable('notebook')
If you prefer to get the data frame instead of the chart you can call aggregation.table
aggregation.table
Now let's look at the Destination
field, same as before:
params = {
'field': 'Destination',
'limit': 10,
'supported_charts': 'hbarchart',
'chart_title': 'Top 10 Source IP',
}
aggregation = timesketch_run_aggregator_func('field_bucket', parameters=params)
aggregation.chart
We can clearly see that the 194.61.24.102
sticks out, so lets try to understand what this IP did. Also note that it is not common that a system from the internet tries to connect to a intranet IP.
attacker_dst = timesketch_query_func(
'Source:"194.61.24.102" AND data_type:"pcap:wireshark:entry"',
fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port').table
attacker_dst.head(10)
OK, we can see that the API says we got 40k records returned but the search actually produced 128.328 records,so let's increase our max entries...
search_obj = timesketch_query_func(
'Source:"194.61.24.102" AND data_type:"pcap:wireshark:entry"',
fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port')
search_obj.max_entries = 150000
attacker_dst = search_obj.table
attacker_dst.head(10)
We got a fairly large table, let's look at the size:
attacker_dst.shape
We will now need to do some aggregation on the data that we got, let's use pandas for that. For that there is a function called groupby
where we can run aggregations.
We want to group based on DST port
and Destination
, so we only need those two columns + one more to store the count/sum.
attacker_group = attacker_dst[['DST port','Destination', 'Protocol']].groupby(
['DST port','Destination'], as_index=False)
Now we got a group, and to get a count, we can use the count()
function of the group.
attacker_dst_mytable = attacker_group.count()
attacker_dst_mytable.rename(columns={'Protocol': 'Count'}, inplace=True)
attacker_dst_mytable.sort_values(by=['Count'], ascending=False)
So we can already point out that there is a lot of traffic from this ip to 10.42.85.10
on port 3389
which is used for Remote Desktop Protocol (RDP)
Let's now look at the IP traffic as it was parsed by scapy
attacker_dst = timesketch_query_func(
'194.61.24.102 AND data_type:"scapy:pcap:entry"',
fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window').table
Let's look at a few entries here:
attacker_dst.head(10)
What we can see here is that quite a bit of the information is in the message field that we need to decode.
We also see that the evil
bit is set... we could query for that as well. Let's start there, to do an aggregation based on that.
params = {
'field': 'ip_src',
'query_string': 'ip_flags:"evil"',
'supported_charts': 'hbarchart',
'chart_title': 'Source IPs with "evil" bit set',
}
aggregation = timesketch_run_aggregator_func('query_bucket', parameters=params)
aggregation.table
We could even save this (if you have write access to the sketch, which the demo user does not have)
name = 'Source IPs with "evil" bit set'
aggregation.name = name
aggregation.title = name
aggregation.save()
And now we could use this in a story for instance.
But let's move on and parse the message field:
First let's look at a single entry. To see how it is constructed:
attacker_dst.iloc[0].message
Now that we know that, let's first remove the <bound method...
in the beginning. Let's check to see if it's the same across the board:
attacker_dst.message.str.slice(start=0, stop=30).unique()
OK, so it's the same, we can therefore just use the slice method to remove this part of the string. After that we can then split the string based on |
which separates the protocols.
attacker_packages = attacker_dst.message.str.slice(start=30).str.split('|', expand=True)
Let's explain what was done in the above syntax. First of all we used the slice method to cut the first 30 characters out of the messages field. What we are left with is the rest of the message string. We then use the split method to split the string, based on |
, and adding the option of expand=True
, which then expands the results into a separate dataframe (as an opposed to just a list).
Now let's look at how this looks like:
attacker_packages.head(3)
We can see a lot of values there are marked as None.. and basically all the columns from 3 and up are not useful, so let's remove those. And then rename the remaining columns
attacker_packages = attacker_packages[[0, 1, 2]]
attacker_packages.columns = ['ether', 'ip', 'transport']
And let's look at how this looks like now:
attacker_packages.head(3)
Now let's look at what happened in the first few packages:
attacker_packages[['transport']].head(10)
What we can see here is that there is first an ICMP (Ping) then two HTTP/HTTPS Requests , another ICMP and then the 3389 traffic begins.
We could obviously parse this even further if we want to.
def parse_row(row):
items = row.split()
protocol = items[0][1:]
line_dict = {
'protocol': protocol
}
for item in items[1:]:
key, _, value = item.partition('=')
if key == 'options':
# We don't want options nor anything after that.
break
line_dict[key] = value
return line_dict
proto_df = pd.DataFrame(list(attacker_packages['transport'].apply(parse_row).values))
Let's look at it, but first let's add in the datetime, since these are the same records as we had in the original DF we can simply apply the datatime there.
proto_df['datetime'] = attacker_dst['datetime']
proto_df.head(3)
So now if we look at the first few actions made:
proto_df[['datetime', 'protocol', 'type', 'dport']].head(10)
So you can see the first action here.
Let's look at the pair of both IPs:
attacker_dst = timesketch_query_func(
'(194.61.24.102 AND 10.42.85.10) AND data_type:"scapy:pcap:entry"',
fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window', max_entries=500000).table
attacker_dst.head(10)
We can then do the same as we did before to break things down.
attacker_packages = attacker_dst.message.str.slice(start=30).str.split('|', expand=True)
attacker_packages = attacker_packages[[0, 1, 2]]
attacker_packages.columns = ['ether', 'ip', 'transport']
proto_df = pd.DataFrame(list(attacker_packages['transport'].apply(parse_row).values))
proto_df['datetime'] = attacker_dst['datetime']
proto_df[['datetime', 'protocol', 'type', 'dport']].head(10)
So we know that this seems to be a RDP connection from the IP 194.61.24.102. Let's look at login events:
evtx_df = timesketch_query_func(
'194.61.24.102 AND data_type:"windows:evtx:record"', fields='*').table
evtx_df.head(3)
Let's get a quick overview of the data:
evtx_df.username.value_counts()
evtx_df.event_identifier.value_counts()
evtx_df.source_name.value_counts()
Let's look at the Administrator logins here:
evtx_df[evtx_df.username == 'Administrator'][['datetime', 'event_identifier', 'tag', 'logon_type', 'source_address']]
So we can see here that the user Administrator was logged in remotely on 2020-09-19
quite a few times, all between 3 and 4 am UTC.
Lets look at whether there was any other user that got logged into the machine from this IP address:
timesketch_query_func(
'source_address:"194.61.24.102" AND data_type:"windows:evtx:record"',
fields='logon_type,username').table[['logon_type', 'username']].drop_duplicates()
Does not look like it. Only administrator. But now we've got a timeframe to search for.
We can now start looking at some actions on the system around that time.
timeframe_df = timesketch_query_func(
'*', start_date='2020-09-19T01:00:00', end_date='2020-09-19T04:20:00', max_entries=50000
).table
OK, we can see that in this timeframe we have 925k records, but we only got back 50k, so let's re-run this and increase the size
warning: since we are pulling in quite a lot of records (925k) this will take a bit of time, as well as memory
max_entries = 1500000
timeframe_df = timesketch_query_func(
'*', start_date='2020-09-19T01:00:00', end_date='2020-09-19T04:20:00', max_entries=max_entries, fields='*'
).table
And now look at the size:
timeframe_df.shape
And to look at what we've got here:
timeframe_df.data_type.value_counts()
Let's start by looking at what type of EVTX records we are seeing:
group = timeframe_df[
timeframe_df.data_type == 'windows:evtx:record'][['event_identifier', 'timestamp', 'source_name']].groupby(
by=['event_identifier', 'source_name'], as_index=False
)
group.count().rename(columns={'timestamp': 'count'}).sort_values('count', ascending=False)
The two most common alerts here are Schannel/36888
(A Fatal Alert Was Generated) and Schannel/36874
(An SSL Connection Request Was Received From a Remote Client Application, But None of the Cipher Suites Supported by the Client Application Are Supported by the Server).
timeframe_evtx = timeframe_df[timeframe_df.data_type == 'windows:evtx:record'].copy()
timeframe_evtx['event_identifier'] = timeframe_evtx.event_identifier.fillna(value=0)
timeframe_evtx[timeframe_evtx.event_identifier == 36888].strings.str.join('|').unique()
Let's ignore those for a while, but we do see a lot of RDP connections:
In this timeframe we are seeing 700 established RDP connections, looks like someone is brute-forcing the password? LEt's look at these records:
timeframe_evtx = timeframe_df[timeframe_df.data_type == 'windows:evtx:record'].copy()
timeframe_evtx['event_identifier'] = timeframe_evtx.event_identifier.fillna(value=0)
timeframe_evtx[(timeframe_evtx.event_identifier == 131) & (timeframe_evtx.source_name == 'Microsoft-Windows-RemoteDesktopServices-RdpCoreTS')].strings.str.join('|').unique()
It all seems to be our infamous IP address. That is brute forcing RDP.
That's how they got in, brute-forcing RDP until they got in via the Administrator account.
OK, we've still got the data from the timeframe, let's look at it again:
timeframe_df.data_type.value_counts()
We see some windows:prefetch:execution
, let's look at that one (remember first logon entry is at 2020-09-19 03:21:48.891087+00:00 ):
timeframe_df[timeframe_df.data_type == 'windows:prefetch:execution'][['datetime', 'executable', 'run_count']]
Let's look at this another way:
timeframe_df[timeframe_df.data_type == 'windows:prefetch:execution'].executable.value_counts()
Or by using run count as an indicator of something that is rare
timeframe_df[(timeframe_df.data_type == 'windows:prefetch:execution') & (~timeframe_df.run_count.isna()) & (timeframe_df.run_count < 2)][['executable', 'run_count']].drop_duplicates()
Here we see few applications that we may want to take a look at...
Since SC was used, we should be looking for some services being registered:
timeframe_evtx[(timeframe_evtx.event_identifier == 7045) & (timeframe_evtx.source_name == 'Service Control Manager')]['strings']
OK, here we do see some services being created, onoe of which is to create an auto start
service called coreupdater
. This is one of the files that we saw only executed once. Let's take a closer look at coreupdater.
That would be the coreupdater.exe
(and here is where we need to do some memory analysis, since we are going to miss some data here by not including the memory content... if you were to look at that, then you would see another malicious process)
Lets simply do the same again from above with slicing the packages based on HTTP Get and the potential evil domain (just a guess, but somehow the attacker needs to deliver the payload)
attacker_dst_http = timesketch_query_func(
'(194.61.24.102 AND 10.42.85.10) AND data_type:"scapy:pcap:entry" AND *http* AND *GET*',
fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window').table
attacker_dst_http.head(4)
attacker_dst.shape
Let's look at HTTP traffic:
attacker_dst_http[attacker_dst_http.message.str.contains(r'GET|POST')].message.str.extract(r'<Raw load=([^|]+)')
Lets look at this again, this time splitting on the new lines, etc..
for v in attacker_dst_http[attacker_dst_http.message.str.contains(r'GET|POST')].message.str.extract(r'<Raw load=([^|]+)').values:
value = str(v)
print(value.replace('\\\\r\\\\n', '\n'))
So we are doing a HTTP request to: http://194.61.24.102/coreupdater.exe
We could now go back to the PCAP and extract that file from the stream
But the IP address is 192.61.24.102
Now we know the name of a suspicious file, lets search for it...
coreupdater = timesketch_query_func(
'coreupdater.exe AND data_type:"fs:stat"',
fields='file_size,filename,hostname,message,data_type,datetime,sha256_hash').table
coreupdater.sort_values(by=['datetime'], ascending=True, inplace=True)
coreupdater.head(10)
So we have all file entries with the filename coreupdater.exe. What is interesting here is that it is on both systems.
Let's start to make sure all the hashes are the same:
coreupdater[['hostname', 'filename', 'sha256_hash']].drop_duplicates()
We can now use the hash to lookup on Virustotal
if VT_API_KEY:
vt_client = vt.Client(VT_API_KEY)
else:
vt_client = None
Let's extract the hash value
hash_value = list(coreupdater[coreupdater.filename == '\Windows\System32\coreupdater.exe'].sha256_hash.unique())[0]
And now we can look up the data.
if vt_client:
file_info = vt_client.get_object(f'/files/{hash_value}')
print_dict(file_info.last_analysis_stats)
Let's look at some of the summary
if file_info:
stars = '*'*10
print(f'{stars}Summary{stars}')
print('')
print_dict(file_info.sigma_analysis_summary)
print('')
print(f'{stars}Analysis Stats{stars}')
print('')
print_dict(file_info.sigma_analysis_stats)
else:
print('No VT API key defined, you\'ll need to manually loookup the information')
This clearly does not look good. Let's look at some information here:
if file_info:
print_dict(file_info.exiftool)
if file_info:
print_dict(file_info.last_analysis_results)
This does not look very good. Lets look for other events where coreupdater.exe is present.
coreupdater = timesketch_query_func(
'coreupdater.exe AND NOT data_type:"fs:stat"', fields='*').table
coreupdater.sort_values(by=['datetime'], ascending=True, inplace=True)
coreupdater.head(10)
Ok there is a lot to see here, lets start with the Top event, the autoruns are mathing the md5 sum we discovered earlier.
coreupdater.loc[coreupdater['data_type'] == 'autoruns:record']
Then we have the service install events (note that it happened on both systems):
coreupdater.loc[coreupdater['data_type'] == 'windows:registry:service'][['datetime', 'hostname', 'key_path', 'image_path', 'name', 'start_type', 'service_type', 'values']]
And we have the GET events again:
coreupdater.loc[coreupdater['data_type'] == 'pcap:wireshark:entry'][['datetime', 'message']]
If we assume 10.42.0.0 is the internal network, lets see which connections are made from that internal network
(here we make use of a cell magic for timesketch to demonstrate different ways of using the picatrix library)
%%timesketch_query search_obj --fields datetime,ip_src,message,timestamp_desc,tcp_dport,ip_dst --max_entries 150000
ip_src:10.42.85* AND NOT ip_dst:10.42.85* AND data_type:"scapy:pcap:entry"'
pd.options.display.max_colwidth = 200
Let's print out an aggregation of these IP pairs:
data = search_obj.table
data[['datetime','ip_src', 'ip_dst', 'tcp_dport']]
mytable = data.groupby(['ip_src','ip_dst']).size().to_frame('count').reset_index()
mytable.sort_values(by=['count'], ascending=False, inplace=True)
mytable
The vast majority is the 194. IP, let's get a list of all external IPs here:
ip_set = set()
list(map(ip_set.add, data['ip_src'].unique()))
list(map(ip_set.add, data['ip_dst'].unique()))
print(f'Found: {len(ip_set)} IPs (including internal)')
Let's look up the IP from earlier:
ip_info('194.61.24.102')
We can now look up all of the IPs, and put it into a data frame
(this will take a while, since we are not doing bulk lookups here, but individual lookups per IP)
lines = []
url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'
for ip_address in ip_set:
if ip_address.startswith('10.'):
continue
params = {
'apikey': VT_API_KEY,
'ip': ip_address,
}
response = requests.get(url, params=params)
j_obj = response.json()
line_dict = {
'resolutions': [x.get('hostname') for x in j_obj.get('resolutions', [])],
'ip': ip_address,
'detected_urls': j_obj.get('detected_urls'),
'detected_downloaded_samples': [
x.get('sha256') for x in j_obj.get('detected_downloaded_samples', [])] or None,
'detected_referrer_samples': [
x.get('sha256') for x in j_obj.get('detected_referrer_samples', [])] or None,
'detected_communicating_samples': [
x.get('sha256') for x in j_obj.get('detected_communicating_samples', [])] or None,
}
lines.append(line_dict)
ip_df = pd.DataFrame(lines)
Now we have a data frame, let's look at it briefly
ip_df.head(3)
We had our hash value from last time:
hash_value
Let's see if that shows up:
ip_df[~ip_df.detected_referrer_samples.isna() & ip_df.detected_referrer_samples.str.contains(hash_value)]
ip_df[~ip_df.detected_communicating_samples.isna() & ip_df.detected_communicating_samples.str.contains(hash_value)]
ip_df[~ip_df.detected_downloaded_samples.isna() & ip_df.detected_downloaded_samples.str.contains(hash_value)]
That does not seem to show up in our data frame, but let's see if there are other potentially malicious IPs in the data set.
Let's look at those that have values in detected_downloaded_samples
ip_df[~ip_df.detected_downloaded_samples.isna()]
There are several here.. let's get a summary:
ip_df[~ip_df.detected_downloaded_samples.isna()]['ip'].value_counts()
Let's look for some strings in nthe data...
timesketch_query_func(
'"This program cannot be run in DOS" AND data_type:"scapy:pcap:entry"',
fields='*'
).table[['ip_dst', 'ip_src']].drop_duplicates()
There is another IP address here that we should look for... 203.78.103.109
search_obj = %timesketch_query --fields * username:"Administrator"
other_df = search_obj.table
Let's look at what systems are involved:
other_df.hostname.value_counts()
Let's look at this a bit more:
other_df[other_df['datetime'].dt.strftime('%Y%m%d') == '20200919'][['datetime', 'hostname', 'logon_type']]
Let's look at the different data types we've got:
other_df.data_type.value_counts()
Let's look at the content of the registry entry for MSTC connection:
other_df[other_df.data_type == 'windows:registry:mstsc:connection'][['datetime', 'display_name', 'hostname', 'message']]
We can see that it seems that the attacker did enter the desktop (DESKTOP-SDN1RPT) using RDP from the Domain Controller (that we already established that they brute-forced to gain access)
Let's look at all logon events that occurred on 2020-09-19
day_filter = other_df.datetime.dt.strftime('%Y%m%d') == '20200919'
filtered_view = other_df[day_filter & other_df.tag.str.join(',').str.contains('logon-event')][[
'datetime', 'computer_name', 'logon_process', 'logon_type', 'source_address', 'source_username', 'windows_domain', 'username', 'workstation']]
filtered_view
We can look at fewer records here, or some summaries:
filtered_view.workstation.value_counts()
So we go from kali
to both the DC server and the Desktop.
filtered_view[filtered_view.workstation == 'kali']
Likely that the attacker used Kali as their attack platform
or operating system of choice for the attacker. Let's look at the remote interactive logons:
filtered_view[filtered_view.logon_type == 'RemoteInteractive']
Here is is clear that the attacker did RDP into CITADEL-DC01 from the IP of 194.61.24.102 and from there they entered DESKTOP-SDN1RPT (also via RDP using the same credentials).
This is a question we might not be able to answer with Timesketch (at least not with the current version, but this may change soon with the introduction of graphing)
Ok so here we are just starting by guessing for potential interesting filenames based on the case description
secret_files = timesketch_query_func('secret', fields='*').table
Let's take a look at what we've got
secret_files.data_type.value_counts()
and
secret_files.parser.value_counts()
Let's look at some of these browser entries...
secret_files[secret_files.data_type == 'msie:webcache:container'][['datetime', 'timestamp_desc', 'url']]
Uh that looks interesting: Administrator@file:///C:/FileShare/Secret/Szechuan%20Sauce.txt
and the whole directory.
Let's filter out the Secret folder... use the message field to start with:
secret_files[secret_files.message.str.contains(r'FileShare\/Secret')]['message'].unique()
Let's look at some of the details here:
secret_files[secret_files.message.str.contains(r'FileShare\/Secret')][['timestamp_desc', 'message']].drop_duplicates()
Let's remove the expiration time from this...
secret_files[secret_files.message.str.contains(r'FileShare\/Secret') & secret_files['timestamp_desc'] != 'Expiration Time'][['datetime', 'message']].drop_duplicates()
Let's extract unique filenames here
secret_files[secret_files.message.str.contains(r'FileShare\/Secret')].message.str.extract(r'(FileShare\/Secret\/[^ ]+)')[0].unique()
There are more files here than just the stolen szechuan sauce....
Let's look more closely into the beth
files:
beth_df = secret_files[secret_files.message.str.contains('beth', case=False)]
beth_df.data_type.value_counts()
Start with the simple fs stat:
beth_df[beth_df.data_type == 'fs:stat'][['datetime', 'display_name', 'timestamp_desc']]
Let's look at another source of timestamped information:
beth_df[beth_df.data_type == 'olecf:dest_list:entry'][['datetime', 'hostname', 'timestamp_desc', 'path']]
beth_df[beth_df.data_type.str.contains('windows:shell_item:file_entry|msie:webcache:container')][[
'datetime', 'data_type', 'source_long', 'timestamp_desc', 'url', 'long_name', 'origin', 'shell_item_path']]
The Administrator account accesses the file at "2020-09-18 20:32:13.141000+00:0
', yet the file's creation date is at 2020-09-18 23:33:54+00:00
.. that's odd...
There are also shell entries (for instance jumplist entries) there before the file was created
.
This needs to be looked at more closely, but it looks like there may be some timestomping activity here.
These files are also visited by the Administrator account... let's look at logins...
search_obj = %timesketch_query --fields * username:"Administrator" AND tag:"logon-event"
admin_login = search_obj.table
admin_login.shape
admin_login['datetime'] = pd.to_datetime(admin_login['datetime'])
admin_login[admin_login.datetime.dt.strftime('%Y%m%d') == '20200918'][[
'datetime', 'logon_type', 'source_address', 'source_username', 'username', 'windows_domain', 'workstation']]
Let's look for signs of compressed files.
%%timesketch_query search_obj --fields *
*.zip OR *.tar.gz OR *.tgz OR *.tbz OR *.tar.bz2 OR *.cab OR *.7a
zip_df = search_obj.table
zip_df.columns
zip_df.data_type.value_counts()
zip_df[~zip_df.filename.isna() & (zip_df.filename.str.contains('.zip$'))]['filename'].value_counts()
Let's limit ourselves to the day of the attack
zip_day = zip_df[zip_df['datetime'].dt.strftime('%Y%m%d') == '20200919']
zip_day['filename'].value_counts()
OK, there are some interesting files here... most notably the Secret.zip
the temporary Secret.zip
, the My Social Security Number.zip
and loot.zip
We can check first what files have an associated LNK entry there:
zip_day[zip_day.data_type == 'windows:lnk:link'][['link_target', 'local_path']].drop_duplicates()
zip_day[zip_day.data_type == 'fs:ntfs:usn_change'][['filename']].drop_duplicates()
These should be looked at a bit more closely.
There are many possible ways to "timestomp" (or alter the timestamps) of a file. Some of these are harder to detect than others. One method of altering timestamps changes the file timestamp to the second; however, NTFS timestamps have greater precision than that. An easy way to detect files with timestamps altered in this way is to look for timestamps with the fractional seconds set to 0.
Let's query for the file timestamps ("fs:stat"). As this would pull back a lot of results, I'll add some search terms found earlier in the case to limit the results. This would work with the keyword (but it would take much longer).
timestomp_df = timesketch_query_func(
'data_type:"fs:stat" AND (*secret* OR *zip* OR *coreupdater* OR *Szechuan*)',
fields='datetime,timestamp,timestamp_desc,filename,hostname,inode').table
Let's see how many results we got:
timestomp_df.shape
And finally, let's look for any timestamps the end with 0s (this could have false positivies, but it's literally a 1 in a million chance).
timestomp_df[timestomp_df.timestamp.astype('str').str.endswith('000000')]
There we go! Beth_Secret.txt
has timestamps that lack fractional seconds; that's strange. Let's take a closer look to see if anything else looks strange about this file.
From the above results, the inode (or MFT file reference number) is 87111
. Since we got a disk image of the server relatively quickly after the incident, there's a good chance we still have USN journal entries that cover that file. Let's look!
The following query looks for any USN change journal records for 87111
, as well as the timestamps from the above table (for comparison), limited to the server:
secret_timestomp_df = timesketch_query_func(
'"File reference: 87111-" AND data_type:"fs:ntfs:usn_change"',
fields='data_type,datetime,filename,hostname,inode,message').table
secret_timestomp_df.sort_values('datetime')
Interesting! We do have change journal records for that file. Let's work through these results (sorted oldest to newest):
New Text Document.txt
is createdNew Text Document.txt
to Beth_Secret.txt
OBJECT_ID_CHANGE
; honestly not sure what that is.USN_REASON_DATA_EXTEND
).USN_REASON_BASIC_INFO_CHANGE
; maybe, basic info like changing the timestamps :)That's a pretty comprehensive timeline for actions on that particular "secrets" file. All the timestamps seem to fit with each other and tell a plausible story. Now, let's add in those timestamps that look a bit suspicious:
more_timestomp_df = timesketch_query_func(
'(("File reference: 87111-" AND data_type:"fs:ntfs:usn_change") OR (inode:87111)) AND hostname:"CITADEL-DC01"',
fields='data_type,datetime,timestamp,timestamp_desc,filename,hostname,inode,message').table
more_timestomp_df
They are all way before the USN journal activity! That, along with the null fractional seconds and the tight USN change journal timeline, point toward manipulated timestamps.