Another day, more data breaches – where the data goes
This is my first blog post as principal threat analyst with ThreatGEN. The intent is to release a somewhat weekly security news aggregate article with a preference for Industrial security threat articles and a strong technical emphasis on things. Well, I will start that ritual next week, because this week I must address the elephant in the room, data breaches.
After reading about yet another "biggest ever" data breach, this time concerning Capital One, and reading yet again how the circumstances were ideal for the company to have noticed something, had they cared enough to be on the lookout, I had to write an article that at least included me complaining about these companies and their ignorance to basic cybersecurity. Companies like Equifax and Capital One are entrusted with our most sensitive data but are unwilling to do their due diligence to prevent that data from being stolen. We are left having to mop up the mess of their reluctance that manifests in issues like stolen identities. If you have ever had to deal with trying to clean up your credit report, you know exactly what I am talking about. It takes months, even years to clean up the mess that is the result of having your identity stolen. Thing is though, there isn't a good incentive for big companies to secure their stuff. It is still cheaper to deal with the aftermath of a security breach then to spend the time and effort to prevent the breach. Therefore, I applaud fines and regulations like the GDPR is (supposed to be) doing in Europe. Seeing how Equifax is paying the price for their lack hearted behavior in the form of a 700 Million dollar settlement is a step in the right direction and I hope the hurt is enough for them to learn their lesson. We need to get to a point where due diligence and properly securing a company's systems is more profitable (cheaper) than ignoring it.
Well, that's all the complaining I want to do on the subject. What always intrigues me though, when I read about these massive amounts of stolen data, is where does it all go? What is being done with it? If we look at the numbers of some of the biggest data breaches of the past years:
- Yahoo 2013 Â Â Â 3 Billion email addresses and passwords
- Marriott 2018 Â Â Â 500 Million (email) addresses, passwords, Passport info and credit card info
- Experian 2013 Â Â Â 200 Million Social Security Numbers and personal identifiable information
- Equifax 2017 Â Â Â 145 Million Social Security Numbers and personal identifiable information
- Target 2014 Â Â Â 110 Million Credit card numbers
- Capital One 2019 Â Â Â 106 Million personal identifiable information including Social Security Numbers
We should quickly realize that this is a lot of information to be had! Where does all this information go and what is being done with it? Well, most of the stolen data ends up being sold in bulk on shady sites on the dark web. At this point, to clarify some terminology, the following is a brief description that outlines what the difference is between the terms deep, dark and regular (or surface) web:
The Surface Web is anything that a search engine can index/access:
The surface web is what all of us use every day. Its where you check your email, go to fakebook and do your banking online. This part of the internet gets indexed by the various search engines like google, Bing and DuckDuckGo.
The Deep Web is anything that a search engine can't index/access:
The deep web is the area of the internet that search engine crawlers cannot index (results from page queries for example) or are not allowed to index (via restrictions in the robots.txt).
The Dark Web is an intentionally hidden network that is inaccessible through standard web browsers:
The dark web is an area of the internet that is not directly accessible from any client. You need specialized software like Tor (https://www.torproject.org/) to access and participate in the dark web network, or 'tor' network. Because of the way the tor client software connects to the internet, via a network of proxy relays, the user's computer details are hidden and activities on and over the tor network are relatively anonymous. This allows the user of the tor software to access the surface web anonymously. The tor software also allows creating web servers on the user side, only accessible via the network of proxy relays, effectively creating so-called onion sites that allow anonymous hosting. This anonymous behavior is ideal for people who live under restrictive regimes but has also attracted the attention of more malicious and devious users, who use the tor network to sell and distribute illegal materials and goods like drugs and personal information like credit cards, social security numbers and login credentials. This has given the dark web a bad reputation. To set things straight, the dark web is not a malicious place in and by itself, however there are areas and users who engage in malicious activities.
With the terminology out of the way and knowing our information makes it to the dark web to be sold. But where? Well, the answer to that is ever changing, depending on who stole the data, who is selling and who you ask. A quick internet search for the subject "reddit onion sites for buying credit cards" reveals many sources of answers (I include reddit in the search term as many links and discussions around dark web activity can be found on the reddit site):
At this point, before we can go any further and explore any of the onion links found, we need to get the tools to allow us to go there, we need to get tor installed and a browser configured to use tor as a proxy to access onion sites. The easiest way to achieve all this is to install tor browser.
I highly recommend that you use a virtual machine to follow along with the rest of the article
If you are running a windows machine the installer for the tor browser can be found here: https://www.torproject.org/download/download/ and for Linux we can simply run the command "sudo apt install torbrowser-launcher", followed by starting the browser from the launch menu and we are off to the races.
Click connect and see the browser popup, allowing access to the network of onion sites
We can now start exploring onion sites and the deep dark secrets they hold. A good place to start is with the dark web search engine 'NotEvil' (http://hss3uro2hsxfogfq.onion/). A search for 'ssn dumps for sale', results in 241 hits:
After trying a site or two, MeccaDumps (http://mecca2tlb6dac76g.onion/) seems to be offering what we are looking for.
After a painless registration process, we can now query the dumps section for breach data:
At this point, you can look for particular data breaches or company of interest and see what is available. If this site doesn't have what you are looking for, you need to move on and try the next dumps selling onion site. Onion sites are often unreliable, can be fake and will disappear for a variety of reasons. Searching for breach data that directly ties to a known breach is not an easy task. Even tough, if you spend enough time researching the dark you develop the proper skills and assemble a list of your go-to sites, your luck of finding any particular information on a given breach, let alone an individual person are slim to none. The biggest issue is, as you will find out that data sold on the dark web doesn't come advertised as "came from Equifax" or "stolen from Target". The stolen data is split up into types and sold along with similar information from other data breaches. This effectively obfuscates the data from the breach and the attacker.
There are companies that will try to sell you a service that will 'scan' the deep and dark web for your personal information. Let's think about that for a minute. Although these dark web scanning services are very vague on how exactly they operate we can conclude some things here. There are 1,208,925,819,614,629,174,706,176 possible site addresses on the dark web, so they are probably not scanning all the dark web, that is just not feasible. They will also not go out and buy the dumps that are being offered on sites like MeccaDumps, that is legally dubious at best. The best they can do is find publicly available data breach dumps like "Collection#1" (https://www.digit.in/news/internet/773-million-records-have-been-leaked-in-the-single-largest-data-breach-to-go-public-is-yours-one-45888.html) and tell you if your data is in those publicly available data dumps. Well, you don't have to pay for that service, Troy Hunt maintains a site at https://haveibeenpwned.com/ that allows you to look through (as of writing):
Most people shouldn't even bother checking though. The sad truth nowadays is that if you conducted any of the following activities within the past 15 years:
- Performed online banking
- Purchased something online
- Had health care
- Owned a cell phone
- Had a job
- Made a hotel reservation
Then, like it or not, your data is out there, being sold and resold by criminals. With stolen credit card information, banks typically "mitigate" by issuing a new credit card. For stolen account login information, one typically changes an account password or closes the affected account to deal with the compromise. However, our lovely credit scoring system cannot operate that way, you only get one Social Security Number and it is extremely hard or impossible to change. So, if your SSN falls in the hands of the criminals it could be used at any point in time. The only remediation is a good credit monitoring regime. Notice that a refrain from using "credit monitoring service" here, that service we are offered companies after they are breached, that can keep an eye on your credit reports for you and alert when something weird happens. Although a service like that should be PART OF your regime, you need to do more. You need to actively keep an eye on your credit details, meaning running credit reports, setting alerts and most importantly, in-between major purchases like cars or real estate, freeze your credit. Thankfully, after the Equifax breach this has become a lot cheaper, if not free, so do it!
If you are like me and intrigued to see what the dark web holds in terms of (your) personal information or information in general, come and join my journey of crawling and indexing (part of) the dark web with me. What follows in the remainder of this article is the setup of an instance of the Ache web crawler, specifically targeted at crawling the dark web. The idea is to crawl and index as much as we can from the dark web, store it in an Elasticsearch instance so we can use targeted queries and report to do our research. So, without further delay, let's build a crawler to start collecting that info:
The setup I am describing next is based on the tutorial on the ReadTheDocs site here: https://ache.readthedocs.io/en/latest/tutorial-crawling-tor.html. It is adjusted slightly to use a different starting angle.
On a Linux VM, open the terminal and install some dependencies we need for the work ahead:
sudo apt update && sudo apt full-upgrade -y
sudo apt install tor docker docker-compose curl wget git openjdk-8-jdk
sudo service docker start
Now we make a working directory and copy the setup files for the docker images and the Ache crawler from github with the following commands:
mkdir workdir
cd workdir
mkdir tools
cd tools/
mkdir config_docker_tor/
cd config_docker_tor/
curl -O https://raw.githubusercontent.com/ViDA-NYU/ache/master/config/config_docker_tor/ache.yml
curl -O https://raw.githubusercontent.com/ViDA-NYU/ache/master/config/config_docker_tor/docker-compose.yml
curl -O https://raw.githubusercontent.com/ViDA-NYU/ache/master/config/config_docker_tor/tor.seeds
Finally, we are adding some starting URLs to the tor.seed file to feed the crawler some more relevant onion site urls. The following are added to the existing list:
- http://hss3uro2hsxfogfq.onion/index.php?q=ssn+credential+dump&session=fayTL686py6R9apcKstWssOEgHtQjM4P0Yso3%2BVt63k%3D&numRows=20&hostLimit=20&templat$
- http://grjfadb7bweuyauw.onion/2.html
- http://visiwnqyii4r5f5l.onion/address/
- http://gdaqpaukrkqwjop6.onion/tag/hacking/
- http://deepmartyqzffl5n.onion/hacking/
- http://hss3uro2hsxfogfq.onion/index.php?q=hacking&session=pZgzgIxNNhiO7%2FPxMx3x3zEykXHT%2B%2Bq0Gnsg8kCjHlc%3D&numRows=20&hostLimit=20&template=0
- http://www.6id6knemqavczyxc4hebboorxg2gpfm3silsfj25jzguagk4q4blepqd.onion/links.html
- http://www.6id6knemqavczyxc4hebboorxg2gpfm3silsfj25jzguagk4q4blepqd.onion/backups/freshonions_lastlist.json
The links that I added came from searches for like "hacking dumps" and "ssn credential dump link" via the NotEvil search page. The onion sites that are added themselves have many links to other, related onion sites, serving as a great seed source for our crawler.
With all the pieces of the puzzle in place, we can start the crawler setup by running the command:
sudo docker-compose up -d
While the crawler does its work, you can look at the statistics by navigating to http://localhost:8080
With the onion sites now being crawled and their content indexed, let's add Kibana to the equation for easy searching and visualizing. On your Linux VM, enter the following commands:
Download and install the Public Signing Key:
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Add the repository definition to your /etc/apt/sources.list.d/kibana.list file:
echo "deb https://packages.elastic.co/kibana/4.6/debian stable main" | sudo tee -a /etc/apt/sources.list.d/kibana.list
Run apt-get update and the repository is ready for use. Install Kibana with the following command:
sudo apt-get update && sudo apt-get install kibana
Configure Kibana to automatically start during bootup. If your distribution is using the System V version of init, run the following command:
sudo update-rc.d kibana defaults 95 10
Now we can start kibana
sudo service kibana start
Kibana will start and once you navigate to the web portal URL at http://localhost:5601 you can start interacting with the kibana application.
The first time you open the kibana app you will have to define the index pattern. Replace the text present with an * and save as default index. Now you can start using the data that is already available:
That's it for this blog post. We are now setup with a powerful dark web research environment. Data is being retrieved and next time we can look at how we can slice and dice the information we are indexing here. Let your crawler run, it will likely index until I come with the next blog post. You will have a ton of data to play with by then.
About the Author
Pascal Ackerman is a seasoned industrial security professional with a degree in electrical engineering and with 18 years of experience in industrial network design and support, information and network security, risk assessments, pentesting, threat hunting and forensics. After almost two decades of hands-on, in-the-field and consulting experience, he joined ThreatGEN in 2019 and is currently employed as Principal Analyst in Industrial Threat Intelligence & Forensics. His passion lies in analyzing new and existing threats to ICS environments and he fights cyber adversaries both from his home base and while traveling the world with his family as a digital nomad.