The World Wide Web is much bigger than you probably think.
Most of us will only ever skim the surface of the wild jungles of unexplored data that exist on the web. Yet every time we average explorers use the Internet, we leave behind a tantalizing trail of breadcrumbs for thieves and criminals who operate in the depths below.
To understand the vastness of the web and the data that live there, you should understand the difference between the surface web, the deep web, and the dark web.
The surface web is the Internet most of us are familiar with, and consists of all the sites and pages that can be found by search engines like Google, Bing, and Yahoo.
The deep web is made up of pages that are not easily accessed by search engines. This includes any pages that are not linked to by an outside source and pages that are password or registration protected. You probably interact with pages like this regularly. A good example is the search results on a travel site like Hotwire. If you want to find the cost of a hotel for a certain city on a particular date, you cannot easily access that information by simply clicking links — and neither can a search engine.
The deep web also can be accessed by TOR, a client software used to route Internet traffic through a worldwide volunteer network of servers and thus make the user as close to completely anonymous as possible. Of course, the possibility of anonymity is very attractive to criminals. So while some of what goes on in the deep web is perfectly normal and legal, some is not.
The dark web refers to pages that are not indexed by search engines and offer illegal products, information, or transactions.
Very little concrete information is known about the dark web (or the deep web) because, by its very nature, it resists classification and quantification. Estimates say it could be anywhere from 5,000 to 7,000 times bigger than the surface web, but no one knows for sure.
Accurately determining the size of the deep web or the dark web is all but impossible. In 2001, it was estimated that the deep web contained 7,500 terabytes of information. The surface web, by comparison, contained only 19 terabytes of content at the time.
What we do know is that the deep web has between 400 and 550 times more public information than the surface web. More than 200,000 deep web sites currently exist. Together, the 60 largest deep web sites contain around 750 terabytes of data, surpassing the size of the entire surface web by 40 times. Compared with the few billion individual documents on the surface web, 550 billion individual documents can be found on the deep web. A total of 95 percent of the deep web is publically accessible, meaning no fees or subscriptions.
Unfortunately, a considerable amount of crime takes place on the dark web. Cybercrime is the most often committed offense in the UK, with 2.5 million incidents recorded in the past year. Stolen credit and debit cards go for very little on the dark web: $5 to $30 in the United States, $20 to $35 in the UK, $20 to $40 in Canada, $21 to $40 in Australia, and $25 to $45 in the European Union. Bank login credentials for a $2,200 balance bank account cost an average of $190.
Markets on the dark web processed more BitCoin transactions than the legitimate BitPay service. In fact, estimates put the volume of dark web transactions at around $650,000 per day in 2014.
A study estimated that more than 80 percent of the traffic on the dark web was involved with child pornography. But the dark web is not merely the domain of criminals and degenerates. It’s also used by whistle-blowers, journalists working in dangerous areas, and national and international security organizations to pass sensitive information.
This is big data at its most raw, untapped, and impossible to analyze. Yet the information exists for those enterprising few who seek it. For example, cyber-security expert John McAfee wrote for the International Business Times that, while a comprehensive database of police-involved shootings in the United States does not exist on the surface web, the data exist in the deep web for anyone interested enough to find it.
For me, the deep web seems like the frontier, the Wild West, the uncharted wilderness of data, just waiting for the right innovators to explore and perhaps tame it.
Bernard Marr is a bestselling author, keynote speaker, strategic performance consultant, and analytics, KPI, and big data guru. In addition, he is a member of the Data Informed Board of Advisers. He helps companies to better manage, measure, report, and analyze performance. His leading-edge work with major companies, organizations, and governments across the globe makes him an acclaimed and award-winning keynote speaker, researcher, consultant, and teacher.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.