Cloudflare revealed a serious bug in its software yesterday which caused sensitive data like passwords, cookies and authentication tokens to spill unencrypted from its customers’ websites. The announcement is a major blow for the content delivery network, which offers enhanced security and performance for more than 5 million websites.
The massive leak (or “cloudbleed” as many are calling it) could have allowed anyone who noticed the error to collect a variety of very personal information that is typically encrypted or obscured. Emergency damage control efforts were further complicated by yet another issue: Some of that data was automatically cached by search engines and inadvertently became publicly available, making it particularly difficult to clean up the aftermath. In an attempt to reduce the damages caused, Cloudflare approached Google, Bing, Yahoo and other search engines and ask them to manually scrub the data.
Using a special set of search parameters inside the less-known Duck Duck Go search engine, one can still examine copious amounts of personal user data which remains still available on the unencrypted internet. (View the search here) Many sources estimate the amount of personal information leaked to be roughly 1,000,000 pages in length covering 4,287,625 possibly-affected domains. Various programmers have gone to work correlating known data from previously compromised databases (such as this one) to produce an unofficial list of the confirmed affected domains. You may download the unofficial list below:
Download the full list.zip (22MB)
Download this file, unzip it, then run
grep -x domaintocheck.com sorted_unique_cf.txtto see if a domain is present.
Some of the notable sites compromised are as follows:
Alexa top 10,000 sites affected are listed HERE.
Additionally, a list of some iOS apps that may have been affected may be found HERE.
The leak may have been active as early as Sept. 22, 2016, almost five months before a security researcher, Tavis Ormandy, at Google’s Project Zero discovered it and reported it to Cloudflare.
Travis explained his discovery:
“On February 17th 2017, I was working on a corpus distillation project, [see also Google Security Blog] when I encountered some data that didn’t match what I had been expecting. It’s not unusual to find garbage, corrupt data, mislabeled data or just crazy non-conforming data…but the format of the data this time was confusing enough that I spent some time trying to debug what had gone wrong, wondering if it was a bug in my code. In fact, the data was bizarre enough that some colleagues around the Project Zero office even got intrigued.
“It became clear after a while we were looking at chunks of uninitialized memory interspersed with valid data. The program that this uninitialized data was coming from just happened to have the data I wanted in memory at the time. That solved the mystery, but some of the nearby memory had strings and objects that really seemed like they could be from a reverse proxy operated by cloudflare – a major cdn service.
“… My working theory was that this was related to their “ScrapeShield” feature which parses and obfuscates html – but because reverse proxies are shared between customers, it would affect *all* Cloudflare customers.
We fetched a few live samples, and we observed encryption keys, cookies, passwords, chunks of POST data and even HTTPS requests for other major cloudflare-hosted sites from other users. Once we understood what we were seeing and the implications, we immediately stopped and contacted cloudflare security.”
Mr. Tavis Ormandy my be contacted via email at firstname.lastname@example.org
Additional information can be found on GitHub
Read the official statement and report by Cloudflare concerning this issue.