In Dutch: Volg de actuele ontwikkelingen rond de Wet op de inlichtingen- en veiligheidsdiensten via het Dossier herziening Wiv 2017

June 24, 2014

Snowden-documents show no evidence for global mass surveillance

(Updated: May 24, 2015)

Earlier this month, it was the one year anniversary of the Snowden-leaks, by far the biggest disclosure ever of highly secret documents from the US National Security Agency (NSA). Edward Snowden and Glenn Greenwald are using these documents to show how eager NSA is to collect every bit of communication that travels around the world.

But by taking a close and careful look at the original slides and reports which have been published so far it comes out that they contain no hard evidence for a massive abuse of power or violation of the law, not even for the alleged mass surveillance of innocent people all over the world.





Headquarters of the National Security Agency at Fort George G. Meade
(screenshot from PBS Frontline - United States of Secrets)
 

No Place To Hide

Edward Snowden and Glenn Greenwald claim that NSA wants to collect, store, monitor and analyse the electronic communications of innocent citizens all over the world, which would be an unprecedented abuse of power and a violation of the American constitution. This is how the story is told over and over in numerous media reports worldwide, and also in Greenwald's book 'No Place To Hide', which was published in over twenty countries on May 13, 2014.

After a year of countless revelations, people might have expected that this book would provide a detailed and comprehensive explanation of all those confusing NSA programs, tools, and operations. But although it contains a range of new documents, these go without any proper explanation. Greenwald just uses them for picking a phrase or a number which he thinks supports his own narrative.


Libertarianism

Both Snowden and Greenwald are acting from points of view that are based on Libertarianism, a political ideology which encompasses minimizing the influence of government and maximizing the freedom and liberties of individual citizens.

They argue that state surveillance is a big evil, not at least because when people are knowing that they are being watched and followed, most of them will going to behave compliant to the existing powers all by themselves (the so-called "chilling-effect").

But for that, people first have to know that they are being monitored, and NSA did everything to keep the extent of its spying operations hidden from the public. Only after the documents taken by Edward Snowden were published, people actually learned about how massive that spying is - in the eyes of Snowden and Greenwald.

 

NSA's military tasks

The Snowden-leaks of the past year learned us a lot about NSA, but there are also some important aspects that were ignored. One is the fact that NSA is a military intelligence agency: it falls under the US Deparment of Defense (DoD), is led by a high-ranking military officer and plays an important role in supporting the US armed forces.

For that, NSA is not only intercepting communications that are of strategic or tactical importance, but also collecting and analysing many other types of electromagnetic radiation, like from radar, which is called ELINT. All five US Armed Services have dedicated signals intelligence and cryptologic units, which together form the Central Security Service (CSS), the tactical branch of NSA:




Neither Snowden, nor Greenwald, nor the vast majority of the media reports even came close to mentioning the true extent of NSA's military job. One indication that can be put together from the numbers from the BOUNDLESSINFORMANT tool is that 54% of the data that NSA collects globally comes from countries in the Middle East plus India.

Because also no NSA activities related to US military operations, like for example in Afghanistan, have been revealed, most people will now think that NSA is only spying on civilians. One of the very few exceptions was the Dutch newspaper NRC Handelsblad, which revealed how the Dutch military intelligence service MIVD cooperated with American troops in Afghanistan and helped mapping a network of Somali pirates.




Military personnel in NSA's National Security Operations Center (NSOC)
(Screengrab from a 60 Minutes documentary)


One example of where the military aspect seems to have been withheld deliberately, was the revelation by The Guardian and the New York Times of the 9-Eyes and the 14-Eyes, groups in which a number of European countries closely cooperate with NSA. Later it became clear that data and intelligence exchanged within both groups is for military purposes.
Update:
On July 9, 2014, Glenn Greenwald indicated on Reddit, that it was part of the agreement with Snowden not to publish anything about Afghanistan and other military operations. This probably also led to the next misrepresentation...


NSA spying in Europe?

During the second half of 2013 we learned about BOUNDLESSINFORMANT, the tool used by NSA for counting and visualizing its worldwide data collection activities. Initially, Glenn Greenwald reported in various European newspapers that charts from this tool show that tens of millions of phone calls of citizens from Germany, Spain, France, Norway and Italy were intercepted by NSA.

But soon, military intelligence services from these countries declared that this interpretation was wrong and that the charts actually show metadata that were not collected by NSA, but by them. These statements are supported by the fact that the related BOUNDLESSINFORMANT charts show the DRTBox technique, which is primarily used in tactical military environments.

The metadata were derived from foreign communications in crisis zones and collected in support of military operations abroad. Subsequently these data were shared with partner agencies, most likely through the SIGDASYS system of the SIGINT Seniors Europe (SSEUR or 14-Eyes) group, which made them available for NSA too.

In the end, the disclosures about various European countries did not proof massive spying by NSA, but rather show how close European agencies are cooperating with the Americans in the field of military intelligence.


Chart from the BOUNDLESSINFORMANT tool that was released by Der Spiegel on June 18, 2014
It shows that SIGADs related to European countries are actually part of 3rd Party collection
(click to enlarge)

 

NSA's goals

Something that Snowden en Greenwald are repeating over and over is that NSA wants to have all digital communications from all over the world: "Collect it All". But the evidence they present is very thin and not very convincing. According to Greenwald's book, that alleged goal is from a memo about the satellite intercept station Misawa in Japan and from a few slides about the Menwith Hill satellite station in the UK:



About the Foreign Satellite Collection (FORNSAT)
at Menwith Hill Station (MHS) in the UK



NSA Director Keith Alexander talking about FORNSAT
during a 16 June 2008 visit to MHS


Since international telecommunications shifted to undersea fiber-optic cables after the year 2000, satellite links nowadays carry only a small share. It could be possible to collect all of that, but that aim can't be applied to the entire collection effort of NSA, which is so much larger. Furthermore, if "Collect it All" really was NSA's ultimate goal, then it certainly would have been in more high-level policy documents for the entire organization - which have not been presented so far.

In an television interview from April 2014 with John Oliver, now former NSA director Alexander explained that "Collect Everything" was only meant for specific problems, and as such applied to Iraq. The same was the case for Afghanistan, as these are the only known countries for which NSA conducted a real mass surveillance effort under the name Real Time-Regional Gateway (RT-RG).



Strategic Mission List

The real and far more specified goals for NSA can actually be found in the 2007 Strategic Mission List (pdf). This document was revealed by The New York Times in November 2013, but got hardly any attention.

Besides the strategically important countries China, North-Korea, Iraq, Iran, Russia and Venezuela, which are enduring targets, the document also lists 16 topical missions. The most important ones are: winning the war against terror; protecting the US homeland; supporting military operations; preventing the proliferation of weapons of mass destruction by countries like China, India, Iran and Pakistan.

Some of the non-military goals for NSA are: anticipating state instability; monitoring regional tensions; countering drug trafficking; gathering economic, political and diplomatic information; ensuring a steady and reliable energy supply for the US. All these goals can be considered more or less legitimate for a large intelligence agency like NSA.

The topics in the Strategic Mission List are derived from a number of other strategic planning documents, including the National Intelligence Priorities Framework (NIPF), which sets the priorities for the US Intelligence Community as a whole. NSA is therefore assigned to its tasks by the US federal government.


Economic espionage

The US government insists that it's intelligence agencies are not spying on foreign companies for the benefit of individual American corporations: economic intelligence is only used to support policies, lawmaking and negotiations that benefit the US economy as a whole. Greenwald doesn't make that distinction, so he interprets every reference in NSA documents to commercial companies in the worst possible way.

For example, he tried to proof economic espionage by publishing a slide that shows the names of companies like Petrobras, Gazprom and Aeroflot. But the slide clearly says "Many targets use private networks", which indicates that NSA is focussing at specific, and probably legitimate targets, more than at these companies themselves:




Just like in many other publications based upon the Snowden-documents, conclusions are drawn from a very selective reading of a single slide, out of its context and with parts of the content redacted. Such can not be sufficient evidence for the far-reaching claims and accusations that Greenwald and Snowden are making.

NSA and GCHQ hacking into the computer networks of Swift, Belgacom and SIM card manufacturers are also often presented as examples of economic espionage, although in all these cases the goal was not to steal trade secrets or gain economic advantage. These hacking operations were instrumental for getting access to information or the communications of other targets: the "Gemalto-hack" was in support of military operations in countries like Afghanistan, Yemen and Somalia.



End-reports

For getting certainty about whether NSA conducted the unwanted economic espionage, or about the results from its eavesdropping operations in general, we should see the end-product intelligence reports that NSA analysts write after having analysed the collected data. It seems that access to these reports is more strictly controlled, because apparently Snowden was not able to take these too.

This indicates that NSA actually has internal access control systems that do work. Which contradicts the alleged uncontrolled access that analysts have to virtually anyone's communications - according to Snowden, who also hasn't provided any documents that proof that claim, for example by showing deficiencies of NSA's user authentication system CASPORT.

At first sight it looks very impressive that almost all documents he leaked are stamped TOP SECRET//COMINT, but inside NSA information at that classification level is actually available to virtually everyone. Really sensitive secrets are in compartments like those for Exceptionally Controlled Information (ECI) of which often not even the codeword is known.
Update:
On July 5, 2014, The Washington Post revealed that Snowden actually did had access to reports containing full internet messages that were intercepted under section 702 FAA authority and that he was able to exfiltrate some 160.000 of them. The article suggests that he was able to do this because he had authorized access to at least the RAGTIME compartment.

Some other ECI-codewords that have been disclosed are REDHARVEST (RDV) and WHIPGENIE (WPG), and also details about the scope of the STELLARWIND (STLW) control system came out.

Hacking operations

Misleading are also the press reports about NSA hacking into smartphones and computers, whether through the telephone networks, the internet or by bridging the "air gap". Without mentioning for what kind of targets these methods are used, and by using general terms like "internet users" instead of "targets", people get the idea that it can effect everyone.

This is illustrated by the story that NSA has facilities where they intercept shippings of commercial computer hardware in order to covertly install spying implants. A scary idea if NSA would do that randomly with hundreds of thousands of shipments, but as we can see in this internal report, the method is used to "Crack Some of SIGINT's Hardest Targets" - in which case it can be considered legitimate and proportionate:


Update:
On January 17, 2015, Der Spiegel published the full version of this NSA report, which appeared to be longer than the one published in Greenwald's book. What he left out was a section that describes a successful supply-chain interdiction against the Syrian Telecommunications Establishment (STE) - a target for which such methods are clearly justified.


Damaging disclosures

It may not have been that lives of American officials or specific operations have been endangered, but there's no doubt that disclosing these methods damaged NSA's ability to get access to communications which are otherwise impossible to intercept. Both friends and enemies will now check every new computer shipment and all of their existing sensitive computer and telephone systems in order to remove every piece that resembles those shown in the media.

Snowden said he doesn't want to harm the US and also not to constrain bilateral relations with other countries. But as the opposite has happened, it seems that some journalists to whom he gave his documents, are not always publishing them according to his intentions.

For example, the German magazine Der Spiegel revealed details about NSA's computer spying implants, while Glenn Greenwald published about their spying on the presidents of Mexico and Brasil, which put their relationship with the US under severe pressure (the eavesdropping on German chancellor Merkel was not based upon information from Snowden, but from another source).


Similar were disclosures about the NSA eavesdropping on the communications of the UN, the European Union, a number of foreign embassies, international conferences and some large private companies. It was embarrasing for the US having these activities exposed, although these kind of activities are the core business of every foreign intelligence agency.


GCHQ operations

Looking at the legal framework and official tasks also helps to better understand the disclosures about the British signals intelligence service GCHQ. From various documents, it seems this agency is especially eager and agressive, like for example in collecting webcam images and planning "disruption" operations against hackers associated with Anonymous.

Rarely mentioned is that such activities would actually fit within the broader mandate and the less legal restrictions which the British service has compared to the NSA. For example, GCHQ is allowed to operate domestically and assist the security service MI5, as well as law enforcement, where activities of NSA are strictly limited to foreign intelligence.

GCHQ also wants to be a major player in the field of foreign signals intelligence. Although it reportedly has access to 200 fiber-optic cables, the agency is only able intercept 46 cables of 10 gigabits/second at a time. This would make that 21 petabytes of data flow past these systems every day.


GCHQ's umbrella program to tap, filter and search internet traffic is codenamed TEMPORA. It incorporates NSA's XKEYSCORE system and is thereby able to preserve all content for 3 days and all metadata for up to 30 days in a rolling buffer. Unlike NSA, which has XKEYSCORE at some 150 sites worldwide, GCHQ concentrated its TEMPORA system at three processing centers:


Explanation of the TEMPORA system used by GCHQ
(Source)

 

NSA collection worldwide

One of the major accusations of Snowden and Greenwald is that NSA is indiscriminately gathering and storing electronic communications from all over the world. As said, there are no documents about the tactical systems for military purposes, but we learned a lot about the various ways the agency taps into general telecommunication channels like satellite links and fiber-optic cables, both submarine and landbased.

NSA's access to them can be unilateral or in cooperation with foreign partner agencies: with 2nd Party partners under the WINDSTOP program, and with 3rd party agencies under the RAMPART-A program.


Some numbers

From the BOUNDLESSINFORMANT tool and some other charts we know that NSA collects billions of data a day. That sounds like a huge number, but remarkably enough there has been not one single press report that provided numbers on the global telecommunication traffic in general for comparison.


The NSA itself issued a statement (pdf) in August 2013 saying that about 30 petabytes a day pass their collection systems, which filter out and store about 7,3 terabyte. Cisco estimates that in 2013 there was some 181 petabyte of consumer web, email, and data traffic a day, which means that roughly 16% passes through NSA systems, which eventually store 0,00004% of it.


XKEYSCORE

At 150 sites where NSA intercepts cables, satellites and other communication channels, the agency has installed the XKEYSCORE (XKS) system, which is able to store a "full take" of the communications that flows past, but only 3 to 5 days of content and 30 days of metadata. At some sites, the amount of data exceeds 20 terabyte a day, which can only be stored for 24 hours:




With this temporary buffer, XKEYSCORE provides NSA analysts with the opportunity to search these data for "soft selectors" like keywords and for other target related characteristics like the use of encryption, virtual private networks, the TOR network or a different language. This enables analysts to use the temporarily buffered data in order to find internet activities that are conducted anonymously and therefore cannot be found by just looking for a target's e-mail address.




Before XKEYSCORE was installed, there were only the more traditional systems that automatically filter out content when there is a match with so-called "strong selectors" like e-mail and IP addresses. This is less than 5% of the internet communications that passes NSA's front-end filters.

Both the traditional filters and the XKEYSCORE system are picking out a relatively small number of communications in a targeted and focussed way. Traffic that is not of interest is only stored for a few days and then automatically disappears as it's overridden by new data. So, although these NSA systems "see" a huge amount of data, there's certainly no "Store it All".


Entire countries

XKEYSCORE is only used for searching and analysing internet communications, but it seems that a similar system for telephone calls is available under the MYSTIC program, which was revealed by The Intercept on May 19, 2014. Under MYSTIC, NSA has access to the entire mobile phone traffic of five or six countries.

But also in this case, the storage of communication data is limited to thirty days, and from the networks of three countries (Mexico, Kenya and the Philippines) this only applies to metadata. Content of phone calls is only stored from two countries: from the Bahama's, but that was only for testing this system. It was probably Afghanistan where the MYSTIC program eventually went live and likely became part of NSA's Real Time-Regional Gateway (RT-RG) effort.

For these countries NSA's collection effort comes close to a mass surveillance, but strangely enough, the SOMALGET program that comprises the content collection, only accounts for less than 2% of NSA's cable tapping programs, which could indicate the program is used in a very focussed way.

 

Bulk collection of metadata

Probably even more misleading and exaggerated are what most Snowden-stories say about the collection of metadata. This is the information needed for the technical and administrative handling of communications, like the calling and the called phone numbers, and the time and duration of a call. This matter is important because NSA collects far more metadata than content, probably up to several trillion records a month.



Chart showing the volumes and limits of NSA metadata collection
(the domestic metadata collection seems to be excluded)


The collection of metadata is even more controversial than storing content. Not only Snowden and Greenwald, but also most civil liberties organizations say that "bulk collection" equals "mass surveillance", because analysing metadata is more intrusive and thus a bigger violation of privacy than looking at the content of phone calls or e-mail messages.

That might be correct in theory and in potential, but in reality the collection of huge amounts of data doesn't automatically mean that equal numbers of individuals are being actively tracked and traced. From the documents that have been disclosed by Snowden and from those that have been declassified by the US Director of National Intelligence (DNI), we learn that NSA uses metadata in two ways:

1. To discover new suspects through a method called "contact chaining". Starting with the phone number of a known foreign bad guy, a specialized tool presents the numbers which he was in contact with, and the numbers they on their turn had been in contact. By cross-referencing, this can point to conspirators that were previously unknown.
In 2012, NSA used 288 phone numbers as a "seed" for starting such a query in its domestic phone record database and this resulted in a total of twelve "tips" to the FBI that called for further investigation. In 2013, the number of seeds had raised to 423. This domestic collection is legally authorized under section 215 of the Patriot Act and is additionally regulated by the FISA Court, so under the existing legal framework this is not illegal spying on Americans.

Update:
On May 7, 2015, a US federal appeals court ruled that NSA's bulk collection of telephone metadata overstretches the meaning and therefore violates the USA Patriot Act.

2. Only for people who are identified as legitimate foreign intelligence targets, the metadata of their phone numbers are pulled from the databases to be used for creating a full "pattern-of-life" analysis. There's no evidence that NSA is randomly querying ("data-mining") the metadata they collected for some kind of profiling without any specific lead.


Most of what we know about the domestic collection of US telephone metadata comes from declassified court orders, because from the Snowden-trove we haven't seen any internal NSA documents about the Section 215 program. At least in this case, NSA seems to be able to "Store it All", but there's no "Analyse it All".

 

Collection inside the US

Probably Snowden's biggest disclosure was the existance of the PRISM program, through which NSA collects communications from major American internet companies like Facebook, Google, Microsoft and Apple. However, the initial claim that NSA had direct access to the servers of these companies proved to be misleading, and also PRISM is not used for spying on ordinary citizens, but only for gathering information for countering threats from foreign governments, terrorist groups and weapon proliferation.



Slide from the PRISM-presentation that shows NSA has no direct
relationship with communication providers - only through FBI


The disclosure that had the biggest impact on the American public was that large telecommunication providers like Verizon are handing over all their telephone records to NSA. Apparently Americans became only fully aware of this after being revealed by Snowden, as the collection of domestic telephony metadata was already revealed in 2006.

It should be noted that in 2006, NSA still received close to 100% of the domestic phone records, but that since 2013 that share plummeted to less than 30%, mainly because two major cell phone providers do not hand over their records.

Should NSA be allowed to request phone metadata from the telecom companies, as proposed in the USA FREEDOM Act, then they would get ability to access virtually all records again.


Upstream collection

Also in 2006 it was disclosed that NSA had installed intercept devices at switching stations of major fiber-optic cables inside the United States. This equipment is used to filter the phone and internet traffic, but because this was done inside the US, it looked like NSA was eavesdropping on Americans, something that is strictly prohibited.




Sensationalist headlines of many press reports following the Snowden-leaks also suggested that NSA was "listening on American phone calls" and "reading American e-mails". This however is only the case for the very few people in the US who are known associates of terrorist groups or foreign governments.
Update:
On July 5, 2014, The Washington Post revealed that Snowden exfiltrated some 160.000 internet messages collected under 702 FAA authority and that almost 90% of them were from persons, both American and foreign, who were not listed as a foreign intelligence target. A large number were correctly minimized and there's no evidence the overcollected messages were actually read or used, but they also weren't deleted.

The domestic cable tapping is part of NSA's Upstream collection program, which is primarily used for access to communications between foreigners or foreign targets and possible conspirators inside the US. Most surprising was probably how close the cooperation with American telecommunication companies is.

The codenames for these domestic programs are FAIRVIEW, BLARNEY and STORMBREW, and under OAKSTAR, American telecoms are providing cable intercept facilities abroad.


In filtering the traffic from these cables, it proved to be impossible for NSA to fully separate communications of approved foreign targets from those of uninvolved Americans. Up to 10.000 of the latter landed in NSA databases each year and the agency was repeatedly critized for this overcollection by the FISA Court.*

This shows that this oversight mechanism isn't the mere "rubber stamp" as Snowden and Greenwald continuously call it. The fact that the FISA Court decides behind closed doors is also not a scandalous exception, as the same applies to grand juries in ordinary crime cases.


Whistleblowing?

Except for some other similar minor violations of internal rules and legal requirements, the documents published so far don't contain evidence of large scale abuse of power, mismanagement or deliberate illegal behaviour. Therefore, it seems that Edward Snowden can not be considered a whistleblower in the traditional and official sense of the word. Snowden himself said that he lacked whistleblower protection because he was just a contractor, but that's not true, as the 1998 Intelligence Community Whistleblower Protection Act (ICWPA) clearly includes contractors. Besides that, the official whistleblowing criteria won't apply to his case:


US Federal Government whistleblower
awareness poster


Of course, not everything that is legally allowed is always right, and many people don't agree with the actual scope of NSA's spying operations. Snowden additionally warns against the (future) misuse that can be made from this kind of systems in general, also in other countries worldwide. That's a legitimate cause, but a personal disagreement with current policies and practices alone doesn't constitute whistleblowing. It's rather a political and/or moral issue.

 

Conclusion

In the past year we really learned a lot about the methods and the collection programs of the NSA. But in the media, the facts that arise from the original documents have often been instrumentalized for the ideological fight between Snowden and Greenwald on one side and the NSA and the US government at the other side. Latter parties are being accused of trying to eliminate all forms of privacy, but in the documents that have been disclosed, there's no hard evidence that proofs that claim.

The documents show that NSA has a large, worldwide network of data collection systems, but these systems are not capable of collecting, let alone storing all the communications that occur all over the world. Instead, NSA tries to collect it's data as targeted and focussed as possible, in order to fulfill it's foreign intelligence tasks, many of which are of a military nature.

The NSA is trying to do this carefully and complient to the laws and the policies, although it is sometimes operating on the edge of what is legally and politically acceptable. Preventing those borders being crossed can only be done by taking a very close look at what NSA is actually doing. The documents leaked by Snowden give us some insight into that, but the myth of an agency that is able to know everything we are doing, saying, thinking and planning is just distracting.


Update:
On July 9, 2014, Greenwald published a story that was announced as a grand finale that would show that NSA does eavesdrop on ordinary American citizens. However, his actual article was about NSA and FBI monitoring five Muslim-American leaders between 2005 and 2008. But in the original documents we once again saw no evidence for the involvement of NSA, just for FBI, which is of course the proper agency for such domestic investigations. Whatever this means for what FBI is doing, it shows no illegal activities of NSA.




Links and Sources
- Newsweek: How Snowden smartened up our spying (2016)
- National Research Council: Bulk Collection of Signals Intelligence: Technical Options (pdf)
- Blog.Erratasec.com: NSA: walk a mile in their shoes
- VillaMedia.nl: Greenwald-hype miskent de aard van spionage
- JoelBrenner.com: N.S.A.: “Not (So) Secret Anymore”
- Director of National Intelligence: Statistical Transparancy Report
- Heise.de: Was war. Was wird.
- DailyKos.com: The 18 Biggest Myths of the Snowden Saga
- TheRegister.co.uk: NSA: Inside the FIVE-EYED VAMPIRE SQUID of the INTERNET
- LennartHuizing.nl: Snowden overdrijft?!? Zeg dat nog eens?
- DeCorrespondent.nl: De les na één jaar Snowden: de misstanden van de NSA zijn stelselmatig overdreven
- TheWeek.com: 13 more unanswered questions for Edward Snowden
- Newsweek.com: 16 Questions Edward Snowden Wasn't Asked
- ProspectMagazine.com: The errors of Edward Snowden and Glenn Greenwald
- ArsTechnica.com: NSA loves The Bahamas so much it records all its cellphone calls
- TheWeek.com: What Edward Snowden didn't disclose
- TheWeek.com: 10 things we've learned about the NSA over the past year
- Nigel Inkster: Snowden – myths and misapprehensions
- Paul Canning: The left must challenge Greenwald
- DavidSimon.com: We are shocked, shocked...
- All the leaked documents: IC off the Record

June 5, 2014

Some numbers about NSA's data collection

(Updated: July 16, 2017)

Today it's exactly one year ago the Snowden-leaks started. Among the many highly classified documents which were disclosed during the past year are various charts that provide us with actual numbers about the amount of data the National Security Agency (NSA) is collecting.

Here we will take a look at those numbers and see what we can learn from them by comparing various sources and from breaking them down into NSA-divisions, countries and collection programs. As still only fragmented parts have been published, this overview cannot provide completeness or full accuracy (estimates are shown as round numbers).
Numbers related to:
- BOUNDLESSINFORMANT
- NSA volumes and limits
- GCHQ metadata collection
- NSA collection by country
- NSA collection by division
- SSO Collection programs
- Shared by 2nd party partner agencies
- Shared by 3rd party partner agencies

 
BOUNDLESSINFORMANT

The most detailed numbers about NSA's data collection are from the BOUNDLESSINFORMANT tool, which is used by NSA officials to view the metadata volumes collected from specific countries or by specific programs.

A worldwide overview is provided by a heat map which was published by The Guardian on June 11, 2013. It displays the figures over a 30-day period ending in March 2013:


NSA worldwide total:

Internet records (DNI):
Telephony records (DNR):
 
221.919.881.317

97.111.188.358
124.808.692.959


This total of 221 billion telephony and internet records a month equals 2,6 trillion a year and 7,3 billion a day. However, the actual number of what NSA collects worldwide might be higher - see the update below.


The BOUNDLESSINFORMANT worldwide overview for March 2013
(click to enlarge)


 
NSA volumes and limits

The BOUNDLESSINFORMANT tool seems to be very accurate, but there's another chart that gives different numbers. It's from a 2012 presentation for the SIGINT Development conference of the Five Eyes community and shows the volumes and limits of NSA metadata collection. The chart was published by The Washington Post on December 4, 2013 and again in Greenwald's book 'No Place To Hide' on May 13, 2014.



Chart showing the volumes and limits of NSA metadata collection
between January and June 2012
Redactions by Greenwald or the press, explanations added by the author
(click to enlarge)


This chart shows the numbers of:
- telephony metadata which are received by FASCIA, which is NSA's main ingest processor for telephony metadata;
- internet metadata that are transferred to MARINA, which is a huge NSA database that can store internet metadata for up to a year;
- internet metadata that had to be deleted because there was apparently not enough storage space.

Except for the deleted metadata, the charts shows ca. 10,4 billion internet metadata (DNI) a day, which makes 312 billion a month or 3,7 trillion a year. There are ca. 4,5 billion telephony metadata (DNR) a day, which makes 135 billion a month or 1,6 trillion a year. If we compare these numbers with those from BOUNDLESSINFORMANT, we see a big difference:





Internet metadata (DNI):
Telephony metadata (DNR):
 
Volumes and Limits
(a month, 1st half 2012)

312.000.000.000
135.000.000.000
 
BOUNDLESSINFORMANT
(a month, 1st half 2013)

97.111.188.358
124.808.692.959


There's a difference of 11 billion telephony metadata between both charts, but an even bigger gap exists between the internet metadata: the Volumes and Limits chart shows 215 billion more than BOUNDLESSINFORMANT. This discrepancy wasn't noticed in the press reportings, nor in Greenwald's book, so at the moment there's no clear explanation for this.

Update:
A possible explanation for the discrepancies between these numbers can be found in a FAQ document for the BOUNDLESSINFORMANT tool, which says the numbers shown in the "map view" are lower than in the so-called "org view" of the tool because for the latter, also records are counted that doesn't contain the country identifiers which are needed to be counted in the "map view".
This would also explain the far bigger difference between the numbers of internet metadata, because for internet communications it is often much more difficult to attribute them to a particular country than for telephone conversations (which always contain country and region codes). This means the Volumes and Limits slide provides the more realistic numbers.


Telephony metadata

After being processed by FASCIA, the telephony metadata go to MAINWAY, which is another huge NSA database that keeps these kind of data for at least five years. In 2006 it was estimated that MAINWAY contained 1,9 trillion (1.900.000.000.000) call detail records.

For comparison: in 2007, AT&T's Daytona system, which is used to manage its call detail records (CDR's) supported 2,8 trillion records. In 2012, T-Mobile USA Inc. upgraded to an IBM Netezza 1000 platform with a capacity of 2 petabytes. This is used for loading 17 billion records a day, making 510 billion a month and more than 6 trillion a year.

If we assume the telecom providers and NSA use "records" in the same sense, than this shows that the telecommunication companies produce far more phone call metadata than NSA collects. As T-Mobile USA alone apparently creates 4 times more records as presented in NSA's BOUNDLESSINFORMANT tool, the domestic telephone metadata collection under section 215 Patriot Act cannot be included in the numbers we've seen so far.

Update #1:
Also interesting is that according to slides about the Hemisphere project, some 4 billion telephone metadata records are collected every day from any carrier that uses AT&T switches in response to grand jury subpoenas in counter-narcotics investigations.

Update #2:
During a parliamentary hearing in Germany, an official of BND explained that one cell phone creates between 100 and 200 metadata and business records a day. For 4.5 billion cell phone users worldwide that would equal at least 450 billion metadata each day.

Update #3:
A 2017 tourism report from the Netherlands provided numbers showing that in January 2013, Dutch mobile phone users generated 255 million metadata a day or 7,65 billion a month. The report also confirms that for Dutch users, mobile phones create about 100 "transactions" a day.


 
GCHQ metadata collection

Even more metadata seem to be collected by NSA's British partner agency GCHQ, which according to this slide from 2011 collects 50 billion metadata per day. This makes 1,5 trillion a month and an astonishing 18 trillion (18.000.000.000.000) a year!




This (partial) slide was published in Greenwald's book No Place To Hide, but without any further explanation, so we don't know whether GCHQ is able to actually store everything or has to delete large amounts, like NSA. From the slide itself it seems that the number of 50 billion refers to internet metadata alone, which would make this number even more remarkable.

According to a report by The Guardian, GCHQ also collects 600 million telephony metadata a day, which makes 18 billion a month - a small number compared to the internet metadata this agency receives:




Internet metadata per month:
Telephony metadata per month:
 
BOUNDLESS
INFORMANT


97 bln.
124 bln.
 
Volumes
and Limits


312 bln.
135 bln.
 

GCHQ

1500 bln.
18 bln.


For indexing and searching the content of internet communications, GCHQ uses the TEMPORA system, which is capable of processing the traffic from 46 fiber-optic cables of 10 gigabits per second. This makes that 21 petabytes of data flow past these systems every day.


 
NSA collection by country

The main BOUNDLESSINFORMANT interface with the heat map also lists the names of the countries which provide the highest numbers of data. These can be sorted in three different ways: Aggregate, DNI (internet) and DNR (telephony), each resulting in a slightly different top-5. The following aggregated totals (so both DNI and DNR) are known:


NSA worldwide total:

Pakistan:
Afghanistan:
Iran:
Jordan:
India:
Saudi Arabia:
Iraq:
Egypt:
...
United States:
...
Brazil:
 
221.919.881.317 (100%)

27.275.944.618  (12%)
24.293.973.693  (11%)
15.834.475.801   (7%)
14.374.155.469   (6%)
12.616.915.557   (5%)
11.367.867.117   (5%)
10.487.011.026   (4%)
9.064.623.040   (4%)
...          
3.095.553.478          
...          
2.300.000.000          


These numbers indicate from which countries NSA gathers most data, but the exact meaning of the numbers has still not been clarified. We do know that BOUNDLESSINFORMANT counts metadata records, but what these records exactly are (for example: how many records are created by one phone call?), and how they are attributed to a specific country is not clear.

Communications by definition have two ends: the originating and the receiving end. When both ends are in the same country, it's easy to attribute it to that particular country. But when the originating and the receiving ends are in a different country, how is such a communication registered? Maybe for both countries, although that would make many of them appear in these numbers twice.


United States

Edward Snowden saw the heat map with the 3 billion attributed to the United States as a proof that NSA was conducting domestic surveillance, although the heat map itself cannot provide sufficient evidence for that. The 3 billion could very well relate to foreign communications which are just transiting the US or to the American end of for example phone calls where the other end is a foreign suspect. Somewhat more information could have been provided by the bar charts for the US, but these haven't been published.

The number of 3.095.553.478 for the United States is the aggregated total. The number of internet records (DNI) for the US is 2.892.343.446, which leaves just 203.210.032 telephony records (DNR) or 0,065% of the aggregated total. In a table this looks like this:

United States total:

Internet records (DNI):
Telephony records (DNR):
 
3.095.553.478 per month

2.892.343.446 per month
203.190.032 per month

This tiny share for telephone metadata is rather strange given the fact that NSA is collecting all American phone records, but does not so with internet metadata. This seems to indicate that these domestic phone records are not counted by BOUNDLESSINFORMANT and that the internet records are from communications with at least one end foreign.


 
NSA collection by division

With a BOUNDLESSINFORMANT chart about the NSA's Special Source Operations (SSO) division published in Greenwald's book, we can also compare the number of data collected by this division with the total number of NSA data collection. We see that SSO, which is responsible for tapping the world's main fiber optic cables, accounts for 72% of all data:


NSA worldwide total:

Special Source Operations (SSO):
Other NSA divisions:
 
221.919.881.317 (100%)

160.168.000.000  (72%)
61.751.000.000  (28%)


This leaves the remaining 28% of the data to be collected by NSA's other main divisions: Global Access Operations (GAO), which operates mobile collection platforms like satellites, planes, drones and ships, and Tailored Access Operations (TAO), which collects data by hacking into foreign computer networks. The remaining 28% could also encompass data collected by the joint NSA/CIA Special Collection Service (SCS) units and by 3rd Party partner agencies.



BOUNDLESSINFORMANT chart about the SSO division
(click to enlarge)

 

SSO Collection programs

From the BOUNDLESSINFORMANT chart about Special Source Operations we can see how the total number of data collected by this division breaks down into the 5 biggest collection programs. From other charts we also know the numbers collected by some other programs, and these are added here too:


SSO worldwide total:

DANCINGSOASIS (US-3171):
SPINNERET (US-3180, part of RAMPART-A):
MOONLIGHTPATH (US-3145, part of RAMPART-A):
INCENSER (DS-300, part of WINDSTOP):
AZUREPHOENIX (US-3127, part of RAMPART-A):
...
FAIRVIEW (US-990):
...
SOMALGET (US-3310, part of MYSTIC):
...
ACIDWASH (part of MYSTIC):
...
MUSCULAR (DS-200B, part of WINDSTOP):

Other programs in total:
 
160.168.000.000 (100%)

57.788.148.908  (36%)
23.003.996.216  (14%)
15.237.950.124   (9%)
14.100.359.119   (9%)
13.255.960.192   (8%)
...         
6.142.932.557         
...         
3.000.000.000         
...         
1.050.000.000         
...         
181.280.466         

26.412.000.000         


This listing shows that roughly one third of the data from telecommunication cables are collected by just on single program: DANCINGOASIS. Another third part is intercepted by the programs ranking second, third and fourth, but despite their weight, we still don't know more about them than just their names. Finally, the last third part of this type of collection is divided into numerous smaller and very small programs, a number of which have been disclosed through the Snowden-documents.

Update:
On June 18, 2014 the Danish newspaper Information and Greenwald's website The Intercept broke a story saying that SPINNERET, MOONLIGHTPATH and AZUREPHOENIX are all part of the RAMPART-A program, which encompasses access to fiber-optic cables abroad, in cooperation with 3rd Party partner agencies from at least five different countries.

According to a FAQ document, the BOUNDLESSINFORMANT tool doesn't count data which are collected under FISA authority, so numbers about the famous PRISM program are excluded. However, another source (pdf) says that under PRISM, more than 227 million "internet communications" are collected annually, which is ca. 19 million a month, but it is not known whether these "internet communications" are the same kind of records as presented by BOUNDLESSINFORMANT.

 
Processing and storing

Metadata from a number of big and important SSO collection programs are processed by a system codenamed SHELLTRUMPET. As can be read in the document below, this system processed almost 500 billion metadata records in 2012, which gives an average of 41,6 billion a month, but by the end of 2012 SHELLTRUMPET was already processing 2 billion call detail records a day, which would make 60 billion a month:




MUSCULAR contributes 60 gigabyte of data to the PINWALE database for internet content every day, which is 1,8 terabyte a month. As BOUNDLESSINFORMANT counts 181 million records for MUSCULAR, this would mean that 1 million internet metadata records represent almost 10 gigabyte of (content) data.

This correlation can be used to make a very rough estimate of the total amount of internet data collected by NSA. The worldwide total of 97 billion internet records a month would then equal some 961 terabyte of data each month or 11,5 petabyte a year (some numbers to compare are here; the new NSA data center in Bluffdale, Utah can store an estimated 12 exabytes, which is 12.000 petabytes).


 
Shared by 2nd party partner agencies

The very close working relationship between NSA and the 2party partner agencies from the Five Eyes community leads to a regular exchange of data, of which the most productive facilities can be seen in a BOUNDLESSINFORMANT chart that was published by Der Spiegel:

DS-300 (INCENSER):
...
DS-800:
DS-204A:
UKC-302A:
UKC-215:
...
DS-200B (MUSCULAR):
 
14.100.359.119
...
4.412.803.504
1.691.419.171
1.245.109.650
937.317.036
...
181.280.466


The SIGAD codes starting with DS denote some kind of joint collection program, those starting with UKC stand for civilian operated facilities of the British signals intelligence agency GCHQ.


 
Shared by 3rd party partner agencies

NSA also gets data provided by 3rd Party partner agencies. These are counted by the BOUNDLESSINFORMANT tool too, as we know from charts about a number of European countries:

Germany (US-987LA):
? (US-985HA)
Germany (US-987LB):
Poland (US-916A):
France (US-985D):
Spain (US-987S):
Italy (US-987A3005):
Norway (US-987F):
Denmark (?):
The Netherlands (US-985Y):
 
471.258.864
181.115.922
81.786.967
71.819.443
70.271.990
60.506.610
45.893.570
33.186.042
23.000.000
1.831.506


The total number of data received from these nine countries is slightly more than 1 billion a month, which is just a tiny 0,0045% of NSA's overall collection as counted by the BOUNDLESSINFORMANT tool.

Initially, Glenn Greenwald reported in various European newspapers that these numbers represented the phone calls of European citizens intercepted by NSA. But gradually it came out that his interpretation was wrong.

The charts actually show numbers of metadata that were collected from foreign communications by European military intelligence agencies in support of military operations abroad. These data were subsequently shared with partner agencies, most likely through the SIGDASYS system of the SIGINT Seniors Europe (SSEUR) group, which is led by NSA.






Links and Sources
- Syncsort.com: How Hadoop is Transforming Telecom
- Secret-bases.co.uk: Secret Data Centres, including GCHQ's Tempora and NSA's PRISM projects
- Cryptome.org: Numbers of reports generated by various NSA programs (pdf)
- Forbes.com: Blueprints Of NSA's Ridiculously Expensive Data Center In Utah Suggest It Holds Less Info Than Thought