(Updated: June 7, 2015)
On May 13, Glenn Greenwald published his book 'No Place To Hide' about the Snowden-disclosures. It doesn't contain substantial new revelations, but from one of the original documents in it we can determine that NSA's largest cable tapping program is codenamed DANCINGOASIS, something which was not reported on earlier.
Here we will combine information from a number of other documents and sources to create a somewhat more complete picture of the DANCINGOASIS program.
Special Source Operations
In Greenwald's book and on his website, the following chart from NSA's BOUNDLESSINFORMANT tool was published. Although these charts are not always easy to interpret, we can rather safely assume that this one gives the overview for NSA's Special Source Operations (SSO) division, which is responsible for collecting data from major telephony and internet cables and switches.
During the one month period between December 10, 2012 and January 8, 2013, a total of more than 160 billion metadata records were counted, divided into 93 billion DNI (internet) data and 67 billion DNR (telephony) data:
In the "Most Volume" section we see that the program which collects most data is identified by the SIGINT Activity Designator (SIGAD) US-3171, a facility that is also known under the codename DANCINGOASIS, which is sometimes abbreviated as DGO.
During the one month period covered by the chart, this program collected 57.7 billion data records, which is more than twice as much as the program that is second: US-3180, which is codenamed SPINNERET. Third is US-3145 or MOONLIGHTPATH and fourth DS-300 or INCENSER. This chart will be analysed in general in a separate article.
Numbers
Previously it seemed that it was INCENSER that collected the biggest number of data. A BOUNDLESSINFORMANT chart published in November 2013 said that this program gathered some 14 billion metadata a month. Now we know that DANCINGOASIS is collecting almost 4 times as much: more than 57 billion records each month, or 684 billion every year.
Comparing some numbers learns us that DANCINGOASIS (57 bln.) accounts for more than a third of everything the SSO division collects (160 bln.). It is also far more than what is collected under FAIRVIEW (6 bln.), which is one of the big domestic cable tapping programs that NSA operates in cooperation with US telecom providers.
Comparing DANCINGOASIS with the total number of data that is collected worldwide during one month early 2013 (221 bln.), as presented in the BOUNDLESSINFORMANT heat map, we see that DANCINGOASIS alone seems to account for almost a quarter of the entire NSA data collection.
Given this large share, it could be that DANCINGOASIS is an umbrella program which encompasses various smaller sub-programs. However, DANCINGOASIS is different from MYSTIC, which is an umbrella program containing facilities that monitor at least five entire countries, as was revealed recently by The Intercept. The part of MYSTIC that stores all phone calls of two countries, codenamed SOMALGET, processes only about 3 billion telephony metadata every month.
> See also: Some numbers about NSA's data collection
Whereabouts
Strangely enough we haven't (yet) read about DANCINGOASIS in media reports, nor in the book of Glenn Greenwald, and also we haven't seen any slides or documents that specifically deal with this program.
Update:
On July 9, 2014, Glenn Greenwald indicated on Reddit, that it was part of the agreement with Snowden not to publish anything about Afghanistan and other military operations, so this might be the reason why Greenwald didn't publish anything about DANCINGOASIS.
But in the book 'Der NSA Komplex' written by two journalists from the German magazine Der Spiegel, there's more information. It says that the DANCINGOASIS program started in May 2011 and monitors a fiber optic cable between Western Europe and the Far East.*
It is not clarified what kind of targets DANCINGOASIS collection is used for, but given the enormous amounts of data (57 billion), it has to be from top priority countries from the Middle East. According to the BOUNDLESSINFORMANT heat map, NSA collected more than 27 billion data a month from Pakistan, 24 billion from Afghanistan, 15 billion from Iran and 13 billion from Jordan - all countries that are along the fiber optic cables between Europe and the Far East.
Blocking address books
Such a huge collection of communications inevitably comes with data that are useless, like for example address books from e-mail accounts that are not related to target persons. Because the number of these address books grew steadily, NSA started to block these from being ingested by installing the SCISSORS selection system.
This is shown in slides published by The Washington Post on October 15, 2013. We see that SCISSORS was enabled for DANCINGOASIS (US-3171) on March 13, 2012:
The slide on the right shows two codes associated with content collected under DANCINGOASIS: DGOT and DGOD. Similar codes for metadata are written reverse: TOGD and DOGD respectively.
Processing
The systems which are used to process the data from DANCINGOASIS are listed in the "Top 5 Tech" section of the SSO chart. Of the four most important systems, three are used for processing internet data: XKEYSCORE (42 bln.), TURMOIL (23 bln.) and FALLOUT (12 bln.), with LOPERS (41 bln.) being a system for processing data derived from telephone networks.
This means that there are two options regarding what kind of data are collected under the DANCINGOASIS program:
- Either 100% derived from the internet and then being processed by a combination of the XKEYSCORE, TURMOIL and FALLOUT systems;
- Or a mix of internet and telephony data, which are processed partly by the internet processing systems and partly by LOPERS.
Clarity about this can only be provided by the yet unpublished BOUNDLESSINFORMANT chart about the DANCINGOASIS specifically, but the fact that data from this collection facility end up in two separate databases (see below) could indicate that one receives internet data and another telephone communications.
Data filtering
The cable intercepted by DANCINGOASIS transfers 25 petabyte of communications data each day. Between 3 and 6 petabyte of them are being scanned by NSA computers. These systems search the data for keywords that are determined by NSA's targeting offices and are derived from the topics in the Strategic Mission List (pdf) and the National Intelligence Priorities Framework, as approved by the White House.
Based upon an unpublished NSA presentation from March 22, 2013 titled "Cyber Threats and Special Sources Operations", the Spiegel book says that between 10 and 40 percent of the data (both content and metadata) collected under the DANCINGOASIS program are filtered out and stored in two databases: 43 gigabyte in one and 132 gigabyte in another database, every day.*
This means that 175 gigabyte of data is stored daily, which is 0,000007% of the 25 petabyte that is transmitted by the cable. The 175 gigabyte makes 5,2 terabyte a month and 63 terabyte a year. Whether the 57,7 billion records collected under DANCINGOASIS also equal 5,2 terabyte of digital storage space seems a bit questionable however.
The book doesn't provide the names of the databases, so probably it aren't the known ones like PINWALE, MAINWAY and MARINA. Therefore, the data from DANCINGOASIS might be stored in the NSA's new cloud systems, the names of which NSA likes to keep secret for some reason or another.
Because of similar capacity limits across a range of collection programs, the NSA is leaping forward with cloud-based collection systems and a huge new "mission data repository" in Utah.
Metadata processing
According to the excerpt of an NSA document published in the book of Glenn Greenwald, metadata records from DANCINGOASIS are processed by a system codenamed SHELLTRUMPET. This system "began as a near-real time metadata analyser in December 2007 for a CLASSIC collection system":
On December 21, 2012 SHELLTRUMPET had processed its 1 trillionth metadata record. Almost half of this volume was processed during 2012, and half of that volume, so one quarter of a trillion (250 billion) metadata records, came from DANCINGOASIS.*
Reporting
A system that collects a huge amount of data does not automatically contribute to equal numbers of intelligence reports. We can see this in a slide about results from NSA's Upstream collection during the fiscal year 2010/2011.
In the chart, US-3171, the SIGAD of DANCINGOASIS, ranks 6th with some 5452 so called "Serialized Product Reports". Data collected under section 702 FAA authority (PRISM and the domestic Upstream cable tapping) led to almost 4 times more reports:
With a blue bar, DANCINGOASIS is listed as a "SSO Non-Corporate Program", which means the collection is done without cooperation of a commercial telecommunications company.
Update:
On June 4, 2015, the New York Times published a slide about SSO cyber operations, which indicates that DANCINGOASIS became operational in June 2011. The remark "Need I see more?" seems to confirm the importance of this very large cable access program: