On October 6, 2013, the Brazilian television program
Fantástico revealed the existance of a software program called OLYMPIA. In this case, the program was used by the
Communications Security Establishment Canada (CSEC) to map the telephone and computer connections of the Brazilian
Ministry of Mines and Energy (MME).
OLYMPIA is a sophisticated software framework that combines access to a range of databases and analytic tools. It's used to discover and identify the telephone and computer infrastructure used by potential targets. This information can then be used for setting up tapping, bugging and/or hacking operations. OLYMPIA itself does not collect any actual content of communications.
In this article we take a close look at the OLYMPIA tool, based on the powerpoint presentation that was first shown on Brazilian television on October 6, 2013. On November 30, the Canadian newspaper The Globe and Mail published most of the slides on its website. Here, all available slides are pulled together, including one that had to be reconstructed from the video footage (click the slides to enlarge them).
The OLYMPIA presentation was dissected and analysed in depth by a reader of this weblog, who wants to stay anonymous, but kindly allowed me to publish his interpretation here. I did some editing to make his text fit the format of this weblog.
For some readers these explanations may be too complex and detailed, but for those who are interested, they provide a unique look at this part of the signals intelligence tradecraft. We can assume that similar tools are used by NSA, GCHQ and other agencies.
The OLYMPIA presentation was held in June 2012 during the "SD Conference", where SD stands for SIGINT Development - an intelligence term for testing and creating new ways to collect signals intelligence information. According to Fantástico this is an annual conference for members of the Five Eyes partnership, which consists of the United States, United Kingdom, Canada, Australia and New Zealand.
This case study was presented by what seems to be someone from the Advanced Network Tradecraft unit of CSEC, probably because "one of the things Canada does very well is analysis" - according to NSA historian Matthew Aid. (or could Advanced Network Tradecraft be the
ANT unit of NSA's Tailored Access Operations (TAO) division?)
This slide gives an overview of the Olympia interface which can present all sorts of different types of information at the same time and probably can be customized by the user. Right in the middle, probably just to have something graphical amidst all the tables, there is a map, showing the central part of Brasil, with a purple dot marking the capital Brasilia. It's not a Google map, because that would have replaced the jagged coastline with bathyometeric shaded relief and would look much nicer than this geolocation satellite view.
This slide shows the same image of the Olympia interface as in the previous slide, but this time with a pop-up menu open. The list shows eight previously known NSA tools and databases, a GCHQ tool, commercial software, and software tools developed by CSEC staff which are recognizable by their classical Greek names. Arranged in alphabetical order, the tools and databases listed in the pop-up menu and in another list from the interface are:
ATHENA - Ports Information (CSEC)
ATLAS - Geolocation and Network Information (CSEC)
BLACKPEARL - Survey Information (NSA)
COEUS - WHOIS Information (CSEC)
DANAUS - Reverse DNS (CSEC)
EONBLUE - Decoding Hostnames? (commercial)
EVILOLIVE - Geolocation (NSA)
FRIARTUCK - VPN Events
GCHQ Geofusion - Geolocation (GCHQ)
HYPERION - IP-IP Communication Summaries (CSEC)
LEVITATE - FFU (FireFox User??) Events
MARINA - TDI Online Events (NSA)
MASTERSHAKE - VSAT Terminals (NSA)
OCTSKYWARD - GSM Cells (NSA)
PACKAGEDGOODS/ARK - Traceroutes (NSA)
PEITHO - TDI Online Events (CSEC)
PEPPERBOX - Targeting Requests
PROMETHEUS - CNO Event Summaries (CSEC)
QUOVA - Anonymizers, Geolocation Map (commercial)
SEDB - FASCIA PCS and PSTN Events (NSA)
SLINGSHOT - End Product Reports
STALKER - Web Forum Events
STARSEARCH - Target Knowledge
STRATOS - GPRS Events (CSEC)
TIDALSURGE - Router Configs
TOYGRIPPE - VPN Detailed Events (NSA)
TRITON - TOR Nodes (CSEC)
TWINSERPENT - Phone Book
Only a handful of Olympia's tools do all the heavy lifting in the slide algorithms. The rest get passing mention in pull-down menus. Thus the presentation provides only a glimpse of Olympia's capabilities - we see for example TRITON for attacking the TOR network and TOYGRIPPE and FRIARTUCK for attacking VPN (virtual private networks) but not examples of their actual use.
The tools of Olympia represents a very large team effort at CSEC over several years with sheltering nearly all its database processing resources under the Olympia umbrella. The shelf life of the Olympia environment may be longer than its tools.
Of the 13 tools that are used, their use in the following algorithmic slides is almost entirely restricted to ATLAS, DANAUS, HYPERION, PEITHO and HANDSET. The reporting tool of course finishes every slide but two others, EONBLUE and QUOVO, only appear obliquely as report sources.
This slide presents a simple algorithm by using drag 'n' drop icons linked by arrows to specify its operations step-by-step. It draws on entries data records already stored in NSA's huge telephony database MAINWAY of call metadata. As NSA does the hoovering down in Brazil, the slide does not build use fresh Canadian surveillance by intercepts or insertion of malware on Brazilian cell phones or servers - that comes later in partnership with NSA's
Tailored Access Operations (TAO) as warranted and informed by initial results obtained here.
Olympia is thus modular software that allows a mid-level analyst (who cannot write computer code) to specify and test advanced NoQYL database queries from within an intuitive visual environment. It provides an intuitive graphical interface allowing to assembly some 40 component tools into a flexible fit-for-purpose logic pipeline by simple drag and drop of icons. Such pipe-and-flow visual programming environments have a rich history – they match how Unix developers can quickly put together complex processes from the simple ones provided by the operating system.
Should an analyst drag one of the widgets into the design, a form window will pop up asking for parameters to be supplied. After stepping through the algorithm to fill in various pop-up forms that address database housekeeping issues, Olympia can then button up (compile) the tested product into a new icon that the next analyst can use as a trusted component for an even more complex investigative process. This allows analysts to conduct sophisticated target-development with minimal additional training.
With a database like MARINA consisting of trillions of rows (records) and 13 columns (fields), it is very easy to pose a query (play a design) that, after hours of delay, returns way too much data, or submit a query so complex or boolean-illogical that it freeezes NSA's server. To prevent this, it would make sense to have expert analysts work out main designs once and for all. Low-level analysts then just enter specific parameter ranges in the forms, but this of course would undercut the whole modular design power of Olympia.
So the whole process can be buttoned up, enabling one-button automation from a few business cards to the best phones to turn into meeting listening devices. While Olympia is a MySQL query builder, it does much more than that, notably advanced post-processing analytics of query results (which amount to a derived special-purpose database or QFD in NSA-speak: Question-Focused Dataset) resulting in convenient output to CSEC's reporting tool Tradecraft Navigator.
In the slide we see the following process:
-1- The process begins with a 'TC Init' widget that initializes processes Olympia needs to run. That may include starting up software, locating Five Eyes network resources, and verifying security authorizations for the analyst's 'thin client' interface to Olympia and NSA's remote network databases. That is, for security purposes following the Jeffrey Delisle spy case, Canadian analysts are given desktop computers without hard drives that cannot copy files to inserted thumb drives nor write to blank CDs. TC is used later in lower case to personalize data field header names so could alternatively represent the initials of the analyst (for logging purposes).
-2- The analyst next fills in a pop-up form called 'Dynamic Configuration' to provide initial data and establish project-specific terminology. The form amounts to a small database with one record (row) for each configuration needed and 7-8 fields (columns) with the specifics: configuration name and number, initial data, default value to use if actual value is missing after enrichment, true/false option to govern whether a later filter condition is met, field names to begin with tc_ (for thin client), and field type.
Configuration here seeds the coming discovery process with the MSISDN (SIM card routing number) for nine cell phones linked to staff at Brazil's Minerals Mining Energy (MME), either from business cards acquire by Canadian diplomats and mining executives or as metadata incidentally ingested by NSA from rooftop mobile phone intercepts at the American embassy in Brasilia. Recalling that MAINWAY has many billions of records just for Brazil, a narrow date range will keep the number of records, and so the subsequent latency (processing delay), manageable
-3- The initial set of phone numbers is then greatly expanded (enriched) by contact-chaining in the huge NSA metabase MAINWAY. This process collects the MSISDN of recipients of calls from the seed numbers, and recipients of their calls (two or more hops). Some of these will be just pizza joints or calls home but others will belong to coworkers at MME.
-4- A filter, defined by another pop-up form, is then applied so MSISDN not meeting certain criteria can be discarded. Those conditions aren't spelled out here but could require 1EF (one end foreign), repeated calls to a number, the call being made during working hours, and so forth to enrich for interesting MME phone numbers.
-5- Next, the GNDB tool is applied to records passing the filter looks up the Network Registration associated to the MSISDN selector. While the ITU.E164's Global Numbering Data Base is open source, the GNDB tool here probably consults a version maintained and likely augmented by NSA under the cover name TAPERLAY. The analyst then fills out a pop-up form specifying the parameter mapping between NSA and session databases.
TAPERLAY is one of the most common skills listed in LinkedIn profiles, with one SIGINT analyst writing he "was responsible for entering numbering information for 132 countries and multiple service providers in each country by reviewing forms and reports and conferring with management." It is often used in conjunction with CHALKFUN, a NSA tool that searches the vast FASCIA database of device location information to find past or current location (notably US roaming) of mobile phones.
-6- The original phoneNumber field has now been supplemented by Last Seen (last recorded use), City and Country of initial registration, Identity (target's name), FIPS, destination number called and its fields, and others we cannot see on the alphabetical pulldown list. Here FIPS is an open source geolocation code maintained by the US government.
-7- The 'Sort' widget is then configured to re-order the records in some sensible way, say reverse chronological order and most frequent MSISDN.
-8- The records are now ready for export into Tradecraft Navigator, the Canadian reporting counterpart to the Analyst Notebook at NSA, except the algorithm won't execute as written because the slide omits the final connecting link. Tradecraft Navigator or downstream commercial software such as Palantir could then explore social network linkages and exploitable associations.
-9- Prior to writing up a final report, the analyst could return to step 7 and insert further operational icons - 29 options are shown (even with A-E and Q-Z missing from the pop-up menu).
This slide says that the presentation is a case study about how to map the target's communication infrastructure when there's only very little information to start with, in this case:
- One known e-mail domain: @mme.gov.br
- Nine known phone numbers
- Very little data collected earlier
Starting with the single e-mail domain @mmm.gov.br for Brazil's Ministry of Mines and Energy (MME), the algorithm works out IP numbers of MME's mail and internet servers plus their network owners and backbone carriers. Note the potential target here is the entire department, not an individual.
-0- After initialization, the input - here just a single domain @mmm.gov.br but optionally a list of thousands - is put in a storage area (buffered) until its entries can be processed.
-1- The CSEC-developed tool DANAUS looks up the domain in its DNS (Domain Name System) repository. For one domain, this can easily be done by google search on the open internet but that is inefficient on a larger scale. Olympia will not only automates this process but can re-package it as a meta-tool icon that can be re-used as a component (sub-routine) of more ambitious algorithms.
-2- The DNS are next sorted by IP record type which splits them into two streams (Type A and Type MX records in DNS nomenclature). Here MX (Mail Exchanger) records specify the mail servers accepting e-mail messages on behalf of the recipient's domain. Type A (address) records specify IP numbers of the mail servers sending email from this domain.
-3a- The MX fork of the diagram filters records according to analyst specifications (pop-up window not shown), changes out value names, and merges text strings with certain information (extracted by the small 'i' icon, never explained) derived from records rejected by the filter. The output to Tradecraft Navigator is a simple database called 'Mail Servers' having six fields discussed below: Response_MX, Hostname, IPv4, Source, FirstSeen and LastSeen.
-3b- The A fork is filtered differently but here rejects are discarded. A new Canadian tool icon labelled ATLAS acts on the records that have been stored in fastBuffer to look up geospatial locations of the IPs. After a sort, duplicated IP locations can be eliminated by a standard database reporting feature (break on change in geolocation field). Duplicates might arise from a single server location hosting multiple IPs or a server cluster.
-4b- Records passing another filter (e.g. geolocation Brasilia) are then sorted by IP number for orderly output to Tradecraft Navigator for report-generation. Here the resulting database 'Domain's IPs' has 9 columns (fields) for IP Range, Country, ASN, Owner, and Carrier in addition to the ones above. The Autonomous System Number (ASN) provides the officially registered IP routing prefix that uniquely identifies each network on the Internet. Here the IPv4 numbers correspond to Global Village Telecom, Embratel and Pelpro. The analyst wants to know this because some carriers sell access to NSA while others have been hacked.
From the mail server records, it turned out the Ministry only used correio.mme.gov.br and correio2.mme.gov.br for their mail servers (
correio means mail in Portuguese). Journalists have inexplicably blacked out IPv4 numbers but anyone can look up the IP address for a given domain name at WHOIS websites, or apply the COEUS widget if they work at CSEC.
The analyst has now actually determined the IP addresses, their blocks of consecutive numbers (ranges), geolocation of servers used by MME's internet services providers plus the identities of backbone carrier networks. Some 27 IPs shown associated with the domain @mme.gov.br came out of processing A type records.
Some of this is unremarkable (the hostname www.mme.gov.br is MME's public home page, ns1.mme.gov.br is just a name server) while others have undeterminable relevance (being barely legible) to commercial espionage. One of these, acessovpn.mme.gov.br (189.9.36.98) running on http port 80 with A, comes up later as a potential target for a man-on-the-side attack.
- The Source field is a bit mysterious. It takes on only two values, EONBLUE and QUOVA. These are tool icons within Olympia whose names lie outside the Greek mythology theme, suggesting software from elsewhere. The explanation: a US company named Quova provides online blocking based on geolocation of a computer's IP address, like for example blacking out URL access to a football game in the home team's city so people purchase stadium tickets. Quova was acquired in 2010 by Neustar which provides a much broader range of backbone internet registry services. EonBlue is also corporate but more obscure.
- Between them, EONBLUE and QUOVA can report on recorded activities and attributes of the IPs at Brazil's MME: the MX record of correio.mme.gov.br shows it was first seen active from 17 Jun 09 and last seen active on 15 Feb 10; similar dates for correio2.mme.gov.br active are later and don't overlap, namely 21 Jun 10 to 19 Jun 11.
- Later Olympia slides show QUOVA within a diagram, so this one should show both QUOVA and EONBLUE but does neither. QUOVA concerns itself with IP ranges, IP geolocation, and anonymizers (proxy servers relaying on a user's behalf, hiding identifying information), yet ATLAS provided IP geolocation in later slides and HYPERION and PEITHO the IP proxies. So it must be that QUOVA add value to the in-house DNS lookup tool DANAUS.
This slide shows how the analyst can identify a proxy server at the Ministry of Mines and Energy based on its observed behavior. It's not clear whether a discovered proxy server has been identified for certain, or that is only the strongest candidate seen, nor whether the full set of MME proxy servers have been located or just one of several. However, this is the most promising site for defeat of SSL by a man-on-the-side attack to intercept of transiting documents before they can be encrypted.
-1- After initialization, the Dynamic Configuration for the IPs of MME determined above is set with three lines: high, low, high - low +1 = range for each block. Here a reverse proxy server (firewall surrogate) often holds the first number of the range block and sits in front of a local network of other computers utilizing the rest of the range block as their addresses. Those other IPs don't show up in metabases because the URL requested by an outside visitor passes through the proxy on its way to the server (that actually can fulfill the request) is returned as if it came from the proxy server.
-2- The initial data is split at an enhancement fork which is not described further. Buffers should have been created for two subsequent tools PEITHO and HYPERION because they are sent large files (as indicated by the little 2-page icon on the connecting line). Those icons are missing from the algorithm, breaking it. Both PEITHO and HYPERION also need demultiplexing as followup but the De-Mux icons (the all-purpose dummy widget) are also missing from the diagram.
Recall many different ongoing processes on a given server are sending (and receiving data) simultaneously using the same Internet Protocol software. To accomplish this, packets of different types are intermingled ('multiplexed') in the exit stream. As the stream of packets is received, it is sorted out by type (demultiplexed) and passed to appropriate application on the receiving client.
-3a- PEITHO specializes in "TDI events" and has the same iconography as MARINA, tinted blue instead of pink. A menu in another slide ties MARINA to these same mysterious TDI events. MARINA is known to be a vast NSA metabase of internet metadata. An online LinkedIn profile speaks of having "used MARINA as a raw SIGINT data viewer for detection and analysis of priority targets and as a tracking and pattern-of-life tool."
PEITHO can thus be presumed very similar to MARINA, probably a refined subset of it adapted to dissecting out the TCP/IP connection metadata needed here, in particular recognizing and compiling the exchange of SSL certificates that are the hallmark of a secure (https) site. In one scenario, an off-site MME staffer uploading oil lease data points a web browser at the MME server that will host the documents, which sits within a LAN (local area network) behind a proxy server running port 443 for https.
After exchange of SSL certificates, the content can be sent over the internet encrypted rather than as plain text, and will decrypted at the MME repository. NSA data trawling - while not specifically seeking them out - intercepts these exchanges and stores them as a Sigint record subset in MARINA. PEITHO extracts these for the specified IP address ranges. This has nothing to do with defeating SSL - that comes later.
PEITHO can only provide half of a full TCP/IP 4-tuple (the output of this algorithm), namely the connection pairs with mentioning MME and server port numbers. This is done by filtering records in PEITHO high and low IP values provided by the initial configuration file, partitioning it into passing and not-passing. Values from both are renamed and retained in output because they define IP blocks.
-3b- Meanwhile, HYPERION works in parallel to PEITHO to provide IP to IP communication summaries, how data flows in and out of MME servers and their IP range blocks, in response to remote IP requests. This data too undergoes similar filtering and re-mapping of value names and formats, again with ultimate retention of both streams as the entity_IP and remote_IP components of the TCP/IP 4-tuple.
-4- The four fields of a TCP/IP 4-tuple are called entity_IP, remote_IP, remote_port, entity_port and will appear as a small table on the proxy output page. They are obtained by merger of the PEITHO 2-tuple with that of HYPERION.
-5- At this point, only https (port 443) and http (port 80) metadata remains as remote_port values. The latter is discarded on the basis of its port value under the assumption that high-value data will be encrypted in transit by a secure socket layer (SSL) using port 443. Note email servers use port 25 - that will show up in the next slide in the context of correio.mme.gov.br.
On the results page provided by Tradecraft Navigator, only the two port columns are visible from the original socket pair 4-tuple. Ports are described by an esoteric compressed four-field format such as 6:443:TS(1) where the second element is the actual port number.
Here every port entry starts with 6: (making it uninformative) followed by 443 in the case of a remote https port, respectively high and variable (ephemeral) port numbers in the case of the entity_port column. The port description is then completed by a cryptic digraph drawn from TS, TC, FS, FC and a small qualifying number in parentheses.
It's not clear whether any more than just the straight port number needed to be retained here to substantiated a discovered proxy. Curiously, Olympia contains a distinct tool called ATHENA specializes in port information but it is not applied in this algorithm or any of the other slides.
The bottom line here is the analyst seems to have identified MME's proxy server and so a line of attack to be described later. That is of interest because closely held documents (like providing extents of offshore oil reserves or assay grade of mineral deposits being auctioned off) would be sent through this server as a measure to protect them from theft.
This slide presents a more complicated diagram of how an analyst can discover IP addresses the target, in the case the Brazilian MME, communicates with. This information can later be used to intercept these communications links.
-1- This starts with DNS lookup of the hostnames (eg correio.mme.gov.br). That process can give duplicates and other records that are empty with respect to fields of interest. These are discarded.
-2- After appropriate menu enrichments have expanded out from the initial seeds, PEITHO and HYPERION act again in parallel to reconstruct the TCP 4-tuples (or socket pairs). The stream of internet packets sent out by a given server are a mix of packets from whatever processes are running, for example http, https, ftp, smtp and telnet on the TCP side and dns, dhcp, tftp, snmp, rip, voip via UDP.
-3- As only http and https are of interest here, the other packets are discarded via the De-Mux widget. Note the packets are not really multiplexed in the traditional sense used in signal electronics but remain discrete and merely alternate in the packet stream connecting server to client. De-multiplexing in this context simply means separating the packets as they come along, retaining only the subset of interest.
-4 - Not everything is of interest here, so the 'select values to carry' widget is necessary to whittle down the fields retained. Since TCP processes are bi-directional, with some of the packets coming from the server and others heading to the server, it's necessary to flip the latter set so that FROM always goes with the MME server and TO goes with IP addresses it communicates with. The two streams are then sorted by IP contacted which allows them to merge coherently to the 4-tuples described before.
-5a- The results are duplicated and split with one fork - after a sort and break-on-same field value reduction - sent to Tradecraft Navigator as a summary of the number of times each IP pair has connected, with most frequent presumably on top. No data page is provided in the slides.
-5b- The other duplicate is sorted so that each client is represented just once for geolocation lookup by ATLAS. That needs another version of de-multiplexing, followed by discard of empty rows. ATLAS is mentioned in three slides; from those annotations, it has to do with geolocation of network information and is filterable by date and IP range.
-6- The output to Tradecraft Navigator is sorted by ASN (Autonomous System Number, the unique identifier for an ISP network). The internet had some 42,000 unique autonomous networks in the routing system at the beginning of 2013; ten distinct ASN networks that MME connects with are discovered here. These include ASNs 6453 and 32613 in Canada, 16322 for Iran, 25019 and two others for Saudi Arabia, plus inexplicable IPs in Eritrea, Jordan and Thailand. ASN lookup is readily available and it provides country, date of registration, registrar, and owner name.
The data page is quite instructive. It shows the silliness of newspaper redactions: Fantástico/Greenwald scrubbed out all tool annotations on the algorithm and blocked columns 2, 4, 5, and 8 in the output whereas the Globe & Mail showed the whole algorithm legibly and redacted columns 2, 3 and 8.
Column 2 is merely DNS lookup, freely available on the open internet. Column 3 in the Globe & Mail can be restored using the months-earlier Fantástico publication. The IP ranges of MME's contacts in Column 8 are not too hard to get at using the initial IP contact from Fantástico as they will be a block extending the last 3 digits of the initial IP contact out to 255, e.g. the first row gives the range 196.200.208.114 to 196.200.208.255, all assigned to Eritrea.
Here MOEM, the Ministry of Energy and Mines in Eritrea, is located at www.moem.gov.er. While their server is not often working, the IP address there 196.200.102.242 does not correspond to any result found by the algorithm. Those IP addresses are assigned to Eritrea but do not have Hostnames and may be routers. Note that British Telecom provides the ASN network so all traffic there is routinely ingested by GCHQ and available to the Canadians. However there is no evidence from this algorithm that MME had any interest in its Eritrean counterpart MOEM.
The algorithm here re-uses tools and widgets seen before with very similar logic: previously determined hostnames associated with Brazil's MME seed the IP address look-up via 'Forward DNS' (Danaus) followed by DNI enrichment at unspecified NSA databases, the symmetric same split to PEITHO and HYPERION to collect IPs and ports, followed by filters, sorts and field renaming (no pop-up details provided) as seen in slides 2 and 4. After Atlas provides geolocation of the retained IPs (note the never-explained x5 in the upper left corner of the ATLAS icon), the fields are consolidated, with just the ones geo-located to non-Five Eyes countries retained.
It's not clear why results for the Five Eyes countries are discarded. These countries by agreement don't launch spying operations on each other; Canada could certainly launch attack on IPs on itself but that may not be within the remit of CSEC. It's hard to believe the analyst would not take a peak at friendly country IPs - perhaps these were only discarded for purposes of this presentation (at which NSA and GCHQ analysts were surely represented).
From other Snowden leaks, it's known NSA also runs its own Brazilian espionage program; if Canada installed its own man-in-the-middle malware on top of a pre-existing NSA attack, these could conceivably collide and crash the Brazilian system, or at least alert the Brazilians via degradation of network performance. For this reason, the analyst contacted TAO prior to the presentation, turning over subsequent man-in-the-middle attack details to them. TAO maintains the central malware repository and is better positioned to vett installations for redundancy and collisions.
These four output tables provide the best view to what CSEC learned about MME's vulnerabilities from applying the algorithm:
-1- The first table consists of two records for acessovpn.mme.gov.br. This Brazilian server was obtained earlier as record 5 from the slide 2 processing (which started with mme.br.gov and provided IPs and ISPs in the 'Domain's IPs Output' table). Here journalists have blacked out the target column out of internet illiteracy (they are 189.9.36.98 and 177.43.69.130) and the IP it contacts. The port numbers indicate the target server is using ephemeral ports and the contact http port 80, meaning it is not a mail server nor secure like https.
This server in Brasilia has been assigned a new database field with value Case Notation MA10099(1) here that was added by the analyst later (certainly not produced from running the algorithm). It's not clear whether this case notation is that of GSEC or joint notation with NSA's TAO.
It's instructive to look at what anyone can learn in seconds for free on the open internet -- and how this works. In the case of acessovpn.mme.gov.br, the TLD (top level domain) acessovpn is recognized by the Root Server i.root-servers.net which redirects to c.dns.br which redirects to two name servers ns1.mme.gov.br and ns1.mme.gov.br which themselves have A type records 177.43.69.148 and 189.9.36.101 so separate IP addresses both located at the same geolocation in Brazil.
-2- This pair of tables unfortunately has the headers censored. They may simply represent the two IP addresses 189.9.36.98 and 177.43.69.130. They are sorted by order of use - number IPs contacted. Thus the ASN contacted the most (26 and 15 times respectively in the time frame considered) was 18881. That indicates the IPS was Global Village Telecom, a formerly Brazilian telecom owned since 2010 by the French company Vivendi. After that, the first IP contacted ASN 7738 11 times whereas the second IP contacted ASN 26599 9 times. Farther down the list, providers in Columbia, Mexico, India and China are listed.
-3- The final result table utilizes two tools not mentioned in the script suggesting these were applied from within Tradecraft Navigator: Reverse DNS (DANAUS) and EONBLUE. The latter is a closely held corporate tool, apparently used here for decoding Hostnames behind proxies, though nothing came of it here. EONBLUE surfaced earlier in slide 2 paired with corporate tool QUOVA (that was the source of acessovpn.mme.gov.br there). The entire table refers to A type rather than MX (email servers).
This slide shows the contact chaining for Brazil's Ministry of Mines and Energy on both the internet and telephony side, mostly the latter. The process is initialized from a small plaintext file of initial selectors (CSV comma separated values, records separated by carriage returns) which is reconfigured to a standardized database format with administrative oversight (front door rules: legal and policy justifications for collection) before being passed to the thin client of the analyst. This is the only appearance of 'Justification' in the slide set.
-1- Another field is added, 'SelectorRealm'. Realm isn't explained here by a popup or sample output slide but in the MonkeyPuzzle memo it meant divisions of a large database (emailAddrm, google, msnpassport, and yahoo). Realm here might specify a subset of collection SIGADS. Thus this step is narrowing the field of inquiry by adding a realm field to the input records to restrict subsequent processing to that realm.
-2a- The records are now filtered by their DNR (telephony) selectors in an unspecified manner. The fork meeting filter conditions is expanded by DNI (internet) chaining via unspecified databases (web email contacts possibly being the realm) and using one hop (see below) for output to Tradecraft Navigator. The fork of records failing to meet filter conditions is discarded.
-2b- The other fork meeting filter conditions, after specifying date ranges etc, is sent out to be expanded DNR contacted chaining. This enrichment step is quite instructive: it involves four telephony databases (FASTBAT, DISHFIRE, FASCIA, MAINWAY). Here FASTBAT appears for the first time in Snowden document releases. It must be partially non-redundant with respect to the others or it would make no sense to include it. It is possibly a SIGAD specific to Brazil or South America, possibly CSEC collection at the Canadian Embassy in Brasilia (the other three are NSA). DISHFIRE holds SMS records (cell phone texting).
It would be amazing if this contact-chaining step did not take overnight (or at least involve long latency) - these databases contain many trillions of records and NSA could be running thousands of multi-hop contact-chaining requests simultaneously for analysts throughout Five Eyes. It's not clear whether NSA's move to the cloud will expedite such searches or break algorithms such as this for whom the haystack has gotten too large.
-3- Because of how realms, date ranges, country of call origin etc were initially specified, not all records produced by contact chaining having any data left in the fields of interest. (It is very common for some fields to be blank in database records) These empty records are discarded so they don't contribute rubbish to the output.
-4a- After renaming records for consistent output, the records are sorted by an important field (e.g. MSISDN phone number) and split, with one fork going to summary statistics (how many records had a given value for the fixed field), as seen by the capital greek letter Sigma (symbol for sum in math) in the 'Group by' icon. These are likely sorted to highest frequency order.
-4b- The other fork simply outputs all the records to Tradecraft Navigator, which may have its own social networking visualization tool or just pass it on to RENOIR. The original presentation may have contained a sample of output but if so, Greenwald may not have included it or if he did, the Globe and Mail didn't publish it.
In this important Olympia algorithm slide, CSEC leverages an initially modest collection of 9 cell phone call records (called DNR selectors) to successively recover the three identification numbers characterizing a cell phone, which in turn lead the analyst to identification of two obsolete handset models (Nokia 3120c-1c and Motorola MURQ7) owned by top MME staffers at one time. The handset models might next be checked against NSA's collection of cell phone malware at TAO or NAC to see if existing tools could hack the phones and turn them into surveillance devices.
A Snowden document disclosed earlier revealed the NSA asking State Department to pass along all cell phone numbers they had been given in the course of normal high level contacts with foreign counterparts. Thus numbers turned in by the American Embassy representatives in Brazil with day-to-day dealings with MME were ingested into an NSA database to which Canada had ready access to. These 9 selectors probably have originated by this route.
What all can be deciphered from this slide?
-1- The overall logic flow is very clear: start from the 9 DNR call record seeds, determine the MSISDN number of the two cell phones, with that find the IMSI, from that the IMEI, and finally the handset model. This is far from trivial due to the properties of cell phone numbers (see below) and devious manufacturing practices in countries such as China. Unlike in previous slides (where anyone online can do reverse DNS lookup in seconds), cell phone owners cannot follow CSEC's logic flow even for their own phone.
-2- The three ellipses show a practically identical logic flow. Even though the tool and widget logos are barely legible, they are evidently the same. In fact, the ellipse processes make very little use of high-powered Olympia tools. The icons primarily represent housekeeping widgets (filter, dummy, rename, sort, delete, etc) that are useful but don't provide enough muscle to do more than shuffle record formats. The real work is done almost entirely by the large outlined-text H icon, not named in the redacted slide or seen elsewhere in menus or other algorithms. It will be called H for HANDSET here.
-3- The output (smaller orange rectangle on far right holding the Tradecraft Navigator icon) is key to understanding the steps of the algorithm. The output is provided for us below the schematic in the form of a small database with 8 fields and two records (the upper dark blue line is highlighted). Although it is highly unlikely these phones are still in use, the MSISDN numbers providing the original input are blued-out as are the IMSI and IMEI. Interestingly, their field names include the work 'correlation' suggesting that they cannot be unambiguously determined but are instead inferred from associations. The Motorola model is more specifically the MURQ7-3334411C11.
-4- The last column TOPI (Target Office of Primary Interest) here takes on the value CSEC, suggesting it is Five Eyes terminology. It's not clear why TOPI needs to be included as a database field. Perhaps adding MME to the NSA's target database - where priority, legal authority, resources needed and operational risk are reviewed - requires tracking of the originating partner agency. Since Canada lacks the malware and insert capabilities of NSA, Brazil's MME must go in the queue to compete with many other projects in the works.
-5- The output line 'Bands Supported by IMEI' can be read well enough that google search can be used to correct any letters mis-read initially. The result provides a look-up of the band wavelengths that the cell phone can use - that might be useful down the roar for DRTBOX interception - and the various communication protocols, like GSM, WCDMA, FDD, HSUPA and HSDPA.
-6- To understand the main algorithm flow, it is necessary to delve into the meaning of the MSISDN, IMSI and IMEI, the three main numbers associated with a cell phone. While that seems straightforward, nominal explanations have to be corrected for online tools that make end runs around official protocols. Cell phones are commonly lost, stolen, re-sold, unlocked, unblocked, registered in one country but used in another, SIM cards replaced, chip sets re-soldered and so on. And that can take place on phones whose manufacturers violated all the rules for unique serial numbers, billing information and so forth.
MSISDN (Mobile Subscriber ISDN Number) is just the ordinary telephone number of a mobile cell that would be on a business card. CSEC may have asked their Brazilian embassy to scan business cards of high level MME staff acquired in the course of ordinary interaction. These selectors could account for the 9 DNR records mentioned here as initializers.
-7- Due to the blurred slide and erased annotations, we cannot follow exactly how CSEC get from the MSISDN to the IMSI to IMEI to the handset model. This cannot be straightforward because the headers indicate correlation (possibly via different databases that share time of call) rather than a determinative algorithm.
In the CO-TRAVELER cloud analytics document, we see two years later that NSA cannot routinely obtain either the MSISDN or IMEI starting from the IMSI in the SEDB Tower QFD summary database. Thus this slide is in some ways the most interesting of all, more the pity that it was so poorly disclosed.
This slide provides a summary showing how all the information gathered can be used for BPoA (Backdoor Point of Access?) leading to further actions through:
- CNE (Computer Network Exploitation, such as cookie-replay, man-on-the-side attacks, CDR, etc.)
- Passive tasking (Upstream collection through backbone cable splitting and filtering, router intercept or telecom carrier cooperation)
- HUMINT-enabled (Human Intelligence, like information derived from voluntary, paid or bribed informants)
It's not clear whether CSEC could take things only so far and then NSA and GCHQ had to step in to aid in an actual tapping, bugging or hacking operation.
This slide is reconstructed from the video footage and shows a diagram containing all the telephone and internet connections discovered in the OLYMPIA case study. At the left side of the slide there are the telephone connections and at the right side the internet links.
It's interesting to see that in this diagram there are also a number of
SIGADs, which are codes designating interception facilities. It's not really clear whether they were used to collect the metadata used for the chaining by the OLYMPIA tools, or whether they were eventually used to conduct interception of content on these communication links.
At the telephony side we see DS-800 as the facility for phone lines between the Brazilian ministry and numbers in Equador and Venezuela. Telephone communications to some other countries are monitored by facilities designated US-3294 and US-966V.
Internet traffic between IP addresses from Global Village Telecom and internet providers in Africa, the Middle East and Canada are also monitored by DS-800. We can also see that for internet traffic to India there's a facility designated DS-200 (maybe because GCHQ has good access to India?).
> See also:
What are SIGADs starting with DS for?
This slide seems to be the final one of the OLYMPIA case study presentation. The analyst writes that he identified mail servers, which meanwhile have been targeted by means of passive collection. That means by tapping the traffic from internet backbone cables. Analysts have been assessing the value of these e-mail data.
The analyst also says that he is working with NSA's TAO division "to further examine the possibility for a Man on the Side operation". Here he's evidently referring to acessovpn.mme.gov.br. Based on the network information gathered, the Network Analysis Centre (NAC) of the British signals intelligence agency GCHQ has started "a BPoA analysis on the MME".
This shows that the OLYMPIA presentation was not just a software tutorial or an example of coding. The results prove CSEC actually ran this exercise against the Brazilian Ministry of Mines and Energy and got some real results: information about their telephone and internet connections, although probably by far not complete.
As OLYMPIA is target-development software, this tool didn't gather any content of phone calls or e-mail messages, but this last slide tells us that as a result of the OLYMPIA effort, at least the e-mail of the Brazilian ministry became subject of an actual collection operation.
> See also:
An NSA eavesdropping case study
> See also:
About the legality of the NSA's testing and SIGINT Development projects
> See also on Lux ex Umbra:
Analyzing the "airport wi-fi" map
Links and Sources
- Vice.com: How does CSEC work with the world's most connected telecom company?
- Theoreti.ca: Interpreting the CSEC Presentation: Watch Out Olympians in the House!
- TheGlobeAndMail.com: Slides reveal Canada’s powerful espionage tool
- Globo.com: American and Canadian Spies target Brazilian Energy and Mining Ministry
- Anonymous: Total tear-down of Canada's Olympia spyware (pdf)