All posts by Narseo Vallina-Rodriguez

The bandwidth costs of third-party tracking services on mobile apps

Mobile operators offer data plans with volume caps to control network congestion and also for profit. The presence of such data caps forces many mobile users to control and limit their browsing and mobile habits to make sure that they do not exceed their data allowance.

In October 2015, a New York Times article measured the bandwidth and performance costs of online advertising on 50 websites. The article revealed that “more than half of all data in those websites came from ads and other content filtered by ad blockers”. For a mobile subscriber, it implies that a large fraction of their traffic is generated simply for tracking them.

What about mobile apps? How much data does an Android app generate for tracking users, app activity or printing ads?

In our previous blogpost, we talked about how Haystack can help you to identify the presence of third-party trackers on your mobile apps. The large number of trackers that we have found on our mobile apps has motivated us to measure the portion of traffic that each app generates for tracking and advertising purposes.

In total, we analyzed more than 1,700 mobile apps. The figure below shows the distribution of the percentage of app’s traffic going to such third-parties.

Histogram tracking traffic for tracking purposes per app

The results may vary depending on how our ICSI Haystack users interact with their apps. However, the ratio of app’s traffic dedicated to tracking is much higher than what we initially expected: on average, 24% of app’s traffic is associated with third-party tracking and advertising services. This networking activity not only impacts on user’s data plans but also on the battery life of the devices.

If we look in detail at the distribution, we can see that 40% of the apps dedicate at least 10% of their traffic on tracking and advertising while more than 10% of mobile apps have at least 90% of their traffic associated with such activities. If it weren’t for user tracking and advertising, many mobile apps could operate completely offline!

The table below lists some of the apps for which tracking activity and ads account for at least 98% of their total traffic. The apps listed below have user ratings higher than 3.5/5 stars and millions of users.

App Name App Category App Audience Google Play Installs Rating % of traffic
Tottoko Dungeon Role Game Everyone 50K-100K 3.9 100%
Busuu – Easy Language Learning Education Everyone, 10+ 10M-50M 4.3 100%
Piano Tiles 2 (Don’t Tap…2) Arcade Everyone 100M-500M 4.7 99%
Drippler – Android Tips & Apps News & Magazines Everyone 5M-10M 4.5 99%
Tap Titans Role Playing Everyone, 10+ 10M-50M 4.7 99%
Headspace – meditation Health & Fitness Everyone 1M-5M 4.4 99%
Top Developer
Kung Fu Panda: BattleOfDestiny
Game Everyone 1M-5M 3.7 99%
Darklings Arcade Everyone 100K-500K 4.0 99%
Cooking Fever Arcade Everyone 10M-50M 4.4 99%
BestFriends – Puzzle Adventure Casual Everyone 10M-50M 4.6 99%
A Dark Dragon AD Role Game Everyone, 10+ 100K-500K 4.3 99%
Square Trade Shopping Everyone 10K-50K 3.6 99%
Jungle Cubes Puzzle Everyone 500K-1M 4.3 99%

If we look at the type of apps and their targeted audience, we can see that many of them are games rated as suitable for children. These apps connect to third-party services like Facebook Graph — Facebook’s analytics and ad network –, tools for user-engagement and A/B testing like HelpShift or Optimizely, tools to promote app installs like Chartboost, analytics services like mobileapptracking (part of Tune), and mobile-game specific tracking services and gaming ad-networks like Unity3D.

According to EU legislation, no tracking activity should take place on apps for children without parental consent. For some of the children games that we’ve manually tried, we have not seen any activity or information on Google Play aiming to inform the parents — or the user — about any tracking activity. Even in the USA, the FTC recently charged InMobi with a nearly 1M USD settlement for tracking children without parental consent. We will further investigate that interesting topic in future blogposts.

We’re currently working to enable new features on the Haystack app that would help you to keep control of your network data consumption and privacy. In the meantime, our current user-interface reports how much of the data generated by your app goes to third-party trackers and ad networks:

ICSI Haystack profile for the Hailo app

Stay tuned!

Exposing indirect privacy leaks on mobile apps

Today, we have been informed that the ICSI Haystack Project has been awarded with one of the prestigious Data Transparency Lab 2016 grants. If you are not familiar with the Data Transparency Lab efforts, the DTL is a community of technologists, researchers, policymakers and industry representatives working to advance online personal data transparency through scientific research and design. The initiative is led by Mozilla, Telefonica, and ODI.

Our DTL research proposal aims to illuminate the presence of indirect privacy leaks in mobile apps. A typical privacy-aware user checks the app’s permission list at the time of installing a new Android app from Google Play. Some users may still agree to share part of their personal information with the app developer even when they consider an app permission harmful for their privacy. However, what most users do not know, is that the app developer may not be the only organization collecting their personal information.

As in the browser context, mobile apps can leak user personal information to third parties such as ad networks and analytics services without user awareness and consent. While these services are valuable to app developers, they may track users and collect a vast amount of personal information about them by piggybacking on the permissions requested by the app developer and granted by the user. Google Play does not require the app developer to inform users about the presence of tracking services in Android apps.

Mobile users, and even regulators, lack of tools to understand how mobile apps operate behind the scenes and the organizations collecting user data. Our research and development efforts in the ICSI Haystack project seek to illuminate this dark space with the hope of helping users to stay in control of their online privacy and rise societal awareness.

To that end, we created an interactive map of tracking services on Android apps: the ICSI Panopticon. The image below contains a screenshot of the interactive map.

ICSI Haystack Panopticon Screenshot

The ICSI Haystack Panopticon contains records for more than 1,500 Android apps and it is built upon the data collected from the users of our ICSI Haystack Privacy Monitor app. If you’re one of the, we would like to thank you for your help. If not, we strongly invite you to install the app and contribute to extend our catalogue of Android apps. Note that we collect the data by crowdsourcing means in a completely anonymized way: we do not collect any personal information about our users as we describe in our privacy policy .

Our analysis revealed that 70% of our monitored mobile apps connect at least with one tracking service. A significant fraction of apps even connect to more than 10 tracking services simultaneously. We invite you to play with the Panopticon and identify the organizations collecting your personal information when you use a given app by yourself. As you will notice, there is a strong power law distribution as a few organizations dominate this ecosystem: Crashlytics and Flurry (both owned by Yahoo), Google Analytics, AdJust, AppsFlyer, Mixpanel and Facebook Analytics. Interestingly, many of these services are cross-platform so that they can track you not only in your mobile apps but also in the browser.

We’re working hard to release new app features to help you to better protect your online privacy. We are taking inspiration from Ghostery’s and Privacy Badger browser extensions to enable data flow blockage in an easy-to-use way. Stay tuned!

New Haystack paper version available on arxiv!

We have uploaded a new version of our Haystack paper on arxiv!

In this new version, we discuss in more detail Haystack’s goals and its ability to conduct real-world mobile measurements at scale from the device. Haystack’s design allows it to be flexible enough to measure aspects beyond privacy leaks and basic traffic characterization.

As we explain the paper, thanks to the data provided by our first 450 users, we could already investigate the behavior of more than 1,300 mobile apps. We have identified a surprisingly high presence of third-party services on mobile apps for tracking and advertising purposes. Likewise, we also identified interesting facts about pre-installed apps — which are typically ignored by previous research studies that rely on static and dynamic analysis. Furthermore, thanks to Haystack’s data we can explore new computing paradigms like the Internet of Things in real-world settings and IoT deployments. We invite you to take a look at the paper and let us know what you think!

If you haven’t tried Haystack yet, we also invite you to install and try our Android app. The latest version allows you to identify more than 200 analytics and ad networks and estimate how much data you’ve wasted on those activities.

We’re currently working to improve Haystack’s flow re-assembly mechanism and to provide new features. For instance, we want to adapt Haystack to perform performance measurements, allow you to filter undesired flows — as in a privacy firewall —, censorship detection and also to enable packet capture so that you can use Haystack to get pcaps on your phone.

Finally, we’re also working on adding new features to our website. In particular, we are quite excited about the release of the Haystack Analytics services.Haystack Analytics will allow anyone to obtain information about specific mobile apps and services so you learn about the behavior of a given app before installing it.

As usual, we would like to thank our users. Without their participation and help, this project would have not succeeded.

Haystack: Preliminary Results

Last week, Haystack crossed the bar of 200 installs on Google Play! We want to take this opportunity to thank our users as your collaboration is extremely valuable for us in our research efforts to illuminate the fast-evolving mobile ecosystem.

If you haven’t tried Haystack yet, we would like to invite you to install it. Haystack is a tool developed by a group of academic researchers at the International Computer Science Institute and UC Berkeley in collaboration with Stony Brook University. Haystack analyzes the traffic generated by the apps running on your phone so you can understand how they communicate with online services, and whether they leak any sensitive information about you or about your device. Haystack is available for free on Google Play.

How does Haystack work?

Your phone hosts a rich array of information about you and your activities. This includes a range of identifiers that can enable sites to track you, as well as data about your device, your installed apps, your accounts, and your location.

Sometimes apps require information to provide useful functions of the app, or to adapt content to your device. For example, a Maps application of course needs to know your exact location. But in other cases, apps may collect and upload privacy-sensitive information for advertising, analytics, and tracking purposes. We consider these to be privacy-sensitive leaks but it is up to you to decide whether you want to continue using the app.

Android uses a permission model to control how applications access sensitive resources, but it does not really help you to know which organizations collect data about you or your device. Haystack aims to fill this gap.

The ICSI Haystack app helps you to identify which apps leak information about you, where your apps connect to, which protocols they use and it finally informs you about the organizations collecting this information, even when the apps use encryption techniques like TLS/SSL (i.e., the technology providing the “S” for Secure in HTTPS). We accomplish this by a technique known as TLS Interception. If you want to disable or enable TLS interception at any time, you can easily do it on the app settings.

Haystack requires access to a number of Android permissions to capture your app’s traffic and to identify privacy leaks. If you want to know more specific details, or if you want to know how Haystack will affect your battery life or performance, you can read our paper. We tried our best to minimize Haystack’s overhead and we will continue investigating new techniques to improve its performance.

A note about Samsung devices: We have noticed that Haystack cannot intercept app traffic in many Samsung devices. In some cases, it only intercepts DNS traffic, and in others it only works when connected over WiFi. Please, accept our apologies. We are investigating the underlying problem to come with a solution but it is possible that this is a device or firmware limitation. Users of other Android VPN clients available in Google Play also reported similar problems. To minimize user’s frustration, we have not listed Samsung devices in the list of supported devices in our Google Play listing. Nevertheless, you can install and try Haystack using this direct link to the APK under your own responsibility.

Data Collection

Haystack is also a tool to research the mobile ecosystem at scale. The mobile ecosystem is fast-evolving and it is difficult to identify new players and services and how app developers integrate their services. In fact, it is also important to know how secure are data communications across all apps and whether developers are implementing countermeasures against potential attacks.

Your collaboration is very important to successfully achieve our research goals. However, in order to perform our studies, we need to collect certain pieces of information about how your apps behave. We do this without compromising your privacy by aggregating and anonymizing our traces. For instance, we want to know that an application leaked a unique identifier like the IMEI to a given server using TLS while the screen was off, but we do not want to know who you are, which apps you run, the value of the IMEI, neither your location nor your IP address. All your personal data remains on your phone!

Because of this data sanitization process, ICSI-UC Berkeley’s Committee for Protection of Human Subjects (IRB) has not considered this study as a human-subject study. If you have any questions about our data collection process, we strongly encourage you to send us an email to haystack-help“@”

Preliminary Results

Thanks to our first 200 users, we have identified 423 apps leaking information to 2,995 unique domains (many of them, owned by the same organization as in the case of Google Services). The figure below shows the most popular online services among our monitored apps: 9/12 of them belong to Google.

Online services popularity

The interactive figure below illustrates the network of apps leaking sensitive information and the organizations collecting it as seen by Haystack. The color of the edges represent whether the developer uploads securely the data with TLS by default (in blue) or if we have at least one record of an insecure flow (red). Nearly 72% of the privacy leaks happened over HTTPS/TLS. This stress the need to intercept TLS traffic locally in order to identify those leaks. We cannot access completely or partially the flows for certain apps like Mega, Uber and Twitter as they implement effective countermeasures against TLS interception.

Because of rendering and performance issues, we pruned the network. In particular, we excluded leaks caused by browser activity (e.g., Mozilla’s Firefox or Google Chrome), those caused by pre-installed Android processes, and unpopular online services reached by less than three applications (mainly apps reporting data to their own online infrastructure). To reduce the number of nodes, we also group together sub-domains.

The figure shows how applications cluster themselves around popular tracking, analytics and advertisement services, but also for social media integration as in the case of Facebook’s Graph API. Nevertheless, there is a clear power-law distribution in the number of apps using those services: 53 of the services have been reached by at least 5 apps. The most popular services in our dataset are:

  1. Facebook Graph : Facebook’s Graph allows app developers to integrate their apps with Facebook’s social graph and platform. However, Facebook’s Graph API also allows developers to monetize their apps through advertising and perform app analytics. Facebook’s services are always delivered over HTTPS and they do not seem to use specific domains for each service: all the services seem to be delivered by the domain This characteristic makes it harder for traditional ad blockers to block this traffic as they need to perform TLS interception and analyze the traffic to identify the content of a given flow. We have identified more than 15 types of sensitive data leaks with a significance variation between apps, probably associated with developer’s needs. Nevertheless, Facebook’s official apps (e.g., Instagram, Facebook’s Pages, Facebook’s Messenger and Facebook App) upload the main bulk of sensitive information. The applications that integrate Facebook’s Graph API in their services generally report information like OS build id (which can be seen as a cookie), operator name, device model and brand with the exception of few applications reporting unique identifiers like the IMEI or even the private IP address. We have manually checked that applications using this service can operate even on devices without Facebook’s official apps installed.
  2. Google Services: This includes more than 20 domains like 1) (for apps interacting with Google’s API, including authentication); 2),, and for online advertisement, and 3) Google Analytics for Android ( The native processes Android’s Backup Service and Google Play leak most of the device and user related information to Google’s services. The rest of the apps leak information like device model, brand and build ID.
  3. Flurry. Flurry is a popular analytics service owned by Yahoo. So far, we have identified more than 10 different types of data leaks associated with it, mostly device information (brand and model and build ID), and ISP information (MCC/MNC codes).
  4. Crashlytics. Crashlytics, now acquired by Twitter, is a mobile company building crash reporting for iOS and Android. 27 of the apps on devices running Haystack use this service.

We will investigate in detail the content of the protocols for each one of the users on instrumented phones under our control to do not cause any privacy violation to our users. We will release our results in this blog. Stay tuned!

Case Study: Unique Identifiers

Mobile devices contain many types of unique identifiers which are guaranteed to be unique among all the values used for those objects, users, devices or resources. Unique identifiers are extremely useful for mobile advertisement and tracking services to link data to users across all the apps using their services. They play a similar role as the cookies in the context of the browser with the advantage of being immutable. Below, we describe some of the unique identifiers stored on mobile phones:


  • The International Mobile Station Equipment Identity or IMEI is a unique 15 digit number that identifies every mobile phone, GSM modem or device with a built-in phone / modem. Based on this value, it is also possible to obtain some additional information about the device brand or model.
  • The International Mobile Subscriber Identity or IMSI is a 64 bit value that identifies uniquely the user of a cellular network globally. The subscriber’s phone number is another value falling in this category.
  • The Media Access Control Address or MAC Address is a unique identifier assigned to network interfaces for communications on the physical network segment (e.g., WiFi or Bluetooth).
  • As any product, mobile phones also contain unique identifiers that are assigned incrementally or sequentially by the manufacturer like serial numbers. Any app can request this information programmatically.

An app developer must request the permission READ_PHONE_STATE, considered on Android’s documentation among the list of dangerous permissions, to read device identifiers such as the IMEI, phone number and IMSI. However, there is no permission to access other unique identifiers like the serial number and the MAC address. In this case, the app developer just need to access the information provided by the getprop command, a system-managed process that stores a vast number of system properties and configurations. This is an anonymized version of a real hexadecimal serial number: [ro.serialno]: [0X8X9XX214084221] as stored by getprop. The information stored in this file has the same impact on user’s privacy as the other values in practical terms. Haystack reads and parses this file and searches for its content on user’s traffic. This allows us to investigate how applications access and leak any unique identifier, and the set of organizations collecting them and the use of encryption for them. The histogram below shows the number of apps leaking those values and if they use encryption to upload this sensitive data.

Histogram of Apps per Leak Type and Encryption

In addition to unique identifiers, applications can access other sensitive information such as the device hostname, or even the WiFi SSID through the getprop command. This can be used to geo-locate the user without requiring any further permission. As in the case of unique identifiers, the following table list of apps leaking this information and their destination as seen by Haystack. Only Adobe’s analytics service does not use encryption by default.

Identifying apps uploading those values without encryption is very important, specially in oppressive regimes. The presence of this metadata on user’s traffic make them trackeable by in-path middleboxes (some of them such as WiFi APs can use it for advertising purposes), and by any surveillance agency. Nonetheless, as a result of the popularity of ad-blockers working at the network level and the increasing user concerns about mass surveillance, the number of online trackers, advertising and analytics services using TLS is increasing. In total, 69 apps leak any type of unique identifiers which are collected by more than 150 domains. Those values allow online services for analytics and advertising to identify users across the metadata provided by all the apps using their services. The figure below shows which apps leak any unique identifier or network-related information and the organizations (domains) collecting them. As in the interactive graph, we highlight in red the applications leaking any unique identifier without encryption. An interesting case is the app PayTM, an e-commerce app for online payments that leaks the IMEI value over plain HTTP.

An interesting observation that benefits from Haystack’s crowd-sourcing nature is that application developers use third-party libraries in different ways: not all apps leak the same type of information to those online services. Finally, many apps upload this information solely to their own servers.

The use of this information can be legitimate and very useful to prevent device or identity theft, and fraud. That could apply to applications like Cerberus anti-theft. However, even if legit, it is not a good practice to request this information without user awareness by using the getprop command.

You can access the Google Play profile for each one of the apps by replacing the corresponding APP_ID (listed on the Y-axis of the Figures) in the following URL:

Apps leaking the IMEI:

Apps leaking IMEI

Apps leaking the IMSI number:

Apps leaking IMSI

Apps leaking the serial number:

Apps leaking device's serial number

Apps leaking the MAC address:

Apps leaking device's MAC address

Future Work and Improvements

Haystack is still in its early stages and the results that we have presented in this post are just a first analysis of the data we have collected so far. We hope that you’ve found it interesting.

We are confident that as more users try the tool, we will be able to have a better idea of how the different stakeholders of the mobile ecosystem leverage user’s metadata while also increasing the range of leakages that we currently support.

As we can allocate resources to the project, we also plan to make our dataset publicly available so any user can search for information about a given application or online service. We also want to extend the tool to become a platform for general mobile measurement: from performance measurements to security measurements. Nevertheless, our priority is developing and maintaining a tool for the mobile user so users can stay in control of their apps, their traffic, and their privacy.

Once more, we would like to invite you to try the tool if you haven’t done it yet and thank those who have already installed the app. As usual, we love to hear your feedback as this is the best way to improve the tool. You can send us your comments by email at haystack-help “@” or through our Twitter account. Likewise, we will be happy to answer any question or concern that you may have about the app or the data collection process.

Many thanks!

The ICSI Haystack Team