Looking at NTP usage by Android apps

A recent post by Dan Drown in the official NTP community forum reported that NTP IPv4 traffic has increased significantly since December 2013: 5x-20x previous traffic levels. Unfortunately, this unusual high load has a direct effect on NTP server performance.

Further investigations by Dan Drown and other researchers revealed that the traffic originated from mobile devices. More detailed analysis that leveraged on-device packet captures allowed them to better identify the actual origin: Snapchat for iOS.

Those findings motivated us to analyse whether any Android application is also responsible for NTP activity and how they use this critical network protocol. The Lumen Privacy Monitor offers us a unique vantage point to capture and analyse accurate traffic traces collected by crowd-sourcing mechanisms in user space. We have access to anonymised network flows generated by more than 3000 mobile apps with real user-stimuli.

According to our records, 84 applications use NTP. This covers popular application with more than 100M installs such as whatsapp, games like subway surfers, audio players like shazam, and tools like battery doctor and officesuit + pfd editor. We provide the full list of application and their number of installation, according to Google Play, at the end of this post.

If we classify the applications by their application category, using public information available on Google Play, we could identify 20 different application categories listed in the Table below.

Category Num. of Instal
preInstalled 30
Educational 16
Tools 8
Communication 6
Travel & Local 3
Travel & Local 3
Video Players & Editors 2
Lifestyle 2
Business 2
Social 2
Arcade 2
Food & Drink 2
Productivity 2
Shopping 1
Entertainment 1
Puzzle 1
Racing 1
Casual 1
Sports 1
Maps & Navigation 1

Note that the preInstalled category covers both pre-installed services and applications as well as apps not available in Google Play (either because they were removed or because they are available on alternative app stores). Excluding those, most apps relying on NTP fall in the “Educational”, “Tools” and “Communication” categories.

WHAT NTP POOLS DO MOBILE ANDROID APPS USE?

Most applications use the NTP server infrastructure provided by ntp.org. Some exceptions are applications using NIST’s time servers and Qualcomm’s server time.izatcloud.net, possibly associated with A-GPS services.

We validated with active measurements from a machine under our control in Spain that those domains host actual NTP servers. All machines replied to our NTP queries but s2m.time.edu.cn, which is under the control of the “China education and research” authority. We are unsure if this is an artefact implemented by the Great Firewall of China.

Although there is a specific NTP pool for android applications, recommended by the NTP community, only 15% of the application seem to communicate with it. However, if we compute the volume of requests for each pool, Android’s dedicated NTP server accounts for 64.5% of the total NTP requests in our records. This fact suggests that a limited number of android applications with high number of installs, mainly communication apps, use the right NTP pool for Android.

NTP Domain NTP Query  (%) # requesting app
2.android.pool.ntp.org 64.54 12
pool.ntp.org 13.15 12
1.cn.pool.ntp.org 5.58 20
north-america.pool.ntp.org 3.98 3
asia.pool.ntp.org 2.99 4
0.asia.pool.ntp.org 2.59 10
ntp.nict.jp 1.79 1
time-a.nist.gov 1.79 1
2.asia.pool.ntp.org 1.0 5
time.izatcloud.net 0.6 2
0.pool.ntp.org 0.4 2
0.us.pool.ntp.org 0.4 1
europe.pool.ntp.org 0.2 1
2.us.pool.ntp.org 0.2 1
s2m.time.edu.cn 0.2 1
1.us.pool.ntp.org 0.2 1

We extended our analysis by investigating the geographical distribution of the NTP pool and the geographical location of the clients performing NTP requests. Note that we do not have accurate geo location of our clients due to our efforts to avoid user tracking. The following table reports the percentage of requests by their geographical region for each NTP server.

The results show that multiple NTP pools often receive queries from devices located out of their supposed region. This is probably caused by the fact that app developers hardcode the NTP pools to be used by their apps rather than selecting the right one for a given user location. For instance asia.pool.ntp.org pool received a considerable percentage of requests (82.62%) from devices located in Europe. Two pre-installed LG services, com.lge.sync and com.lge.qmemoplus, use the pool independently of the device location. Further, “0.us.pool.ntp.org” presents a similar behaviour: 50% of its queries originated in Europe.

Not using the right NTP server may have a negative impact on the accuracy of the returned time value. It has been reported that a client-server latency above 20 ms is not recommended.

NTP Pool “Europe” “North america” “South america” “Asia” “Oceania”
2.android.pool.ntp.org 45.88 30.97 18.46 4.28 0.42
pool.ntp.org 36.0 32.0 8.0 24.0 _
asia.pool.ntp.org 82.61 _ 4.35 13.04 _
time-a.nist.gov _ 100.0 _ _ _
ntp.nict.jp 69.23 30.77 _ _ _
time.izatcloud.net 20.0 80.0 _ _ _
3.time.crdcloud.com _ 100.0 _ _ _
0.pool.ntp.org 50.0 50.0 _ _ _
0.us.pool.ntp.org 50.0 50.0 _ _ _
n1.netalyzr.icsi.berkeley.edu _ 50.0 _ 50.0 _
north-america.pool.ntp.org _ 100.0 _ _ _
2.us.pool.ntp.org _ 100.0 _ _ _
2.time.crdcloud.com _ 100.0 _ _ _
ro.pool.ntp.org 100.0 _ _ _ _
1.time.crdcloud.com _ 100.0 _ _ _
us.pool.ntp.org 100.0 _ _ _ _
1.us.pool.ntp.org _ 100.0 _ _ _

FINAL REMARKS

This short study confirms that NTP traffic is commonly used by mobile applications, including pre-installed services, many of which are highly popular and heavily used by mobile users all over the world. Unfortunately, we are unable to answer to which extent those apps contribute to the total increase of NTP queries reported by Dan Drown in his article. We hope, nevertheless, to shed some more light on the unknown usage of NTP traffic by mobile apps and the role that application developer may have in server loads and also in the final accuracy of the NTP request.

Category package_Name pool(s) num_Installation
Communication air.com.oak.walkoffame 1,000,000,000 – 5,000,000,000
Communication com.sinyee.babybus.kindergarten ‘1.cn.pool.ntp.org’ 1,000,000,000 – 5,000,000,000
Communication com.whatsapp ‘2.android.pool.ntp.org’, ‘ntp.nict.jp’, ‘asia.pool.ntp.org’ 1,000,000,000 – 5,000,000,000
Communication com.sinyee.babybus.cars ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 1,000,000,000 – 5,000,000,000
Arcade com.kiloo.subwaysurf 500,000,000 – 1,000,000,000
Music & Audio com.shazam.android ‘asia.pool.ntp.org’ 100,000,000 – 500,000,000
Tools com.ijinshan.kbatterydoctor_en ‘pool.ntp.org’ 100,000,000 – 500,000,000
Arcade com.nordcurrent.canteenhd 50,000,000 – 100,000,000
Music & Audio ru.org.amip.clocksync 50,000,000 – 100,000,000
Racing com.topfreegames.bikeracefreeworld 50,000,000 – 100,000,000
Music & Audio com.clearchannel.iheartradio.controller ‘time-a.nist.gov’ 50,000,000 – 100,000,000
Video Players & Editors com.mobitv.client.tmobiletvhd 10,000,000 – 50,000,000
Social tv.periscope.android ‘pool.ntp.org’ 10,000,000 – 50,000,000
Entertainment com.verizonmedia.go90.enterprise 10,000,000 – 50,000,000
Tools com.tmobile.pr.mytmobile ‘0.pool.ntp.org’ 10,000,000 – 50,000,000
Tools org.hola ‘pool.ntp.org’ 10,000,000 – 50,000,000
Travel & Local com.airbnb.android ‘pool.ntp.org’ 10,000,000 – 50,000,000
Communication com.cmcm.whatscall ‘pool.ntp.org’, ‘2.android.pool.ntp.org’ 10,000,000 – 50,000,000
Tools com.lbe.parallel.intl 10,000,000 – 50,000,000
Travel & Local com.hcom.android 10,000,000 – 50,000,000
Educational com.sinyee.babybus.care ‘0.asia.pool.ntp.org’, ‘2.asia.pool.ntp.org’, ‘1.cn.pool.ntp.org’ 10,000,000 – 50,000,000
Travel & Local com.makemytrip ‘pool.ntp.org’ 10,000,000 – 50,000,000
Sports de.motain.iliga 5,000,000 – 10,000,000
Communication com.google.android.wearable.app 5,000,000 – 10,000,000
Lifestyle ch.bitspin.timely ‘2.android.pool.ntp.org’ 5,000,000 – 10,000,000
Educational com.sinyee.babybus.shopping ‘1.cn.pool.ntp.org’ 5,000,000 – 10,000,000
Food & Drink com.application.zomato 5,000,000 – 10,000,000
Shopping com.target.socsav ‘north-america.pool.ntp.org’ 5,000,000 – 10,000,000
Maps & Navigation me.lyft.android ‘0.us.pool.ntp.org’, ‘2.us.pool.ntp.org’, ‘1.us.pool.ntp.org’ 5,000,000 – 10,000,000
Productivity com.drippler.android.updates 5,000,000 – 10,000,000
Educational com.sinyee.babybus.animal ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.moonexplorer ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.takecar ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.travelsafety ‘1.cn.pool.ntp.org’, 1,000,000 – 5,000,000
Educational com.sinyee.babybus.fireman ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Communication com.snrblabs.grooveip ‘pool.ntp.org’ 1,000,000 – 5,000,000
Productivity me.bluemail.mail ‘2.android.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.flowers ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.shoes ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.birthdayparty ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Food & Drink com.done.faasos 1,000,000 – 5,000,000
Educational com.sinyee.babybus.dining ‘2.asia.pool.ntp.org’, ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 1,000,000 – 5,000,000
Tools com.koushikdutta.tether 1,000,000 – 5,000,000
Social com.cmcm.live ‘pool.ntp.org’, ‘north-america.pool.ntp.org’, ‘europe.pool.ntp.org’ 1,000,000 – 5,000,000
Business system ‘time.izatcloud.net’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.truck ‘0.asia.pool.ntp.org’, ‘2.asia.pool.ntp.org’, ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Lifestyle com.nest.android 1,000,000 – 5,000,000
Educational com.sinyee.babybus.toilet ‘1.cn.pool.ntp.org’ 1,000,000 – 5,000,000
Puzzle com.disney.disneyrestaurant_goo 1,000,000 – 5,000,000
Educational com.sinyee.babybus.organized ‘1.cn.pool.ntp.org’, ‘2.asia.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 1,000,000 – 5,000,000
Video Players & Editors com.plexapp.android ‘2.android.pool.ntp.org’ 1,000,000 – 5,000,000
Casual app.fotopalabra ‘pool.ntp.org’ 1,000,000 – 5,000,000
Communication com.trtf.blue ‘2.android.pool.ntp.org’ 1,000,000 – 5,000,000
Educational com.sinyee.babybus.photostudio ‘0.asia.pool.ntp.org’, ‘2.asia.pool.ntp.org’, ‘1.cn.pool.ntp.org’ 500,000 – 1,000,000
Educational com.sinyee.babybus.share ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 500,000 – 1,000,000
Music & Audio com.shazam.encore.android 500,000 – 1,000,000
Educational com.sinyee.babybus.dayandnight ‘1.cn.pool.ntp.org’, ‘0.asia.pool.ntp.org’ 500,000 – 1,000,000
Business com.microsoft.windowsintune.companyportal ‘0.pool.ntp.org’ 100,000 – 500,000
Tools edu.berkeley.icsi.netalyzr.android ‘glue.glue1u19589.n1.netalyzr.icsi.berkeley.edu’, ‘n1.netalyzr.icsi.berkeley.edu’ 50,000 – 100,000
Communication com.rdx.kyanuserinterface 10,000 – 50,000
preInstall com.fingerprints.fingerprintsensortest Not Available
preInstall com.qualcomm.embms Not Available
preInstall com.huawei.android.airsharing ‘time.izatcloud.net’, Not Available
preInstall com.qualcomm.timeservice ‘ntp.nict.jp’ Not Available
preInstall com.samsung.android.fingerprint.service Not Available
preInstall air.com.oak.oddsocksmob ‘pool.ntp.org’ Not Available
preInstall com.android.keychain Not Available
preInstall com.sinyee.fruit.activity ‘1.cn.pool.ntp.org’ Not Available
preInstall com.android.development ‘2.android.pool.ntp.org’ Not Available
preInstall com.amazon.connectivitydiag Not Available
preInstall com.sonymobile.screenrecording Not Available
preInstall com.mediatek.providers.drm Not Available
preInstall com.lge.qmemoplus ‘asia.pool.ntp.org’ Not Available
preInstall com.jide.networkconn ‘s2m.time.edu.cn’ Not Available
preInstall com.lge.sync ‘asia.pool.ntp.org’ Not Available
preInstall com.android.location.fused ‘2.android.pool.ntp.org’ Not Available
preInstall com.lge.sprinthiddenmenu Not Available
preInstall com.sonymobile.simlock.service Not Available
preInstall com.motricity.verizon.ssodownloadable ‘2.android.pool.ntp.org’ Not Available
preInstall com.google.android.backuptransport ‘pool.ntp.org’, ‘2.android.pool.ntp.org’ Not Available
preInstall com.android.vending ‘pool.ntp.org’ Not Available
preInstall com.qualcomm.atfwd ‘2.android.pool.ntp.org’ Not Available
preInstall com.samsung.helphub Not Available
preInstall com.sec.knox.switcher ‘north-america.pool.ntp.org’ Not Available

Monkey business in children’s apps

It’s not uncommon for small children, toddlers, and even babies to be mesmerized by smartphones and tablets. Whether it’s angering birds, ninja-ing fruit, or crafting mines, intuitive touchscreen-based interfaces allow the youngest of users to play in high-definition virtual environments with ease.

Given such a compelling audience, there’s now a vibrant ecosystem of app developers specializing in products designed for young children. A quick glance at the “Kids” category in the iOS App Store and the “Family” category in the Google Play Store reveals hundreds of offerings from dozens of companies.

Many apps listed in children’s categories are offered free of charge. Just like free apps for general audiences, developers of no-cost children’s apps still need to generate revenue. They might opt to partner with advertising networks to display ads alongside app content. These developers might also use analytics services to better understand their users’ interests.

Advertising and analytics are ubiquitous online. However, developers of children’s apps must be especially careful with any data they share with affiliates, inadvertently or otherwise. In the United States, the Children’s Online Privacy Protection Act (COPPA) limits the collection and sharing of data from young children. Multimedia captures, street addresses, and unique identifiers — among many others — fall within the scope of this law. Children’s apps need to properly inform parents of the nature of any data collection, and subsequently obtain parental consent before proceeding. The Federal Trade Commission (FTC) has brought about stiff civil penalties on software developers that violate these standards.

The ICSI Usable Security and Privacy group and the Haystack team are in a unique position to investigate the data collection behavior of children’s apps — behavior often invisible to end-users. We aim to empower parents and regulators alike with a bird’s eye view of the state of data collection in children’s apps. Our goal is to evaluate large numbers of children’s apps at scale.

We have combined two of our in-house research tools in order to characterize app data collection behavior: the Lumen Privacy Tool (a tool that detects privacy leaks on mobile apps, available on Google Play) to monitor data transmitted to remote servers, and a customized Android platform to log accesses to permission-protected resources and track coverage.

Previous investigations of children’s apps relied on human-directed exploration of the software. Manual efforts are time-consuming and financially costly.  We automate investigations using the Android Exerciser Monkey, which generates a pseudorandom stream of input events, such as taps, swipes, hardware button presses, and application activity triggers. For our research, we hired a tester to manually explore apps so we can establish a baseline to which we can compare the monkey’s thoroughness.

We have assembled a corpus of 446 free apps gathered from the “Ages 5 & under,” “Ages 6-8,” and “For kids 9 & up” subcategories in the Google Play Store. Developers self-declare their apps for these subcategories for young children. This is an explicit acknowledgement that their target audience is indeed children under 13 years of age.

We presented a report and a poster of our ongoing work at FTC PrivacyCon 2017. Although this effort is still underway, we’re excited to share some of our early findings. The following results were collected using automated monkey-driven analysis.

“Do you know where your children are?”

COPPA prohibits the collection of street-resolvable geolocation from users under 13 years of age. Online services can (and often do) identify users’ cities and ISPs via IP address location lookups. However, this doesn’t necessarily run afoul of COPPA restrictions, as that’s insufficiently fine information to determine the user’s home address or street.

Apps that use GPS, cell network, and Wi-Fi location would likely be in violation though. Those geolocation technologies enable apps to resolve the user’s location well within tens of meters; enough to identify a street address with high confidence. Luckily, Android requires apps to declare the ACCESS_FINE_LOCATION permission and request user approval in order to access this functionality.

Using our automated test platform, we identified a likely leakage of fine geolocation data to third parties without informing the user. Out of the 111 children’s apps we’ve auto-tested so far, we observed 15 sharing Wi-Fi router MAC addresses with third parties. All 15 were developed by BabyBus, and all shared it with the same analytics firm. Wi-Fi router MAC addresses are trivially street-resolvable using public APIs offered by WiGLE and Google Maps Geolocation.

Excerpt of POST request to third party:

[{"type":"wifi","available":true,"connected":true,"current":[{"name":"\"ICSI\"","id":"38:1c:1a:c6:b9:c0","level":-76,"hidden":false,"ip":1711712448,"speed":65,"networkId":0,"mac":"02:00:00:00:00:00","dhcp":{"dns1":146446016,"dns2":163223232,"gw":17213632,"ip":1711712448,"mask":0,"server":-239429952,"leaseDuration":3600}}],"configured":[{"networkId":1,"priority":7,"name":"\"AirBears2\""},{"networkId":0,"priority":5,"name":"\"ICSI\"","id":"any"]},{"type":"cellular","available":false,"connected":false}]

We were able to locate our offices with very high accuracy using this data. We also also asked our manual tester where she evaluated the affected apps, and with permission verified those locations too (not displayed to protect her privacy).

Additionally, this third party collects the names of all Wi-Fi routers saved on the host device. It’s unclear what purpose this information could serve, but it could easily be used to fingerprint users and identify when the user is at home vs. in school.

We notified BabyBus of this finding, and they responded rapidly, saying that have now stopped using this analytics package. All their products on the Google Play Store were indeed updated within days of our notice, but we still need to re-run our tests to verify this change.

“Just say no”

Online advertisers seek to increase conversation rates by serving users with timely ads relevant to their interests. This requires tracking users over time and across different services, often by uniquely identifying their devices and building personal profiles tied to those identifiers. With enough observations and data tied to a particular device, advertisers can build a profile of that user’s interests and background. For young children though, this kind of highly targeted behavioral advertising is restricted under COPPA guidelines.

A number of unique identifiers are available on Android devices. These include the Wi-Fi radio’s MAC address, phone IMEI, and SIM card identifiers. One such unique identifier is the Android Advertising ID (AAID). The AAID is a persistent OS-generated identifier that the user can opt not to share with apps. The user can also choose to manually regenerate it, effectively disconnecting themselves from marketing profiles built on the previous AAID. Google Play policy recommends that the Android AAID be used exclusively for advertising and analytics purposes. The policy discourages developers from associating the AAID to other (more difficult to reset) device identifiers.

From our automated tests to date, we identified a children’s game that not only collected the AAID, but also other device identifiers with it, going against Google’s best practices. This collection persists even if a parent uses the system settings to explicitly opt out of tracking.

Excerpt of POST request to third-party:

advertiserIdEnabled=false&ang=English&app_version_name=2.2.0&dkh=yZnL9BNt&android_id=84f942c74fffbdef&advertiserId=fff3ca7e-61d7-4298-ab14-256033002de9&deviceType=userdebug&network=WIFI&operator=&brand=Android&date2=2016-11-02_0126-0700&uid=1478118365655-1389078544330603868&isFirstCall=true&counter=1&product=aosp_angler&model=AOSP+on+angler

We reached out to the app developer for verification and comments. They have yet to respond to us. We are in the process of checking for this behavior in their other apps, as well as other companies’ products that use the same third-party service.

Baby steps

These results just scratch the surface of the data collection behavior in the children’s apps we’re testing. Our automated testing process revealed the collection and sharing of COPPA-relevant data — behavior otherwise invisible even to tech-savvy parents, much less to the young children actually using these apps. We still have the rest of our corpus to analyze, both with our automated system, and with our manual tester for comparison. We look forward to sharing additional findings on this blog and discussing our methods rigorously in refereed academic venues.

The bandwidth costs of third-party tracking services on mobile apps

Mobile operators offer data plans with volume caps to control network congestion and also for profit. The presence of such data caps forces many mobile users to control and limit their browsing and mobile habits to make sure that they do not exceed their data allowance.

In October 2015, a New York Times article measured the bandwidth and performance costs of online advertising on 50 websites. The article revealed that “more than half of all data in those websites came from ads and other content filtered by ad blockers”. For a mobile subscriber, it implies that a large fraction of their traffic is generated simply for tracking them.

What about mobile apps? How much data does an Android app generate for tracking users, app activity or printing ads?

In our previous blogpost, we talked about how Haystack can help you to identify the presence of third-party trackers on your mobile apps. The large number of trackers that we have found on our mobile apps has motivated us to measure the portion of traffic that each app generates for tracking and advertising purposes.

In total, we analyzed more than 1,700 mobile apps. The figure below shows the distribution of the percentage of app’s traffic going to such third-parties.

Histogram tracking traffic for tracking purposes per app

The results may vary depending on how our ICSI Haystack users interact with their apps. However, the ratio of app’s traffic dedicated to tracking is much higher than what we initially expected: on average, 24% of app’s traffic is associated with third-party tracking and advertising services. This networking activity not only impacts on user’s data plans but also on the battery life of the devices.

If we look in detail at the distribution, we can see that 40% of the apps dedicate at least 10% of their traffic on tracking and advertising while more than 10% of mobile apps have at least 90% of their traffic associated with such activities. If it weren’t for user tracking and advertising, many mobile apps could operate completely offline!

The table below lists some of the apps for which tracking activity and ads account for at least 98% of their total traffic. The apps listed below have user ratings higher than 3.5/5 stars and millions of users.

App Name App Category App Audience Google Play Installs Rating % of traffic
Tottoko Dungeon Role Game Everyone 50K-100K 3.9 100%
Busuu – Easy Language Learning Education Everyone, 10+ 10M-50M 4.3 100%
Piano Tiles 2 (Don’t Tap…2) Arcade Everyone 100M-500M 4.7 99%
Drippler – Android Tips & Apps News & Magazines Everyone 5M-10M 4.5 99%
Tap Titans Role Playing Everyone, 10+ 10M-50M 4.7 99%
Headspace – meditation Health & Fitness Everyone 1M-5M 4.4 99%
Top Developer
Kung Fu Panda: BattleOfDestiny
Game Everyone 1M-5M 3.7 99%
Darklings Arcade Everyone 100K-500K 4.0 99%
Cooking Fever Arcade Everyone 10M-50M 4.4 99%
BestFriends – Puzzle Adventure Casual Everyone 10M-50M 4.6 99%
A Dark Dragon AD Role Game Everyone, 10+ 100K-500K 4.3 99%
Square Trade Shopping Everyone 10K-50K 3.6 99%
Jungle Cubes Puzzle Everyone 500K-1M 4.3 99%

If we look at the type of apps and their targeted audience, we can see that many of them are games rated as suitable for children. These apps connect to third-party services like Facebook Graph — Facebook’s analytics and ad network –, tools for user-engagement and A/B testing like HelpShift or Optimizely, tools to promote app installs like Chartboost, analytics services like mobileapptracking (part of Tune), and mobile-game specific tracking services and gaming ad-networks like Unity3D.

According to EU legislation, no tracking activity should take place on apps for children without parental consent. For some of the children games that we’ve manually tried, we have not seen any activity or information on Google Play aiming to inform the parents — or the user — about any tracking activity. Even in the USA, the FTC recently charged InMobi with a nearly 1M USD settlement for tracking children without parental consent. We will further investigate that interesting topic in future blogposts.

We’re currently working to enable new features on the Haystack app that would help you to keep control of your network data consumption and privacy. In the meantime, our current user-interface reports how much of the data generated by your app goes to third-party trackers and ad networks:

ICSI Haystack profile for the Hailo app

Stay tuned!

Exposing indirect privacy leaks on mobile apps

Today, we have been informed that the ICSI Haystack Project has been awarded with one of the prestigious Data Transparency Lab 2016 grants. If you are not familiar with the Data Transparency Lab efforts, the DTL is a community of technologists, researchers, policymakers and industry representatives working to advance online personal data transparency through scientific research and design. The initiative is led by Mozilla, Telefonica, and ODI.

Our DTL research proposal aims to illuminate the presence of indirect privacy leaks in mobile apps. A typical privacy-aware user checks the app’s permission list at the time of installing a new Android app from Google Play. Some users may still agree to share part of their personal information with the app developer even when they consider an app permission harmful for their privacy. However, what most users do not know, is that the app developer may not be the only organization collecting their personal information.

As in the browser context, mobile apps can leak user personal information to third parties such as ad networks and analytics services without user awareness and consent. While these services are valuable to app developers, they may track users and collect a vast amount of personal information about them by piggybacking on the permissions requested by the app developer and granted by the user. Google Play does not require the app developer to inform users about the presence of tracking services in Android apps.

Mobile users, and even regulators, lack of tools to understand how mobile apps operate behind the scenes and the organizations collecting user data. Our research and development efforts in the ICSI Haystack project seek to illuminate this dark space with the hope of helping users to stay in control of their online privacy and rise societal awareness.

To that end, we created an interactive map of tracking services on Android apps: the ICSI Panopticon. The image below contains a screenshot of the interactive map.

ICSI Haystack Panopticon Screenshot

The ICSI Haystack Panopticon contains records for more than 1,500 Android apps and it is built upon the data collected from the users of our ICSI Haystack Privacy Monitor app. If you’re one of the, we would like to thank you for your help. If not, we strongly invite you to install the app and contribute to extend our catalogue of Android apps. Note that we collect the data by crowdsourcing means in a completely anonymized way: we do not collect any personal information about our users as we describe in our privacy policy .

Our analysis revealed that 70% of our monitored mobile apps connect at least with one tracking service. A significant fraction of apps even connect to more than 10 tracking services simultaneously. We invite you to play with the Panopticon and identify the organizations collecting your personal information when you use a given app by yourself. As you will notice, there is a strong power law distribution as a few organizations dominate this ecosystem: Crashlytics and Flurry (both owned by Yahoo), Google Analytics, AdJust, AppsFlyer, Mixpanel and Facebook Analytics. Interestingly, many of these services are cross-platform so that they can track you not only in your mobile apps but also in the browser.

We’re working hard to release new app features to help you to better protect your online privacy. We are taking inspiration from Ghostery’s and Privacy Badger browser extensions to enable data flow blockage in an easy-to-use way. Stay tuned!

New Haystack paper version available on arxiv!

We have uploaded a new version of our Haystack paper on arxiv!

In this new version, we discuss in more detail Haystack’s goals and its ability to conduct real-world mobile measurements at scale from the device. Haystack’s design allows it to be flexible enough to measure aspects beyond privacy leaks and basic traffic characterization.

As we explain the paper, thanks to the data provided by our first 450 users, we could already investigate the behavior of more than 1,300 mobile apps. We have identified a surprisingly high presence of third-party services on mobile apps for tracking and advertising purposes. Likewise, we also identified interesting facts about pre-installed apps — which are typically ignored by previous research studies that rely on static and dynamic analysis. Furthermore, thanks to Haystack’s data we can explore new computing paradigms like the Internet of Things in real-world settings and IoT deployments. We invite you to take a look at the paper and let us know what you think!

If you haven’t tried Haystack yet, we also invite you to install and try our Android app. The latest version allows you to identify more than 200 analytics and ad networks and estimate how much data you’ve wasted on those activities.

We’re currently working to improve Haystack’s flow re-assembly mechanism and to provide new features. For instance, we want to adapt Haystack to perform performance measurements, allow you to filter undesired flows — as in a privacy firewall —, censorship detection and also to enable packet capture so that you can use Haystack to get pcaps on your phone.

Finally, we’re also working on adding new features to our website. In particular, we are quite excited about the release of the Haystack Analytics services.Haystack Analytics will allow anyone to obtain information about specific mobile apps and services so you learn about the behavior of a given app before installing it.

As usual, we would like to thank our users. Without their participation and help, this project would have not succeeded.

Haystack: Preliminary Results

Last week, Haystack crossed the bar of 200 installs on Google Play! We want to take this opportunity to thank our users as your collaboration is extremely valuable for us in our research efforts to illuminate the fast-evolving mobile ecosystem.

If you haven’t tried Haystack yet, we would like to invite you to install it. Haystack is a tool developed by a group of academic researchers at the International Computer Science Institute and UC Berkeley in collaboration with Stony Brook University. Haystack analyzes the traffic generated by the apps running on your phone so you can understand how they communicate with online services, and whether they leak any sensitive information about you or about your device. Haystack is available for free on Google Play.

How does Haystack work?

Your phone hosts a rich array of information about you and your activities. This includes a range of identifiers that can enable sites to track you, as well as data about your device, your installed apps, your accounts, and your location.

Sometimes apps require information to provide useful functions of the app, or to adapt content to your device. For example, a Maps application of course needs to know your exact location. But in other cases, apps may collect and upload privacy-sensitive information for advertising, analytics, and tracking purposes. We consider these to be privacy-sensitive leaks but it is up to you to decide whether you want to continue using the app.

Android uses a permission model to control how applications access sensitive resources, but it does not really help you to know which organizations collect data about you or your device. Haystack aims to fill this gap.

The ICSI Haystack app helps you to identify which apps leak information about you, where your apps connect to, which protocols they use and it finally informs you about the organizations collecting this information, even when the apps use encryption techniques like TLS/SSL (i.e., the technology providing the “S” for Secure in HTTPS). We accomplish this by a technique known as TLS Interception. If you want to disable or enable TLS interception at any time, you can easily do it on the app settings.

Haystack requires access to a number of Android permissions to capture your app’s traffic and to identify privacy leaks. If you want to know more specific details, or if you want to know how Haystack will affect your battery life or performance, you can read our paper. We tried our best to minimize Haystack’s overhead and we will continue investigating new techniques to improve its performance.

A note about Samsung devices: We have noticed that Haystack cannot intercept app traffic in many Samsung devices. In some cases, it only intercepts DNS traffic, and in others it only works when connected over WiFi. Please, accept our apologies. We are investigating the underlying problem to come with a solution but it is possible that this is a device or firmware limitation. Users of other Android VPN clients available in Google Play also reported similar problems. To minimize user’s frustration, we have not listed Samsung devices in the list of supported devices in our Google Play listing. Nevertheless, you can install and try Haystack using this direct link to the APK under your own responsibility.

Data Collection

Haystack is also a tool to research the mobile ecosystem at scale. The mobile ecosystem is fast-evolving and it is difficult to identify new players and services and how app developers integrate their services. In fact, it is also important to know how secure are data communications across all apps and whether developers are implementing countermeasures against potential attacks.

Your collaboration is very important to successfully achieve our research goals. However, in order to perform our studies, we need to collect certain pieces of information about how your apps behave. We do this without compromising your privacy by aggregating and anonymizing our traces. For instance, we want to know that an application leaked a unique identifier like the IMEI to a given server using TLS while the screen was off, but we do not want to know who you are, which apps you run, the value of the IMEI, neither your location nor your IP address. All your personal data remains on your phone!

Because of this data sanitization process, ICSI-UC Berkeley’s Committee for Protection of Human Subjects (IRB) has not considered this study as a human-subject study. If you have any questions about our data collection process, we strongly encourage you to send us an email to haystack-help“@”icsi.berkeley.edu.

Preliminary Results

Thanks to our first 200 users, we have identified 423 apps leaking information to 2,995 unique domains (many of them, owned by the same organization as in the case of Google Services). The figure below shows the most popular online services among our monitored apps: 9/12 of them belong to Google.

Online services popularity

The interactive figure below illustrates the network of apps leaking sensitive information and the organizations collecting it as seen by Haystack. The color of the edges represent whether the developer uploads securely the data with TLS by default (in blue) or if we have at least one record of an insecure flow (red). Nearly 72% of the privacy leaks happened over HTTPS/TLS. This stress the need to intercept TLS traffic locally in order to identify those leaks. We cannot access completely or partially the flows for certain apps like Mega, Uber and Twitter as they implement effective countermeasures against TLS interception.

Because of rendering and performance issues, we pruned the network. In particular, we excluded leaks caused by browser activity (e.g., Mozilla’s Firefox or Google Chrome), those caused by pre-installed Android processes, and unpopular online services reached by less than three applications (mainly apps reporting data to their own online infrastructure). To reduce the number of nodes, we also group together sub-domains.

The figure shows how applications cluster themselves around popular tracking, analytics and advertisement services, but also for social media integration as in the case of Facebook’s Graph API. Nevertheless, there is a clear power-law distribution in the number of apps using those services: 53 of the services have been reached by at least 5 apps. The most popular services in our dataset are:

  1. Facebook Graph : Facebook’s Graph allows app developers to integrate their apps with Facebook’s social graph and platform. However, Facebook’s Graph API also allows developers to monetize their apps through advertising and perform app analytics. Facebook’s services are always delivered over HTTPS and they do not seem to use specific domains for each service: all the services seem to be delivered by the domain graph.facebook.com. This characteristic makes it harder for traditional ad blockers to block this traffic as they need to perform TLS interception and analyze the traffic to identify the content of a given flow. We have identified more than 15 types of sensitive data leaks with a significance variation between apps, probably associated with developer’s needs. Nevertheless, Facebook’s official apps (e.g., Instagram, Facebook’s Pages, Facebook’s Messenger and Facebook App) upload the main bulk of sensitive information. The applications that integrate Facebook’s Graph API in their services generally report information like OS build id (which can be seen as a cookie), operator name, device model and brand with the exception of few applications reporting unique identifiers like the IMEI or even the private IP address. We have manually checked that applications using this service can operate even on devices without Facebook’s official apps installed.
  2. Google Services: This includes more than 20 domains like 1) googleapis.com (for apps interacting with Google’s API, including authentication); 2) gstatic.com, googlesyndication.com, and doubleclick.com for online advertisement, and 3) Google Analytics for Android (analytics-google.com). The native processes Android’s Backup Service and Google Play leak most of the device and user related information to Google’s services. The rest of the apps leak information like device model, brand and build ID.
  3. Flurry. Flurry is a popular analytics service owned by Yahoo. So far, we have identified more than 10 different types of data leaks associated with it, mostly device information (brand and model and build ID), and ISP information (MCC/MNC codes).
  4. Crashlytics. Crashlytics, now acquired by Twitter, is a mobile company building crash reporting for iOS and Android. 27 of the apps on devices running Haystack use this service.

We will investigate in detail the content of the protocols for each one of the users on instrumented phones under our control to do not cause any privacy violation to our users. We will release our results in this blog. Stay tuned!

Case Study: Unique Identifiers

Mobile devices contain many types of unique identifiers which are guaranteed to be unique among all the values used for those objects, users, devices or resources. Unique identifiers are extremely useful for mobile advertisement and tracking services to link data to users across all the apps using their services. They play a similar role as the cookies in the context of the browser with the advantage of being immutable. Below, we describe some of the unique identifiers stored on mobile phones:

 

  • The International Mobile Station Equipment Identity or IMEI is a unique 15 digit number that identifies every mobile phone, GSM modem or device with a built-in phone / modem. Based on this value, it is also possible to obtain some additional information about the device brand or model.
  • The International Mobile Subscriber Identity or IMSI is a 64 bit value that identifies uniquely the user of a cellular network globally. The subscriber’s phone number is another value falling in this category.
  • The Media Access Control Address or MAC Address is a unique identifier assigned to network interfaces for communications on the physical network segment (e.g., WiFi or Bluetooth).
  • As any product, mobile phones also contain unique identifiers that are assigned incrementally or sequentially by the manufacturer like serial numbers. Any app can request this information programmatically.

An app developer must request the permission READ_PHONE_STATE, considered on Android’s documentation among the list of dangerous permissions, to read device identifiers such as the IMEI, phone number and IMSI. However, there is no permission to access other unique identifiers like the serial number and the MAC address. In this case, the app developer just need to access the information provided by the getprop command, a system-managed process that stores a vast number of system properties and configurations. This is an anonymized version of a real hexadecimal serial number: [ro.serialno]: [0X8X9XX214084221] as stored by getprop. The information stored in this file has the same impact on user’s privacy as the other values in practical terms. Haystack reads and parses this file and searches for its content on user’s traffic. This allows us to investigate how applications access and leak any unique identifier, and the set of organizations collecting them and the use of encryption for them. The histogram below shows the number of apps leaking those values and if they use encryption to upload this sensitive data.

Histogram of Apps per Leak Type and Encryption

In addition to unique identifiers, applications can access other sensitive information such as the device hostname, or even the WiFi SSID through the getprop command. This can be used to geo-locate the user without requiring any further permission. As in the case of unique identifiers, the following table list of apps leaking this information and their destination as seen by Haystack. Only Adobe’s analytics service does not use encryption by default.

Identifying apps uploading those values without encryption is very important, specially in oppressive regimes. The presence of this metadata on user’s traffic make them trackeable by in-path middleboxes (some of them such as WiFi APs can use it for advertising purposes), and by any surveillance agency. Nonetheless, as a result of the popularity of ad-blockers working at the network level and the increasing user concerns about mass surveillance, the number of online trackers, advertising and analytics services using TLS is increasing. In total, 69 apps leak any type of unique identifiers which are collected by more than 150 domains. Those values allow online services for analytics and advertising to identify users across the metadata provided by all the apps using their services. The figure below shows which apps leak any unique identifier or network-related information and the organizations (domains) collecting them. As in the interactive graph, we highlight in red the applications leaking any unique identifier without encryption. An interesting case is the app PayTM, an e-commerce app for online payments that leaks the IMEI value over plain HTTP.

An interesting observation that benefits from Haystack’s crowd-sourcing nature is that application developers use third-party libraries in different ways: not all apps leak the same type of information to those online services. Finally, many apps upload this information solely to their own servers.

The use of this information can be legitimate and very useful to prevent device or identity theft, and fraud. That could apply to applications like Cerberus anti-theft. However, even if legit, it is not a good practice to request this information without user awareness by using the getprop command.

You can access the Google Play profile for each one of the apps by replacing the corresponding APP_ID (listed on the Y-axis of the Figures) in the following URL: https://play.google.com/store/apps/details?id=APP_ID.

Apps leaking the IMEI:

Apps leaking IMEI

Apps leaking the IMSI number:

Apps leaking IMSI

Apps leaking the serial number:

Apps leaking device's serial number

Apps leaking the MAC address:

Apps leaking device's MAC address

Future Work and Improvements

Haystack is still in its early stages and the results that we have presented in this post are just a first analysis of the data we have collected so far. We hope that you’ve found it interesting.

We are confident that as more users try the tool, we will be able to have a better idea of how the different stakeholders of the mobile ecosystem leverage user’s metadata while also increasing the range of leakages that we currently support.

As we can allocate resources to the project, we also plan to make our dataset publicly available so any user can search for information about a given application or online service. We also want to extend the tool to become a platform for general mobile measurement: from performance measurements to security measurements. Nevertheless, our priority is developing and maintaining a tool for the mobile user so users can stay in control of their apps, their traffic, and their privacy.

Once more, we would like to invite you to try the tool if you haven’t done it yet and thank those who have already installed the app. As usual, we love to hear your feedback as this is the best way to improve the tool. You can send us your comments by email at haystack-help “@” icsi.berkeley.edu or through our Twitter account. Likewise, we will be happy to answer any question or concern that you may have about the app or the data collection process.

Many thanks!

The ICSI Haystack Team

Welcome!

A few weeks ago, we released our Android app the ICSI Haystack in Google Play . The ICSI Haystack is a tool that allows you to identify privacy leaks associated with your installed mobile apps, where they connect to, which organizations collect personal information about you, and many other interesting facts about your mobile traffic such as use of encryption and  protocol breakdown.

The ICSI Haystack app comes from a research team at ICSIUC Berkeley in collaboration with Stony Brook University.

One of our main goals of this project is promoting user awareness about the risks associated with mobile applications while increasing their transparency. By installing Haystack, you actively contribute in our research efforts to illuminate the mobile ecosystem, the technologies used by mobile apps and their hidden behavior.  In this blog, we will report our findings but also about any interesting news regarding mobile privacy and security. Stay tuned!

If you’re curious about the technology behind Haystack, you can find many details in our paper and in this poster that we presented at UC Berkeley’s workshop “1984+31: Is Nothing Private Anymore?”

Finally, we would like to take this opportunity for a sincere Thank You to all of our users! We always welcome your feedback and suggestions at haystack-help ”@” icsi.berkeley.edu

A blog about mobile privacy, tracking services, online security and mobile apps