Your phone hosts a rich array of information about you and your activities.
This includes a range
of identifiers, location data and even your contacts list. Often time,
apps collect such privacy-sensitive information and share it with
third parties such as ad networks and analytics services
without your consent for advertising and tracking purposes.
The Haystack Project is an academic initiative led by
independent academic researchers at ICSI--UC Berkeley and IMDEA Networks
in collaboration with UMass and Stony Brook University.
At the core of the project is the Lumen app, an Android app that
analyzes your mobile traffic and helps you to identify privacy
leakes inflicted by your apps and the organizations collecting this information.
Lumen supports TLS interception so you can identify
apps leaking privacy-sensitive information over
encrypted traffic in real-time.
Be part of a research study!
Lumen comes from a research team at
ICSI--UC Berkeley.
By installing Lumen, you actively contribute to ongoing research
efforts aiming to
improve the operational transparency of mobile technologies.
[ Lumen features ]
Easy to Use
Finding out how your apps behave in the networks and how they extract or leak your personal information
is as simple as clicking the start button and letting Lumen run!
For security purposes, Android will inform you that your traffic will be intercepted, asking you
for permission to continue. You may need to also install an additional TLS certificate to enable
intercepting TLS traffic. If you miss it during installation time, don't worry! You can
re-install it any time from the app settings.
We strongly recommend reading in its entirety the tutorial shown the first time you run the app.
Learn About Your Mobile Apps
Most likely, very soon after turning on Lumen you will quickly learn
interesting facts about the apps that you run on your phone.
You can use Lumen to understand where your apps
connect to, which data they share with third parties and even how much traffic they waste
for advertising and tracking purposes so you can
decide whether to uninstall those that strike you as too intrusive.
Not all devices provide the features required by Lumen to operate.
If after a few minutes you observe that
Lumen does not identify any privacy leaks, read our FAQ
and feel free to get in touch with us.
Detailed Reports
Apps may sometimes leak information to not only their own servers but also
to online advertising networks or other online tracking services that monetize your metadata.
Lumen aims to help you to understand many dynamics that may remain unknown for you!
Lumen analyzes your mobile traffic
and generates reports about the traffic patterns and the private data collected
by each application and online service.
Illuminating App Behavior
Nearly 70% of Android apps leak personal data to third-party services such as analytics services and ad networks. The data provided by Lumen users is used to promote app and service transparency. For instance, you can play with our interactive ICSI panopticon tool to better understand the whole mobile ecosystem and how apps use third-party online trackers. You can also contribute to our research efforts by installing and running our Lumen app!
[ Papers ]
Beyond Google Play: A Large-Scale Comparative Study of Chinese Android App Markets
Haoyu Wang, Zhe Liu, Jingyue Liang, Narseo Vallina-Rodriguez, Yao Guo, Li Li, Juan Tapiador, Jingcun Cao, Guoai Xu
Proceedings of the ACM Internet Measurements
Conference (IMC), 2018
China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (e.g., Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (e.g., Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees.
As of today, the research community has not studied the Chinese Android ecosystem in depth. To fill this gap, we present the first large-scale comparative study that covers more than 6 million Android apps downloaded from 16 Chinese app markets and Google Play. We focus our study on catalog similarity across app stores, their features, publishing dynamics, and the prevalence of various forms of misbehavior (including the presence of fake, cloned and malicious apps). Our findings also suggest heterogeneous developer behavior across app stores, in terms of code maintenance, use of third-party services, and so forth. Overall, Chinese app markets perform substantially worse when taking active measures to protect mobile users and legit developers from deceptive and abusive actors, showing a significantly higher prevalence of malware, fake, and cloned apps than Google Play.
“Won’t Somebody Think of the Children?” Examining COPPA Compliance at Scale
Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari Bar On, Abbas Razaghpanah, Narseo Vallina-Rodriguez, Serge Egelman
Proceedings on Privacy Enhancing
Technologies (PETS), 2018
We present a scalable dynamic analysis framework that allows for the automatic
evaluation of the privacy behaviors of Android apps. We use our system
to analyze mobile apps’ compliance with the Children’s
Online Privacy Protection Act (COPPA), one of the few
stringent privacy laws in the U.S. Based on our automated analysis of 5,855 of the most popular free children’s apps, we found that a majority are potentially in
violation of COPPA, mainly due to their use of thirdparty SDKs. While many of these SDKs offer configuration options to respect COPPA by disabling tracking
and behavioral advertising, our data suggest that a majority of apps either do not make use of these options or incorrectly propagate them across mediation SDKs.
Worse, we observed that 19% of children’s apps collect
identifiers or other personally identifiable information
(PII) via SDKs whose terms of service outright prohibit
their use in child-directed apps. Finally, we show that
efforts by Google to limit tracking through the use of a
resettable advertising ID have had little success: of the
3,454 apps that share the resettable ID with advertisers, 66% transmit other, non-resettable, persistent identifiers as well, negating any intended privacy-preserving
properties of the advertising ID.
The Cloud that Runs the Mobile Internet: A Measurement Study of Mobile Cloud Services
Foivos Michelinakis, Hossein Doroud, Abbas Razaghpanah, Andra Lutu, Narseo Vallina-Rodriguez, Phillipa Gill, Joerg Widmer
IEEE International Conference on Computer Communications (INFOCOM), 2018
Mobile applications outsource their cloud infrastructure deployment and content delivery to cloud computing services and content delivery networks. Studying how these services, which we collectively denote Cloud Service Providers (CSPs), perform over Mobile Network Operators (MNOs) is crucial to understanding some of the performance limitations of today's mobile apps. To that end, we perform the first empirical study of the complex dynamics between applications, MNOs and CSPs. First, we use real mobile app traffic traces that we gathered through a global crowdsourcing campaign to identify the most prevalent CSPs supporting today's mobile Internet. Then, we investigate how well these services interconnect with major European MNOs at a topological level, and measure their performance over European MNO networks through a month-long measurement campaign on the MONROE mobile broadband testbed. We discover that the top 6 most prevalent CSPs are used by 85\% of apps, and observe significant differences in their performance across different MNOs due to the nature of their services, peering relationships with MNOs, and deployment strategies. We also find that CSP performance in MNOs is affected by inflated path length, roaming, and presence of middleboxes, but not influenced by the choice of DNS resolver.
Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem
Abbas Razaghpanah, Rishab Nithyanand, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Mark Allman, Christian Kreibich, Phillipa Gill
Network and Distributed System Security Symposium (NDSS), 2018
Third-party services form an integral part of the mobile ecosystem: they ease application development and enable features such as analytics, social network integration, and app monetization through ads. However, aided by the general opacity of mobile systems, such services are also largely invisible to users. This has negative consequences for user privacy as third-party services can potentially track users without their consent, even across multiple applications. Using real-world mobile traffic data gathered by the Lumen Privacy Monitor (Lumen), a privacy-enhancing app with the ability to analyze network traffic on mobile devices in user space, we present insights into the mobile advertising and tracking ecosystem and its stakeholders. In this study, we develop automated methods to detect third-party advertising and tracking services at the traffic level. Using this technique we identify 2,121 such services, of which 233 were previously unknown to other popular advertising and tracking blacklists. We then uncover the business relationships between the providers of these services and characterize them by their prevalence in the mobile and Web ecosystem. Our analysis of the privacy policies of the largest advertising and tracking service providers shows that sharing harvested data with subsidiaries and third-party affiliates is the norm. Finally, we seek to identify the services likely to be most impacted by privacy regulations such as the European General Data Protection Regulation (GDPR) and ePrivacy directives.
Bug Fixes, Improvements,... and Privacy Leaks
Jingjing Ren, Martina Lindorfer, Daniel J. Dubois, Ashwin Rao, David Choffnes and Narseo Vallina-Rodriguez
Network and Distributed System Security Symposium (NDSS), 2018
Is mobile privacy getting better or worse over time? In this paper, we address this question by studying privacy leaks from historical and current versions of 512 popular Android apps, covering 7,665 app releases over 8 years of app version history. Through automated and scripted interaction with apps and analysis of the network traffic they generate on real mobile devices, we identify how privacy changes over time for individual apps and in aggregate. We find several trends that include increased collection of personally identifiable information (PII) across app versions, slow adoption of HTTPS to secure the information sent to other parties, and a large number of third parties being able to link user activity and locations across apps. Interestingly, while privacy is getting worse in aggregate, we find that the privacy risk of individual apps varies greatly over time, and a substantial fraction of apps see little change or even improvement in privacy. Given these trends, we propose metrics for quantifying privacy risk and for providing this risk assessment proactively to help users balance the risks and benefits of installing new versions of apps.
Studying TLS Usage in Android Apps
Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan,
Johanna Amann, Phillipa Gill
ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2017
Transport Layer Security (TLS), has become the de-facto standard for secure Internet communication. When used correctly, it provides secure data transfer, but used incorrectly, it can leave users vulnerable to attacks while giving them a false sense of security. Numerous efforts have studied the adoption of TLS (and its predecessor, SSL) and its use in the desktop ecosystem, attacks, and vulnerabilities in both desktop clients and servers. However, there is a dearth of knowledge of how TLS is used in mobile platforms. In this paper we use data collected by Lumen, a mobile measurement platform, to analyze how 7,258 Android apps use TLS in the wild. We analyze and fingerprint handshake messages to characterize the TLS APIs and libraries that apps use, and also evaluate weaknesses. We see that about 84% of apps use default OS APIs for TLS. Many apps use third-party TLS libraries; in some cases they are forced to do so because of restricted Android capabilities. Our analysis shows that both approaches have limitations, and that improving TLS security in mobile is not straightforward. Apps that use their own TLS configurations may have vulnerabilities due to developer inexperience, but apps that use OS defaults are vulnerable to certain attacks if the OS is out of date, even if the apps themselves are up to date. We also study certificate verification, and see low prevalence of security measures such as certificate pinning, even among high-risk apps such as those providing financial services, though we did observe major third-party tracking and advertisement services deploying certificate pinning.
Dissecting DNS Stakeholders in Mobile Networks
Alessandro Finamore, Diego Perino, Narseo Vallina-Rodriguez, Mario Almeida, Matteo Varvello
ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2017
When using mobile apps, users ignite a complex set of network operations. Among all protocols and elements behind the scenes, Domain Name System (DNS) is an almost omnipresent component. Despite being one of the oldest Internet system, DNS still operates with semi-obscure interactions among its stakeholders, i.e., domain owners, network operators, and apps/operating system (OS) developers. The goal of this work is to understand the dynamics of DNS in mobile traffic, and quantify the role of each of its stakeholders. To do so, we use two different but complementary anonymized datasets: traffic logs provided by an European mobile operator with about 19M customers, and a second one containing traffic logs from 5,000 Lumen users, an Android traffic monitoring app. Our analysis show that 10k domains (out of 198M) are responsible for 87% of total network flows. We complement our traffic analysis with active measurements which reveal that i) TTL values for such domains are mostly short (< 1 min) despite IPs mapping changes occurring at a lower pace, and ii) DNS lookup time cost, about 10% of page load time (PLT), can potentially be reduced with optimisations, but those are rarely used in the wild.
"Is Our Children's Apps Learning?" Automatically Detecting COPPA Violations
Irwin Reyes, Primal Wiesekera, Abbas Razaghpanah, Joel Reardon, Narseo Vallina-Rodriguez, Serge Egelman and Christian Kreibich
Workshop on Technology and Consumer Protection (ConPro 2017), in conjunction with the 38th IEEE Symposium on Security and Privacy (IEEE S&P 2017), 2017
In recent years, a market of games and learning apps for children has flourished in the mobile world. Many of these often ``free'' mobile apps have access to a variety of sensitive personal information about the user, which app developers can monetize via advertising or other means. In the United States, the Children's Online Privacy Protection Act (COPPA) protects children's privacy, requiring parental consent to the use of personal information and prohibiting behavioral advertising and online tracking. In this work, we present our ongoing effort to develop a method to automatically evaluate mobile apps' COPPA compliance. Our method combines dynamic execution analysis (to track sensitive resource access at runtime) with traffic monitoring (to reveal private information leaving the device and recording with whom it gets shared, even if encrypted). We complement empirical technical observations with legal analysis of the apps' corresponding privacy policies. As a proof of concept, we scraped the Google Play store for apps distributed in categories specifically targeting users under than 13 years of age, which subjects these products to COPPA's regulations. We automated app execution on an instrumented version of the Android OS, recording the apps' access to and transmission of sensitive information. To contextualize third parties (e.g., advertising networks) with whom the apps share information, we leveraged a crowdsourced dataset collected by the Lumen Privacy Tool (formerly Haystack), an Android-based device-local traffic inspection platform. Our effort seeks to illuminate apps' compliance with COPPA and catalog the organizations that collect sensitive user information. In our preliminary results, we find several likely COPPA violations, including omission of prior consent and active sharing of persistent identifiers with third-party services for tracking and profiling of children. These results demonstrate our testbed's capability to detect different types of possible violations in the market for children's apps.
Tracking the Trackers: Towards Understanding the Mobile Advertising and Tracking Ecosystem
Narseo Vallina-Rodriguez, Srikanth Sundaresan, Abbas Razaghpanah, Rishab Nithyanand, Mark Allman, Christian Kreibich, Phillipa Gill
1st Data and Algorithm Transparency Workshop (DAT), 2016
Third-party services form an integral part of the mobile ecosystem: they allow app developers to add features such as performance analytics and social network integration, and to monetize their apps by enabling user tracking and targeted ad delivery.
At present users, researchers, and regulators all have at best limited understanding of this third-party ecosystem. In this paper we seek to shrink this gap. Using data from users of our ICSI Haystack app we gain a rich view of the mobile ecosystem: we identify and characterize domains associated with mobile advertising and user tracking, thereby taking an important step towards greater transparency. We furthermore outline our steps towards a public catalog and census of analytics services, their behavior, their personal data collection processes, and their use across mobile apps.
Haystack: In Situ Mobile Traffic Analysis in User Space
Abbas Razaghpanah, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Christian Kreibich,
Phillipa Gill, Mark Allman, Vern Paxson
arXiv, 2015
Despite our growing reliance on mobile phones for a wide range of daily tasks, their operation remains largely opaque. A number of previous studies have addressed elements of this problem in a partial fashion, trading off analytic comprehensiveness and deployment scale. We overcome the barriers to large-scale deployment (e.g., requiring rooted devices) and comprehensiveness of previous efforts by taking a novel approach that leverages the VPN API on mobile devices to design Haystack, an in-situ mobile measurement platform that operates exclusively on the device, providing full access to the device's network traffic and local context without requiring root access. We present the design of Haystack and its implementation in an Android app that we deploy via standard distribution channels. Using data collected from 450 users of the app, we exemplify the advantages of Haystack over the state of the art and demonstrate its seamless experience even under demanding conditions. We also demonstrate its utility to users and researchers in characterizing mobile traffic and privacy risks.
[ FAQ ]
What data do you collect for your research studies?
We care about your privacy.
The ICSI Haystack project is led by researchers at the University of California,
the International Computer Science Institute (ICSI) in Berkeley, USA and IMDEA Networks (Spain).
The Haystack project is sponsored by the
National Science Foundation (NSF)
and the DataTransparencyLabs (DTL).
The goal of our project is to better
understand the mobile app eco-system and its impact on user
security and privacy, while also helping individual users to understand which organizations
and apps collect personal information from their devices. By installing and running
Lumen, you will help us in our research efforts by helping us to understand
how mobile apps behave and communicate with online services. To this end, we thank you!
For our research efforts, we collect information about your apps' behavior,
the type of information the apps leak, and the organization collecting this information:
WE DO NOT COLLECT ANY SENSITIVE INFORMATION ABOUT YOU OR YOUR DEVICE.
All your personal information
remains on your phone and it is not uploaded to our servers.
You can read the full
permission list on Google Play listing (see also the question below)
and read more about our privacy policy.
We collect our data securely
over encrypted traffic to our servers; it remains completely anonymous, and conforms
with a protocol reviewed by
UC Berkeley / ICSI's Institutional Review Board (IRB).
Please do not hesitate to
contact us if you have any question or concerns.
Why does Lumen need so many permissions?
Lumen requires accessing several sensitive permissions in order to search for private data
on your app's traffic. many apps may leak your last phone calls, your text messages,
your location, and even your contacts. as a result,
Lumen requires permissions to access this information so it knows what to look for.
please do not hesitate to get in touch with us
if you have any concern.
if you want to know more about Lumen's technical details, please, read our
paper!.
how much data does Lumen take from my data plan?
nothing at all! Lumen does not generate any traffic but for uploading
the data required for our analysis which is done solely when you are
connected on WiFi. However, as it intercepts all your app's traffic,
Android's Data Usage statistics will consider other app's traffic as
generated by Lumen.
Why Haystack does not identify any leak on my phone?
First, it could be that in fact you
have a well-secured phone that does
not leak any information.
If not that, then
it is possible that your phone does not provide the basic support
necessary to run Lumen. This could arise due to Android incompatibilities stemming from
OS versions customized by your mobile operator or your handset vendor. We are aware of incompatibilities
in some Samsung devices running Samsung's proprietary KNOX API.
Unless you know this to be the case,
it's worth trying to restart your phone if you encounter any problem or check if it works
in different types of connectivities. Also, make sure
that you have installed the Certificate Authority that Lumen requires for TLS interception
as more and more applications use secure protocols. See the next
FAQ entry for furthe discussion. Please let us know if you encounter problems
so that we can investigate the issues
further, enabling us to support a wider range of devices. Your feedback is very
valuable for us!
How can I uninstall the root certificate for TLS interception?
First, some background. Some apps send privacy-sensitive data over non-encrypted channels such as HTTP.
Others use encrypted channels such as HTTPS, for which the S stands for Secure.
HTTPS connections
use a protocol known as SSL/TLS to create a secure channel between the app and
the server. Accordingly, information between the app and the server cannot be "snooped" by
any intermediate entity.
In order for Lumen to gain visibility into such encrypted traffic,
you need to install Lumen's "Certificate Authority" (CA)
certificate in your phone's "trusted store". You will see a dialogue box that will ask you to install
the CA certificate.
Please select OK and install it. You can install it from the app's GUI.
Unfortunately, Android does not allow apps to
uninstall certificates, so if you wish
to do so, you need to do this manually
through your System settings:
Settings > Security > Trusted Credentials . Here, you
select the User tab and delete all the installed ICSI certificates by clicking on them.
Get in touch with us!
Your feedback is highly valuable for enabling us to improve the app. What doesn't work on your phone?
Which aspects of the app don't you like? What do you think we can improve? Is the app difficult to
use or to understand?
Do you feel that using it affects your system performance? We want to hear from you! Below, you can find
our email, Google Plus, and Twitter contact details.