Your phone hosts a rich array of information about you and your activities.
This includes a range
of identifiers, location data and even your contacts list. Often time,
apps collect such privacy-sensitive information and share it with
third parties such as ad networks and analytics services
without your consent for advertising and tracking purposes.
The Haystack Project is an academic initiative led by
independent academic researchers at ICSI--UC Berkeley and IMDEA Networks
in collaboration with UMass and Stony Brook University.
At the core of the project is the Lumen app, an Android app that
analyzes your mobile traffic and helps you to identify privacy
leakes inflicted by your apps and the organizations collecting this information.
Lumen supports TLS interception so you can identify
apps leaking privacy-sensitive information over
encrypted traffic in real-time.
Be part of a research study!
Lumen comes from a research team at
ICSI--UC Berkeley.
By installing Lumen, you actively contribute to ongoing research
efforts aiming to
improve the operational transparency of mobile technologies.
[ Lumen features ]
Easy to Use
Finding out how your apps behave in the networks and how they extract or leak your personal information
is as simple as clicking the start button and letting Lumen run!
For security purposes, Android will inform you that your traffic will be intercepted, asking you
for permission to continue. You may need to also install an additional TLS certificate to enable
intercepting TLS traffic. If you miss it during installation time, don't worry! You can
re-install it any time from the app settings.
We strongly recommend reading in its entirety the tutorial shown the first time you run the app.
Learn About Your Mobile Apps
Most likely, very soon after turning on Lumen you will quickly learn
interesting facts about the apps that you run on your phone.
You can use Lumen to understand where your apps
connect to, which data they share with third parties and even how much traffic they waste
for advertising and tracking purposes so you can
decide whether to uninstall those that strike you as too intrusive.
Not all devices provide the features required by Lumen to operate.
If after a few minutes you observe that
Lumen does not identify any privacy leaks, read our FAQ
and feel free to get in touch with us.
Detailed Reports
Apps may sometimes leak information to not only their own servers but also
to online advertising networks or other online tracking services that monetize your metadata.
Lumen aims to help you to understand many dynamics that may remain unknown for you!
Lumen analyzes your mobile traffic
and generates reports about the traffic patterns and the private data collected
by each application and online service.
Illuminating App Behavior
Nearly 70% of Android apps leak personal data to third-party services such as analytics services and ad networks. The data provided by Lumen users is used to promote app and service transparency. For instance, you can play with our interactive ICSI panopticon tool to better understand the whole mobile ecosystem and how apps use third-party online trackers. You can also contribute to our research efforts by installing and running our Lumen app!
[ Papers ]
Tracking the deployment of TLS 1.3 on the web: a story of experimentation and centralization
Ralph Holz, Jens Hiller, Johanna Amann, Abbas Razaghpanah, Thomas Jost, Narseo Vallina-Rodriguez, Oliver Hohlfeld
ACM SIGCOMM Computer Communication
Review (CCR), 2020
Transport Layer Security (TLS) 1.3 is a redesign of the Web's
most important security protocol. It was standardized in August 2018 after a four year-long, unprecedented design process
involving many cryptographers and industry stakeholders.
We use the rare opportunity to track deployment, uptake,
and use of a new mission-critical security protocol from the
early design phase until well over a year after standardization.
For a profound view, we combine and analyze data from active domain scans, passive monitoring of large networks, and
a crowd-sourcing effort on Android devices. In contrast to
TLS 1.2, where adoption took more than five years and was
prompted by severe attacks on previous versions, TLS 1.3
is deployed surprisingly speedily and without security concerns calling for it. Just 15 months after standardization, it
is used in about 20% of connections we observe. Deployment
on popular domains is at 30% and at about 10% across the
com/net/org top-level domains (TLDs). We show that the
development and fast deployment of TLS 1.3 is best understood as a story of experimentation and centralization. Very
few giant, global actors drive the development. We show
that Cloudflare alone brings deployment to sizable numbers
and describe how actors like Facebook and Google use their
control over both client and server endpoints to experiment
with the protocol and ultimately deploy it at scale. This story
cannot be captured by a single dataset alone, highlighting
the need for multi-perspective studies on Internet evolution.
The Price is (Not) Right: Comparing Privacy in Free and Paid Apps
Catherine Han, Irwin Reyes, Álvaro Feal, Joel Reardon, Primal Wijesekera, Narseo Vallina-Rodriguez, Amit Elazari, Kenneth A Bamberger, Serge Egelman
Proceedings on Privacy Enhancing
Technologies (PETS), 2020
It is commonly assumed that “free” mobile apps come at the cost of consumer privacy and that paying for apps could offer consumers protection from behavioral advertising and long-term tracking. This work empirically evaluates the validity of this assumption by comparing the privacy practices of free apps and their paid premium versions, while also gauging consumer expectations surrounding free and paid apps. We use both static and dynamic analysis to examine 5,877 pairs of free Android apps and their paid counterparts for differences in data collection practices and privacy policies between pairs. To understand user expectations for paid apps, we conducted a 998-participant online survey and found that consumers expect paid apps to have better security and privacy behaviors. However, there is no clear evidence that paying for an app will actually guarantee protection from extensive data collection in practice. Given that the free version had at least one thirdparty library or dangerous permission, respectively, we discovered that 45% of the paid versions reused all of the same third-party libraries as their free versions, and 74% of the paid versions had all of the dangerous permissions held by the free app. Likewise, our dynamic analysis revealed that 32% of the paid apps exhibit all of the same data collection and transmission behaviors as their free counterparts. Finally, we found that 40% of apps did not have a privacy policy link in the Google Play Store and that only 3.7% of the pairs that did reflected differences between the free and paid versions.
Angel or Devil? A Privacy Study of Mobile Parental Control Apps
Álvaro Feal, Paolo Calciati, Narseo Vallina-Rodriguez, Carmela Troncoso, Alessandra Gorla
Proceedings on Privacy Enhancing
Technologies (PETS), 2020
Android parental control applications are used by parents to monitor and limit their children’s mobile behaviour (e.g., mobile apps usage, web browsing, calling, and texting). In order to offer this service, parental control apps require privileged access to system resources and access to sensitive data. This may significantly reduce the dangers associated with kids’ online activities, but it raises important privacy concerns. These concerns have so far been overlooked by organizations providing recommendations regarding the use of parental control applications to the public.
We conduct the first in-depth study of the Android parental control app’s ecosystem from a privacy and regulatory point of view. We exhaustively study 46 apps from 43 developers which have a combined 20M installs in the Google Play Store. Using a combination of static and dynamic analysis we find that: these apps are on average more permissions-hungry than the top 150 apps in the Google Play Store, and tend to request more dangerous permissions with new releases; 11% of the apps transmit personal data in the clear; 34% of the apps gather and send personal information without appropriate consent; and 72% of the apps share data with third parties (including online advertising and analytics services) without mentioning their presence in their privacy policies. In summary, parental control applications lack transparency and lack compliance with regulatory requirements. This holds even for those applications recommended by European and other national security centers.
An Analysis of Pre-installed Android Software
Julien Gamba, Mohammed Rashed, Razaghpanah Abbas, Narseo Vallina-Rodriguez, Juan Tapiador
IEEE Symposium on Security and Privacy
(Oakland), 2020 [BEST PRACTICAL PAPER AWARD]
The open-source nature of the Android OS makes it possible for manufacturers to ship custom versions of the OS along with a set of pre-installed apps, often for product differentiation. Some device vendors have recently come under scrutiny for potentially invasive private data collection practices and other potentially harmful or unwanted behavior of the pre-installed apps on their devices. Yet, the landscape of pre-installed software in Android has largely remained unexplored, particularly in terms of the security and privacy implications of such customizations. In this paper, we present the first large-scale study of pre-installed software on Android devices from more than 200 vendors. Our work relies on a large dataset of real-world Android firmware acquired worldwide using crowd-sourcing methods. This allows us to answer questions related to the stakeholders involved in the supply chain, from device manufacturers and mobile network operators to third-party organizations like advertising and tracking services, and social network platforms. Our study allows us to also uncover relationships between these actors, which seem to revolve primarily around advertising and data-driven services. Overall, the supply chain around Android's open source model lacks transparency and has facilitated potentially harmful behaviors and backdoored access to sensitive data and services without user consent or awareness. We conclude the paper with recommendations to improve transparency, attribution, and accountability in the Android ecosystem.
Don’t accept candy from strangers: An analysis of third-party SDKs
Julien Gamba, Mohammed Rashed, Razaghpanah Abbas, Narseo Vallina-Rodriguez, Juan Tapiador
CPDP Book Series, 2020
Mobile app developers often include third-party Software Development Kits (SDKs) in their software to externalize services and features, or monetize their apps through advertisements. Unfortunately, these development practices often come at a privacy cost to the end user. In this paper, we discuss the privacy damage that third-party SDKs can cause to end users due to limitations present in today’s mobile permission models, and the overall lack of transparency in the ecosystem. We combine static, dynamic and manual analysis of the SDKs embedded in the top 50 Google Play store’s applications to develop a taxonomy of hird-party libraries. We also provide insights about their data collection, and transparency issues. We also discuss different ways to tackle current challenges, like increasing developer’s awareness or changing the permission model of mobile phone to clearly state the purpose of permissions and to separate permissions requested by the app itself and third-party libraries, as well as mechanisms to ease certification and regulatory enforcement efforts.
50 ways to leak your data: An exploration of apps' circumvention of the android permissions system
Joel Reardon, Álvaro Feal, Primal Wijesekera, Amit Elazari Bar On, Narseo Vallina-Rodriguez, Serge Egelman
USENIX Security Symposium, 2019
[DISTINGUISHED PAPER AWARD]
Modern smartphone platforms implement permission-based models to protect access to sensitive data and system resources. However, apps can circumvent the permission model and gain access to protected data without user consent by using both covert and side channels. Side channels present in the implementation of the permission system allow apps to access protected data and system resources without permission; whereas covert channels enable communication between two colluding apps so that one app can share its permission-protected data with another app lacking those permissions. Both pose threats to user privacy.
On the ridiculousness of notice and consent:
Contradictions in app privacy policies
Ehimare Okoyomon, Nikita Samarin, Primal Wijesekera, Amit Elazari Bar On, Narseo Vallina-Rodriguez, Irwin Reyes, Álvaro Feal, Serge Egelman
IEEE ConPro Workshop, 2019
The dominant privacy framework of the information
age relies on notions of “notice and consent.” That is, service
providers will disclose, often through privacy policies, their
data collection practices, and users can then consent to their
terms. However, it is unlikely that most users comprehend these
disclosures, which is due in no small part to ambiguous, deceptive,
and misleading statements. By comparing actual collection and
sharing practices to disclosures in privacy policies, we demonstrate the scope of the problem.
Through analysis of 68,051 apps from the Google Play Store,
their corresponding privacy policies, and observed data transmissions, we investigated the potential misrepresentations of apps
in the Designed For Families (DFF) program, inconsistencies
in disclosures regarding third-party data sharing, as well as
contradictory disclosures about secure data transmissions. We
find that of the 8,030 DFF apps (i.e., apps directed at children),
9.1% claim that their apps are not directed at children, while
30.6% claim to have no knowledge that the received data comes
from children. In addition, we observe that 10.5% of 68,051 apps
share personal identifiers with third-party service providers, yet
do not declare any in their privacy policies, and only 22.2% of the
apps explicitly name third parties. This ultimately makes it not
only difficult, but in most cases impossible, for users to establish
where their personal data is being processed. Furthermore, we
find that 9,424 apps do not use TLS when transmitting personal
identifiers, yet 28.4% of these apps claim to take measures
to secure data transfer. Ultimately, these divergences between
disclosures and actual app behaviors illustrate the ridiculousness
of the notice and consent framework.
Do you get what you pay for? Comparing the
privacy behaviors of free vs. paid apps
Catherine Han, Irwin Reyes, Amit Elazari Bar On, Joel Reardon, Álvaro Feal, Serge Egelman, Narseo Vallina-Rodriguez
IEEE ConPro Workshop, 2019
It is commonly assumed that the availability of
“free” mobile apps comes at the cost of consumer privacy, and
that paying for apps could offer consumers protection from
behavioral advertising and long-term tracking. This work empirically evaluates the validity of this assumption by investigating
the degree to which “free” apps and their paid premium versions
differ in their bundled code, their declared permissions, and their
data collection behaviors and privacy practices.
We compare pairs of free and paid apps using a combination
of static and dynamic analysis. We also examine the differences
in the privacy policies within pairs. We rely on static analysis
to determine the requested permissions and third-party SDKs
in each app; we use dynamic analysis to detect sensitive data
collected by remote services at the network traffic level; and we
compare text versions of privacy policies to identify differences in
the disclosure of data collection behaviors. In total, we analyzed
1,505 pairs of free Android apps and their paid counterparts,
with free apps randomly drawn from the Google Play Store’s
category-level top charts.
Our results show that over our corpus of free and paid pairs,
there is no clear evidence that paying for an app will guarantee
protection from extensive data collection. Specifically, 48% of the
paid versions reused all of the same third-party libraries as their
free versions, while 56% of the paid versions inherited all of
the free versions’ Android permissions to access sensitive device
resources (when considering free apps that include at least one
third-party library and request at least one Android permission).
Additionally, our dynamic analysis reveals that 38% of the paid
apps exhibit all of the same data collection and transmission
behaviors as their free counterparts. Our exploration of privacy
policies reveals that only 45% of the pairs provide a privacy
policy of some sort, and less than 1% of the pairs overall have
policies that differ between free and paid versions.
Coming of Age: A Longitudinal Study of TLS Deployment
Platon Kotzias, Abbas Razaghpanah, Johanna
Amann, Kenneth G. Paterson, Narseo Vallina-Rodriguez, and Juan Caballero
Proceedings of the ACM Internet Measurements
Conference (IMC), 2018
[DISTINGUISHED PAPER AWARD]
The Transport Layer Security (TLS) protocol is the de-facto standard
for encrypted communication on the Internet. However, it has been
plagued by a number of different attacks and security issues over
the last years. Addressing these attacks requires changes to the
protocol, to server- or client-software, or to all of them. In this
paper we conduct the first large-scale longitudinal study examining
the evolution of the TLS ecosystem over the last six years. We
place a special focus on the ecosystem’s evolution in response to
high-profile attacks.
For our analysis, we use a passive measurement dataset with
more than 319.3B connections since February 2012, and an active
dataset that contains TLS and SSL scans of the entire IPv4 address
space since August 2015. To identify the evolution of specific clients
we also create the—to our knowledge—largest TLS client fingerprint
database to date, consisting of 1,684 fingerprints.
We observe that the ecosystem has shifted significantly since
2012, with major changes in which cipher suites and TLS extensions
are offered by clients and accepted by servers having taken place.
Where possible, we correlate these with the timing of specific attacks on TLS. At the same time, our results show that while clients,
especially browsers, are quick to adopt new algorithms, they are
also slow to drop support for older ones. We also encounter significant amounts of client software that probably unwittingly offer
unsafe ciphers. We discuss these findings in the context of long tail
effects in the TLS ecosystem.
An Empirical Analysis of the Commercial VPN Ecosystem
Mohammad Taha Khan, Joe DeBlasio, Geoffrey
M. Voelker, Alex C. Snoeren, Chris Kanich, and Narseo Vallina-Rodriguez
Proceedings of the ACM Internet Measurements
Conference (IMC), 2018
Global Internet users increasingly rely on virtual private network
(VPN) services to preserve their privacy, circumvent censorship,
and access geo-filtered content. Due to their own lack of technical
sophistication and the opaque nature of VPN clients, however, the
vast majority of users have limited means to verify a given VPN
service’s claims along any of these dimensions. We design an active
measurement system to test various infrastructural and privacy aspects of VPN services and evaluate 62 commercial providers. Our
results suggest that while commercial VPN services seem, on the
whole, less likely to intercept or tamper with user traffic than other,
previously studied forms of traffic proxying, many VPNs do leak
user traffic—perhaps inadvertently—through a variety of means. We
also find that a non-trivial fraction of VPN providers transparently
proxy traffic, and many misrepresent the physical location of their
vantage points: 5–30% of the vantage points, associated with 10%
of the providers we study, appear to be hosted on servers located in
countries other than those advertised to users.
Beyond google play: A large-scale comparative study of chinese android app markets
Haoyu Wang, Zhe Liu, Jingyue Liang, Narseo Vallina-Rodriguez, Yao Guo, Li Li, Juan Tapiador, Jingcun Cao, Guoai Xu
Proceedings of the ACM Internet Measurements
Conference (IMC), 2018
China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (eg, Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (eg, Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees.
“Won’t Somebody Think of the Children?” Examining COPPA Compliance at Scale
Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari Bar On, Abbas Razaghpanah, Narseo Vallina-Rodriguez, Serge Egelman
Proceedings on Privacy Enhancing
Technologies (PETS), 2018
[CASPAR BOWDEN PETS AWARD 2020]
We present a scalable dynamic analysis framework that allows for the automatic
evaluation of the privacy behaviors of Android apps. We use our system
to analyze mobile apps’ compliance with the Children’s
Online Privacy Protection Act (COPPA), one of the few
stringent privacy laws in the U.S. Based on our automated analysis of 5,855 of the most popular free children’s apps, we found that a majority are potentially in
violation of COPPA, mainly due to their use of thirdparty SDKs. While many of these SDKs offer configuration options to respect COPPA by disabling tracking
and behavioral advertising, our data suggest that a majority of apps either do not make use of these options or incorrectly propagate them across mediation SDKs.
Worse, we observed that 19% of children’s apps collect
identifiers or other personally identifiable information
(PII) via SDKs whose terms of service outright prohibit
their use in child-directed apps. Finally, we show that
efforts by Google to limit tracking through the use of a
resettable advertising ID have had little success: of the
3,454 apps that share the resettable ID with advertisers, 66% transmit other, non-resettable, persistent identifiers as well, negating any intended privacy-preserving
properties of the advertising ID.
The Cloud that Runs the Mobile Internet: A Measurement Study of Mobile Cloud Services
Foivos Michelinakis, Hossein Doroud, Abbas Razaghpanah, Andra Lutu, Narseo Vallina-Rodriguez, Phillipa Gill, Joerg Widmer
IEEE International Conference on Computer Communications (INFOCOM), 2018
Mobile applications outsource their cloud infrastructure deployment and content delivery to cloud computing services and content delivery networks. Studying how these services, which we collectively denote Cloud Service Providers (CSPs), perform over Mobile Network Operators (MNOs) is crucial to understanding some of the performance limitations of today's mobile apps. To that end, we perform the first empirical study of the complex dynamics between applications, MNOs and CSPs. First, we use real mobile app traffic traces that we gathered through a global crowdsourcing campaign to identify the most prevalent CSPs supporting today's mobile Internet. Then, we investigate how well these services interconnect with major European MNOs at a topological level, and measure their performance over European MNO networks through a month-long measurement campaign on the MONROE mobile broadband testbed. We discover that the top 6 most prevalent CSPs are used by 85\% of apps, and observe significant differences in their performance across different MNOs due to the nature of their services, peering relationships with MNOs, and deployment strategies. We also find that CSP performance in MNOs is affected by inflated path length, roaming, and presence of middleboxes, but not influenced by the choice of DNS resolver.
Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem
Abbas Razaghpanah, Rishab Nithyanand, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Mark Allman, Christian Kreibich, Phillipa Gill
Network and Distributed System Security Symposium (NDSS), 2018
Third-party services form an integral part of the mobile ecosystem: they ease application development and enable features such as analytics, social network integration, and app monetization through ads. However, aided by the general opacity of mobile systems, such services are also largely invisible to users. This has negative consequences for user privacy as third-party services can potentially track users without their consent, even across multiple applications. Using real-world mobile traffic data gathered by the Lumen Privacy Monitor (Lumen), a privacy-enhancing app with the ability to analyze network traffic on mobile devices in user space, we present insights into the mobile advertising and tracking ecosystem and its stakeholders. In this study, we develop automated methods to detect third-party advertising and tracking services at the traffic level. Using this technique we identify 2,121 such services, of which 233 were previously unknown to other popular advertising and tracking blacklists. We then uncover the business relationships between the providers of these services and characterize them by their prevalence in the mobile and Web ecosystem. Our analysis of the privacy policies of the largest advertising and tracking service providers shows that sharing harvested data with subsidiaries and third-party affiliates is the norm. Finally, we seek to identify the services likely to be most impacted by privacy regulations such as the European General Data Protection Regulation (GDPR) and ePrivacy directives.
Bug Fixes, Improvements,... and Privacy Leaks
Jingjing Ren, Martina Lindorfer, Daniel J. Dubois, Ashwin Rao, David Choffnes and Narseo Vallina-Rodriguez
Network and Distributed System Security Symposium (NDSS), 2018
Is mobile privacy getting better or worse over time? In this paper, we address this question by studying privacy leaks from historical and current versions of 512 popular Android apps, covering 7,665 app releases over 8 years of app version history. Through automated and scripted interaction with apps and analysis of the network traffic they generate on real mobile devices, we identify how privacy changes over time for individual apps and in aggregate. We find several trends that include increased collection of personally identifiable information (PII) across app versions, slow adoption of HTTPS to secure the information sent to other parties, and a large number of third parties being able to link user activity and locations across apps. Interestingly, while privacy is getting worse in aggregate, we find that the privacy risk of individual apps varies greatly over time, and a substantial fraction of apps see little change or even improvement in privacy. Given these trends, we propose metrics for quantifying privacy risk and for providing this risk assessment proactively to help users balance the risks and benefits of installing new versions of apps.
Studying TLS Usage in Android Apps
Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan,
Johanna Amann, Phillipa Gill
ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2017
Transport Layer Security (TLS), has become the de-facto standard for secure Internet communication. When used correctly, it provides secure data transfer, but used incorrectly, it can leave users vulnerable to attacks while giving them a false sense of security. Numerous efforts have studied the adoption of TLS (and its predecessor, SSL) and its use in the desktop ecosystem, attacks, and vulnerabilities in both desktop clients and servers. However, there is a dearth of knowledge of how TLS is used in mobile platforms. In this paper we use data collected by Lumen, a mobile measurement platform, to analyze how 7,258 Android apps use TLS in the wild. We analyze and fingerprint handshake messages to characterize the TLS APIs and libraries that apps use, and also evaluate weaknesses. We see that about 84% of apps use default OS APIs for TLS. Many apps use third-party TLS libraries; in some cases they are forced to do so because of restricted Android capabilities. Our analysis shows that both approaches have limitations, and that improving TLS security in mobile is not straightforward. Apps that use their own TLS configurations may have vulnerabilities due to developer inexperience, but apps that use OS defaults are vulnerable to certain attacks if the OS is out of date, even if the apps themselves are up to date. We also study certificate verification, and see low prevalence of security measures such as certificate pinning, even among high-risk apps such as those providing financial services, though we did observe major third-party tracking and advertisement services deploying certificate pinning.
Dissecting DNS Stakeholders in Mobile Networks
Alessandro Finamore, Diego Perino, Narseo Vallina-Rodriguez, Mario Almeida, Matteo Varvello
ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2017
When using mobile apps, users ignite a complex set of network operations. Among all protocols and elements behind the scenes, Domain Name System (DNS) is an almost omnipresent component. Despite being one of the oldest Internet system, DNS still operates with semi-obscure interactions among its stakeholders, i.e., domain owners, network operators, and apps/operating system (OS) developers. The goal of this work is to understand the dynamics of DNS in mobile traffic, and quantify the role of each of its stakeholders. To do so, we use two different but complementary anonymized datasets: traffic logs provided by an European mobile operator with about 19M customers, and a second one containing traffic logs from 5,000 Lumen users, an Android traffic monitoring app. Our analysis show that 10k domains (out of 198M) are responsible for 87% of total network flows. We complement our traffic analysis with active measurements which reveal that i) TTL values for such domains are mostly short (< 1 min) despite IPs mapping changes occurring at a lower pace, and ii) DNS lookup time cost, about 10% of page load time (PLT), can potentially be reduced with optimisations, but those are rarely used in the wild.
"Is Our Children's Apps Learning?" Automatically Detecting COPPA Violations
Irwin Reyes, Primal Wiesekera, Abbas Razaghpanah, Joel Reardon, Narseo Vallina-Rodriguez, Serge Egelman and Christian Kreibich
Workshop on Technology and Consumer Protection (ConPro 2017), in conjunction with the 38th IEEE Symposium on Security and Privacy (IEEE S&P 2017), 2017
In recent years, a market of games and learning apps for children has flourished in the mobile world. Many of these often ``free'' mobile apps have access to a variety of sensitive personal information about the user, which app developers can monetize via advertising or other means. In the United States, the Children's Online Privacy Protection Act (COPPA) protects children's privacy, requiring parental consent to the use of personal information and prohibiting behavioral advertising and online tracking. In this work, we present our ongoing effort to develop a method to automatically evaluate mobile apps' COPPA compliance. Our method combines dynamic execution analysis (to track sensitive resource access at runtime) with traffic monitoring (to reveal private information leaving the device and recording with whom it gets shared, even if encrypted). We complement empirical technical observations with legal analysis of the apps' corresponding privacy policies. As a proof of concept, we scraped the Google Play store for apps distributed in categories specifically targeting users under than 13 years of age, which subjects these products to COPPA's regulations. We automated app execution on an instrumented version of the Android OS, recording the apps' access to and transmission of sensitive information. To contextualize third parties (e.g., advertising networks) with whom the apps share information, we leveraged a crowdsourced dataset collected by the Lumen Privacy Tool (formerly Haystack), an Android-based device-local traffic inspection platform. Our effort seeks to illuminate apps' compliance with COPPA and catalog the organizations that collect sensitive user information. In our preliminary results, we find several likely COPPA violations, including omission of prior consent and active sharing of persistent identifiers with third-party services for tracking and profiling of children. These results demonstrate our testbed's capability to detect different types of possible violations in the market for children's apps.
Tracking the Trackers: Towards Understanding the Mobile Advertising and Tracking Ecosystem
Narseo Vallina-Rodriguez, Srikanth Sundaresan, Abbas Razaghpanah, Rishab Nithyanand, Mark Allman, Christian Kreibich, Phillipa Gill
1st Data and Algorithm Transparency Workshop (DAT), 2016
Third-party services form an integral part of the mobile ecosystem: they allow app developers to add features such as performance analytics and social network integration, and to monetize their apps by enabling user tracking and targeted ad delivery.
At present users, researchers, and regulators all have at best limited understanding of this third-party ecosystem. In this paper we seek to shrink this gap. Using data from users of our ICSI Haystack app we gain a rich view of the mobile ecosystem: we identify and characterize domains associated with mobile advertising and user tracking, thereby taking an important step towards greater transparency. We furthermore outline our steps towards a public catalog and census of analytics services, their behavior, their personal data collection processes, and their use across mobile apps.
Haystack: In Situ Mobile Traffic Analysis in User Space
Abbas Razaghpanah, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Christian Kreibich,
Phillipa Gill, Mark Allman, Vern Paxson
arXiv, 2015
Despite our growing reliance on mobile phones for a wide range of daily tasks, their operation remains largely opaque. A number of previous studies have addressed elements of this problem in a partial fashion, trading off analytic comprehensiveness and deployment scale. We overcome the barriers to large-scale deployment (e.g., requiring rooted devices) and comprehensiveness of previous efforts by taking a novel approach that leverages the VPN API on mobile devices to design Haystack, an in-situ mobile measurement platform that operates exclusively on the device, providing full access to the device's network traffic and local context without requiring root access. We present the design of Haystack and its implementation in an Android app that we deploy via standard distribution channels. Using data collected from 450 users of the app, we exemplify the advantages of Haystack over the state of the art and demonstrate its seamless experience even under demanding conditions. We also demonstrate its utility to users and researchers in characterizing mobile traffic and privacy risks.
[ FAQ ]
What data do you collect for your research studies?
We care about your privacy.
The ICSI Haystack project is led by researchers at the University of California,
the International Computer Science Institute (ICSI) in Berkeley, USA and IMDEA Networks (Spain).
The Haystack project is sponsored by the
National Science Foundation (NSF)
and the DataTransparencyLabs (DTL).
The goal of our project is to better
understand the mobile app eco-system and its impact on user
security and privacy, while also helping individual users to understand which organizations
and apps collect personal information from their devices. By installing and running
Lumen, you will help us in our research efforts by helping us to understand
how mobile apps behave and communicate with online services. To this end, we thank you!
For our research efforts, we collect information about your apps' behavior,
the type of information the apps leak, and the organization collecting this information:
WE DO NOT COLLECT ANY SENSITIVE INFORMATION ABOUT YOU OR YOUR DEVICE.
All your personal information
remains on your phone and it is not uploaded to our servers.
You can read the full
permission list on Google Play listing (see also the question below)
and read more about our privacy policy.
We collect our data securely
over encrypted traffic to our servers; it remains completely anonymous, and conforms
with a protocol reviewed by
UC Berkeley / ICSI's Institutional Review Board (IRB).
Please do not hesitate to
contact us if you have any question or concerns.
Why does Lumen need so many permissions?
Lumen requires accessing several sensitive permissions in order to search for private data
on your app's traffic. many apps may leak your last phone calls, your text messages,
your location, and even your contacts. as a result,
Lumen requires permissions to access this information so it knows what to look for.
please do not hesitate to get in touch with us
if you have any concern.
if you want to know more about Lumen's technical details, please, read our
paper!.
how much data does Lumen take from my data plan?
nothing at all! Lumen does not generate any traffic but for uploading
the data required for our analysis which is done solely when you are
connected on WiFi. However, as it intercepts all your app's traffic,
Android's Data Usage statistics will consider other app's traffic as
generated by Lumen.
Why Haystack does not identify any leak on my phone?
First, it could be that in fact you
have a well-secured phone that does
not leak any information.
If not that, then
it is possible that your phone does not provide the basic support
necessary to run Lumen. This could arise due to Android incompatibilities stemming from
OS versions customized by your mobile operator or your handset vendor. We are aware of incompatibilities
in some Samsung devices running Samsung's proprietary KNOX API.
Unless you know this to be the case,
it's worth trying to restart your phone if you encounter any problem or check if it works
in different types of connectivities. Also, make sure
that you have installed the Certificate Authority that Lumen requires for TLS interception
as more and more applications use secure protocols. See the next
FAQ entry for furthe discussion. Please let us know if you encounter problems
so that we can investigate the issues
further, enabling us to support a wider range of devices. Your feedback is very
valuable for us!
How can I uninstall the root certificate for TLS interception?
First, some background. Some apps send privacy-sensitive data over non-encrypted channels such as HTTP.
Others use encrypted channels such as HTTPS, for which the S stands for Secure.
HTTPS connections
use a protocol known as SSL/TLS to create a secure channel between the app and
the server. Accordingly, information between the app and the server cannot be "snooped" by
any intermediate entity.
In order for Lumen to gain visibility into such encrypted traffic,
you need to install Lumen's "Certificate Authority" (CA)
certificate in your phone's "trusted store". You will see a dialogue box that will ask you to install
the CA certificate.
Please select OK and install it. You can install it from the app's GUI.
Unfortunately, Android does not allow apps to
uninstall certificates, so if you wish
to do so, you need to do this manually
through your System settings:
Settings > Security > Trusted Credentials . Here, you
select the User tab and delete all the installed ICSI certificates by clicking on them.
Get in touch with us!
Your feedback is highly valuable for enabling us to improve the app. What doesn't work on your phone?
Which aspects of the app don't you like? What do you think we can improve? Is the app difficult to
use or to understand?
Do you feel that using it affects your system performance? We want to hear from you! Below, you can find
our email, Google Plus, and Twitter contact details.