Mozilla Rally: trading privacy for the "public good"
A new project from Mozilla, which is meant to help researchers collect
browsing data, but only with the informed consent of the browser-user, is taking a lot of
heat, perhaps in part because the company can never seem to do anything
right, at least in the
eyes of some. Mozilla Rally was
announced
on June 25 as joint venture between the company and researchers at
Princeton University "to enable crowdsourced science for public
good
". The idea is that users can volunteer to give academic studies access to
the same kinds of browser data that is being tracked in some browsers
today. Whether the privacy safeguards are strong
enough—and if there is sufficient reason for users to sign up—remains to be seen.
Studies
The underlying theme of being able to control who is able to access your data, coupled with using it for the "public good", may well resonate with some people. The initial study that is available for participation by Rally users is "Political and COVID-19 News". It is being run by Princeton University’s Center for Information Technology Policy under the auspices of professor Jonathan Mayer, who also helped develop the Rally platform. The goals of the study are interesting and any conclusions that it draws could potentially be quite helpful for fighting the problems of misinformation on the net:
This study will help us understand how web users encounter, consume, and share news online about politics and COVID-19. There are a variety of sources for information on these topics: some authoritative and trustworthy, and some not. We hope the study can inform efforts to help users distinguish trustworthy and untrustworthy content.
As might be expected for a Mozilla project, Rally is integrated with the Firefox browser; it is an optional add-on that can easily be installed from a button on the Rally home page. Doing so brings up a list of various permissions the add-on needs in order to function, which can be reviewed in the "about:addons" page after it is installed. There is also an extensive privacy policy that must be agreed to; it outlines the kinds of data that can be collected (which includes things like geographic location, demographic information, and the hardware/software platform), how it can be used, and who it can be shared with. The policy outlines the big picture, while each individual study will further narrow down its data needs and plans.
Rally is only available to Firefox users in the US who are at least 19 years of age. Neither the age nor location requirements are enforced in any obvious way; no problem was encountered installing the add-on from a non-US location. There is a page that asks for demographic information (age, gender, race, education, income, and US zip code), but it is optional. It may be that certain studies will not use any data shared without corresponding demographic information, however.
The announcement talks about academic studies, mentioning the COVID-19 study and an upcoming "Beyond the Paywall" study in conjunction with the Stanford University Graduate School of Business. The latter sounds like it could also be useful, especially to newspapers, magazines, and internet news outlets:
It aims to better understand news consumption, what people value in news and the economics that could build a more sustainable ecosystem for newspapers in the online marketplace.
Beyond those two academic studies, the Rally team has its own "Your
Time Online and 'Doomscrolling'" study that is currently running. It is meant to better
understand "how our community browses the internet, and how these
browsing dynamics differ across segments of people
". The description
of the kinds of information that will be gathered and what might be done
with it are particularly concerning from a privacy perspective, however.
In truth, those who are highly privacy-conscious are likely to find much
that is worrisome in the descriptions of both of the current studies.
The study descriptions do try to allay some of the fears that potential volunteers might have, though. The "How We Protect You" section in the information pages of the studies is meant to clarify what is being done with the data and the protections being placed on it. The "How Rally Works" page is also geared toward reassuring the privacy-conscious user. By the sounds, Rally is taking extraordinary care; it is only collecting what it needs (and has specified), is encrypting the data from the browser all the way to an offline analysis environment, and is limiting access to the data to only those who are working on the study.
On the other hand, the Rally add-on can run in private-browsing windows, which was not apparent when it was being installed. The two current studies explicitly state that they will not collect data from private-browsing sessions, which leaves open the possibility that others will collect that data down the road. That may also be of concern to the privacy-conscious, though, of course, anyone using the private-browsing feature is pretty obviously conscious of their privacy to some extent.
In general, though, Rally is protecting its data far more carefully than the advertising networks and other user-tracking organizations are doing with their data. Many who have commented about the project, here and at other sites, seem to mistrust Mozilla's commitment to privacy and some see Rally as a mechanism for the company to generate income. Research organizations might be willing to pay for the privilege of using the platform, but it does not seem likely to be a huge income stream, especially once Rally rolls out on other browsers (as is mentioned in the FAQs and elsewhere).
Filling in details
One question might be: why would anyone want to sign up? Those who are privacy-conscious may well not be interested in allowing any access to their data, while those who are not seem rather unlikely to go out of their way to install the add-on—if they are even using Firefox at all. It essentially comes down to the whole "research for the public good" theme and whether people will care enough to forgo some of their privacy—and enough of them will be willing to install an add-on no matter their privacy inclinations—to foster it.
Over at Hacker News, Mayer has been commenting to try to answer some of the questions and concerns posted in a thread about the announcement. The "why?" question came up there, and Mayer tried to clarify what he and others are after:
The motivation is enabling crowdsourced scientific research that benefits society. [...] There are many research questions at the intersection of technology and society where conventional methods like web crawls, surveys, and social media feeds aren't sufficient. That's especially true for platform accountability research; the major platforms have generally refused to facilitate independent research that might identify problems, and platform problems often involve targeting and personalization that other methods can't meaningfully examine.
One area in particular that might be of interest to potential volunteers is in this area of "platform accountability". The large, social media platforms have often come under scrutiny for their behavior—and its effect on users—but there is no way to gather data on that except from within the browsers of users of those sites. As with many other commenters, "Yaina" lamented that the announcement did not specify the problem being solved very well. Yaina noted that the big internet companies can already do these kinds of studies, but that others are left out:
This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance. Mozilla Rally is meant to give these capabilities to everyone, and the platform is meant to ensure that you always know what you sign up for and what data is being used.If I understand the Princeton example correctly: They want to figure out how people consume and spread misinformation. Social networks like Facebook have all that data but won't share it. Now you can opt-in to a Rally study where independent researchers can examine the data.
Mayer largely agreed with that characterization, though the imbalance is more far-reaching:
The power imbalance goes far beyond science. Independent research is foundational for platform accountability. An example: when I was working on the Senate staff, before I started teaching at Princeton, a recurring challenge was the lack of rigorous independent research on platform problems. We were mostly compelled to rely on anecdotes, which made oversight and building a factual record for legislation difficult.
There is, of course, something of a self-selection bias at work among Rally users. If all of the participants have to know about the project, believe in its goals, and be willing to donate their data even though it reduces their privacy to a certain extent, they may well not reflect a cross-section of the browser-using public. Mayer addressed that issue as well:
The Rally participant population is not representative of the U.S. population—these are users who run Firefox (other browsers coming soon), choose to join Rally, and choose to join a study. In research jargon, there's significant sampling bias.For some studies, that's OK, because the research doesn't depend on a representative sample. For other studies, researchers can approximate U.S. population demographics. When a user joins Rally, they can optionally provide demographic information. Researchers can then use the demographics with reweighting, matching, subsampling, and similar methods to approximate a representative population. Those methods already appear throughout social science; whether they're sufficient also depends on the study.
Part of the difficulty in the messaging around a project like Rally is all of the moving parts it has and that different kinds of users are going to need different areas of emphasis in order to really make it clear for them. It is a project that sits at a particularly uncomfortable intersection of concerns—or the lack thereof. The lack of any real tangible benefit from joining up is problematic as well. "For the good of society" has a nice ring to it, but it is terribly difficult to quantify.
If Mozilla were a different kind of company, one could imagine it gathering this kind of information without any kind of uproar from the social-network-using folks who seem utterly unconcerned with the massive privacy invasions those kinds of sites routinely perform. But Mozilla is not that kind of organization, so it needs to convince those who do not really seem to care about privacy much to care enough to install the add-on, while not excessively irritating the more tech-savvy users who get up in arms about even the smallest loss of private data. It is a hard balance to find.
Given all that, it is a little hard to see Rally being a huge success. There are certainly perfectly reasonable concerns about gathering this kind of data, storing it, dealing with governments that want access to it, and so on. The privacy-savvy may well skip over Rally for its real or perceived shortcomings, while the vast majority of folks may either never hear of it or pay it no attention whatsoever. That is somewhat sad, perhaps, at least to those who can see value in the kinds of studies (and platform oversight) that Rally data-gathering would enable. It will be interesting to see what comes of it.
| Index entries for this article | |
|---|---|
| Security | Privacy |