The Externalities of Search 2.0: The Emerging Privacy Threats when the Drive for the Perfect Search Engine meets Web 2.0
First Monday

The Externalities of Search 2.0: The Emerging Privacy Threats when the Drive for the Perfect Search Engine meets Web 2.0 by Michael Zimmer

Web search engines have emerged as ubiquitous and vital tools for the successful navigation of the growing online informational sphere. As Google puts it, the goal is to “organize the world’s information and make it universally accessible and useful” and to create the “perfect search engine” that provides only intuitive, personalized, and relevant results. Meanwhile, the so–called Web 2.0 phenomenon has blossomed based, largely, on the faith in the power of the networked masses to capture, process, and mashup one’s personal information flows in order to make them more useful, social, and meaningful. The (inevitable) combining of Google’s suite of information–seeking products with Web 2.0 infrastructures – what I call Search 2.0 – intends to capture the best of both technical systems for the touted benefit of users. By capturing the information flowing across Web 2.0, search engines can better predict users’ needs and wants, and deliver more relevant and meaningful results. While intended to enhance mobility in the online sphere, this paper argues that the drive for Search 2.0 necessarily requires the widespread monitoring and aggregation of a users’ online personal and intellectual activities, bringing with it particular externalities, such as threats to informational privacy while online.


The Drive for the Perfect Search Engine
Web 2.0 and Personal Information Flows
Search 2.0: The Perfect Search Engine Meets Web 2.0
Externalities of Search 2.0
Potential Effects of Search 2.0




The rhetoric surrounding Web 2.0 infrastructures presents certain cultural claims about media, identity, and technology. It suggests that everyone can and should use new Internet technologies to organize and share information, to interact within communities, and to express oneself. It promises to empower creativity, to democratize media production, and to celebrate the individual while also relishing the power of collaboration and social networks. Web sites such as Flickr, Wikipedia,, MySpace, and YouTube are all part of this second–generation Internet phenomenon, which has spurred a variety of new services and communities – and venture capitalist dollars. But Web 2.0 also embodies a set of unintended consequences emerging from the resultant blurring of the boundaries between Web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality.

The focus of this article is the unintended consequence of the increased flow of personal information across Web 2.0 infrastructures, and in particular, the efforts by Web search engines to crawl and aggregate this data in order to build profiles, predict intentions, and deliver personalized products and services. This drive the perfect search engine through the capture of personal information flowing across the networks – the quest for Search 2.0 – brings with it particular value externalities, such as the privacy of individuals’ online intellectual activities. This article argues that the externalities of Search 2.0 represent a new and powerful infrastructure of data surveillance – otherwise referred to as “dataveillance” (Clarke, 1988) – for the aggregation of one’s online information–seeking activities, inflaming a growing environment of discipline and social control.

This article is divided into five sections [1]. The first section describes the quest for the “perfects search engine,” with the requisite components of the “perfect reach” and the “perfect recall.” The next section introduces various quintessential Web 2.0 applications, and how they are increasingly being incorporated by search engines – either through indexing or integrating the applications themselves – to fuel the perfect search engine, resulting in what I call Search 2.0. The third section reveals two key externalities of Search 2.0, which leads to the potential effects of Search 2.0 outlined in the fourth section. Finally, the article outlines possible spaces for intervention, including the value–conscious design of future Search 2.0 platforms in order to mitigate its externalities.



The Drive for the Perfect Search Engine

Since the first search engines started to provide a way of interfacing with the content on the Web, there has been a drive for the “perfect search engine,” one that has indexed all available information and provides fast and relevant results (see Kushmerick, 1998; Andrews, 1999; Gussow, 1999; Mostafa, 2005). A perfect search engine would deliver intuitive results based on users’ past searches and general browsing history (Pitkow, et al., 2002; Teevan, et al., 2005), knowing, for example, whether a search for the keywords “Washington” and “apple” is meant to help a user locate Apple Computer stores in Washington, D.C. or nutritional information about the Washington variety of the fruit. Search engine companies have clear financial incentives for achieving the “perfect search”: receiving personalized search results might contribute to a user’s allegiance to a particular search engine service, increasing exposure to that site’s advertising partners as well as improving chances the user would use fee–based services. Similarly, search engines can charge higher advertising rates when ads are accurately placed before the eyes of users with relevant needs and interests (i.e., someone shopping for computers rather than fruit) (Hansell, 2005).

Web journalist John Battelle summarizes how such a perfect search engine might work:

Imagine the ability to ask any question and get not just an accurate answer, but your perfect answer – an answer that suits the context and intent of your question, an answer that is informed by who you are and why you might be asking. The engine providing this answer is capable of incorporating all the world’s knowledge to the task at hand – be it captured in text, video, or audio. It’s capable of discerning between straightforward requests – who was the third president of the United States? – and more nuanced ones – under what circumstances did the third president of the United States foreswear his views on slavery?

This perfect search also has perfect recall – it knows what you’ve seen, and can discern between a journey of discovery – where you want to find something new – and recovery – where you want to find something you’ve seen before. (Battelle, 2004)

To attain such an omnipresent and omniscient ideal, search engines must have both “perfect reach” in order to provide access to all available information on the Web and “perfect recall” in order to deliver personalized and relevant results that are informed by who the searcher is.

Perfect Reach

To achieve the reach necessary for the realization of Search 2.0, Web search engines amass enormous indices of the Web’s content. Expanding beyond just HTML–based Web pages, search engines providers have indexed a wide variety of media found on the Web, including images, video files, PDFs and other computer documents. For example, in 2005 Yahoo! claimed to have indexed over 20 billion items, including over 19.2 billion Web documents, 1.6 billion images, and over 50 million audio and video files (Mayer, 2005). The increasing sophistication and reach of Web crawler and indexing technology provide search engine companies the means to obtain an increasingly perfect reach, indexing an incredible diversity of content types available on the Internet and World Wide Web. In addition to expansive and diverse searchable indexes, today’s search engines also obtain a “perfect reach” by developing various tools and services to help users organize and use information in contexts not considered traditional Web searching. These include communication and social networking platforms, personal data management, financial data management, shopping and product research, computer file management, and enhanced Internet browsing.

Combining these two aspects of the perfect reach – expansive searchable indexes and diverse information organization products – the perfect search engine empowers users to search, find, and relate to nearly any all forms of information they need in their everyday lives. The reach of the perfect search engines allows users to search and access nearly all content on the Web, and also enables them to communicate, navigate, shop, and organize their lives, both online and off.

Perfect Recall

Complimenting the perfect reach of the perfect search engine is the desire of search engine providers to obtain perfect recall of each individual searcher, allowing the personalization of both services and advertising. To achieve this perfect recall, Web search engines must be able to identity and understand searchers’ intellectual wants, needs and desires when they perform information seeking tasks online. In order to discern the context and intent of a search for “Washington apple,” for example, the perfect search engine would know if the searcher has shown interest in computer products and lives in the Washington D.C. area, or whether she spends time online searching for recipes and various food items.

The primary means for search engines to obtain perfect recall is to monitor and track users’ search habits and history (see, for example, Pitkow, et al., 2002; Speretta, 2004; Teevan, et al., 2005). To gather users’ search histories, most Web search engines maintain detailed server logs recording each Web search request processed through their search engine, the pages viewed, and the results clicked (see, for example, Google, 2005a; IAC Search & Media, 2005; Yahoo!, 2006). Google, for example, records the originating IP address, cookie ID, date and time, search terms, results clicked for of the 100 million search requests processed daily (Google, 2005b).

Logging this array of enhances a search engine’s ability to reconstruct a particular user’s search activities in support of obtaining perfect recall. For example, by cross–referencing the IP address each request sent to the server along with the particular page being requested and other server log data, it is possible to find out which pages, and in which sequence, a particular IP address has visited. When asked, “Given a list of search terms, can Google produce a list of people who searched for that term, identified by IP address and/or Google cookie value?” and “Given an IP address or Google cookie value, can Google produce a list of the terms searched by the user of that IP address or cookie value?”, Google responded in the affirmative to both questions, confirming the general ability of search providers to track a particular user’s (or, at least, a particular browser or IP address) activity through such logs (Battelle, 2006a; 2006b).

The practice of collecting and retaining search query data in support of attaining “perfect recall” has not escaped controversy. In January 2006, it was revealed that, as part of the government’s effort to uphold an online pornography law, the U.S. Department of Justice had asked a federal judge to compel the Web search engine Google to turn over records on millions of its users’ search queries (Hafner and Richtel, 2006; Mintz, 2006). Google resisted, but three of its competitors, America Online (AOL), Microsoft, and Yahoo!, complied with similar government subpoenas of their search records (Hafner and Richtel, 2006). Later that year, AOL released over 20 million search queries from 658,000 of its users to the public in an attempt to support academic research on search engine query analysis (Hansell, 2006). Despite AOL’s attempts to anonymize the data, individual users remained identifiable based solely on their search histories, which included search terms matching users’ names, social security numbers, addresses, phone numbers, and other personally identifiable information (McCullagh, 2006a).

These cases brought search query retention practices into a more public light, creating anxiety among many searchers about the presence of such systematic monitoring of their online information–seeking activities (Barbaro and Zeller, 2006; Hansell, 2006; McCullagh, 2006a), and leading news organizations to investigate and report on the information search engines routinely collect from their users (Glasner, 2005; Ackerman, 2006). In turn, various advocacy groups have criticized the extent to which Web search engines are able to track and collect search queries, often with little knowledge by the users themselves (see, for example, Electronic Frontier Foundation, 2007; Privacy International, 2007), while both European and U.S. government regulators have started to investigate search engine query retention practices and policies (Associated Press, 2007; Lohr, 2007).

Yet, while public attention has recently focused on the industry practice of archiving users’ Web search queries in server logs, less attention has been paid to how search engine providers are able to monitor and aggregate activity across their growing array of products and services. Most notably, search companies like Google and Yahoo! have taken great steps to add the latest trend of Web services to their information infrastructures: Web 2.0.



Web 2.0 and Personal Information Flows

In 2004, Tim O’Reilly and Dale Dougherty of O’Reilly Media (a company known for its information technology–related books and conferences) sought to describe the common features of various Web companies that survived the “dot–com burst” of the late 1990s (O’Reilly, 2005). The companies – and their services and technologies – that survived, they argued, all had certain characteristics in common: they were collaborative, interactive, dynamic, user–centered, network–based, and data–rich. To describe this emerging trend in Web technologies and services, they coined the term “Web 2.0,” a concept that has been hailed as the “new wisdom of the Web” (Levy and Stone, 2006) and “a new cultural force based on mass collaboration” (Kelly, 2005).

While Web 2.0 has not been universally embraced – some deride it as merely a hyped–up buzzword (Boutin, 2006), “millenialist rhetoric” (Carr, 2006), and even an extension of Marxist ideology that is “inherently dangerous for the vitality of culture and the arts” (Keen, 2006) – the concept does seem to encapsulate the growing trend of user–generated and user–driven Web technologies. Popular Web sites such as Flickr, Wikipedia,, Facebook, and YouTube are all part of this second–generation Internet phenomenon, featuring user–generated content, opportunities for collaboration and harnessing collective intelligence, and relatively open platforms for anyone to participate, modify (mash–up) or share content (via RSS feeds, APIs, and the like).

Much of Web 2.0 is based upon – indeed built upon – increased personal information flows online. Inherent in Web 2.0 evangelism is an overall faith in the logic of the networked masses to be vehicle to provide meaning to your otherwise solitary existence – to give up your information to the Web, and allow various services, APIs, and communities capture, process, and mashup your information flows to make them more useful, more social, and more meaningful. For example, users of Web 2.0 are encouraged to put as much of their lives as possible online, to divulge and share their personal lives through blogs or on Live Journal, their professional development on LinkedIn, share bookmarks of favorite Web sites on, upload the music they listen to on, detail their friendships on Facebook and MySpace, share their appointments and social events on UpComing, where they are traveling on Dopplr, where they’ve connected to wi–fi on Plazer, just to name a few.

The prevalence of open flows of personal information on and across Web 2.0 platforms have prompted both general concerns over user privacy (see, for example, Barnes, 2006; George, 2006; Harris, 2006; Solove, 2007), as well as explorations into whether expectations of privacy online are shifting towards acceptance – or at least ambivalence – to the sharing of personal information in these contexts, especially among younger users (see, for example, Lenhart and Madden, 2007; Nussbaum, 2007). Often missing from these vital investigations and debates, however, is recognition of the growing integration of Web 2.0 platforms – and the personal information flows they contain – with the power of Web search engines: the emergence of Search 2.0.



Search 2.0: The Perfect Search Engine Meets Web 2.0

In their pursuit of the perfect search engine, search providers have increasingly capitalized on the growing Web 2.0 infrastructure to compliment both the reach of the search engine’s indexes, as well as the user information fueling their perfect recall. Enhancing their perfect reach, many search engines incorporate the information flows from Web 2.0 applications directly into their searchable indexes. For example, a Google search for an individual’s name routinely returns Facebook and LinkedIn profile pages, and even the minute and often personal details shared with friends through the Web 2.0 service Twitter. Taking Search 2.0 one step further, Yahoo!, through the purchase of Web 2.0 properties like Flickr and, has integrated user–generated photos and folksonomies of bookmarks directly into their search engine results (Yahoo!, 2007; Sullivan, 2008).

Yahoo!’s purchase and integration of these two popular Web 2.0 services also contributes to their ability to attain the perfect recall necessary for the perfect search engine. Recalling that search providers typically track user activity in order to personalize results and target advertising, adding various Web 2.0 technologies into their suite of products allows search provides to amass even more detailed records of user actions and interests. Requiring users to create Yahoo! accounts to use Web 2.0 services such as Flickr or UpComing, Yahoo! can add user data about their photos and social events, respectively, to their vast search history logs. Similarly, by linking Web 2.0 products, such as Orkut, Dodgeball, Picasa and YouTube, to traditional Google Accounts (see Weinberg, 2005), Google can amass much more detailed and personal information about users of these services, including their personal interests (Orkut), the places they visit (Dodgeball), the photos they share (Picasa), and the videos they enjoy (YouTube). In short, Search 2.0 empowers search providers to capture the personal information flows inherent in Web 2.0 applications and link them to users’ other search activities, resulting in the ability to amass detailed and comprehensive records of users online activities.



Externalities of Search 2.0

In their effort to achieve the perfect search engine, search providers such as Google and Yahoo! have captured many of the personal information flows inherent within the new Web 2.0 infrastructures within their searchable indexes, as well as integrating Web 2.0 platforms directly into their suite of products. The result is Search 2.0, a powerful Web search information infrastructure that promises to provide more extensive and relevant search results and information management services to users. But not without a price. Inherent in the Search 2.0 infrastructure are two key externalities: one, the deterioration of what I call “privacy via obscurity” of one’s personal information online; and two, the concentrated surveillance, capture, and aggregation of one’s online intellectual and social activities by a single provider.

Lack of “Privacy via Obscurity”

The notion of “Googling” someone has become common practice. People use search engines to learn about prospective blind dates (Lobron, 2006). Almost one in four Web users have searched online for information about co–workers or business contacts (Sharma, 2004), and employers are Googling prospective employees before making hiring decisions (Weiss, 2006). Through the powerful reach of search engines, obscure pieces of personal information – such as court records in the archives of a county government building, e–mail messages sent a decade ago to a now–defunct discussion forum, or a newsletter from an obscure social club – are increasingly retrievable by a simple keyword search. As a result, any “privacy via obscurity” that generally kept such information from public view has been diminished.

The personal information flows normally relegated to particular Web 2.0 platforms have similarly become broadly accessible via search engine’s desire to expand their reach by including these flows in their search able indexes. Bits of personal information previously thought to exist merely on relatively obscure Web 2.0 platforms such as Twitter or Plazes, or even the early Facebook [2], are now increasingly available to anyone searching through Google or Yahoo!. As a result, the playful or investigative searching done by potential dates or employers can now reveal much more personal insights. The consequences can be significant: job applicants have lost offers due to postings on social networking sites (Lewis, 2006), others have lost existing jobs (Czekaj, 2007), and social networking sites have been used for dozens of criminal and other police investigations. By integrating the information flows from disparate – and often obscure – Web 2.0 services into the indexes of popular search engines, any notion of “privacy via obscurity” is diminished, and the availability of these personal information flows for disciplinary or discriminatory activity increases.

Concentrated Surveillance of Online Activities

While the potential harms that emerge once Web 2.0–related personal data streams are indexed and searchable within the major Web search engines are significant, they are matched – if not exceeded – by the externalities of the integration of Web 2.0 applications within search company’s suite of products. By offering their users Web 2.0 services, search providers are increasingly able to track users’ social and intellectual activities across these innovative services, adding the personal information flows within Web 2.0 to the stores of information can leverage for personalized services and advertising. This represents a significant shift in the norms of personal information flow online. Previously, a person’s social and intellectual activities were distributed across multiple Web 2.0 applications scattered across the Web. But with the drive towards Search 2.0, single entities, such as Google or Yahoo!, have the means of monitoring, collecting and aggregating an increasing amount of one’s online social and intellectual activities. Search 2.0’s ability to collect and aggregate a wide array of personal and intellectual information about its users now extends beyond just what website a user searches for (the original goal of the “perfect recall”) to potentially include detailed demographic and profile information on linked social networking sites, the friends in one’s social networks, the photos shared (and the tags used to describe them), the various websites bookmarked (and, again, the descriptive tags), the RSS feeds subscribed to, and so on.



Potential Effects of Search 2.0

In their quest for Search 2.0, Web search engines have gained the ability to track, capture, and aggregate a wealth of personal information stemming from the increased flow of personal information made available by growing use and reliance on Web 2.0–based applications. The full effects and consequences of the emerging Search 2.0 infrastructure are difficult to predict, but potentially include the exercise of disciplinary power against users, the panoptic sorting of users, and the general invisibility and inescapability of Search 2.0’s impact on users’ online activities.

Disciplinary Power

Clive Norris warns of how infrastructures of dataveillance could be used to “[render] visualization meaningful for the basis of disciplinary social control” [3]. Instances of how users of Search 2.0 were made visible for the exercise of disciplinary power include a court ordering Google to provide the complete contents of a user’s Gmail account, including e–mail messages he thought were deleted (McCullagh, 2006b) and the introduction of evidence that a suspected murderer performed a Google search for the words “neck snap break” (Cohen, 2005), the Brazilian government asking Google to release data on users of its Orkut social networking site to help authorities investigate potential use of the site for illegal activities (Downie, 2006), or Yahoo! providing e–mail and other account data to Chinese officials, resulting in the jailing of dissidents within that country (Olesen, 2005; Schonfeld, 2006). The possibility of search providers providing detailed Search 2.0 data to government bodies for disciplinary action has reached new heights within the United States with the passage of the USA PATRIOT Act, greatly expanding the ability of law enforcement to access such records, while restricting the source of the records from disclosing any such request has even been made [4]. Given the recent discovery of the National Security Agency having direct access to citizens’ telecommunication activities (Singel, 2006), fears that the personal information flows inherent in Search 2.0 could similarly fall into government hands become all too real.

Panoptic Sorting

Search 2.0’s infrastructure of dataveillance also spawns instances of “panoptic sorting” where users of search engines are identified, assessed and classified “to coordinate and control their access to the goods and services that define life in the modern capitalist economy” [5]. Google, like most for–profit search engine providers, is financially motivated collect as much information as possible about each user: receiving personalized search results might contribute to a user’s allegiance to a particular search engine service, increasing exposure to that site’s advertising partners as well as improving chances the user would use fee–based services. Similarly, search engines can charge higher advertising rates when ads are accurately placed before the eyes of users with relevant needs and interests (Hansell, 2005). Through the panoptic gaze of its diverse suite of products – fueled by the growing Web 2.0 portion of their offerings – search providers capture as much information as possible about an individual’s behavior, and considers it to be potentially useful in the profiling and categorization of a user’s potential economic value: recognizing that targeted advertising will be the “growth engine of Google for a very long time”, Google CEO Eric Schmidt stressed the importance of collecting user information, acknowledging that “Google knows a lot about the person surfing, especially if they have used personal search or logged into a service such as Gmail” (Miller, 2006). Beyond Gmail, the personal information flows gleaned from search providers’ Web 2.0 offerings fuel a more detailed panoptic sorting if their users.

Invisibility and Allure of Search 2.0

Perhaps the most potent externality of Search 2.0 stems from its relative invisibility, indispensability, and apparent inescapability. The majority of Web searchers are not aware that search engines have the ability to actively track users’ search behavior [6], and as they continue to expand their information infrastructure to include a variety of Web 2.0 services, it becomes arduous for everyday users to recognize the data collection threats of these services, and easier to take the design of these services merely “at interface value” [7]. Greg Elmer warns of the dangers of such an environment where the collection of personal information is a prerequisite of participation inevitably entrenches power in the hands of the technology designers:

Ultimately, what both requesting and requiring personal information highlight is the centrality of producing, updating, and deploying consumer profiles – simulations or pictures of consumer likes, dislikes, and behaviors that are automated within the process of consuming goods, services, or media and that increasingly anticipate our future needs and wants based on our aggregated past choices and behaviors. And although Foucault warns of the self–disciplinary model of punishment in panoptic surveillance, computer profiling, conversely, oscillates between seemingly rewarding participation and punishing attempts to elect not to divulge personal information. [8]

This blurring of punishments and rewards – subtle requests and not so subtle commands for personal information – is often repeated in the interfaces for various Search 2.0 products, where the default settings and arrangement of services make the collection of personal information automatic and difficult to resist. Give the rising ubiquity of Web 2.0 services – and search providers’ attempts to bring such services into their own product suites – many users appear willing to embrace Search 2.0 with only scant hesitation. Commenting on Google’s collection of user data, one user has stated, “I don’t know if I want all my personal information saved on this massive server in Mountain View, but it is so much of an improvement on how life was before, I can’t help it” (Williams, 2006). Search 2.0 places users under an almost invisible gaze, resulting in a kind of anticipatory conformity, whereby the divulgence of personal information become both routinized and internalized.




In conclusion, by amassing a tantalizing collection of, admittedly, innovative and useful Web 2.0 tools, the un quest to achieve Search 2.0 has resulted in the emergence of a robust infrastructure of dataveillance that can quickly be internalized and become the basis of disciplinary social control. Roger Clarke provides a prescient warning about the effects of dataveillance on the individual:

[The] real impact of dataveillance is the reduction in the meaningfulness of individual actions, and hence in self–reliance and self–responsibility. Although this may be efficient and even fair, it involves a change in mankind’s image of itself, and risks sullen acceptance by the masses and stultification of the independent spirit needed to meet the challenges of the future. …In general, mass dataveillance tends to subvert individualism and the meaningfulness of human decisions and actions. [9]

Thus a kind of Faustian bargain emerges: Search 2.0 promises breadth, depth, efficiency, and relevancy, but enables the widespread collection of personal and intellectual information in the name of its perfect recall. If left unchecked, potential cost of this bargain is nothing less than the “individualism and the meaningfulness of human decisions and actions.”

What options exist for renegotiating our Faustian bargain with Search 2.0? One avenue for changing the terms of the Faustian bargain is to enact laws to regulate the capture and use of personal information by Web search engines. A recent gathering of leading legal scholars and industry lawyers to discuss the possibility of regulating search engines revealed, however, that viable and constitutional solutions are difficult to conceive, let alone agree upon [10]. Alternatively, the search engine industry could self–regulate, creating strict policies regarding the capture, aggregation, and use of personal data via their services. But as Chris Hoofnagle reminds us, “We now have ten years of experience with privacy self–regulation online, and the evidence points to a sustained failure of business to provide reasonable privacy protections” [11]. Given search engine companies’ economic interests in capturing user information for powering Search 2.0, relying solely on self–regulation will likely be unsatisfying.

A third option is to affect the design of the technology itself. As Larry Lessig notes, “how a system is designed will affect the freedoms and control the system enables” [12], I argue that technological design is one of the critical junctures for society to re–negotiate its Faustian bargain with Search 2.0 in order to preserve a sense of “individualism and the meaningfulness of human decisions and actions.” [13] Potential design variables include whether default settings for new products or services automatically enroll users in data–collecting processes – or whether the process can be turned off. Or the extent to which different products should be interconnected: For example, if a user signs up to use Gmail, should the Personalized Search automatically be activated? If a user logs into Flickr, should the user automatically be logged in to other services? Ideally, new tools and interfaces can be developed to give users access and control over the personal information collected: In the spirit of the Code of Fair Information Practices, search providers should allow users to view all their personal data collected, make changes and deletions, restrict how it is used, and so on [14].

In a speech to recent information school graduates, Tim O’Reilly warned of the dangers of companies gaining control over the information flows inherent in Web 2.0:

If history is any guide, the democratization promised by Web 2.0 will eventually be succeeded by new monopolies, just as the democratization promised by the personal computer led to an industry dominated by only a few companies. Those companies will have enormous power over our lives – and may use it for good or ill. (O’Reilly, 2006)

As the personal information flows of Web 2.0 become incorporated into the power of Search 2.0, and increasingly focused into the hands of only a few major search providers, this potential for “good or ill” increases exponentially. We, as scholars, activists, and users, must work to re–negotiate the Faustian bargain and mitigate the potential externalities of Search 2.0. End of article


About the author

Michael Zimmer, PhD, is the Microsoft Resident Fellow at the Information Society Project at Yale Law School. He received his PhD in the Department of Media, Culture, and Communication at New York University under the guidance of Profs. Helen Nissenbaum, Alex Galloway, and Siva Vaidhyanathan. He frequently writes about the social, political, and ethical dimensions of information and communication technologies at



1. Portions of this article appear in Michael Zimmer, 2008. “The gaze of the Perfect Search Engine: Google as an infrastructure of dataveillance,” In: Amanda Spink and Michael Zimmer (editors). Web search: Multidisciplinary perspectives. Berlin: Springer.

2. Originally, Facebook was exclusive to Harvard University students, and later expanded to include other universities, requiring an “.edu” e–mail address to gain access to the platform. Eventually, this requirement was dropped, and Facebook was opened to any Web user.

3. Norris, 2003, p. 251.

4. See Battelle, 2005, pp. 197–204.

5. Gandy, 1993, p. 15.

6. Fallows, 2005, p. 21; Kopytoff, 2006.

7. Turkle, 1995, p. 103.

8. Elmer, 2004, pp. 5–6.

9. Clarke, 1988, p. 508.

10. See “Regulating Search: A Symposium on Search Engines, Law, and Public Policy“ held in December 2005 at the Yale Law School.

11. Hoofnagle, 2005, p. 1.

12. Lessig, 2001, p. 35.

13. Various pragmatic frameworks have recently emerged to broaden the criteria for judging the quality of technological systems to include the advancement of ethical and human values, and to proactively influence the design of technologies to account for such values during the conception and design process. These include Design for Values (Camp, n.d.), Values at Play (Flanagan, et al., in press), and Value Sensitive Design (Friedman, et al., 2002).

14.’s recent launch of AskEraser, allowing users to instruct Ask not to purge their records and not collect any information, is a step in this direction.



Elise Ackerman, 2006. “What do Google, Yahoo, AOL and Microsoft’s MSN know about you?” San Jose Mercury News (19 August); see also

Paul Andrews, 1999. “The Search for the Perfect Search Engine,” Seattle Times (7 February), p. E1.

Associated Press, 2007. “EU data privacy officers launch investigation into Google’s Internet search engine,” International Herald Tribune (25 May), at, accessed 28 July 2007.

Michael Barbaro and Tom Zeller, 2006. “A face is exposed for AOL searcher no. 4417749,” New York Times (9 August), p. A1, and at, accessed 28 February 2008.

Susan Barnes, 2006. “A privacy paradox: Social networking in the United States,” First Monday, volume 11, number 9 (September), at, accessed 12 October 2007.

John Battelle, 2006a. “More on what Google (and probably a lot of others) know,” Searchblog (30 January), at, accessed 16 May 2006.

John Battelle, 2006b. “What info does Google keep?” Searchblog (27 January), at, accessed 16 May 2006.

John Battelle, 2005. The search: How Google and its rivals rewrote the rules of business and transformed our culture. New York: Portfolio.

John Battelle, 2004. “Perfect search,” Searchblog (8 September), at, accessed 16 May 2006.

Paul Boutin, 2006. “Web 2.0: The new Internet “boom” doesn’t live up to its name,” Slate (29 March), at, accessed 12 January 2007.

Nicholas Carr, 2006. “The amorality of Web 2.0,” Rough Type (3 October), at, accessed 14 January 2007.

Roger Clarke, 1988. “Information technology and dataveillance,” Communications of the ACM, volume 37, number 5, pp. 498–512.

Adam Cohen, 2005. “What Google should roll out next: A privacy upgrade,” New York Times (28 November), p. A18, and at, accessed 28 February 2008.

Laura Czekaj, 2007, “Workers fired over Internet postings,” Ottawa Sun (17 January), at, accessed 12 February 2007.

Andre Downie, 2006. “Google carves a middle path on privacy,” Christian Science Monitor (September 8), p. 1, and at, accessed 28 February 2008.

Electronic Frontier Foundation, 2007. “Privacy and search engines,” at, accessed 28 July 2007.

Greg Elmer, 2004. Profiling machines: Mapping the personal information economy. Cambridge, Mass.: MIT Press.

Deborah Fallows, 2005. “Search engine users: Internet searchers are confident, satisfied and trusting – But they are also unaware and naïve,” Pew Internet & American Life Project, at, accessed 15 October 2005.

Oscar H. Gandy, 1993. The panoptic sort: A political economy of personal information. Boulder, Colo.: Westview.

A. George, 2006. “Things you wouldn’t tell your mother,” New Scientist, volume 191, number 2569 (16 September), pp. 50–51.

Joanna Glasner, 2005. “What search sites know about you,” Wired News (5 April), at,1848,67062,00.html, accessed 2 August 2006.

Google, 2005a. “Google privacy FAQ,” at, accessed 3 May 2006.

Google, 2005b. “Google privacy policy” (14 October), at, accessed 3 May 2006.

Dave Gussow, 1999. “In search of ...,” St. Petersburg Times (4 October), p. 13, and at, accessed 28 February 2008.

Katie Hafner and Matt Richtel, 2006. “Google resists U.S. subpoena of search data,” New York Times (20 January), pp. A1, C4, and at, accessed 28 February 2008.

Saul Hansell, 2006. “AOL removes search data on vast group of Web users,” New York Times (8 August), p. C4.

Saul Hansell, 2005. “Microsoft plans to sell search ads of its own,” New York Times (26 September), pp. C1, C8, and at, accessed 28 February 2008.

Will Harris, 2006. “Why Web 2.0 will end your privacy,” bit– (3 June), at, accessed 15 October 2007.

Chris Hoofnagle, 2005. “Privacy self regulation: A decade of disappointment,” Electronic Privacy Information Center (4 March), at, accessed 18 April 2007.

IAC Search & Media, 2005. “Privacy policy for,” (13 July), at, accessed 6 January 2007.

Andrew Keen, 2006. “Web 2.0: The second generation of the Internet has arrived. It’s worse than you think,” Daily Standard (15 February), at, accessed 14 January 2007.

Kevin Kelly, 2005. “We are the Web,” (August), at, accessed 2 July 2007.

Verne Kopytoff, 2006. “Most Web users say Google should keep data private,” San Francisco Chronicle (24 January), p. C3, and at, accessed 28 February 2008.

Nicholas Kushmerick, 1998. “The search engineers,” Irish Times (23 February), p. 10.

Amanda Lenhart and Mary Madden, 2007. “Teens, privacy & online social networks,” Pew Internet & American Life Project (18 April), at, accessed 20 April 2007.

Lawrence Lessig, 2001. The future of ideas: The fate of the commons in a connected world. New York: Random House.

Steven Levy and Brad Stone, 2006. “The new wisdom of the Web,” Newsweek (3 April), at, accessed 2 July 2007.

Diane Lewis, 2006. “Job applicants’ online musings get hard look” (30 March), at, accessed 30 March 2006.

Alison Lobron, 2006. “Googling your Friday–night date may or may not be snooping, but it won’t let you peek inside any souls,” Boston Globe Magazine (5 February), p. 42.

Steve Lohr, 2007. “Google deal said to bring U.S. scrutiny,” New York Times (29 May), at, accessed 27 July 2007.

Tim Mayer, 2005. “Our blog is growing ip – and so has our index,” Yahoo! Search Blog (8 August), at, accessed 25 November 2006.

Declan McCullagh, 2006a. “AOL’s disturbing glimpse into users’ lives,” CNET (7 August), at, accessed 3 December 2006.

Declan McCullagh, 2006b. “Police blotter: Judge orders Gmail disclosure,” (17 March), at, accessed 20 June 2006.

Michael Miller, 2006. “Google’s Schmidt clears the air,” (17 March), at,1895,1939257,00.asp, accessed 17 March 2006.

Howard Mintz, 2006. “Feds after Google data: Records sought in U.S. quest to revive porn law,” San Jose Mercury News (16 January), at, accessed 19 January 2006.

Javed Mostafa, 2005. “Seeking better Web searches,” Scientific (24 January), at, accessed 30 January 2005.

Clive Norris, 2003. “From personal to digital: CCTV, the panopticon, and the technological mediation of suspicion and social control,” In: David Lyon (editor). Surveillance as social sorting: Privacy, risk, and digital discrimination. London: Routledge, pp. 249–281.

Emily Nussbaum, 2007. “Kids, the Internet, and the end of privacy,” New York Magazine (February), at, accessed 13 February 2007.

Tim O’Reilly, 2006. “My commencement speech at SIMS,” O’Reilly Radar (14 May), at, accessed 3 September 2007.

Tim O’Reilly, 2005. “What is Web 2.0?” (30 September), at, accessed 13 June 2007.

Alexa Olesen, 2005. “Rights group says Yahoo Helped China jail journalist,” USA Today (6 September), at, accessed 13 March 2007.

James Pitkow, Hinrich Schütze, Todd Cass,Rob Cooley, Don Turnbull, Andy Edmonds, Eytan Adar, and Thomas Breuel, 2002. “Personalized Search,” Communications of the ACM, volume 45, number 9 (September), pp. 50–55.

Privacy International, 2007. “A race to the bottom: Privacy ranking of Internet service companies” (9 June), at, accessed 10 July 2007.

Erick Schonfeld, 2006. “Analysis: Yahoo’s China problem,” CNN/Money (8 February), at, accessed 13 March 2007.

Dinesh Sharma, 2004. “Is your boss Googling you?” CNET (21 October), at, accessed 6 January 2007.

Ryan Singel, 2006. “At&T sued over NSA eavesdropping,” Wired (31 January), at, accessed 12 September 2007.

Daniel J. Solove, 2007. The future of reputation: Gossip, rumor, and privacy on the Internet. New Haven: Yale University Press.

Mirco Speretta, 2004. “Personalizing search based on user search histories,” Unpublished Master’s thesis, University of Kansas.

Danny Sullivan, 2008. “Yahoo tests Delicious integration in search results,” Search Engine Land (21 January), at, accessed 21 January 2008.

Jamie Teevan, Susan T. Dumais, and Eric Horvitz, 2005. “Personalizing search via automated analysis of interests and activities.” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–456.

Sherry Turkle, 1995. Life on the screen: Identity in the age of the Internet. New York: Simon & Schuster.

Nathan Weinberg, 2005. “Google unifying logins,” Inside Google (11 September), at, accessed 20 August 2006.

Piper Weiss, 2006. “What a tangled Web we weave: Being Googled can jeopardize your job search,” New York Daily News (19 March), accessed 7 January 2007.

Alex Williams, 2006. “Planet Google wants you,” New York Times (15 October), at, accessed 28 February 2007.

Yahoo!, 2007. “Flickr–izing image search,” Yahoo! Search Blog (26 June), at, accessed 10 September 2007.

Yahoo!, 2006. “Yahoo! privacy policy” (11 November), at, accessed 6 January 2007.


Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

The Externalities of Search 2.0: The Emerging Privacy Threats when the Drive for the Perfect Search Engine meets Web 2.0
by Michael Zimmer
First Monday, Volume 13, Number 3 - 3 March 2008

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2015.