Propagation of unintentionally shared information and online tracking
First Monday

Propagation of unintentionally shared information and online tracking by Rath Kanha Sar and Yeslam Al-Saggaf



Abstract
Information is shared online while users browse on the Internet. This information is being leaked from first party or visited sites to third party sites (such as advertisers) in a number of ways, including in HTTP headers. In this paper, we analysed HTTP headers resulting from browsing activities and reported on the types of information being leaked or shared, and to whom. We observed that within just a single browsing session among some social network sites as well as non–social networking sites, identifiable and non–identifiable information was leaked or shared to various third party sites and propagated to more than just one level of third party sites. In addition, we also discovered that sites such as Facebook, Twitter, and Google Plus are able to track browsing activities not only within their sites but beyond their boundaries, particularly among Web sites that embed widgets, such as Facebook’s Like button, Twitter’s Tweet button, and Google’s Plus One button.

Contents

1. Introduction
2. Methods and data collection
3. Findings
4. Discussion
5. Conclusion

 


 

1. Introduction

Concerns about the impact that technologies have on informational privacy arise when different types of information about individuals can be rapidly collected, in large volumes, without users being aware of the collection, or how their information will be used, or the duration for which information will be kept (Tavani, 2011; 1999). Large volumes of information are available in topical databases or diffused online. We categorise information diffused online into two types: the “intentionally” and the “unintentionally” shared information.

The term “intentionally shared information” refers to information that is voluntarily supplied by users to an intended recipient group. For example, people share their opinions in Web forums and personal blogs or on social network sites (SNSs) with an expectation or awareness that their posts could be seen by other online users. SNSs could be the biggest space where large volumes of personal information are being voluntarily shared by SNS users owing to the fact that there are at least one billion users active on SNS like Facebook (ABC News, 2012), which accounts for one seventh of the world’s population. SNS privacy settings, for example, are seen to be one of the many factors that make the users’ information available to the public beyond their expected audience (Gross and Acquisti, 2005; Krishnamurthy and Wills, 2008). Some users do not change their privacy settings because they are not aware that they can do it, or because they do not have the technical knowledge to do it (boyd and Hargittai, 2010), while others are aware of the privacy settings and do care about limiting access to their profiles (Strater and Lipford, 2008; Young, 2009; Madden, 2012). Users may open themselves to risks such as embarrassment, stalking, identity theft, phishing attacks, or scamming, which could harm them physically and mentally (Gross and Acquisti, 2005; boyd and Heer, 2006).

On the other hand, the term “unintentionally shared information” refers to information about a user which can be gathered or leaked without the user intending to reveal it, or even being aware that the information is being shared. This could be information about online browsing activities. For example, John is searching for allergy tablets and nasal spray on a pharmacy Web site. He might not be aware that his searches can be collected by not only the pharmacy site, which is the first party site, but also by various third party sites such as advertisers, and he did not intend to disclose these types of information to those third party sites. The term first party sites refer to sites that are directly or intentionally accessed by users, whereas third party sites are those which are not directly requested by users. Third party sites can be data aggregators or advertisers who display targeted advertising content on the first party Web pages. Users’ online movements or browsing behaviours can be tracked or recorded by various technologies including HTTP cookies (Angwin, 2010; Tene and Polonetsky, 2012). HTTP cookies are seen to be troublesome for privacy because they allow users’ movements to be tracked from site to site.

HTTP, or the Hypertext Transfer Protocol, is used to communicate between the browser, any intermediate machines, and Web servers (Comer, 2000). Since HTTP is a stateless protocol, each request or response is treated independently. So in order to remember the state, it uses a small text file called an HTTP cookie, which is stored by the browser at the user’s machine (Kristol, 2001). Cookies were first developed in 1994 for the purpose of assisting users in online shopping by serving as a virtual shopping trolley (Hormozi, 2005). Cookies were not intended to be used as spying mechanisms, but rather for the purpose of informing the Web server that the same user has returned. HTTP cookies are not programs, nor can they be accessed at anytime by the Web site; however, they can be used by marketers and Web developers to collect personal information and to track users’ visits and browsing habits.

Previous studies have shed light on the current practice of information leakage or gathering from different perspectives, including the privacy settings of SNSs (Gross and Acquisti, 2005; Krishnamurthy and Wills, 2008), via flash cookies (Soltani, et al., 2009; Ayenson, et al., 2011) and via HTTP headers (Krishnamurthy, et al., 2011, 2007; Krishnamurthy and Wills, 2010a, 2010b, 2009, 2008, 2006; Soltani, et al., 2009; Mayer, 2011). Those studies have investigated large numbers of sites, including social network sites (SNSs) and other sites that are not SNS (or non–SNSs). Their results, however, do not really reflect the browsing habits of users in real life, because users tend to have a combination of browsing among both SNSs and non–SNSs (Purcell, 2011). They are not likely to browse large numbers of SNSs alone (e.g., 10 SNSs at a time), and some of them may have their browsers in a private browsing mode (Bursztein, 2012) which means they regularly clear search histories and cookies from their browsers, and they are likely to have a new online browsing session every time they close and restart their browser. In addition, it is important to examine the information sharing among both SNSs and non–SNSs because while the information leaked from SNSs tends to be more personal and identifiable to a specific user (e.g., name, e–mail address and post code) (Krishnamurthy and Wills, 2008; Soltani, et al., 2009), the combination of this identifiable information and the browsing behaviours among non–SNSs could reveal so much about a person’s life.

The question remains, what are the ramifications of information leakages from one browsing session among both SNSs and non–SNSs? Based on the literature and the rationale above, we intend to investigate the ramifications of the unintentionally shared information that is being diffused in the HTTP headers from the perspective of a user who usually browses both SNSs and non–SNSs while also regularly clearing search histories and cookies. We examine the HTTP headers resulting from the first author’s browsing activities while also reporting on the types of shared information, and to whom they are being shared.

This study is similar in some ways to those by Krishnamurthy and Wills (2010a, 2010b), Mayer (2011), and Krishnamurthy, et al. (2011), which investigated information leakages in HTTP headers; it is also similar to studies by Dwyer (2009) and Wongyai and Charoeunwatana (2012), which employed a small case study using Wireshark as a tool. However, this study differs from these others in the choice of number and categories of sites and the choice of online activities examined. This study did not investigate large numbers of either SNSs or non–SNSs alone (Krishnamurthy and Wills, 2010a, 2010b; Mayer, 2011; Krishnamurthy, et al., 2011); nor did it focus on only one organisation (Dwyer, 2009) or site (Wongyai and Charoeunwatana, 2012). Rather, this study examined a small number of sites frequently visited by most people while employing different sets of browsing activities and Google search trends common among most people (see justification in the next section). Several browsing sessions were conducted and each browsing session consisted of a set of browsing activities as described in next section.

 

++++++++++

2. Methods and data collection

2.1. Online activities, sites, and online search trends for study

First we decided on the set of online activities to be performed, based on a report by Pew Internet (Purcell, 2011). We then selected sites associated with chosen activities based on rankings in Alexa (www.alexa.com). In terms of online search, we relied on Google statistics (Google, 2012b) for the top search trends. The top or most popular online activities surveyed by Pew Internet (Purcell, 2011) were checking e–mail messages, using SNSs, doing online shopping, reading online news articles, and performing online searches (e.g., by using Google). After choosing top visited sites (ranked by Alexa) combined with popular online activities (Purcell, 2011) and Google search trends (Google, 2012b), we developed a list of sites as summarised in Table 1.

 

Table 1: Online activities and sites chosen for the studies.
E–mailSNSOnline shoppingNewsGoogle search
Yahoo
Gmail
Facebook
Twitter
LinkedIn
YouTube
eBayNine News
ABC News
Lyrics007
Taste.com.au
Weatherzone
Wikipedia

 

Wireshark (http://www.wireshark.org/) was our chosen data collection tool. It is a network protocol analyser that captures and displays all HTTP traffic, such as communication between the browser or application and the requested sites or servers (Figure 1). The rationale for choosing Wireshark is that we could triangulate our data by conducting the experiments across different operating systems (OS) and different network environments (with and without proxy settings), and unlike Fiddler (http://fiddler2.com/), Wireshark is more versatile and can be installed on any OS. It can also be used to record the activities of any application or browser running on the device. Each recorded performance can be saved or exported into different file formats for later analysis. The only drawback is that encrypted information or information transmitted over SSL or HTTPS packets are not observable. However, as our aim was to analyse HTTP packets, this encrypted information has no effect on our study.

 

File captured by Wireshark
 
Figure 1: File captured by Wireshark.

 

2.2. Data collection and analysis

Each experiment involved the first author performing a set of browsing activities among the selected sites (as summarised in Table 1) while having her activities recorded by Wireshark. The experiments were conducted on Windows, Linux and Mac OS machines using the Firefox browser, with and without the ad–block Firefox extension, over two separate networks (with and without a proxy server). Users are often encouraged to create accounts for many categories of sites they visit. However, for this study, the first author already owned accounts for sites under investigation, so our work did not involve investigating the possible information leakage/sharing during the sign–up process.

Examples of the first author’s actions included, but were not limited, to the following:

  • E–mail: signing in, checking, reading and sending e–mail messages.
  • SNSs: signing in, checking her own and her friends’ profile, checking messages, where feasible, playing third–party applications and clicking on the advertisements.
  • Online shopping: signing in, searching for and occasionally purchasing a few items (e.g., hair accessories).
  • News: browsing to different types of articles (e.g., technology, health, or national news).
  • Online search: checking the daily weather forecast for her current location, searching for some general knowledge about a specific topic (e.g., hay fever).

Each experiment or browsing session included the following steps:

  • Making notes of what browsing activities will be performed.
  • Terminating other applications running on the device which may also be using the HTTP protocol.
  • Opening the browser and clearing all the cookies and search histories.
  • Running Wireshark and starting to record the HTTP messages.
  • Performing a set of actions as planned (e.g., those noted in the list above).
  • Taking notes of what exact actions are performed if there are changes.
  • Stopping the recording and saving the trace when the browsing actions are completed.
  • Examining and observing the saved HTTP messages line by line and taking notes of what/who the third party sites are and what types of information are being shared to them.

We examined the HTTP headers in each file (as shown in Figure 1) resulting from experiments in order to identify types of shared information and types of third party sites. Let us examine how we observe an HTTP conversation between a browser, first party and third party site in Table 2.

(a) The first author visits Mediterranean chicken pasta salad recipe page on Taste.com.au. The browser makes an HTTP request to Taste.com.au in order to retrieve the page content.
(b) Taste.com.au receives the HTTP request and returns an HTTP response which contains the page content (in HTML format) as well as Javascript code. The browser executes the code that it requires to fetch an image content from another site, imrworldwide.com.
(c) The browser then sends another HTTP request to imrworldwide.com for the image content to display on the recipe page. Within this request, Taste.com.au is seen to leak the user’s search keyword, Mediterranean chicken pasta salad to a third party site (imrworldwide.com) in the HTTP header.
(d) imrworldwide.com then sends an HTTP response (contains 1x1 pixel image) and sets cookies in the browser.

 

Table 2: First party site connects to third party site, sharing user’s search vocabulary.
 HTTP messages
(a)GET /recipes/25158/mediterranean+chicken+pasta+salad HTTP/1.1
Host: www.taste.com.au
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Referer: http://www.taste.com.au
Cookie: PHPSESSID=mersn2toc44pvidn2ph61t1lc3 ...
(b)HTTP/1.1 200 OK
Date: Fri, 02 Dec 2011 00:38:07 GMT
Server: Apache/2.0.52 (CentOS)
Content-Type: text/html;charset=utf-8
...
Line-based text data: text/html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...
<script type="text/javascript">
...
¡img src="http://secure-au.imrworldwide.com/...¿
...
(c)HGET /cgi-bin/m?...
Host: secure-au.imrworldwide.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://www.taste.com.au/recipes/25158/mediterranean+chicken+pasta+salad
(d)HTTP/1.1 200 OK
Date: Fri, 02 Dec 2011 00:37:16 GMT
set-cookie: V5=cookie1; expires=Mon, 29-Nov-2021 00:37:16 GMT; domain=.imrworldwide.com; path=/cgi-bin
...
Compuserve GIF, Version: GIF89a
Screen width: 1
Screen height: 1
...

 

 

++++++++++

3. Findings

Visits to a few Web sites result in thousands of lines of HTTP messages captured by Wireshark, whether on Windows, Linux or Mac OS. Similar to the finding reported by Krishnamurthy, et al. (2007), the ad–block extension on the browser helps to limit the number of third party site connections, but it does not eliminate them. We observed that not all the advertisements were removed from the visited pages while the ad–block was on. We summarise our findings by first reporting on the level of information propagation going beyond one level of the third party sites which was not covered in the previous literature, then on the types of third party site being contacted at each propagation level, while also pinpointing the types of information being transferred from the first party site. We found that users’ identifiable and non–identifiable information were being leaked to third party sites while users were visiting first party sites. Two types of third party sites were identified in this study: advertisers or data aggregators, and SNSs.

3.1. First level traverse: From first party sites to third party sites

3.1.1. Third party sites who are advertisers or data aggregators

It is not surprising and it appears to be consistent with the literature that first party sites usually transferred user’s search keywords to third party sites who are advertisers or data aggregators. The connection is necessary to fetch advertisement contents from advertiser servers to display on first party site pages. In this case, for example, paying a single visit to some non–SNS sites such as Taste.com.au (for a vanilla cupcake recipe), Lyrics007 (for a song lyric: “Yesterday” by the Beatles), and Nine News (for an article: Google gets personal with search results), resulted in at least 10 connections to third party sites while also transmitting the user’s searches to those sites (Table 3).

 

Table 3: List of third party sites connected by the browser while visiting first party sites.
taste.com.au
(Recipe: vanilla cupcake)
Lyrics007
(Song title: Yesterday)
Nine News
(Article: Google gets personal with search results)
News–static.com
sops.news.com.au
trakr-news.com.au
google–analytics.com
notebookmagazin.com
fashion.vogue.com.au
doubleclick.net
sunbeamfoods.com.au
bs.serving–sys.com
jdn.monster.com
facebook.com
twitter.com
getprice.com.au
imrworldwide.com
unica.com
Clickfuse.com
addthis.com
fastclick.net
googlesyndication.com
ringtonemaker.com
b.scorecardresearch.com
apmebf.com
rubiconproject.com
rtbidder.net
amazonaws.com
jangonetwork.com
advertising.com
abmr.net
googleapis.com
doubleclick.net
facebook.com
api.google.com
twitter.com
Msnportal.112.2o7.net
imrworldwide.com
m.adnxs.com
bs.serving-sys.com
b.scorecardsearch.com
widget.twimg.com
am.au.msn.overture.com
facebook.com
twitter.com

 

In terms of information leakages among SNSs (Table 4), there are two cases for Facebook. In the first case, connections were made only to Facebook CDN (Content Delivery Network) if no third party applications or advertisements on Facebook were clicked or used. In the second case, Facebook shared the user’s unique ID to Zynga (the online game company which hosts the Farmville application) when the user played Farmville from Facebook. Multiple requests started to be forwarded by Farmville to third party sites such as DoubleClick. However, transmission of the user’s Facebook ID from Farmville to advertisers was not observed. Meanwhile, although there were connections to third party sites, leaks or shares of identifiable information from Twitter were not observed, however Twitter page movements were detected. LinkedIn, on the other hand, was seen to share or leak user’s information (in this case: LinkedIn ID and full name) to third party severs, like Doubleclick (Table 5).

 

Table 4: List of third party sites connected to by the SNS.
LinkedIn
(No click on ads)
Twitter
(No click on ads)
Facebook
(Farmville application)
Facebook
(No click on ads)
Google–analytics.com
imrworldwide.com
quantserve.com
b.scorecardresearch.com
doubleclick.net
Google–analytics.com
twimg.com
Akamaihd.net
googletagservices.com
quantserve.com
b.scorecardresearch.com
doubleclick.net
rubiconproject.com
googleadservices.com
rtbidder.net
socialvi.be
No third party sites

 

 

Table 5: The leakage of user’s Linked ID and full name.
GET /b?
 
...
Host: ad.au.doubleclick.net
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://www.linkedin.com/profile/... a user’s LinkedIn ID and full name

 

3.1.2. Third party sites who are SNSs

User’s information and searches were transferred from first party sites to third party SNSs. New SNS widgets, such as Facebook’s Like button, Twitter’s Tweet button and Google’s Plus One button, enabled site users to share content from other Web sites with their SNS friends (Facebook, 2012; Twitter, 2012; Google, 2012a). Facebook’s Like button also enabled site owners to have a view of the number of likes on their domain both daily and demographically. It was observed that when user visited sites embedded with those widgets, with or without logging into any SNS sites, those first party sites always sent HTTP requests to the SNS to populate the page with the SNS buttons. As shown in Table 3, Taste.com.au, Lyrics007, and Nine News, connected to at least three SNSs — Facebook, Twitter and Google Plus — because SNS widgets reside on those first party sites. For example, Table 6 shows Lyrics007 sending a request to Twitter for the widget while also sharing the user’s currently visited page. Twitter recognises the same visit from the cookies that were set when the user logged into her Twitter account.

 

Table 6: Lyrics007 shares user’s searched word to Twitter while retrieving Twitter widget.
GET /widgets.js HTTP/1.1
Host: platform.twitter.com
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
...
Referer: http://www.lyrics007.com/ the song title and the artist name
Cookie: k=Twitter cookie 1
...

 

3.2. Second level traverse: From third party to other third party sites

It was observed that the user’s information and browser connection propagated beyond one level from the first party sites. The following illustrate this finding.

3.2.1. eBay

When the user visited eBay and searched or purchased an item, eBay shared the search keyword with other third party sites, like Doubleclick. Doubleclick was also seen to share that information with other third party sites, like amgdgt.com and b.scorecardresearch.com (Table 7). User’s information was seen to be transmitted from eBay to a third party site, and from that sitr to another third party site as shown in Figure 2.

 

Table 7: The traverse of eBay user’s search keyword.
1GET /adi/ebay.au.search/keywords...
Host: ad–apac.doubleclick.net
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://www.ebay.com.au/ searched item
2GET /base/js/v1/amgdgt.js HTTP/1.1
Host: cdn.amgdgt.com
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://ad–apac.doubleclick.net/adi/ebay.au.search/ keywords searched item
3GET /p?c1=8c2=6035179...
Host: b.scorecardresearch.com
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://ad–apac.doubleclick.net/adi/ebay.au.search/ keywords searched item

 

 

eBay traverse map
 
Figure 2: eBay traverse map.

 

3.2.2. Lyrics007

Lyrics007 was similar to eBay in that there was a second level of information sharing to other third party sites. For example, the song title propagated from Lyrics007 to jangonetwork, then from jangonetwork to Doubleclick (Table 8). In addition, when user connected to Lyrics007 (1), the browser connection went to rubiconproject.com (2), then from rubiconproject.com to w55c.net (3), and from w55c.net to bluekai.com (4) (Figure 3). However, information sharing ended in the second level as shown in Figure 3.

 

Table 8: The traverse of Lyrics007.
1GET /00?... song title and artist name
Host: jmn.jangonetwork.com
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://www.lyrics007.com/ the song title and the artist name
2GET /widgets.js HTTP/1.1
Host: partner.googleadservices.com
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://jmn.jangonetwork.com/ song title, and artist name
3GET /gampad/ads?...
Host: pubads.g.doubleclick.net
User–Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
Referer: http://jmn.jangonetwork.com/ song title, and artist name

 

 

Lyrics007 traverse map
 
Figure 3: Lyrics007 traverse map.

 

 

++++++++++

4. Discussion

First party sites were seen to leak users’ identifiable and non–identifiable information to third party sites. Third party sites in this study were identified as advertisers or data aggregators and SNSs, particularly Facebook, Twitter, and Google Plus (Table 9). Components of users’ information propagate from first party sites (both SNS and non–SNS) to third party sites, and also from third party sites to other third party sites. If the first party sites use or connect to the same third party sites, those third party sites are able to track user’s movements across different sites, among SNS and non–SNS, via the use of HTTP cookies. The tracking can be classified into two categories: the tracking by third party sites who are advertisers or data aggregators, and the tracking by third party sites who are SNSs.

 

Table 9: Summary of information leakages.
First party sitesLeaked informationThird party sites
YahooUser’s clickstream within YahooNon–SNS
GmailNot observableNot observable
Facebook
(No ads or app clicked)
NoneNone
Facebook
(e.g., Farmville)
Facebook IDNon–SNS
TwitterUser’s searches and clickstreamNon–SNS
LinkedInUser’s clickstream, full name and IDNon–SNS
Taste.com.auUser’s searches and clickstreamSNS and non–SNS
Lyrics007User’s searches and clickstreamSNS and non–SNS
Nine NewsUser’s searches and clickstreamSNS and non–SNS
eBayUser’s searches
(e.g., hair clips)
Non–SNS
WikipediaNoneNone
YouTube2User’s searches and clickstreamGoogle
WeatherzoneUser’s location and clickstreamSNS and non–SNS

 

4.1. Tracking by advertisers or data aggregators as third party sites

Case: b.scorecardresearch

b.scorecardresearch is one of the most contacted third party sites among those visited sites in this study. They are able to obtain both the user’s identifiable (in this case from LinkedIn: name and ID) and non–identifiable information (user searches). Therefore, within just one browsing session, b.scorecardresearch is able to associate a given user with specific online movements. In this case, b.scorecardresearch knows the first author’s name from LinkedIn and knows that she likes cooking (from her search for recipes) and listening to the Beatles.

4.2. Tracking by SNS as third party sites

Through the use of the SNS widgets, it has been shown that SNS are able to track users’ online movements not only within SNS but also across the many non–SNS sites that embed SNS widgets. SNSs themselves hold large amounts of personal information about a person’s life. There are two separate situations here. First, without logging into any SNS during a browsing session, SNSs can track a user’s movements without being able to identify a specific person. However, in the second situation, if the user has remained logged into any SNS account (Facebook, Twitter, or Google Plus, or even Gmail or YouTube) while also browsing different Web sites, that SNS is able to combine their non–SNS browsing habits with their profile, within that browsing session.

Case: Twitter

Once a user logs into Twitter, Twitter sets cookies to remember the state (e.g., a user’s credentials). The cookies store information about a particular user (e.g., the user’s Twitter ID, name, and IP address). When the user visits other sites that embed the Tweet button, Twitter associates visits with the same cookie value set while logging in the account. Even after the user logs out from her Twitter account, then Twitter is still able to track her movement outside the SNS because the cookie values remain the same. From the perspective of Twitter, a Twitter user’s profile may include all online browsing movements.

Case: Facebook

Facebook, on the other hand, is slightly different from Twitter in a sense that once a given user logs out, the cookies associated with that person (which include a user’s name, e–mail address and ID) are destroyed. However, other cookies (e.g., guest cookies) which are not directly associated with a particular person remain. Those cookies can still associate browsed Web sites with browsing devices (e.g., OS, browser, and IP address).

Case: Google

Google was observed to act differently in this study. The traffic associated with Gmail or Google Plus could not be observed in HTTP messages. Rather, Google uses another protocol called TLS (Transport Layer Security) which is a cryptographic protocol providing secured communication over the Internet. All the data are encrypted. However, we observed a form of tracking by Google via the cookie values used when the user later uses the Google search engine or YouTube. The majority of the first party sites within this study usually directly connect to Google or its other domains (e.g., google–syndication or google–analytics) or its franchise (Doubleclick), so Google has the ability to track a user across different sites, both SNS and non–SNS, from the use of the same cookies.

Again, without logging into Gmail or Google Plus, there should not be any linkage of the user’s online search with a specific identifier. However, logging into either of these services will cause the user’s online activities (among first party sites who use Plus one buttons, sites discovered through Google search, or sites using Google ads) to be linkable with their Google profile via the HTTP cookie. We also noted that once users sign into their Gmail account, they are automatically logged into Google Plus and YouTube. Though Gmail and Google Plus traffic data is not observable via the HTTP headers, the rest of the traffic via HTTP messages show that Google can gain access to many details about a person’s life if that user logs into their Gmail account and stays logged in during the browsing session.

 

++++++++++

5. Conclusion

Within this study, while non–SNSs share or leak non–identifiable information, they reveal a person’s browsing habits or searches to third party sites (e.g., advertisers and/or SNSs). SNSs like LinkedIn and Facebook (when the user clicks on advertisements or third party applications) share the user’s identifiable information (LinkedIn case: a user’s name and ID, Facebook case: Facebook’s ID) to third party sites. The traffic between the browser and the Google Gmail and Google Plus services is not visible in the HTTP protocol because it is encrypted. However, based on Google HTTP cookies, we observed that all of the user’s online movements can be tracked and linked to a specific individual by Google if the user happens to log into any of its services (either Gmail, Google Plus, or YouTube) within the same browsing session.

It was also observed that both the user’s information (e.g., searches) and the browser connection propagate to more than just one level, traversing from first party sites to third party sites, and from those third party sites to other third party sites. There are two types of third party sites in this study: advertisers or data aggregators and SNSs, such as Facebook, Twitter, and Google Plus. Both are able to track a user’s movement across different sites by the use of cookies and SNS widgets. With a piece of identifiable information from a SNS (like in the case of LinkedIn: name and ID), the browsing habits or behaviours can be linked to a specific person.

Technologies that enable tracking, like HTTP cookies, are not new. Users’ online movements, as well as browsing behaviours, could be tracked for as long as cookies have existed. It becomes problematic when a specific user can be identified. If a user never clears her browser histories and cookies, browsing profiles may include their online activities for 365 days of the year (some cookies can live up to 100 years). It appears that whenever a user visits any Web site (SNS and non–SNS), it is inevitable that they will leave their digital footprints within and across sites.

These results provide insight into information leakages and the nature of the behavioural tracking within a browsing session, where the first author of this article browsed both a SNS and a non–SNS frequented by many, and whose activities are common among many online users. It is important to examine information leakages from both a SNS and non–SNS perspective, because the combination of leaked information, both identifiable and non–identifiable (e.g., name and browsing activities), reveals much about an individual. Although the findings of this study are not intended to generalise about information gathering online, within just one browsing session, the first author’s identifiable and non–identifiable information propagated to third party sites, and that thosse sites were able to track her browsing habits and combine those details with her identifiable information.

Future work

SNSs — particularly Facebook, Twitter and Google Plus — and advertisers or data aggregators have the ability to track a user’s online movements across different sites. Online tracking is very common; however, SNSs have a great deal of personal information provided by and about their online users, along with information about their movements outside the SNS itself. In our future work, we would like to analyse privacy issues within the context of SNS tracking through the lens of a privacy framework like contextual integrity (CI).

CI was introduced by Helen Nissenbaum (2004) as a means for evaluating the impacts of a technology or system from a moral and political viewpoint. There are two norms in this privacy scheme known as: (a) norms of appropriateness (which dictate the types of information that is allowable to be divulged in a particular context); and, (b) norms of distribution (which govern the flow of the information from one context to another). Contextual integrity of information flow is maintained when both kinds of norms are respected; otherwise, a breach of privacy occurs. We would like to find out if the practice or ability of tracking users’ online movements within and outside SNSs by SNS companies violate the norms of information flow, and thus users’ privacy. End of article

 

About the authors

Rath Kanha Sar is a Ph.D. candidate in the School of Computing and Mathematics at Charles Sturt University in Australia. Her research focuses on privacy issues raised by the use of social network sites. She has given several presentations about her research, including at international conferences.
E–mail: rsar [at] csu [dot] edu [dot] au

Yeslam Al–Saggaf is a Research Fellow at the Centre for Applied Philosophy and Public Ethics (CAPPE) and a Senior Lecturer in Information Technology at the School of Computing and Mathematics, Charles Sturt University. His research interests lie in the areas of privacy in social media and ICT ethics. He has published in those areas in a number of international refereed journals and has made presentations at a number of international conferences.
E–mail: yalsaggaf [at] csu [dot] edu [dot] au

 

References

ABC (Australian Broadcasting Corporation) News, 2012. “Facebook hits billion users amid revenue worries,” at http://www.abc.net.au/news/2012-10-05/facebook-hits-billion-users-amid-revenue-worries/4296792, accessed 5 October 2012.

J. Angwin, 2010. “The Web’s new gold mine: Your secrets,” Wall Street Journal (30 July), at http://http://online.wsj.com/article/SB10001424052748703940904575395073512989404.html, accessed 9 September 2012.

M. Ayenson, D.J. Wambach, A. Soltani, N. Good, and C.J. Hoofnagle, 2011. “Flash cookies and privacy II: Now with HTML5 and ETag repawning,” Social Science Research Network, at http://ssrn.com/abstract=1898390, accessed 19 May 2013.

d. boyd and E. Hargittai, 2010. “Facebook privacy settings: Who cares?” First Monday, volume 15, number 8, at http://firstmonday.org/article/view/3086/2589, accessed 19 May 2013.

d. boyd and J. Heer, 2006. “Profiles as conversation: Networked identity performance on Friendster,” HICSS ’06: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, volume 3, p. 59.3.

E. Bursztein, 2012. “19% of users use their browser private mode” (17 May), at http://elie.im/blog/privacy/19-of-users-use-their-browser-private-mode/, accessed 12 November 2012.

D.E. Comer, 2000. Internetworking with TCP/IP. Fourth edition. Upper Saddle River, N.J.: Prentice Hall.

C. Dwyer, 2009. “Behavioural targeting: A case study of consumer tracking on levis.com,” Proceedings of the Fifteenth Americas Conference on Information Systems, at http://csis.pace.edu/~dwyer/research/AMCISDwyer2009.pdf, accessed 19 May 2013.

Facebook, 2012. “Facebook developers: Like button,” at http://developers.facebook.com/docs/reference/plugins/like/, accessed 1 December 2012.

Google, 2012a. “Plus one: Recommend on search, share on Google +,” at http://www.google.com/+1/button/, accessed 1 December 2012.

Google, 2012b. “Web search interest,” at http://www.google.com/trends/explore, accessed 2 December 2012.

R. Gross and A. Acquisti, 2005. “Information revelation and privacy in online social networks,” WPES ’05: Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society, pp. 71–80.

A.M. Hormozi, 2005. “Cookies and privacy,” Information Systems Security volume 13, number 6, pp. 51–59.http://dx.doi.org/10.1201/1086/44954.13.6.20050101/86221.8

B. Krishnamurthy and C.E. Wills 2010a. “On the leakage of personally identifiable information via online social networks,” WOSN ’09: Proceedings of the Second ACM Workshop on Online Social Networks, pp. 7–12.

B. Krishnamurthy and C.E. Wills, 2010b. “Privacy leakage in mobile online social networks,” WOSN ’10: Proceedings of the Third Conference on Online Social Networks, at http://web.cs.wpi.edu/~cew/papers/wosn10.pdf, accessed 19 May 2013.

B. Krishnamurthy and C.E. Wills, 2009. “Privacy diffusion on the Web: A longitudinal perspective,” WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 541–550.

B. Krishnamurthy and C.E. Wills, 2008. “Characterizing privacy in online social networks,” WOSN ’08: Proceedings of the First Workshop on Online Social Networks, pp. 37–42.

B. Krishnamurthy and C.E. Wills, 2006. “Generating a privacy footprint on the Internet,” IMC ’06: Proceedings of the Sixth ACM SIGCOMM Conference on Internet Measurement, pp. 65–70.

B. Krishnamurthy, K. Naryshkin, and C.E. Wills, 2011. “Privacy leakage vs. protection measures: The growing disconnect,” Proceedings of the Web 2.0 Security and Privacy Workshop, at http://w2spconf.com/2011/papers/privacyVsProtection.pdf, accessed 19 May 2013.

B. Krishnamurthy, D. Malandrino, and C.E. Wills, 2007. “Measuring privacy loss and the impact of privacy protection in Web browsing,” SOUPS ’07: Proceedings of the Third Symposium on Usable Privacy and Security, pp. 52–63.

D.M. Kristol, 2001. “HTTP cookies: Standards, privacy, and politics,” ACM Transactions on Internet Technology, volume 1, number 2, pp. 151–198.http://dx.doi.org/10.1145/502152.502153

M. Madden, 2012. “Privacy management on social media sites,” Pew Internet & American Life Project (24 February), at http://www.pewinternet.org/Reports/2012/Privacy-management-on-social-media.aspx, accessed 19 May 2013.

J. Mayer, 2011. “Tracking the trackers: Where everybody knows your username,” Center for Internet and Society at the Stanford Law School (11 October), at http://cyberlaw.stanford.edu/blog/2011/10/tracking-trackers-where-everybody-knows-your-username, accessed 19 May 2013.

H. Nissenbaum, 2004. “Privacy as contextual integrity,” Washington Law Review, volume 79, number 1, pp. 119–158.

K. Purcell, 2011. “Search and email still top the list of most popular online activities,” Pew Internet & American Life Project (9 August), at http://www.pewinternet.org/Reports/2011/Search-and-email.aspx, accessed 19 May 2013.

A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C.J. Hoofnagle, 2009. “Flash cookies and privacy,” Social Science Research Network, at http://ssrn.com/abstract=1446862, accessed 19 May 2013.

K. Strater and H.R. Lipford, 2008. “Strategies and struggles with privacy in an online social networking community,” BCS–HCI ’08: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction, volume 1, pp. 111–119.

H.T. Tavani, 2011. Ethics and technology: Controversies, questions, and strategies for ethical computing. Third edition. Hoboken, N.J.: Wiley.

H.T. Tavani, 1999. “Informational privacy, data mining, and the Internet,” Ethics and Information Technology, volume 1, number 2, pp. 137–145.http://dx.doi.org/10.1023/A:1010063528863

O. Tene and J. Polonetsky, 2012. “To track or ‘do not track’: Advancing transparency and individual control in online behavioral advertising,” Social Science Research Network, at http://ssrn.com/abstract=1920505, accessed 19 May 2013.

Twitter, 2012. “Twitter buttons,” at http://twitter.com/about/resources/buttons, accessed 28 November 2012.

W. Wongyai and L. Charoeunwatana, 2012. “Examining the network traffic of Facebook homepage retrieval: An end user perspective,” Proceedings of the 2012 International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 77–81.

K. Young, 2009. “Online social networking: An Australian perspective,” International Journal of Emerging Technologies and Society, volume 7, number 1, pp. 39–57.

 


Editorial history

Received 23 January 2013; revised 7 May 2013; accepted 10 May 2013.


Creative Commons License
“Propagation of unintentionally shared information and online tracking” by Rath Kanha Sar and Yeslam Al–Saggaf is licensed under a Creative Commons Attribution–NonCommercial–NoDerivs 3.0 Unported License.

Propagation of unintentionally shared information and online tracking
by Rath Kanha Sar and Yeslam Al–Saggaf.
First Monday, Volume 18, Number 6 - 3 June 2013
http://journals.uic.edu/ojs/index.php/fm/article/view/4349/3681
doi:10.5210/fm.v18i6.4349





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2014.