If you’re not the site owner, you’ll have to use competitive intelligence services to estimate the traffic. These services a wide range of methodological approaches, generally recruiting panel members using an opt–in mechanism, sometimes mixing in anonymized data bought from Internet service providers. Three of the leading free services of this type are Alexa, Compete, and Quantcast.
This study examines the methodology of these four methods of measuring online audiences. It compares the results from all three services using a panel of 18 Web sites that have agreed to share analytics data with the author.
Results show high variability in the quality of the estimated audience data from competitive intelligence services. Using analytics as the reference, Compete underestimated traffic by more than 300 percent. Quantcast underestimated traffic by more than 200 percent. Confounding this pattern were occasional overestimates of traffic by these services. Alexa data, which is presented differently than Compete and Quantcast data, showed one significant correlation (page views) with analytics data, while three other variables showed no relationship to data from analytics.
Overall, these services performed poorly. But are the data worthless? The report concludes with recommendations for understanding the limitations of competitive intelligence services and appropriately reporting audience activity based upon differing methodologies.
Audience data from traditional media
Criticism of competitive intelligence tools
Summary of competitive intelligence tools
We live in the age of data. And it’s often said, in the digital world, everything is measurable. But gaining access to those clicks is another matter. Analytics are generally only available to the owner of a particular Web domain. And so competitive intelligence services have sprung up to fill this void, albeit with different kinds of data.
These services — including Compete (http://www.compete.com/us/), Quantcast (https://www.quantcast.com/) and Alexa (http://www.alexa.com/) — help answer everyday business questions: how am I doing relative to my competitors? Is that new business truly a threat? Are the trends upward or down?
The purpose of this inquiry is to explain these services and provide exploratory analyses that compare data from these services with known analytics data.
It shouldn’t be a surprise that three different services using different data will generate different estimates. The services are often wildly at odds with one another and also with data from analytics software. One goal of this inquiry is to address the notion of “false precision” that is common when using the services. This form of faulty logic goes as follows: a computer generated the numbers coming from the services, they are presented with precision, and so they must be right.
But they may not be. An exploration of the methodology of these services helps to reinforce that these are estimates, bounded by error terms that are sometimes so large as to render the estimates meaningless.
A second goal is to generate enough understanding of these services to aid in their correct use. By exploring how the services are designed, we can educate the end user to carefully use these services within their limits. After all, bad data can be more harmful than no data at all.
This inquiry will first explain the advantages and limitations of analytics data, and will make the case that this represents the “gold standard” of audience measurement online. Then we will explore the methodologies of the three popular services that estimate online traffic. Next, using a limited data set, we’ll compare the accuracy of all four methods. Finally, the inquiry will generate some principles of good practice when using competitive intelligence services.
Audience data from traditional media
Throughout the history of mass communication, it has been difficult for researchers to obtain accurate measurements of audience behavior.
While it’s easy to know how many newspapers were printed, we’ve never had good information on which articles people actually read — although researchers have tried.
Measuring radio has been even more difficult. Economists call radio a “free good,” in that a station can serve an essentially unlimited audience at no increase in marginal cost per listener. For years, in the United States Arbitron (http://www.arbitron.com/) has measured radio listening by marrying survey–sampling techniques to a written diary.
In the U.S., AC Nielsen (http://www.nielsen.com/) has done much the same for the measurement of television viewing, although its people meter automates much of the data collection in larger markets. While the people meter monitors when the set is on and what channel it is tuned to, people in the household must manually tell the device when they enter and leave the room. Still, there’s no way of knowing if those people are actually watching. People meter data are also subject to sampling error.
Our data in digital media are much better and more complete than any of these traditional mass media. Through analytics software, we can track every visit to a Web page and can learn about where visitors come from, how they behave, and what conversions may occur (such as purchasing a product on an e–commerce site). Virtually every commercial site takes care to keep track of its visitors, as this information is essential to monetize and understand the activity on the sites.
Criticism of competitive intelligence tools
While analytics software provides accurate counts of visitor activity, it is generally only available to the owner of a site. If you want to estimate the traffic on someone else’s site, you have to use other tools, which may be based on entirely different ways of capturing visitor data.
This study focuses on three free tools that provide information on Web visitor activity: Alexa.com, Quantcast.com and Compete.com. Each of these services estimates viewer activity differently.
Webmasters, who have very good information on their own traffic, have criticized competitive intelligence services. Matt Cutts, a software engineer at Google, has addressed this issue on his blog, where one post generated 54 comments .
Cutts cited an informal study by Google colleague Peter Norvig, director of research at Google, who noted, “people with the Alexa toolbar installed are 25 times more likely to view a page on Matt’s site versus mine, but overall, all users view twice as many pages on my site. That’s a 50 to 1 difference introduced by the selection bias of Alexa.” 
Norvig appears fairly unique in using traditional socio–scientific principles (here, sample selection) in criticizing Alexa. The other webmasters tend to focus on real–world issues. Said one commenter on Cutts’ post:
Alexa scores are utterly dependent on visits by alexa toolbar users; very largely webmasters and geeks, with a scattering of well–meaning teenagers.
Alexa’s ability to tell you ANYTHING about YOUR, GENUINE visitors is precisely zero. 
Another commenter cited Aaron Wall’s SEO Book (http://www.seobook.com/) which explains:
Alexa is widely tooted as a must use tool by many marketing gurus. The problems with Alexa are:
- Alexa does not get much direct traffic and has a limited reach with its toolbar
- a small change in site visitors can represent a huge change in Alexa rating
- Alexa is biased toward webmaster traffic
- many times new webmasters are only tracking themselves visiting their own site
Why do many marketing hucksters heavily promote Alexa? Usually one of the following reasons:
- if you install the Alexa toolbar and then watch your own Alexa rating quickly rise as you surf your own site it is easy for me to tell you that you are learning quickly and seeing great results, thus it is easy to sell my customers results as being some of the best on the market
- if many people who visit my site about marketing install the Alexa toolbar then my Alexa rating would go exceptionally high
- the marketers may associate their own rise in success with their increasing Alexa ranking although it happens to be more of a coincidence than a direct correlation 
A panel of Webmasters in the SEO (search engine optimization) community joined together and publicly shared their analytics data, then compared it to competitive intelligence tools . They found the tools wanting in comparison to analytics data. Using Pearson correlation coefficients, they found the Alexa rank only correlated at .49 with Web site popularity; Compete’s correlation was only .38. The study’s conclusion:
Services like Alexa, Ranking.com, Compete.com & Netcraft are nearly useless when it comes to predicting traffic or comparing relative levels of popularity, even when used on a highly comparable set of sites in a similar field
none of these are nearly accurate enough to use, even in combination, to help predict a site’s level of traffic or its relative popularity, even in a small niche with similar competitors. Unfortunately, it appears that the external metrics available for competitive intelligence on the web today simply do not provide a significant source of value. 
Before taking a closer look at competitive intelligence tools, we must understand the value and limitations of true analytics data.
- Easy to implement;
- Implementable if you don’t have access to Web servers or logs;
- Not affected by ISP page caching;
- High degree of control; can add e–commerce tags;
- Analytics vendor is responsible for data capture, not your IT department;
- Facilitates use of third–party cookies. 
There are also some methodological limitations related to how people use the Web:
The last click. When someone visits your computer, your analytics software sets a cookie. This permits analysis of activity over time. But there is no click to mark the end of a visit. While every analytics package has a way to deal with this, this represents a degree of imprecision in your analytics data.
Same visitor, different computer, or different visitor, same computer. Since the cookies are placed on a particular browser on a particular computer, they are limited to behavior on that combination of software and hardware. So, if the same person visits your site with Firefox on one day and Chrome on another, the analytics will report these as two visits from two visitors. Conversely, if two different household members visit a common site with the same browser and computer, the second visit will be counted as a returning visit from the same person. Remember also that many individuals have multiple ways to visit a particular Web site, including mobile devices, tablets, game machines and office computers.
Tabbed browsing. With tabbed browsing, the user concurrently loads multiple sites and then jumps from one tab to another. For example, an office worker may load a news site at the beginning of the day, but only check it occasionally during the day while tending to work activities. The analytics software must decide when a session is over. The default in Google Analytics is that after 30 minutes of no activity, the session has ended. This may not reflect actual behavior .
Wiped cookies. When a computer user deletes browser cookies, historical analytical data is lost, and repeat visits will be logged as new visits.
A major limitation of page tagging for some applications is that the analytical information is not public. If you need information on Web sites that you don’t control, you must look to competitive intelligence tools to estimate traffic. Three widely available services with a free option are Compete, Alexa and Quantcast.
Summary of competitive intelligence tools
Compete, owned by Kantar Media Company, is a “freemium” service — there is a limited free service, and a more detailed paid service with monthly fees from US$199 to US$499.
Compete relies upon panel data. Its sample begins with a panel of 350,000 persons who opt in and agree to have their Web activity monitored. Anyone can opt in by visiting consumerinput.com. People join the panel for a chance to win prizes for their participation. This sample is supplemented with click stream data purchased from Internet service providers and application service providers with a goal of two million “representative” panelists. Through a “harmonization” process these data are turned into one large dataset, a “single, unified online consumer measurement panel that is representative of the Internet browser population in the United States.” 
The data are then normalized against a 4,000–person omnibus survey of the U.S. Internet population. This survey creates weights, which are applied to the click stream panel, resulting in “data that meet the high quality threshold required for trusted, reliable consumer and media research.”
Since so much of the Compete methodology is proprietary, it is difficult to assess. In survey research terminology, however, the initial opt–in panel is a SLOP sample, a self–selected opinion poll . Because anyone can participate, the sample is not representative of any larger population.
Compete provides only U.S. data, and does not purport to measure Web surfing at work (most information technology departments would forbid its software as “spyware”). Office Web browsing is a significant traffic source; so much that Nielsen offers separate office and home panels for its service.
Alexa, owned by Amazon.com, also relies upon panel data. Alexa data comes from a toolbar, which a volunteer can download for Internet Explorer, Google Chrome and Mozilla Firefox. There appears to be no option for a Macintosh user with the Safari browser. As with Compete, Alexa utilizes a SLOP sample. The toolbar includes a few features to enhance the browsing process, such the ability to view ratings or popularity of a particular site, a “wayback” button to see how the site used to look, and a chance to view related links and search terms.
Alexa data is then “normalized”:
Alexa’s ranking methodology corrects for a large number of potential biases and calculates the ranks accordingly. We normalize based on the geographic location of site visitors. We correct for biases in the demographic distribution of site visitors. We correct for potential biases in the data collected from our Alexa Toolbar to better represent those types of site visitors who might not use an Alexa Toolbar. 
Alexa makes several disclaimers about its data. The service notes “Sites with relatively low traffic will not be accurately ranked by Alexa.” Alexa recommends caution in interpreting data from sites with fewer than 100,000 visits in a month .
For Web sites that choose to not implement Quantcast tags, a different methodology is utilized:
Quantcast has NOT built its model on PII (personal identifiable data) data, but instead uses numerous data inputs that provide some insight into demographic benchmarks and a large–scale mathematical model to infer demographics of visitors. 
Because the methodology is not explicitly stated, it is difficult to evaluate. However, Quantcast states, “If you’re not a Quantified Publisher then we’re only able to present estimates for your audience, and it’s entirely possible that these estimates aren’t very good.” 
Summary of traffic estimating services
All three of the services in question here — Alexa, Compete and Quantcast — share three qualities that should be troubling when their respective methodologies are examined through the filter of good research practice. First, they all appear to use — at least in part — SLOP samples for much or all of their traffic estimations. Second, they all rely upon “black box” methodologies. In essence, they hide behind proprietary methodologies that make the services difficult to evaluate. Third, while they may note their limitations in fine print, the services present their data with the sheen of precision, fostering the illusion that the data are better than they appear to be.
This study addresses these questions:
- How accurate are the data from free traffic estimating services Alexa, Compete and Quantcast?
- What can we learn from the limitations of these services that can facilitate their effective use?
Eighteen Web site owners agreed to share their analytics data from February through August 2011. The data have been anonymized to protect confidential information. The sites include a wide range of business and non–profit interests, including health, professional services, e–commerce, consumer products and retail. The monthly unique visitor counts for these sites ranged from about 2,000 to 200,000.
All reports came from Google Analytics, the leading analytics package with a reported 80 percent market share. Competitive estimates were taken from the free versions of Alexa, Compete and Quantcast. Variables coded were: monthly unique visits at three points in time (August, May and February, 2011) from Compete and Quantcast, and four measures from the last three months (September–November 2011) from Alexa: page views, bounce rate, percentage of traffic from search and time on site. These variables were chosen to facilitate direct comparison between analytics and the three services.
Data were organized and keypunched into a MS Excel spreadsheet, then imported into SPSS version 19 for analysis.
Data from Compete were positively correlated with Google Analytics data. Table 1 shows Pearson correlation coefficients for all Web sites in the sample, using monthly unique visits for August, May and February:
Table 1: Correlation coefficients — Google Analytics and Compete. August .637** p=.004 n=18 May .709** p=.001 n=17 February .679** p=.004 n=16
To illustrate these relationships, Figures 1-3 show scatter plots for each month’s measurements:
Figure 1: August Compete estimates plotted against analytics data.
Figure 2: May Compete estimates plotted against analytics data.
Figure 3: February Compete estimates plotted against analytics data.
While the relationships between analytics measures and Compete measures are positive and generally linear, the correlations don’t show the absolute level of agreement between the two data sources.
To illustrate this relationship, unique visitors from Analytics were divided by unique visitors from Compete to create a ratio. If each service measures the same number, the resulting ratio will be 1. If Google Analytics reports 1,000 clicks while Compete reports 500 clicks, the ratio is 2. Table 2 shows that Compete consistently underreports unique visitors relative to Analytics:
Table 2: Ratio of unique visitors: Google Analytics/Compete. Mean ratio SD August 3.34 4.36 n=18 May 3.25 3.15 n=17 February 3.25 4.06 n=16
Overall, Compete’s audience estimates are more than 300 percent lower relative to Analytics. Further, large standard deviations show high variability in these estimates. Manual tabulation for August data, for example, shows that Compete overestimated the audience for one site, but underestimated the audience for 16 sites. In seven cases, the Compete data were more than 100 percent lower than the Analytics estimates. In the worst case, for a non–profit Web site, analytics logged 4,268 unique visitors while Compete estimated 219 unique visitors. This represents a ratio of almost 20 to 1. This general pattern is evident in the data for May and February as well: Compete tends to under–report unique visitors, and its estimates vary greatly relative to data from analytics.
Quantcast has a higher threshold for reporting Web site traffic relative to Compete. The practical implication for this study is that Quantcast does not report estimates for smaller or newer sites, leading to a smaller working sample for this study. Also noteworthy is that only two of 18 sites in this study are “Quantified,” or directly measured. The rest have estimated traffic from their opt–in panel. Due to the small sample size for Quantcast, data from both sources are presented together for this analysis.
In two of the three months analyzed, Quantcast data was significantly and linearly related to analytics data.
Table 3: Correlation coefficients — Google Analytics and Quantcast. August .908** p=.002 n=8 May .190 NS (.653) n=8 February .910** p=.002 n=8
Scatter plots, shown in Figures 4–6, visually show the relationship between unique visitors as estimated by Quantcast and recorded by analytics. In the May graph, one outlier may be enough with a sample of eight to render the relationship non–significant:
Figure 4: August Quantcast estimates plotted against analytics data.
Figure 5: May Quantcast estimates plotted against analytics data.
Figure 6: February Quantcast estimates plotted against analytics data.
While the relationships between analytics measures and Quantcast measures are generally positive, the correlations mask the true agreement between the different approaches.
Table 4 shows unique visitors from analytics were divided by unique visitors from Quantcast (this study used the “people” measure from Quantcast). The service consistently underreports unique visitors relative to analytics:
Table 4: Ratio of unique visitors: Google Analytics/Quantcast. Mean ratio SD August 2.43 1.32 n=8 May 2.20 1.67 n=8 February 2.75 3.05 n=8
On average, Quantcast underreports unique visitors by more than 200 percent. Out of 24 observations (three months times eight Web sites) Quantcast over reported unique visitors in only two cases. The service underreported unique visitors by more than 100 percent in half of all cases.
While Alexa reports many kinds of data for Web sites, it does not report many common metrics such as unique visitors. Several reports in Alexa are based upon reach and are difficult to interpret. For example, the largest site in the panel for this study has an “estimated percentage of global page views” of .000079 percent. To further complicate things, Alexa only reports limited data for sites that it measures below the top 100,000. All sites considered for this study are below this threshold.
Four measures available from Alexa are directly comparable to data from analytics: average page views, bounce rate, percentage of traffic from search and average time on site. Each of these measures is aggregated over a three–month reporting period.
Table 5: Correlation coefficients — Google Analytics and Alexa. Page views .642** p=.004 n=18 Bounce rate .473 NS (.087) n=14 % from search .627* p=.016 n=14 Time/site .282 NS (.290) n=16
The mean number of page views recorded by analytics was 3.49 (SD=2.23) while the mean number of page views as recorded by Alexa was 3.68 (SD=3.14). The correlation between these measures was significant (r=.642, p=.004). Figure 7 displays a scatter plot of that relationship:
Figure 7: Alexa page view estimates plotted against analytics data.
Bounce rate measures the percentage of visits that result in only one page view. The mean bounce rate as recorded by analytics was 44.4 percent, with a standard deviation of 13.39. The mean bounce rate as recorded by Alexa was 44.74 percent, with a standard deviation of 14.06. The correlation between the two measures was not significant (r=.473, p=.087). The relationship is visualized on this scatter plot:
Figure 8: Alexa bounce rate estimates plotted against analytics data.
The percentage of traffic from search as recorded by analytics was 46.7 percent, with a standard deviation of 19.76 percent. The percentage of traffic from search as reported by Alexa was 22.53, with a standard deviation of 13.27 percent. These two measures were positively correlated (r=.627, p=.016). This relationship is visualized in Figure 9:
Figure 9: Alexa search referral estimates plotted against analytics data.
Average time on site as recorded by analytics was 177 seconds, with a standard deviation of 91 seconds. Alexa reported an average time on site of 267 seconds and a standard deviation of 283 seconds. The correlation between the two was not significant (r=.282, p=.290). This relationship is represented on Figure 10:
Figure 10: Alexa time on site estimates plotted against analytics data.
Overall, the Alexa data provide less utility than the other two services, and its data appears less stable. In one case (bounce rate) the descriptive data agree (about 44 percent) but there is no correlation. With another variable, percentage of traffic from search, there is wide disparity (analytics estimates 47 percent while Alexa estimates 23 percent) yet the two are significantly correlated. The scatter plot shows a generally linear relationship but with notable outliers. Overall, with this limited sample, it’s difficult to assess the quality of the Alexa data.
Should you use bad data or no data at all? This appears to be the choice of the analyst considering the use of competitive intelligence services. Of course, while every data source has its own flaws, these are greatly compounded when relying upon a panel that opts in to either win prizes or get free anti–virus software. These panels also significantly under–report “at–work” Web activity, because IT departments generally ban the toolbars because they are invasive or spyware.
There are some principles that will help you get value from competitive intelligence services. Most importantly, never forget that these are just estimates. While they are using the same traffic concepts as your analytics, it’s best to divorce these numbers from your analytics numbers. Instead, consider the estimates as a point of departure. Compare them over time. And try to make your comparisons within the same estimating tool, even if you have analytical data available.
When presented with competitive intelligence, triangulate it with other data sources. If the metric in question should be stable, go back in time and see if the service reports it so. If you know a site experienced a huge increase in traffic at a specific time, see if the tool picked up this trend. While absolute numbers may not agree, trends may still be presented with some accuracy.
Compete data were orders of magnitude off from analytical data. Overall, Compete underestimated traffic to sites by more than 300 percent. But before you apply a correction to Compete data, be advised that the estimates are highly variable. In one case, Compete was low by a factor of 20, yet in another, the service overestimated traffic. Even a casual glance at Compete data shows highly variable data, even though most Webmasters know traffic is generally predictable.
Quantcast is really two services, a panel and a true analytics sample. For the purposes of this study, conclusions should be tentative, since Quantcast measured only eight of 18 Web sites.
Overall, Quantcast underestimated traffic by more than 200 percent as compared to analytics data. But while Quantcast estimates were low, they were more tightly aligned with analytics data, even though one of the three months measured showed a non–significant correlation with analytics.
Because Quantcast also maintains a panel, it is able to move beyond clicks to people, thus being able to present data in a manner that is more like traditional Nielsen television viewing data. While this is useful, remember that the demographics overlay comes from the Quantcast panel, with the attendant issues related to sample selection and estimation.
Alexa’s data appeared least stable of the three services. Two measures, bounce rate and time on site, were not correlated with data of the same events as reported by analytics.
Alexa is also less useful because of how it reports data. By clinging to “global reach” instead of simply reporting visits, visitors or other standard measures, it is difficult to interpret. This is especially true for smaller sites.
So much data is now available; it’s easy to look past the basic principles of how the information is collected. Relatively little is known about how competitive intelligence services create their reports. While they may use sophisticated, proprietary methods, this inquiry shows their methods wanting.
Social scientists are known to stress over how to reduce sampling error from plus or minus seven percentage points to five. Here, the errors are several orders of magnitude larger.
Competitive analysis tools demand that you interpret with caution. Triangulate with other data sources or time periods. It’s important to remember that the data are not carved into stone tablets. They’re just estimates, and frequently not very accurate.
About the author
David Kamerer serves as assistant professor in the School of Communication at Loyola University Chicago.
E–mail: david [at] davidkamerer [dot] com
1. “Estimating Webmaster skew in Alexa metrics,” at http://www.mattcutts.com/blog/estimating-webmaster-skew-in-alexa-metrics/, accessed 3 March 2012.
3. “Alexa toolbar and the problem of experiment design,” at http://norvig.com/logs-alexa.html, accessed 3 March 2012.
5. “SEOmoz | Website Analytics vs. Competitive Intelligence Metrics,” at http://www.seomoz.org/article/search-blog-stats, accessed 3 March 2012.
7. Avinash Kaushik, 2007. Web analytics: An hour a day. Indianapolis, Ind.: Sybex.
9. Avinash Kaushik, 2010. Web analytics 2.0. Indianapolis, Ind.: Wiley.
10. “Cookies & Google Analytics — Google Analytics — Google code,” at http://code.google.com/apis/analytics/docs/concepts/gaConceptsCookies.html, accessed 3 March 2012.
11. “Revisiting log file analysis versus page tagging,” at http://web.analyticsblog.ca/2010/02/revisiting-log-file-analysis-versus-page-tagging/, accessed 19 November 2011.
12. “Compete data methodology white paper,” at http://blog.compete.com/2010/03/15/compete-data-methodology-white-paper/, accessed 6 March 2012.
13. “AAPOR | bad samples,” at http://www.aapor.org/Content/aapor/Resources/PollampSurveyFAQ1/WhatisaRandomSample/BadSamples/default.htm, accessed 6 March 2012.
14. “Help,” at http://www.alexa.com/help/traffic-learn-more, accessed 6 March 2012.
16. “Quantcast publisher FAQs,” at http://www.quantcast.com/learning-center/faqs/quantcast-publisher-faqs/, accessed 6 March 2012.
Alexa, “Our data,” at http://www.alexa.com/help/traffic-learn-more, accessed 6 March 2012.
Compete, “Where does Compete’s data come from?” at http://www.compete.com/us/about/our-data/, accessed 6 March 2012.
Google Analytics, “Google Analytics,” at http://google.com/analytics, accessed 6 March 2012.
Avinash Kaushik, 2010. Web analytics 2.0: The art of online accountability & science of customer centricity. Indianapolis, Ind.: Wiley.
Avinash Kaushik, 2007. Web analytics: An hour a day. Indianapolis, Ind.: Sybex.
Quantcast, “Quantcast Methodology FAQs,” at http://www.quantcast.com/learning-center/faqs/methodology-faq/, accessed 6 March 2012.
Received 7 March 2012; accepted 5 March 2013.
Copyright © 2013, First Monday.
Copyright © 2013, David Kamerer.
Estimating online audiences: Understanding the limitations of competitive intelligence services
by David Kamerer.
First Monday, Volume 18, Number 5 - 6 May 2013