Sign In    |    Member Center

The Trouble with Cookies

Research cautions against an overreliance on cookies

There’s a growing realization in the online publishing industry that Web user measurement cannot be accurate as long as it relies on anonymous cookies to distinguish one site visitor from another.

The question now is: What is the solution?

At Lee Enterprises, Internet sales director Greg Swanson has one answer. He has stopped using server log analysis tools to figure out the reach and frequency of his company’s newspaper Web sites. He believes that pop-up surveys that ask site visitors about their online habits produce more accurate counts of Web visitors and their frequency of usage.

At other sites, executives have concluded that only mandatory registration will give them the data they need about their audience. Meanwhile, audience measurement vendors – recognizing a growing need among site producers and advertisers for a consistent, understandable approach to measuring online usage – are scrambling to create and offer solutions that avoid the limitations associated with anonymous cookies.

Cookies Defined

A cookie is a small string of text code delivered by a Web server to a user’s computer to be stored in a file or folder. With the technology architecture built into the World Wide Web, the only Web domain that can access the cookie later is the one that delivered it in the first place.

Privacy advocates have long been concerned about the potential threats to privacy caused by cookies, but some fail to realize that cookies themselves cannot be connected to personally identifiable information unless a Web user chooses to provide it. Giving information to a site that also delivers you a cookie can be a great convenience. If you have sites that “recognize” you each time you visit, making it unnecessary for you to log in or, say, provide your shipping address each time, you are benefiting from the site’s use of cookies and its storage of personal information about you.

From the perspective of a Web site manager, the problem with cookies is that many audience measurement tools rely on cookies to identify unique users. But they don’t capture any personal information to connect to the cookie.

Take server log analysis tools as an example. If your site uses WebTrends to analyze server log files, you are relying on cookies to count the number of visitors to your site. Each time someone visits your site, WebTrends checks for a previously issued cookie from your site. If a cookie is present, that user is recognized as a repeat visitor. If no cookie is present, a new cookie is issued – and the user is considered to be a first-time visitor.

Unless your site is requiring or offering registration, the cookie cannot be connected to personally identifiable information. In the absence of personal information, there are various ways WebTrends might consider someone to be a first-time visitor when that person actually has visited before:

  • The user might be visiting from a computer different than the one he used the last time (for instance, someone who used their office computer and received a cookie there might now be visiting from a cookieless home computer).
  • The user might have his Internet Explorer security settings configured to block cookies from your site.
  • The user might use more than one browser (say, Internet Explorer and Mozilla Firefox) and have separate cookies delivered to each.
  • The user might have deleted his cookie files since the last time he visited your site.

In each of these cases, the effect of anonymous cookies is to cause WebTrends to fail to recognize that a particular site user has visited the site before. There is also the chance that the site visitor is using a computer shared with other users – in which case WebTrends would consider the user a repeat visitor when he actually is visiting for the first time.

Overcounting Problems

In the past year, several researchers have attempted to measure the significance of the cookie problem. All have found evidence that audience measurement systems that rely on anonymous cookies significantly overestimate the number of unique visitors to a given site.A few data points:

  • On two British e-commerce sites, RedEye International found that cookie-based measurement overcounted unique visitors by a factor of as much as 2.3 over the course of a month.
  • Looking at a sample of sites that required registration, SageMetrics Corp. found that cookie-based measurement would overstate the number of users for those sites by 27 to 35 percent.
  • Using online surveys, Belden Associates found that up to half the users of newspaper sites report using more than one computer to visit those sites.
  • In a survey taken earlier this year, NetRatings Inc.’s @Plan found that 43.7 percent of Internet users said they had deleted cookies from their computer in the previous month.

“If you’re calculating based on cookies, you may be getting a very skewed measure of reach and frequency,” said Michael Saxon, vice president of media products for Nielsen//NetRatings.

Cookies are fundamental to the technologies used by most server log analysis tools such as WebTrends and Urchin, as well as by page-tagging solutions such as Sage and Omniture Inc. Cookies are also critical to the capabilities provided by companies that offer behavioral targeting of advertisements, such as Tacoda Systems.

(There are some bare-bones server log analysis tools that don’t use cookies. These rely on users’ Internet protocol (IP) addresses to distinguish visitors from one another. These are even less accurate than cookie-based systems because visitors from a single company, institution or Internet Service Provider are perceived as having the same IP address. RedEye’s research indicated that these tools overcounted unique visitors by a factor of as much as 7.6 – meaning they were more than three times less accurate than cookie-based measurement.)

If the number of unique visitors is overestimated by cookie-based measurement tools, one could imply that people visit more frequently than the cookie-based tools suggest.

In a survey of sites using cookie-based monitoring techniques, Belden Associates found that the average newspaper Web site was seeing users come to their site an average of two to three times per month. This was at odds with Belden’s findings from pop-up online surveys. In those surveys, newspaper site users said they were visiting an average of seven times per month – with a core of frequent users who visit almost every day.

Lee Enterprises’ Swanson considered the conflicting data. One possibility, he realized, was that the pop-up online surveys were being filled out mostly by loyal and frequent site users. But he ultimately concluded that it was the pop-up surveys that presented a more accurate picture of the reach and frequency of his Web sites.

New Approach to Audience, Ads

Working with Belden, Lee constructed a model of the Web audiences for its small- and medium-sized newspapers. (Prior to its acquisition of Pulitzer Inc., the company’s largest solely owned property was the North County Times in Escondido, Calif., with a weekday circulation of about 93,000.) The goal of the model was to reconcile the conflicting data from its online audience surveys and its server log analysis data.

Lee’s model suggested that its newspaper Web sites had a core audience – less than a third of all visitors – who visited an average of four to five days a week, once or twice a day. These frequent visitors accounted for 80 percent or more of the sites’ traffic. Meanwhile, there was a much larger number of users, two or three times as many as the frequent visitors, who came just once or twice a month, accounting for the remainder of the traffic.

This conclusion led Lee executives to dramatically change the way the company approached managing its sites – and selling online advertising.

To drive more traffic to its sites, Lee concluded that it would be more productive to appeal to frequent visitors than infrequent ones. In Casper, Wyo., for instance, the company added a midday news update – and doubled traffic within a month.

“We have found that updating throughout the day, adding more content throughout the day aimed at the core users, drives traffic much faster than trolling for incidental visitors,” Swanson said.

The company’s new understanding of its audience had significant implications for its advertising sales efforts as well. Newspaper ad sales representatives had been wary of helping to sell online ads – in part, Swanson said, because they were armed with cookie-based data that suggested their papers’ Web sites had bigger audiences than their newspapers. No one who’d dedicated a career to selling print advertising found that to be an exciting selling point.

Instead, Swanson said, Lee believes that within local markets, its papers’ Web sites have a core, loyal audience about a fifth as large as the newspaper’s readership (as opposed to circulation). An audience that size is usually comparable to that of the most successful television station in the market, and larger than the audience of any local radio station, Swanson said.

“Now the print rep,” said Swanson, “gets to tell a story they like to tell” – namely, that the newspaper’s Web site has the second- or third-largest audience (after the print edition) of any local media outlet. Armed with that information, Lee found significant success selling online ads to local merchants in its markets. For 2004, company executives reported that increased online ad revenue contributed a third of the company’s profit growth for 2004. For 2005, Lee has established a corporate goal that its Web sites “become the second largest media buy in the market,” Swanson said.

Alternative Systems

Lee’s approach is particularly relevant for newspapers in small markets, which cannot afford to invest in more sophisticated audience measurement systems. For larger markets, Web publishers concerned about the cookie problem can look to other approaches to understand their sites’ audiences:

Page-tagging (or “beacon”) systems use cookies in essentially the same way that server log analysis tools do. Relying on these JavaScript tags to measure reach and frequency will produce the same kinds of distortions as server log analysis tools. But these systems (from companies including SageMetrics and Omniture) offer much more than measures of reach and frequency.

Page-tagging systems can collect and transmit a variety of information about a page that’s being viewed, including the section of the site where it is located, the time the page was delivered, and even details about which specific link is clicked. These tools also make it possible to see the paths that users take through a site – including the pages they start on and the ones they were viewing when they decide to leave. While the number of visitors and their frequency may be inaccurate, this rich data warehouse can be invaluable for understanding how people use your site and how to improve it.

Mandatory site registration offers the potential to overcome the limitations associated with anonymous cookies. Sites that require people to register can link the registration data to cookies and, in concept, ensure that visitors are properly identified and counted.

Even with registration, though, most sites make parts of their site (for instance, the home page) open even to non-registered users. For such sites, cookie-based measurement systems will still overcount reach and undercount frequency. The site can now emphasize its base of registered users. But that number may be so much lower than its count of unique visitors that site managers and ad sales representatives may still prefer to talk about the larger number. Behavioral ad targeting systems such as those marketed by Tacoda Systems rely on cookies to determine if, say, a particular site visitor today is the same one who visited the auto classified section last week. If so, the site should deliver a car dealer’s ad to the visitor.

Behavioral systems count the “targetable audience” only – an audience smaller than what might be counted by other measurement tools.

“In a sense, what a system like ours is doing is extracting the most valuable audience you have – people who allow themselves to be identified, because they haven’t turned off cookies, are coming back to the site, and are doing things that enable the site to correctly represent them as being of value to particular ad categories,” said Bennett Zucker, executive director of customer success for Tacoda.

To measure a site’s market share, another option is to use a network traffic monitoring system such as Hitwise, which obtains its traffic data from Internet service providers. Using this data, Hitwise reports market share for different sites in 160 site categories. Hitwise cannot count the number of users, or their frequency of visiting, but it can show how many page views or visits your site gets compared with your competitors.

Nielsen//NetRatings, one of the two leading providers of panel-based audience measurement for Web sites, has proposed a different approach to more accurately measure reach and frequency of Web sites and advertising campaigns.

NetRatings and its competitor, comScore Networks, have an advantage over cookie-based measurement systems in that they have identification and demographics for their panel members. Assuming their panels are representative of the Internet population (an assumption that can be challenged – see “Pondering Panels”), Nielsen and comScore can project the number of unique visitors a site has and can construct a demographic profile of a site’s users.

ComScore and Nielsen can’t help an advertiser decide how to target their advertising based on content interests of Web users. A car advertiser might know how many users a site such as cars.com would have, but not be able to estimate how many car buyers they could reach with an ad in a particular newspaper’s auto classifieds section. That’s because, except for very large sites, the panel-based data doesn’t categorize Web pages based on type of content.

To address this problem, about a year ago, Nielsen announced the acquisition of RedSheriff – a company providing page-tagging technology linked to cookies. Nielsen renamed the company’s product SiteCensus and announced plans to connect its panel-based tracking data to SiteCensus tags.

In a recent proposal to the Online Publishers Association (OPA), Nielsen suggested that OPA members consider adopting SiteCensus on their sites. If enough sites agreed to use this common measurement system, Nielsen argued, it could combine data on page views and ad impressions from SiteCensus with data on audience characteristics from its panel. Then it would be possible to know:

  • The true reach and frequency of advertising campaigns purchased across multiple sites.
  • The demographics of people exposed to advertising on multiple sites.
  • The degree to which Internet advertising contributes to an advertiser’s overall media mix.

“What we’re proposing here is industry-level market intelligence,” said Saxon of Nielsen//NetRatings. “If all the firms in an industry are interested in getting really accurate data, all of you tag with SiteCensus.”

Contacted in December, executives of the OPA and Interactive Advertising Bureau (IAB) were not leaping to embrace Nielsen’s proposal.

“What we continually hear from marketers is that they are moving beyond 'counting eyeballs' and are looking for ways to measure consumer engagement with the various media,” said Michael Zimbalist, president of OPA, in a comment relayed through a spokesperson. “Therefore, we believe that the online marketplace would benefit more from research and development focused on these sorts of metrics rather than yet another attempt to reinvent Web audience measurement."

Greg Stuart, president of the IAB, said the online advertising industry is interested in seeing if Nielsen’s approach would help but needs to know more about it. “It’s clearly a devil-in-the-details kind of situation,” Stuart said.

For Stuart, a key challenge in online advertising is developing consistency and “transparency” in how online advertising and traffic are measured.

“We have superior data (to other media). It’s just that it’s not accepted, which is kind of a funny dynamic,” Stuart said. “In every other medium, audience data is incredibly transparent to buyers and sellers in terms of how it’s developed and packaged. One of the challenges is that we don’t have the same level of transparency as everyone else.”

Overcoming Cookie Limitations

    
The following measurement systems provide a more sophisticated approach to audience knowledge than simple cookie tracking:
  • Page-tagging (or “beacon”) systems
  • Mandatory site registration
  • Behavioral ad targeting systems
  • Network traffic monitoring system
  • Panel-based audience measurement

Digital Links:

Lee Enterprises
Interactive Advertising Bureau
North County Times
Online Publishers Association

By Rich Gordon, Chair - Newspapers & New Media, Medill School of Journalism, Northwestern University


First Published:
June 8, 2007