Check out the new NAA Community!
NAA.org has introduced a new opportunity to network and interact with your industry colleagues and NAA experts, share best practices and keep your fingers on the pulse of important industry issues. The NAA Community is a tool designed to make your online community experience easy, with exciting features including blogs, photo galleries, file sharing, upgraded e-forums, and more. Please also note that the Digital Edge blog has now moved to NAA Community.

Get started on NAA Community today!

Search Blog

<<  July 2009  >>

SMTWTFS
1234
567891011
12131415161718
19202122232425
262728293031

December 07, 2007

Search Engines: Meet ACAP

Group asks media sites to implement new standards for communicating with search engines

I just wanted to call some attention to this, since it may pertain to the searchability of a lot of newspaper Web sites. (Disclosure: My employer, the Newspaper Association of America, is a member/supporter of ACAP.)

ACAP version 1.0, a strategically-based technology solution to the controversy over what mainstream media content search engines can index, launched at the end of November. The people behind ACAP are asking newspaper and other media Web sites to install and implement it.

ACAP is the acronym for Automated Content Access Protocol.  Here's the quick (easy to understand) background from the Nov. 30 Online Publishing Update:

Currently, most (but not all) search engines respect the instructions in a document called “robot.txt” that tells search engines not to crawl certain Web pages or site sections. A group of publishers unveiled a proposal [Nov. 29] called the “Automated Content Access Protocol” which would require all search engines to respect search instructions and restrictions.

“If accepted by search engines, publishers say they would be willing to make more of their copyright-protected materials available online. But Web surfers also could find sites disappear from search engines more quickly, or find smaller versions of images called thumbnails missing if sites ban such presentations,” The Associated Press reported.

Many Web sites already have robot.txt files that essentially tell search engines what to crawl and what to ignore. There is some controversy over robot.txt files, though, as some Web developers see them as a road map for hackers. According to the ACAP FAQ section:

We recognise that robots.txt is a well established method for communication between content owners and crawler operators. However, robots.txt is not sophisticated enough for today's content and publishing models. Robots.txt, in its current form as implemented by most search engine operators, provides only a simple choice between allowing and disallowing access. These simple choices are inconsistently interpreted. A number of proprietary extensions have been implemented by several of the major search engines, but not all search engines recognise all or even any of these extensions. ACAP provides a standard mechanism for expressing conditional access which is what is now required.

According to a recent press release, ACAP "will allow publishers, broadcasters and indeed any other publisher of content on the network to express their individual access and use policies in a language that search engine robots and similar automated tools can read and understand."

The primary drivers of ACAP are the World Association of Newspapers (WAN), the European Publishers Council (EPC) and the International Publishers Association (IPA). That press release also lists the pilot project participants (who tested the system for a year starting in late 2006) and current members, which now include The Associated Press, Reuters and NAA.



Posted by Beth Lawton at 11:47 AM | PermaLink | 0 comments

Subscription Options

You are not logged in, so your subscription status for this entry is unknown. You can login here.

Comments

No comments found.

Commenting has been disabled for this entry.