But this article also taps into a common fear of the 20th and 21st centuried that our lives will be dictated by machines and not humans. Are we losing something, it seems to ask. Will headless bodies ever again walk into topless bars? But the reality is that indexing is not only being performed by algorithims, they are also being performed by humans as they leave a trail of tags and blogs marking their paths on the web. How we connect information may in fact be our most creative act.
Also intriguing that Times web journalists are trained for search engine optimization given that the Times prevents search engines from indexing their pages, as I learned from this On the Media interview about Google Books with NYU media scholar Siva Vaidhyanathan.
"SIVA VAIDHYANATHAN: Well, they already have, for the basic reason that Google has based its entire business on the fact that it can copy almost the entire Web without any serious copyright implications, because to create an index, it has to make copies. If you don't want someone copying your work online, it's your duty to opt out of the system. There are ways that the New York Times makes sure that its news stories, for instance, don't get copied by Google and then placed in the index. Now, most people producing most webpages opt in or don't opt out...."Trademark Protection, Copyright and Search Engines by Sara Holoubek reports on the Search Engine Strategies conference, explains how websites can opt out.
When prompted about altered or excerpted content, copyright laws exclude facts, so it is legal to post facts taken from another written account. [Peter D.] Raymond brought up the fact that if a publisher did not want his content available in a search engine's index, he should simply use a robot.txt file to instruct search engine crawlers to avoid the content.
This Boring Headline Is Written for Google - New York Times
Ideas & Trends
This Boring Headline Is Written for Google
By STEVE LOHR
Published: April 9, 2006
JOURNALISTS over the years have assumed they were writing their headlines and articles for two audiences — fickle readers and nitpicking editors. Today, there is a third important arbiter of their work: the software programs that scour the Web, analyzing and ranking online news articles on behalf of Internet search engines like Google, Yahoo and MSN.
The search-engine "bots" that crawl the Web are increasingly influential, delivering 30 percent or more of the traffic on some newspaper, magazine or television news Web sites. And traffic means readers and advertisers, at a time when the mainstream media is desperately trying to make a living on the Web.
So news organizations large and small have begun experimenting with tweaking their Web sites for better search engine results. But software bots are not your ordinary readers: They are blazingly fast yet numbingly literal-minded. There are no algorithms for wit, irony, humor or stylish writing. The software is a logical, sequential, left-brain reader, while humans are often right brain.
In newspapers and magazines, for example, section titles and headlines are distilled nuggets of human brainwork, tapping context and culture. "Part of the craft of journalism for more than a century has been to think up clever titles and headlines, and Google comes along and says, 'The heck with that,' " observed Ed Canale, vice president for strategy and new media at The Sacramento Bee.
Moves to accommodate the technology are tricky. How far can a news organization go without undercutting its editorial judgment concerning the presentation, tone and content of news?
So far, the news media are gingerly stepping into the field of "search engine optimization." It is a booming business, estimated at $1.25 billion in revenue worldwide last year, and projected to more than double this year.
Much of this revenue comes from e-commerce businesses, whose sole purpose is to sell goods and services online. For these sites, search engine optimization has become a constant battle of one-upmanship, pitting the search engine technologists against the marketing experts and computer scientists working for the Web sites.
Think of it as an endless chess game. The optimizer wizards devise some technical trick to outwit the search-engine algorithms that rank the results of a search. The search engines periodically change their algorithms to thwart such self-interested manipulation, and the game starts again.
News organizations, by contrast, have moved cautiously. Mostly, they are making titles and headlines easier for search engines to find and fathom. About a year ago, The Sacramento Bee changed online section titles. "Real Estate" became "Homes," "Scene" turned into "Lifestyle," and dining information found in newsprint under "Taste," is online under "Taste/Food."
Some news sites offer two headlines. One headline, often on the first Web page, is clever, meant to attract human readers. Then, one click to a second Web page, a more quotidian, factual headline appears with the article itself. The popular BBC News Web site does this routinely on longer articles.
Nic Newman, head of product development and technology at BBC News Interactive, pointed to a few examples from last Wednesday. The first headline a human reader sees: "Unsafe sex: Has Jacob Zuma's rape trial hit South Africa's war on AIDS?" One click down: "Zuma testimony sparks HIV fear." Another headline meant to lure the human reader: "Tulsa star: The life and career of much-loved 1960's singer." One click down: "Obituary: Gene Pitney."
"The search engine has to get a straightforward, factual headline, so it can understand it," Mr. Newman said. With a little programming sleight-of-hand, the search engine can be steered first to the straightforward, somewhat duller headline, according to some search optimizers.
On the Web, space limitations can coincide with search-engine preferences. In the print version of The New York Times, an article last Tuesday on Florida beating U.C.L.A. for the men's college basketball championship carried a longish headline, with allusions to sports history: "It's Chemistry Over Pedigree as Gators Roll to First Title." On the Times Web site, whose staff has undergone some search-engine optimization training, the headline of the article was, "Gators Cap Run With First Title."
The Associated Press, which feeds articles to 11,000 newspapers, radio and television stations, limits its online headlines to less than 40 characters, a concession to small screens. And on the Web, there is added emphasis on speed and constant updates.
"You put those demands, and that you know you're also writing for search engines, and you tend to write headlines that are more straightforward," said Lou Ferrara, online editor of The Associated Press. "My worry is that some creativity is lost."
Whether search engines will influence journalism below the headline is uncertain. The natural-language processing algorithms, search experts say, scan the title, headline and at least the first hundred words or so of news articles.
Journalists, they say, would be wise to do a little keyword research to determine the two or three most-searched words that relate to their subject — and then include them in the first few sentences. "That's not something they teach in journalism schools," said Danny Sullivan, editor of SearchEngineWatch, an online newsletter. "But in the future, they should."
Such suggestions stir mixed sentiments. "My first thought is that reporters and editors have a job to do and they shouldn't worry about what Google's or Yahoo's software thinks of their work," said Michael Schudson, a professor at the University of California, San Diego, who is a visiting faculty member at the Columbia University Graduate School of Journalism.
"But my second thought is that newspaper headlines and the presentation of stories in print are in a sense marketing devices to bring readers to your story," Mr. Schudson added. "Why not use a new marketing device appropriate to the age of the Internet and the search engine?"
In journalism, as in other fields, the tradition of today was once an innovation. The so-called inverted pyramid structure of a news article — placing the most important information at the top — was shaped in part by a new technology of the 19th century, the telegraph, the Internet of its day. Putting words on telegraph wires was costly, so reporters made sure the most significant points were made at the start.
Yet it wasn't all technological determinism by any means. The inverted pyramid style of journalism, according to Mr. Schudson, became standard practice only in 1900, four decades or more after telegraph networks came into use. It awaited the rise of journalists as "an avowedly independent, self-conscious, professionalizing group," confident of their judgments about what information was most important, he said.
The new technology shaped practice, but people determined how the technology was used — and it took a while. Something similar is the likely path of the Internet.
"We're all struggling and experimenting with how news is presented in the future," said Larry Kramer, president of CBS Digital Media. "And there's nothing wrong with search engine optimization as long as it doesn't interfere with news judgment. It shouldn't, and it's up to us to make sure it doesn't. But it is a tool that is part of being effective in this medium."