'Scrapers' Dig Deep for Data on Web market for personal data about Internet users is


Senior Member
Didn't know where to put this article. Be careful about the details of what you put out there. I am more worried about identity theft than someone knowing I have CFS/FM, hate the CDC, take Ultracet, hate most shrinks, and so on...I do LOVE Coke so if the Coke company wants to send me a few year's worth of coke I would be most happy! Oh, and Doral Ultralights too!! Scrape THAT!

OCTOBER 12, 2010
'Scrapers' Dig Deep for Data on Web http://online.wsj.com/article/SB10001424052748703358504575544381288117888.html

At 1 a.m. on May 7, the website PatientsLikeMe.com noticed suspicious activity on its "Mood" discussion board. There, people exchange highly personal stories about their emotional disorders, ranging from bipolar disease to a desire to cut themselves.
It was a break-in. A new member of the site, using sophisticated software, was "scraping," or copying, every single message off PatientsLikeMe's private online forums.

PatientsLikeMe managed to block and identify the intruder: Nielsen Co., the privately held New York media-research firm. Nielsen monitors online "buzz" for clients, including major drug makers, which buy data gleaned from the Web to get insight from consumers about their products, Nielsen says.
"I felt totally violated," says Bilal Ahmed, a 33-year-old resident of Sydney, Australia, who used PatientsLikeMe to connect with other people suffering from depression. He used a pseudonym on the message boards, but his PatientsLikeMe profile linked to his blog, which contains his real name.
After PatientsLikeMe told users about the break-in, Mr. Ahmed deleted all his posts, plus a list of drugs he uses. "It was very disturbing to know that your information is being sold," he says. Nielsen says it no longer scrapes sites requiring an individual account for access, unless it has permission.

digits: How to Escape Web Scrapers
The market for data about Web users is hot-and one of the methods used is "scraping," harvesting online conversations. In May, Nielsen scraped private forums where patients discuss illnesses. How can web users prevent their data from being scraped? J

julia Angwin joins Digits to discuss.
Related Reading
Digits: Escaping the 'Scrapers'
Complete Coverage: What They Know

Journal Community
The market for personal data about Internet users is booming, and in the vanguard is the practice of "scraping." Firms offer to harvest online conversations and collect personal details from social-networking sites, rsum sites and online forums where people might discuss their lives.
The emerging business of web scraping provides some of the raw material for a rapidly expanding data economy. Marketers spent $7.8 billion on online and offline data in 2009, according to the New York management consulting firm Winterberry Group LLC. Spending on data from online sources is set to more than double, to $840 million in 2012 from $410 million in 2009.
The Wall Street Journal's examination of scrapinga trade that involves personal information as well as many other types of datais part of the newspaper's investigation into the business of tracking people's activities online and selling details about their behavior and personal interests.
Some companies collect personal information for detailed background reports on individuals, such as email addresses, cell numbers, photographs and posts on social-network sites.
Others offer what are known as listening services, which monitor in real time hundreds or thousands of news sources, blogs and websites to see what people are saying about specific products or topics.
One such service is offered by Dow Jones & Co., publisher of the Journal. Dow Jones collects data from the Webwhich may include personal information contained in news articles and blog postingsthat help corporate clients monitor how they are portrayed. It says it doesn't gather information from password-protected parts of sites.
It's rarely a coincidence when you see Web ads for products that match your interests. WSJ's Christina Tsuei explains how advertisers use cookies to track your online habits.
The competition for data is fierce. PatientsLikeMe also sells data about its users. PatientsLikeMe says the data it sells is anonymized, no names attached.
Nielsen spokesman Matt Anchin says the company's reports to its clients include publicly available information gleaned from the Internet, "so if someone decides to share personally identifiable information, it could be included."
Internet users often have little recourse if personally identifiable data is scraped: There is no national law requiring data companies to let people remove or change information about themselves, though some firms let users remove their profiles under certain circumstances.
California has a special protection for public officials, including politicians, sheriffs and district attorneys. It makes it easier for them to remove their home address and phone numbers from these databases, by filling out a special form stating they fear for their safety.
Data brokers long have scoured public records, such as real-estate transactions and courthouse documents, for information on individuals. Now, some are adding online information to people's profiles.
Many scrapers and data brokers argue that if information is available online, it is fair game, no matter how personal.
"Social networks are becoming the new public records," says Jim Adler, chief privacy officer of Intelius Inc., a leading paid people-search website. It offers services that include criminal background checks and "Date Check," which promises details about a prospective date for $14.95.
"This data is out there," Mr. Adler says. "If we don't bring it to the consumer's attention, someone else will."
Scraping for Your Real Name
PeekYou.com has applied for a patent for a way to, among other things, match people's real names to pseudonyms they use on blogs, Twitter and online forums.
Read PeekYou.com's patent application


XMRV - L'Agent du Jour
Thanks for posting this info Muffin. Cort - would it be a good idea for this info to be made into a 'Sticky' thread on the forum, to alert all users to be careful what we post in terms of personal info here?