<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.shumans.com/~d/styles/atomfull.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.shumans.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><feed xmlns="http://purl.org/atom/ns#" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="0.3" xml:lang="en"><title>shumans.com</title><link rel="alternate" type="text/html" href="http://shumans.com/" /><tagline type="text/html" mode="escaped">Business, media and technology blog and articles by Shuman Ghosemajumder</tagline><copyright>Copyright 2008</copyright><modified>2008-05-02T08:01:47+00:00</modified><generator>http://www.sixapart.com/movabletype/?v=3.35</generator><link rel="start" href="http://feeds.shumans.com/shumans" type="application/atom+xml" /><feedburner:feedFlare href="http://add.my.yahoo.com/rss?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://us.i1.yimg.com/us.yimg.com/i/us/my/addtomyyahoo4.gif">Subscribe with My Yahoo!</feedburner:feedFlare><feedburner:feedFlare href="http://www.newsgator.com/ngs/subscriber/subext.aspx?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.newsgator.com/images/ngsub1.gif">Subscribe with NewsGator</feedburner:feedFlare><feedburner:feedFlare href="http://feeds.my.aol.com/add.jsp?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://o.aolcdn.com/favorites.my.aol.com/webmaster/ffclient/webroot/locale/en-US/images/myAOLButtonSmall.gif">Subscribe with My AOL</feedburner:feedFlare><feedburner:feedFlare href="http://www.rojo.com/add-subscription?resource=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://blog.rojo.com/RojoWideRed.gif">Subscribe with Rojo</feedburner:feedFlare><feedburner:feedFlare href="http://www.bloglines.com/sub/http://feeds.shumans.com/shumans" src="http://www.bloglines.com/images/sub_modern11.gif">Subscribe with Bloglines</feedburner:feedFlare><feedburner:feedFlare href="http://www.netvibes.com/subscribe.php?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.netvibes.com/img/add2netvibes.gif">Subscribe with Netvibes</feedburner:feedFlare><feedburner:feedFlare href="http://fusion.google.com/add?feedurl=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://buttons.googlesyndication.com/fusion/add.gif">Subscribe with Google</feedburner:feedFlare><feedburner:feedFlare href="http://www.pageflakes.com/subscribe.aspx?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.pageflakes.com/ImageFile.ashx?instanceId=Static_4&amp;fileName=ATP_blu_91x17.gif">Subscribe with Pageflakes</feedburner:feedFlare><entry><title>Tips on how to avoid phishing attacks</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/281931513/000063.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-05-02T03:01:47-05:00</issued><modified>2008-05-02T03:01:47-05:00</modified><id>http://shumans.com/articles/000063.php</id><content type="text/html" mode="escaped">Via the Google Blog, here are some recent tips from our security team on how to not get caught by phishing attacks: &lt;a href="http://googleblog.blogspot.com/2008/04/how-to-avoid-getting-hooked.html"&gt;How to avoid getting hooked&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/281931513" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000063.php</feedburner:origLink></entry><entry><title>Yahoo to add invalid clicks report</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/276473477/000062.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-04-23T18:17:28-05:00</issued><modified>2008-04-23T18:17:28-05:00</modified><id>http://shumans.com/articles/000062.php</id><content type="text/html" mode="escaped">It's great to hear that Yahoo will soon be adding &lt;a href="http://www.webpronews.com/topnews/2008/04/23/yahoo-introduces-click-filter-report"&gt;this report&lt;/a&gt;. I hope that every PPC ad network will provide a feature like this, and was glad when Microsoft added it &lt;a href="http://www.webpronews.com/topnews/2007/07/06/new-reports-added-to-adcenter"&gt;last year&lt;/a&gt;. For details on how to use this feature on Google AdWords, see our &lt;a href="http://adwords.blogspot.com/2006/07/estimating-invalid-clicks.html"&gt;blog post&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/276473477" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000062.php</feedburner:origLink></entry><entry><title>Back to Facebook</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/260988514/000061.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-30T21:04:22-05:00</issued><modified>2008-03-30T21:04:22-05:00</modified><id>http://shumans.com/articles/000061.php</id><content type="text/html" mode="escaped">After trying several social apps, I'm back to Facebook. The reason: I think Facebook could do most of what those apps do, and better. All they need to do: offer an additional News Feed that &lt;em&gt;removes&lt;/em&gt; the &lt;a href="http://blog.facebook.com/blog.php?post=2242467130"&gt;intelligence&lt;/a&gt; from the existing code (letting you see all your friends' updates, including from all apps) and let you customize from there.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/260988514" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000061.php</feedburner:origLink></entry><entry><title>How Google uses log data to improve search results</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/260547316/000060.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-30T00:26:39-05:00</issued><modified>2008-03-30T00:26:39-05:00</modified><id>http://shumans.com/articles/000060.php</id><content type="text/html" mode="escaped">Another post in our uses of data series, this time from Paul and Steve from Search Quality on how we comb through massive logs to build models which deliver relevant search results. &lt;a href="http://googleblog.blogspot.com/2008/03/making-search-better-in-catalonia.html"&gt;Making search better in Catalonia, Estonia, and everywhere else&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/260547316" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000060.php</feedburner:origLink></entry><entry><title>Staying safe online</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/260485010/000059.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Media</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-29T21:24:33-05:00</issued><modified>2008-03-29T21:24:33-05:00</modified><id>http://shumans.com/articles/000059.php</id><content type="text/html" mode="escaped">Here are a number of resources we posted this week, including a family safety guide and video. &lt;a href="http://googleblog.blogspot.com/2008/03/common-sense-approach-to-internet.html"&gt;Official Google Blog: A common sense approach to Internet safety&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/260485010" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000059.php</feedburner:origLink></entry><entry><title>Kourosh's click fraud talk at CMU</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/260131283/000058.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-29T04:59:02-05:00</issued><modified>2008-03-29T04:59:02-05:00</modified><id>http://shumans.com/articles/000058.php</id><content type="text/html" mode="escaped">Here's a great video of the 70-minute talk the head of our Ad Traffic Quality engineering team gave at Carnegie Mellon in October. &lt;a href="http://www.youtube.com/watch?v=6gihlx0tEWM"&gt;YouTube - Click Fraud: Anecdotes from the Front Line&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/260131283" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000058.php</feedburner:origLink></entry><entry><title>I didn't really need 4 RSS feeds</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/259976558/000057.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Media</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-28T21:30:22-05:00</issued><modified>2008-03-28T21:30:22-05:00</modified><id>http://shumans.com/articles/000057.php</id><content type="text/html" mode="escaped">I recently tried Twitter, del.icio.us, and FriendFeed to enable more frequent and granular writing and sharing. I've now stopped using the first two, but am still using FriendFeed. Instead, I've created a new section on my site with my microblog content and included it in my main RSS feed.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/259976558" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000057.php</feedburner:origLink></entry><entry><title>How we use log data to protect against click fraud</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/259854981/000056.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-28T17:05:46-05:00</issued><modified>2008-03-28T17:05:46-05:00</modified><id>http://shumans.com/articles/000056.php</id><content type="text/html" mode="escaped">This post includes a simple example of analyzing IP distributions. &lt;a href="http://googleblog.blogspot.com/2008/03/using-data-to-help-prevent-fraud.html"&gt;Official Google Blog: Using data to help prevent fraud&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/259854981" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000056.php</feedburner:origLink></entry><entry><title>How to unlist your phone number from Google</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/259854982/000055.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Media</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-28T16:59:23-05:00</issued><modified>2008-03-28T16:59:23-05:00</modified><id>http://shumans.com/articles/000055.php</id><content type="text/html" mode="escaped">I helped out with the Privacy Tips series on YouTube. &lt;a href="http://www.youtube.com/watch?v=XyuGCI7o_2c"&gt;YouTube: Google Privacy Tips: Unlisting Phone Numbers&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/259854982" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000055.php</feedburner:origLink></entry><entry><title>Our new Privacy Center</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/259844758/000054.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Media</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2008-03-28T16:32:47-05:00</issued><modified>2008-03-28T16:32:47-05:00</modified><id>http://shumans.com/articles/000054.php</id><content type="text/html" mode="escaped">Is now up: &lt;a href="http://www.google.com/intl/en/privacy.html"&gt;Google Privacy Center&lt;/a&gt;. Google Blog Post: &lt;a href="http://googleblog.blogspot.com/2008/03/privacy-made-easier.html"&gt;Privacy Made Easier&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/259844758" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000054.php</feedburner:origLink></entry><entry><title>Fair Isaac Says They Find Less Click Fraud Than Headlines Report</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/147576432/000053.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2007-05-19T00:16:15-05:00</issued><modified>2007-05-19T00:16:15-05:00</modified><id>http://shumans.com/articles/000053.php</id><content type="text/html" mode="escaped">&lt;span class=dropcap&gt;F&lt;/span&gt;air Isaac is one of the leading fraud detection companies in the world, and is an organization I have a great deal of respect for. I have spoken with them in the past, and they told us they have been trying to determine if click fraud detection might be a viable business for them. At Google, we’re very happy to see organizations with scientific backgrounds in anomaly detection getting into this space and conducting research. Fair Isaac put out a press release yesterday which has gotten coverage in a number of media outlets. The headlines indicate that Fair Isaac conducted a study which showed that 10-15% of all clicks on online advertising were fraudulent. It turns out this is not true. I spoke with Joe Milana, chief scientist at Fair Isaac, today to find out what the real story was. He told me that most of the headlines and stories were wrong.&lt;P&gt;First of all, he said Fair Isaac has &lt;em&gt;&lt;strong&gt;not &lt;/strong&gt;&lt;/em&gt;come up with an estimate of click fraud in the industry – and in fact only analyzed data from a handful (fewer than ten) advertisers. And even this finding pertains to only the syndication networks and not search engines, where the majority of pay-per-click advertising occurs.  In fact, they found that the rates of “pathological activity” on search engines was “negligible” (“a few percent or less”). This would imply a combined &lt;strong&gt;click fraud rate in the single digits&lt;/strong&gt; even in their sample set – which they said they would certainly not generalize to the entire industry.
&lt;P&gt;
Fair Isaac indicated that they needed a lot more data before they could conduct a meaningful study. They also recognized the need for clean data, acknowledging the importance of using auto-tagging to remove fictitious clicks as we had mentioned to them previously. Unfortunately none of the advertisers in their initial survey were using auto-tagging to fix this problem, which results in inflated click fraud estimates.
&lt;P&gt;
We’re continuing to talk and I hope we’ll be able to help them further understand the challenges relating to click fraud detection, which is completely different from fraud detection in other industries. The biggest difference is the fact that it requires unsupervised analysis, something they told us they are aware of. They won’t share their methodologies with us to protect their intellectual property of course, but I get the feeling that they may not be aware of many other factors relating to the specific behavior of the Internet, web browsers, etc., which make this much more than just a generic task for existing fraud tools from other industries. I’m looking forward to talking to them more as their study progresses, and hopefully takes these and other issues into account.
&lt;P&gt;
&lt;i&gt;Update: Search Engine Watch has additional details on this at &lt;a href="http://blog.searchenginewatch.com/blog/070519-082108"&gt;"Fair Isaac Click Fraud Report Spreads False Alarm"&lt;/a&gt;.&lt;/i&gt;&lt;img src="http://feeds.shumans.com/~r/shumans/~4/147576432" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000053.php</feedburner:origLink></entry><entry><title>Advertiser Requests on Invalid Clicks</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/147576433/000052.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2007-04-30T02:20:27-05:00</issued><modified>2007-04-30T02:20:27-05:00</modified><id>http://shumans.com/articles/000052.php</id><content type="text/html" mode="escaped">&lt;span class=dropcap&gt;T&lt;/span&gt;he Click Quality Council is a group of
  advertisers which meets regularly to discuss click fraud. A few days ago they
  came out with their “Cornerstone Principles for Pay-Per-Click Quality
  Improvement” &amp;ndash; eight requests from advertisers, similar to Jeffrey Rohrs’
  Sausage Manifesto, which also collected and presented advertiser requests in
  January. I thought folks might be interested in
  where Google stands on some of their requests. Overall, depending on how they
  would define some of these items, it looks like we’re already doing these things. Let’s take a look.&lt;BLOCKQUOTE&gt;
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;1) Advertisers should never pay for double clicks or repeat 
    clicks from the same session. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;I agree that advertisers should not be charged for double clicks. While the 
    activity of comparison shopping is a common reason that multiple clicks to 
    the same ad can occur within a short period of time, if the clicks occur so
    close together that they could only be caused by double-clicking or malicious 
    repeated clicking, the extra clicks clearly provide no value to the advertiser. 
    &lt;p&gt; But “same session” is not defined here, and it would be bad for advertisers 
    to define it in a way that would exclude comparison shopping. For example, 
    if publishers and search engines decided not charge for multiple clicks on 
    an ad within the same day, they would redesign their ad systems to not show 
    that advertiser’s ad the second time a user searched on the same keyword, 
    since showing ads which produce no revenue is not desirable. But this would 
    deny that advertiser the opportunity to have a user who was comparison shopping 
    revisit their site, and that would rob them of sales opportunities.
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;2) Advertisers should never pay for traffic from bots. &lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; This request surprised me, since I am not aware of any company in the entire 
    industry which has a policy of charging for clicks made by known bots. We obviously 
    monitor for bot activity and have lists of known bots which we maintain. The 
    difficulty is in knowing whether something is a bot. There are bots which 
    are easily identifiable (for example, if their User-Agent value announces 
    them as a bot) but there are also bots which nobody can identify. We have 
    systems and processes to detect and identify bots (as well as other click 
    fraud attempt methods, such as a click farms), but even in cases where traffic 
    cannot be identified as coming from a specific method, our overall detection 
    approach is still effective because it is based on analyzing data related 
    to the clicks themselves.
&lt;p&gt; &lt;em&gt;&lt;strong&gt;3) Advertisers should have control over where, when and to whom 
    ads are distributed. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;Definitely. We provide multiple levels of control, ranging from the coarse 
    granularity offered by geotargeting, or opting in or out of syndication or 
    the content network, to more detailed controls such as opting out of specific 
    URLs, which we’re the only major search engine to provide at the moment. We 
    are also going to be releasing the ability to prevent ads from showing to 
    specified IP addresses (see #4) in the next month.
&lt;p&gt; &lt;em&gt;&lt;strong&gt;4) Domain and IP exclusion lists from search providers should 
    be easy to use and maintain.&lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; I agree. We currently have URL/domain exclusion features and will be launching 
    IP exclusion in the next month. We have and will continue to work hard to 
    ensure features like these are easy to use. At the same time, it is important 
    to provide advertisers with more accurate information about domains and IPs so they 
    can make informed decisions and are not misled into thinking that Google expects 
    them to maintain such lists in order to protect against click fraud. These 
    are features which provide targeting controls to advertisers and are more 
    similar to geotargeting than anything related to invalid click detection.&lt;br/&gt;
    &lt;br/&gt;
    &lt;em&gt;&lt;strong&gt;5) Search providers should provide advertisers detailed referrer 
    information on all traffic that is billed. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;I agree with this, and we are currently working on ways to provide advertisers 
    with more transparency into where their ads are placed. Advertisers can already 
    obtain referrer URLs from their own web logs, of course.&lt;br/&gt;
    &lt;br/&gt;
    &lt;em&gt;&lt;strong&gt;6) Advertisers should never pay for traffic originating outside 
    the specified geo-targeted settings. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt; I agree with this also, but we need to be clear on what geotargeting is. 
    Geotargeting is based on IP address and other signals and works very well, 
    but is not perfect. There are some instances of IP addresses where geographic 
    location cannot be determined. In addition, when an advertiser targets a specific 
    country, our policy is to show their ads to users who are in that country 
    as well as to users who opt into results from that country. For example, if 
    a user chooses to use a country-specific Google site such as our French site 
    &lt;a href="http://www.google.fr/"&gt;www.google.fr&lt;/a&gt;, we will show them ads geotargeted 
    to France even if their computer is located elsewhere. (A side note: Google 
    does not have a US-specific site, and using Google.com from non-US countries 
    will not result in the user opting into US results and ads. Instead, the geotargeting 
    in that case will be based only on their machine location). Another example 
    of user choice taking precedence over machine location is when a user actually 
    types in a query which indicates they are interested in ads relevant to a 
    specific geography, such as "paris france travel". 
&lt;p&gt; &lt;em&gt;&lt;strong&gt;7) Search engines should adopt third-party validation for click 
    quality as other media companies have done for their audience validation. 
    &lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; We are in favor of submitting our systems to an audit by a trusted third party, 
    and are working with the other members of the &lt;a href="http://adwords.blogspot.com/2006/08/creating-standards-for-clicks.html" title="IAB Click Measurement Working Group"&gt;IAB 
    Click Measurement Working Group&lt;/a&gt; to set this up. The audit will likely 
    be administered through the Media Ratings Council, the organization which 
    audits Nielson and Arbitron. Third-party click fraud auditing firms should 
    also be audited through the MRC to ensure they do not repeat the types of 
    errors that have happened in the past, when fictitious clicks were included 
    in advertiser reports. Those reports misled advertisers and advised them to 
    make decisions which could significantly damage their businesses. 
  &lt;p&gt; A simple example of continuing serious accounting issues with third parties: 
    several firms have admitted to overcounting errors in the past due to fictitious 
    clicks and have adopted Google's auto-tagging support in their systems to 
    begin to correct the problem for analysis they do for Google advertisers. 
    While they claim to have dealt with the problem of fictitious clicks, some 
    of the same firms continue to publicize estimates of industry click fraud 
    rates which include networks (such as Yahoo and MSN) where it is not yet possible 
    to distinguish between fictitious clicks and real clicks (due to lack of support 
    similar to Google's auto-tagging).
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;8) Search providers should provide an easy mechanism to reconcile 
    paid clicks on a monthly basis. &lt;/strong&gt;&lt;/em&gt; 
&lt;p&gt; Definitely. Google provides this through &lt;a href="https://adwords.google.com/support/bin/answer.py?answer=31216&amp;amp;query=auto-" title="auto-tagging"&gt;auto-tagging&lt;/a&gt;, 
    which allows advertisers (and third party analytics firms, including click 
    fraud auditing firms) to reconcile the clicks they see in their logs with 
    the number of clicks in their AdWords reports. Using auto-tagging, advertisers 
    (and third-party firms) are able to get accurate information on how many clicks 
    occurred on their campaigns and how those figures compare to the activity 
    seen in their logs. This allows them to properly count clicks and avoid the 
    problem of fictitious clicks we have discussed &lt;a href="http://shumans.com/articles/000048.php"&gt;before&lt;/a&gt;.
  &lt;p&gt; Google also provides our advertisers with reports of the &lt;a href="http://adwords.blogspot.com/2006/07/estimating-invalid-clicks.html"&gt;daily 
    number of invalid clicks&lt;/a&gt; on their campaigns, which is what they (and third-party 
    auditing firm) need to verify whether the number of clicks they thought were 
    suspicious was less than or equal to the number of clicks we already filtered 
    out for them that day.
  &lt;p&gt;

  We are the only company in the industry
  that currently provides either of these features,
  but we have been working on
  evangelizing them to our competitors and the industry overall. MSN has
  announced that they will be releasing their version of invalid clicks
  reporting later this year, but none of the other major search engines has yet
  adopted a feature like auto-tagging. We hope both of these will become part of
  the IAB standards.
  
  We have
  also been working on plans to share detailed click information, similar to a
  phone bill as many in the industry have pointed out. It would contain
  information such as the IP addresses, time, and cost associated with
  individual clicks. It would not contain flags for which specific
  clicks were detected as invalid (and not charged for), since that would make
  it simple for a fraudster to pose as an advertiser, run an experiment with
  millions of clicks, and then attempt to reverse engineer our system.
  But this type of report would provide advertisers further
  transparency into which clicks occurred on their ads and more easily identify
  discrepencies between their systems and ours.
&lt;/BLOCKQUOTE&gt;
&lt;p&gt;

  Many thanks to the advertisers who
  provided their suggestions, as well as to all of the other groups that send us
  ideas regularly. We benefit greatly
  from the feedback our advertisers provide us, as it helps
  us constantly improve our systems and customer service, and we would
  always like to get more.
  In fact, we are hosting our
  first advertiser forum dedicated exclusively to invalid clicks
  at Google headquarters this coming week. In it, we will be meeting
  with several dozen advertisers, both large and not-so-large, to
  discuss their concerns, share information about our invalid click
  detection methods and policies, and come up with ways to continue to deliver a
  great advertising experience on Google.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/147576433" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000052.php</feedburner:origLink></entry><entry><title>Structure of a Click Fraud Botnet</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/147576434/000051.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2007-04-11T15:19:14-05:00</issued><modified>2007-04-11T15:19:14-05:00</modified><id>http://shumans.com/articles/000051.php</id><content type="text/html" mode="escaped">&lt;span class=dropcap&gt;O&lt;/span&gt;ne question I frequently get asked by some of our more advanced advertisers is "what is Google doing about click fraud from botnets?" Analyzing botnets is an important activity in both our Click Quality and Security Teams. Yesterday, Dr. &lt;a href="http://www.neildaswani.com/"&gt;Neil Daswani&lt;/a&gt;, a member of both teams, presented a &lt;a href="http://www.usenix.org/events/hotbots07/tech/full_papers/daswani/daswani.pdf"&gt;paper&lt;/a&gt; at the &lt;a href="http://www.usenix.org/events/hotbots07/tech/"&gt;HotBots 2007&lt;/a&gt; workshop on a case study of one such botnet we examined last year called Clickbot.A. The paper provides an in-depth look at how a fraudster was attempting to utilize 100,000 machines to execute a low-noise click fraud attack through syndicated search ads.&lt;P&gt;
Botnets have of course been around for many years, and have been used most commonly for activities like denial of service attacks. We have also seen them used for click fraud. There are many different ways that click fraud is attempted, and the use of botnets generally represents one of the more sophisticated methods. At a basic level, the main benefit of a botnet to fraudsters is the use of many diverse IP addresses and other machine-specific signals. By utilizing thousands of hijacked IPs, a fraudster hopes that their attack will be difficult to catch. Of course, IP address is only one of hundreds of factors we analyze when looking for evidence of click fraud. Some sophisticated fraudsters realize this, and program their botnets to behave in more complex and subtle ways than just randomizing IPs (as Clickbot.A demonstrates).
&lt;P&gt;
One reason we're publishing this paper is to continue to share more information on the types of analysis we do to protect our advertisers against click fraud. But an even more important reason is to provide greater understanding of a challenging area the entire Internet community should work together to manage. The bad guys share their information with each other, and so should we. We hope to be able to discuss more publicly in the future ourselves, and also we hope that other security-related companies will share similar case studies and findings, which will end up benefitting everyone. The concluding observations and recommendations from the paper are worth repeating here:
&lt;ul&gt;
&lt;li&gt; Search engines need to investigate botnets that might be used to issue automated, distributed click fraud attacks.
&lt;li&gt; ISPs need to protect their web hosting and customer accounts from being compromised. Many of the domains and hosts involved in conducting the attack described in this paper were compromised.
&lt;li&gt; Malware detection rates may need to be improved. Only 7 out of 24 of the anti-virus scanners run as part of Virus-TOTAL detected Clickbot.A around the time the attack was publicly reported.
&lt;li&gt; Web site publishers, financial institutions, and advertisers can encourage their users and customers to proactively install anti-virus tools.
&lt;li&gt; Users can run anti-virus software to help prevent their computer from participating in a botnet. There are several free offerings available to users in the market.
&lt;li&gt; Security researchers and corporate IT departments can proactively and more agressively share data and publish results to help the white-hat community prevent, detect, contain, and recover from attacks conducted by miscreants in the underground Internet economy.
&lt;/ul&gt;
You can read more about the Clickbot.A case at our &lt;a href="http://adwords.blogspot.com/2007/04/new-case-study-on-botnet-based-click.html"&gt;AdWords Blog&lt;/a&gt; post, and you can access Neil's paper, which he co-wrote with Mike Stoppelman and other team members, &lt;a href="http://www.usenix.org/events/hotbots07/tech/full_papers/daswani/daswani.pdf"&gt;here&lt;/a&gt;. Incidentally, Neil is also the author of the recently published &lt;a href="http://www.amazon.com/Foundations-Security-Every-Programmer-Experts/dp/1590597842"&gt;"Foundations of Security: What Every Programmer Needs to Know"&lt;/a&gt;, which is a great reference as well as introduction to security methods.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/147576434" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000051.php</feedburner:origLink></entry><entry><title>Google's Click Quality Team</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/147576435/000050.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2007-02-05T00:20:57-06:00</issued><modified>2007-02-05T00:20:57-06:00</modified><id>http://shumans.com/articles/000050.php</id><content type="text/html" mode="escaped">&lt;span class=dropcap&gt;R&lt;/span&gt;eaders of this blog know that click fraud is an issue we take very seriously at Google. We throw out a significant percentage of ad clicks (on average in the single digits) every day to protect our advertisers. Because of our investment in click fraud protection systems, we are able to manage this issue very well and prevent it from having an impact on the vast majority of AdWords advertisers.&lt;P&gt;However, click fraud is real, and it’s definitely one of the main concerns of advertisers who contact myself and our Click Quality team. But in the click fraud sessions at the &lt;a href="http://www.searchenginestrategies.com/"&gt;Search Engine Strategies&lt;/a&gt; conferences and elsewhere, I often hear from advertisers who tell me that they don’t know how to correctly diagnose whether they’ve been affected by click fraud, or how to contact Google to request an investigation.&lt;P&gt;

Well, there’s a &lt;a href="http://adwords.blogspot.com/2007/02/meet-click-quality-team.html"&gt;great post on the AdWords blog&lt;/a&gt; about just that from Julian, a long-time member of our Click Quality team. In it, he describes many common cases which can actually be misdiagnosed as click fraud &amp;ndash; such as normal traffic or ROI fluctuations (caused by other sources), web log discrepancies due to technical issues, or multiple clicks from the same IP address due to large shared ISP proxies. The Click Quality team helps diagnose these cases every day, but you as an advertiser can diagnose them too. The most important thing to remember is that undetected click fraud shows up as a drop in ROI that you can’t explain because of other causes. The more carefully and granularly you track your campaign, the better you can optimize its performance &amp;ndash; including managing issues related to click fraud.&lt;P&gt;
 
In the rare event you find that your campaign may have been affected by undetected click fraud, our Click Quality team definitely wants to hear from you. There’s a link at the end of the post to the form you can use to contact them, which I’ll repeat for good measure &lt;a href="https://adwords.google.com/support/bin/request.py?clickquality=1&amp;ctx=clickqual"&gt;here&lt;/a&gt;.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/147576435" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000050.php</feedburner:origLink></entry><entry><title>Why Third-Party Click Fraud Estimates Don't Add Up - 2</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/147576436/000049.php" /><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Technology</dc:subject><author><name>Shuman Ghosemajumder</name></author><issued>2007-01-31T14:13:45-06:00</issued><modified>2007-01-31T14:13:45-06:00</modified><id>http://shumans.com/articles/000049.php</id><content type="text/html" mode="escaped">&lt;span class=dropcap&gt;I&lt;/span&gt;n &lt;a href="http://shumans.com/articles/000048.php"&gt;part 1&lt;/a&gt;, I wrote about the recent press releases from click fraud consulting firms on industry click fraud rates. In this post, I'd like to follow up on some of the issues we covered in our August 
  report, &lt;em&gt;&lt;a href="http://www.google.com/adwords/ReportonThird-PartyClickFraudAuditing.pdf"&gt;&amp;quot;How 
  Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports&amp;quot;&lt;/a&gt;&lt;/em&gt;, 
  and explain why click fraud firms are still making egregious mistakes in (a) 
  click counting, and even more egregious mistakes in (b) click fraud estimation.
&lt;p&gt;To begin, where do third-party click fraud numbers come from? At Google, whenever 
  we detect malicious activity against an advertiser's account, we mark those 
  clicks as invalid, and thus don't charge the advertiser for them. We utilize 
  a number of different automated techniques and algorithms, as well as proactive 
  manual analysis, to do this, analyzing hundreds of different factors. The analysis 
  that we see from third-party auditing firms (including ClickForensics) seems 
  to essentially rely on just one factor, which we call IP frequency. IP frequency 
  is the number of times an IP address clicks within a certain time window. If 
  it clicks too many times, it could be click fraud. On our end, this is a very 
  simple rule which runs in an automated fashion, protecting Google advertisers 
  24/7. Third-party firms sometimes find the same suspicious IP frequency patterns 
  that our systems do, and include them in their click fraud reports - leading 
  advertisers to request refunds for clicks they were never charged for in the 
  first place.
&lt;p&gt;But that is actually not even the most common problem with their analyses. 
  What is far more common is that the reports we receive from them ask for refunds 
  for clicks which do not even exist. This more serious problem comes from the 
  issues we addressed in our August report on fictitious clicks. In that report, 
  we demonstrated the limits of web log based analysis for any analytics purpose 
  (including click fraud analysis) due to the way Internet Explorer, Firefox and 
  other browsers work. Unfortunately, that was a very technical report, which 
  was difficult for many readers to parse. I'll try to provide a simpler explanation 
  here.
&lt;p&gt;Here's the problem: web logs, whether generated by an advertisers, or by third-party 
  code on an advertiser's site, cannot directly track ad clicks. Instead, they 
  track visits to a special landing page URL on the advertiser's site (e.g. &lt;font size="2" face="Courier New, Courier, mono"&gt;http://example.com/?adwords&lt;/font&gt; 
  ) as a proxy for how many ad clicks occurred. The assumption they're relying 
  upon is that each visit to that URL corresponds to a unique click, and vice 
  versa. But in practice this is not the case. Once a user visits that page, they 
  often browse through the site, navigating through sub pages, and then return 
  to the original landing page by hitting the back button. When the landing page 
  is reloaded in the browser, it appears in the web log as though additional ad 
  &amp;quot;clicks&amp;quot; are occurring. Google can count ad clicks reliably as a click 
  on a Google ad will cause the web browser to contact Google and then we redirect 
  it to the advertiser's landing page. A reload of the advertiser's landing does 
  not contact Google again. In addition, the referrer URL which is passed by the 
  browser when users hit the back button is actually the original referrer URL 
  (which says the page came from an ad click) which gets cached, so there is no 
  analysis which can be done based on logs alone which can resolve this. This 
  is where the fictitious clicks come from.
&lt;p&gt;When one analyzes data from web logs under these default conditions, we find 
  that on average it leads to a 40% inflation of click estimates. You can think 
  of it this way: if an average of 1000 clicks occurred, a log based analysis 
  would estimate on average that there were 1400 clicks, 400 of which are fictitious 
  and did not actually occur.
&lt;p&gt;Now consider the principal analytical tool of third-party click fraud firms: 
  IP frequency. When they see a user browsing through the site, and reloading 
  the landing page multiple times in a short time window, they will classify it 
  as click fraud - even though those &amp;quot;clicks&amp;quot; do not actually exist. 
  It also results in the misclassification of advertisers' best users (the ones 
  who are spending time browsing through their sites) as &amp;quot;fraudulent&amp;quot;.
&lt;p&gt;Thus, while click estimates were inflated by 40% on average, click fraud estimates 
  were inflated by much, much higher amounts. As we detailed in our report, we 
  found cases of firms reporting click fraud rates &lt;em&gt;above 100%&lt;/em&gt; in some 
  instances due to this problem. We also found that in other instances, clicks 
  classified as &amp;quot;click fraud&amp;quot; by third-party firms produced sales at 
  the same rate as the &amp;quot;good&amp;quot; clicks. In other words, the identification 
  of click fraud by third-party firms was much worse than imprecise - it was not 
  even in the right ballpark, with nearly all of the &amp;quot;bad&amp;quot; clicks they 
  identified actually being fictitious.
&lt;p&gt;The net result was that advertisers were consistently being given false data 
  from reports they trusted, which would actually hurt their advertising campaigns 
  if they acted on them. For example, if an advertiser is told certain keywords 
  have higher &amp;quot;fraud rates&amp;quot;, they are likely to change their campaign 
  to eliminate spending on those keywords in favor of others, hurting the performance 
  on their campaigns when this information is false. The damage this can do to 
  advertisers' businesses can be quite large.
&lt;p&gt;So is there a solution to this? Yes. Third-party analytics (not click fraud) 
  firms have been aware of the page reload issue for many years, and generally 
  use redirects (rather than web log based tracking) to avoid it. If one is tied 
  to using web site logs (or landing page code generating logs) however, the only 
  solution is to use the &lt;a href="http://www.google.com/adwords/learningcenter/text/31854.html"&gt;AdWords 
  auto-tagging feature&lt;/a&gt;. Auto-tagging has been available since 2005, and is 
  a feature which appends a unique ID to the landing page URL for every click, 
  so that the cases of (a) multiple clicks and (b) multiple reloads of the landing 
  page can be easily distinguished.
&lt;p&gt;Two of the three firms we identified in our report, AdWatcher and ClickFacts, 
  have not made any changes we're aware of. That's discouraging to say the least. 
  ClickForensics claims to have fixed this problem a couple of months ago by requiring 
  their AdWords clients to use auto-tagging, yet despite such a significant change 
  in methodology, their new numbers are nearly the same as their old numbers. 
  Perhaps it hasn't yet been fully or correctly utilized, so the significant corrective 
  drop in their numbers is yet to come. Or perhaps their network is heavily skewed 
  toward non-Google advertisers, and thus they still cannot correct the problem 
  until Yahoo, MSN and others implement their own versions of auto-tagging. Until 
  then, considering that the total number of clicks they're counting could be 
  off by as much as 40%, and their click fraud estimates could be off by much 
  more, there's very little meaning in a difference of 0.1% from Q2 to Q4 - or 
  in any of their other inferred statistics. But most importantly, the fact that 
  they don't take into account the amount that Google already protects advertisers 
  against means that they're not even trying to measure actual click fraud.&lt;img src="http://feeds.shumans.com/~r/shumans/~4/147576436" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000049.php</feedburner:origLink></entry></feed>
