<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atomfull.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.shumans.com/~d/styles/itemcontent.css"?><feed xmlns="http://purl.org/atom/ns#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="0.3" xml:lang="en-US"><title>Shuman Ghosemajumder</title><link rel="alternate" type="text/html" href="http://shumans.com/" /><tagline type="text/html" mode="escaped">Business, media, and technology articles by Shuman Ghosemajumder</tagline><copyright>Copyright 2009 Shuman Ghosemajumder</copyright><modified>2009-01-05T08:05:12+00:00</modified><generator>http://shumans.com</generator><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.shumans.com/shumans" /><feedburner:info uri="shumans" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license><link rel="start" type="application/atom+xml" href="http://feeds.shumans.com/shumans" /><feedburner:feedFlare href="http://add.my.yahoo.com/rss?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://us.i1.yimg.com/us.yimg.com/i/us/my/addtomyyahoo4.gif">Subscribe with My Yahoo!</feedburner:feedFlare><feedburner:feedFlare href="http://www.newsgator.com/ngs/subscriber/subext.aspx?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.newsgator.com/images/ngsub1.gif">Subscribe with NewsGator</feedburner:feedFlare><feedburner:feedFlare href="http://feeds.my.aol.com/add.jsp?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://o.aolcdn.com/favorites.my.aol.com/webmaster/ffclient/webroot/locale/en-US/images/myAOLButtonSmall.gif">Subscribe with My AOL</feedburner:feedFlare><feedburner:feedFlare href="http://www.bloglines.com/sub/http://feeds.shumans.com/shumans" src="http://www.bloglines.com/images/sub_modern11.gif">Subscribe with Bloglines</feedburner:feedFlare><feedburner:feedFlare href="http://www.netvibes.com/subscribe.php?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.netvibes.com/img/add2netvibes.gif">Subscribe with Netvibes</feedburner:feedFlare><feedburner:feedFlare href="http://fusion.google.com/add?feedurl=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://buttons.googlesyndication.com/fusion/add.gif">Subscribe with Google</feedburner:feedFlare><feedburner:feedFlare href="http://www.pageflakes.com/subscribe.aspx?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.pageflakes.com/ImageFile.ashx?instanceId=Static_4&amp;fileName=ATP_blu_91x17.gif">Subscribe with Pageflakes</feedburner:feedFlare><feedburner:feedFlare href="http://www.plusmo.com/add?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://plusmo.com/res/graphics/fbplusmo.gif">Subscribe with Plusmo</feedburner:feedFlare><feedburner:feedFlare href="http://www.thefreedictionary.com/_/hp/AddRSS.aspx?http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://img.tfd.com/hp/addToTheFreeDictionary.gif">Subscribe with The Free Dictionary</feedburner:feedFlare><feedburner:feedFlare href="http://www.bitty.com/manual/?contenttype=rssfeed&amp;contentvalue=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.bitty.com/img/bittychicklet_91x17.gif">Subscribe with Bitty Browser</feedburner:feedFlare><feedburner:feedFlare href="http://www.newsalloy.com/?rss=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.newsalloy.com/subrss3.gif">Subscribe with NewsAlloy</feedburner:feedFlare><feedburner:feedFlare href="http://www.live.com/?add=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://tkfiles.storage.msn.com/x1piYkpqHC_35nIp1gLE68-wvzLZO8iXl_JMledmJQXP-XTBOLfmQv4zhj4MhcWEJh_GtoBIiAl1Mjh-ndp9k47If7hTaFno0mxW9_i3p_5qQw">Subscribe with Live.com</feedburner:feedFlare><feedburner:feedFlare href="http://mix.excite.eu/add?feedurl=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://image.excite.co.uk/mix/addtomix.gif">Subscribe with Excite MIX</feedburner:feedFlare><feedburner:feedFlare href="http://www.yourminis.com/subscribe.aspx?u=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.yourminis.com/images/addtoyourminisbadge.gif">Subscribe with Yourminis.com</feedburner:feedFlare><feedburner:feedFlare href="http://download.attensa.com/app/get_attensa.html?feedurl=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.attensa.com/blogs/attensa/WindowsLiveWriter/BadgeredintoBadges_10C02/attensa_feed_button5.gif">Subscribe with Attensa for Outlook</feedburner:feedFlare><feedburner:feedFlare href="http://www.webwag.com/wwgthis.php?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.webwag.com/images/wwgthis.gif">Subscribe with Webwag</feedburner:feedFlare><feedburner:feedFlare href="http://hub.netomat.net/account/account.autoSubscribe.jspa?urls=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.netomat.net/blogger/images/icon_netomat_feedbutton.gif">Subscribe with netomat Hub</feedburner:feedFlare><feedburner:feedFlare href="http://www.podcastready.com/oneclick_bookmark.php?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.podcastready.com/images/podcastready_button.gif">Subscribe with Podcast Ready</feedburner:feedFlare><feedburner:feedFlare href="http://www.flurry.com/pushRssFeed.do?r=fb&amp;url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.flurry.com/images/flurry_rss_logo2.gif">Subscribe with Flurry</feedburner:feedFlare><feedburner:feedFlare href="http://www.wikio.com/subscribe?url=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.wikio.com/shared/img/add2wikio.gif">Subscribe with Wikio</feedburner:feedFlare><feedburner:feedFlare href="http://www.dailyrotation.com/index.php?feed=http%3A%2F%2Ffeeds.shumans.com%2Fshumans" src="http://www.dailyrotation.com/rss-dr2.gif">Subscribe with Daily Rotation</feedburner:feedFlare><entry><title>Slumdog Millionaire, a Cinderella Story about a Cinderella Story</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/hE3XJY7Ll5E/slumdog-millionaire.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2009-01-05T00:05:12-08:00</issued><modified>2009-01-05T00:05:12-08:00</modified><id>http://shumans.com/articles/slumdog-millionaire.php</id><content type="text/html" mode="escaped">&lt;p&gt;I'm pulling for &lt;a href="http://www.imdb.com/title/tt1010048/"&gt;Slumdog Millionaire&lt;/a&gt; to win Best Picture at the Academy Awards. Sure, the nominations have not yet been announced, but seeing as how Slumdog has more momentum behind it than perhaps any other potential nominee, I think it's one of the frontrunners.&lt;/p&gt;

&lt;p&gt;Slumdog has a quality I've never seen before in a movie. It has a stunning contemporary style which still manages to feel authentically Indian in every frame. It's a new kind of movie, and I want to see more like it. It doesn't feel at all like the films made by either Western directors or Indian directors. Perhaps this is because it's made by an accomplished Western director, Danny Boyle, collaborating with an Indian first-time co-director, Loveleen Tandan.&lt;/p&gt;

&lt;p&gt;Boyle's previous masterwork was 1996's &lt;a href="http://www.imdb.com/title/tt0117951/"&gt;Trainspotting&lt;/a&gt;, which launched the careers of Ewan McGregor, Robert Carlyle, and essentially everyone else in the cast. But while it garnered an Oscar nomination for best adapted screenplay and a smattering of minor best film awards, it didn't win any big prizes due to its controversial content. That was a shame, since it was one of the most creative and powerful movies that year, just as Slumdog is this year.&lt;/p&gt;

&lt;p&gt;Fortunately for Slumdog, it doesn't have Trainspotting's key disadvantages. Thematically, a paean to true love is a much easier sell to audiences than a collection of short stories about heroin addiction. And despite much of it being in Hindi, it is likely more comprehensible to more people than the rapid Scottish dialogue in the earlier movie. What it does have going against it is an R rating (although I'm unclear on what makes it less suitable for children than, say, &lt;a href="http://www.imdb.com/title/tt0468569/"&gt;The Dark Knight&lt;/a&gt;) and a much smaller release than any of the other movies in contention.&lt;/p&gt;

&lt;p&gt;In fact, Slumdog Millionaire almost didn't make it to theaters. With an estimated $15MM budget (which would not cover the salary of even one A-list Hollywood star) it was always going to be a relatively modest production. It was backed by Warner Independent, but when that division was closed by Warner Bros., its entire distribution was in &lt;a href="http://articles.latimes.com/2008/sep/18/entertainment/et-word18"&gt;jeopardy&lt;/a&gt;. At one point they were reportedly considering a direct-to-video release. Finally,  Fox Searchlight picked it up, and it has been their brightest star of the season since then.&lt;/p&gt;

&lt;p&gt;That a small movie was able to overcome major setbacks and not only make it to theaters, but receive widespread praise, is itself a great story. That it's actually good enough to merit that acclaim is an even better one. I would love to see it carry Oscar night and give Danny Boyle the recognition he deserves.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/hE3XJY7Ll5E" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/slumdog-millionaire.php</feedburner:origLink></entry><entry><title>Switching to the Mac: Problems and Solutions</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/PHHr9F9ZxFQ/switching-to-the-mac-problems.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2008-08-05T07:01:01-07:00</issued><modified>2008-08-05T07:01:01-07:00</modified><id>http://shumans.com/articles/switching-to-the-mac-problems.php</id><content type="text/html" mode="escaped">After more than 20 years of being a PC user (including all versions of Windows and MS-DOS before that), I switched over to a MacBook Pro last October. It was not an easy adjustment, and I seriously considered going back to my ThinkPad. Fortunately, I made it past that stage, love it now, and can't imagine ever using Windows (which needed to meditate for two minutes before even turning off) as my primary operating system again. Nonetheless, the first few weeks were rough, and the switch pretty much killed my productivity during that time.

&lt;P&gt;Here are some of the software and operating system challenges I experienced, as well as how they were resolved. Hopefully some of this might be helpful to those who have recently made the switch or are considering it. 

&lt;P&gt;&lt;strong&gt;It was terrible for doing real work.&lt;/strong&gt; This was the big one. Sure, there are great applications like &lt;a href="http://www.apple.com/finalcutstudio/finalcutpro/"&gt;Final Cut Pro&lt;/a&gt; that are only available on the Mac, and Adobe's suite of products runs just as well on both platforms, but the vast majority of regular people don't use those applications. Nearly everyone needs to use a word processor or spreadsheet semi-regularly, and most business folks need to work with Microsoft Office documents specifically. Unfortunately, using Office 2004 on my Mac was a poor imitation of using any version of Office on any Windows computer. The main problems were speed and stability. Since Office 2004 was written for the PowerPC platform, it runs through the &lt;a href="http://www.apple.com/rosetta/"&gt;Rosetta&lt;/a&gt; translation layer in order to work on Intel Macs. There was really no good option on this front. I tried NeoOffice, OpenOffice, Apple's own suite, and even running Windows Office through &lt;a href="http://www.vmware.com/products/fusion/"&gt;VMWare Fusion&lt;/a&gt;. All of these solutions were horrible. &lt;strong&gt;Solution&lt;/strong&gt;: Microsoft came out with &lt;a href="http://www.microsoft.com/mac/"&gt;Office 2008&lt;/a&gt; in January, and life is so much better. It still feels a tad slower than Office 2003 on my older ThinkPad did, but I understand that Office 2007 on Windows is no picnic either. The main benefit is that Office just works now, and my biggest potential reason for switching back to Windows is gone. Thank you Microsoft!

&lt;P&gt;&lt;strong&gt;The web is a little broken.&lt;/strong&gt; Web sites look different on the Mac than on Windows. One reason is that &lt;em&gt;everything&lt;/em&gt; on a Mac looks a bit different than on Windows because fonts are rendered differently, with &lt;a href="http://www.joelonsoftware.com/items/2007/06/12.html"&gt;more aggressive default anti-aliasing&lt;/a&gt;. I find it makes most type look better (to my eyes at least; I know many folks who find the Mac's type to look "blurry" by comparison). Another reason the web looks different is because most sites are designed to work with Internet Explorer, and modern versions of IE are not available on the Mac. There are huge debates in the Mac community about which browser is best, but Apple's Safari is the market leader, and is the one I prefer for various reasons. However, many web sites don't render properly under Safari, and a small number of sites don't work at all. This is just pathetic when one considers that HTML was designed to specifically work across platforms. In fact, it's one of the primary reasons I convinced myself that switching to a Mac would be okay, since I use mostly web-based applications these days. I was particularly disappointed when I discovered that even many Google products didn't run as well on Safari, or on any Mac browser, as they did on IE or Firefox for Windows. &lt;strong&gt;Solution&lt;/strong&gt;: Things are getting better. Thanks perhaps to efforts like the &lt;a href="http://www.webstandards.org/"&gt;Acid&lt;/a&gt; tests which highlight and embarrass non-compliant browsers, it really seems like browser developers - &lt;a href="http://blogs.msdn.com/ie/archive/2007/12/19/internet-explorer-8-and-acid2-a-milestone.aspx"&gt;including the IE8 team&lt;/a&gt; - are listening and the experience across browsers is becoming more similar. &lt;a href="http://webkit.org/"&gt;WebKit&lt;/a&gt; becoming a cross-platform standard is helping too.

&lt;P&gt;&lt;strong&gt;My mouse was completely broken.&lt;/strong&gt; This one was bizarre and completely unexpected. For some reason, my mouse didn't feel right on OS X. At first I thought it could be a device problem, so I bought a new mouse. The new mouse didn't feel any better, so I thought perhaps I wasn't yet used to the new mouse's shape, and I should get a different mouse that was shaped more like my old one. That didn't work either. I had no idea what was going on. After a little &lt;a href="http://www.google.com/search?q=mac+acceleration+curve"&gt;Googling&lt;/a&gt; I learned that OS X uses a different mouse pointer "acceleration curve" than Windows. Windows uses a flatter curve, which makes the mouse respond more naturally, whereas OS X's curve accelerates quicker for speed but slower for smaller, precise movements. The theory is fine, unfortunately the reality just doesn't work at all, with the pointer always feeling too fast or too slow. &lt;strong&gt;Solution&lt;/strong&gt;: There are numerous solutions for this one, including buying a Microsoft Mouse which includes a driver with the Windows acceleration curve. I ended up buying &lt;a href="http://plentycom.jp/en/steermouse/"&gt;SteerMouse&lt;/a&gt;, which lets you modify the curve manually. Some people also don't notice this at all, so for them it's a non-issue.

&lt;P&gt;&lt;strong&gt;My phone didn't sync.&lt;/strong&gt; I have a Windows Mobile phone, and Microsoft doesn't make ActiveSync for OS X. &lt;strong&gt;Solution&lt;/strong&gt;: there are third-party applications which can sync with your Windows Mobile device, such as &lt;a href="http://www.markspace.com/downloads.html"&gt;Missing Sync&lt;/a&gt;. I ended up using ActiveSync under VMWare Fusion.

&lt;P&gt;&lt;strong&gt;There's no standard uninstall application&lt;/strong&gt;. Initially I thought OS X's installation system was brilliant. You just drag an application into the Applications folder, and it's installed. If you want to uninstall, you delete it from the same folder. Unfortunately, there are various files outside of that folder which some applications will modify, which of course will not be reverted if you just delete the application package. Most well-behaved applications provide their own uninstall utility to clean up these files, however some don't. &lt;strong&gt;Solution&lt;/strong&gt;: Again, there are third-party applications such as &lt;a href="http://www.appzapper.com/"&gt;AppZapper&lt;/a&gt; which fill this need. I've found that not installing misbehaving applications, which are definitely in the minority, is an even simpler solution.

&lt;P&gt;&lt;strong&gt;CTRL-X and CTRL-V don't work for cutting and pasting.&lt;/strong&gt; For some reason, Apple thinks these keystrokes ought to be COMMAND-X and COMMAND-V. In fact, a lot of what one does with CTRL on Windows is done instead with COMMAND on the Mac. This might make sense if it weren't for the fact that Mac keyboards also have a CTRL key. &lt;em&gt;[Note: As has been pointed out by several folks, the reason that Apple uses COMMAND-X instead of CTRL-X is because Apple invented this shortcut, and Microsoft copied it and used CTRL instead of COMMAND. Of course, now 95% of the world uses CTRL-X, which one must use on web-based applications even using a Mac.]&lt;/em&gt; &lt;strong&gt;Solution&lt;/strong&gt;: OS X lets you swap the COMMAND and CTRL keys, which is what I've done. Unfortunately there are a small number of applications for which this doesn't work, and for those you just have to remember to do the reverse. 

&lt;P&gt;&lt;strong&gt;The HOME and END keys don't work correctly.&lt;/strong&gt; I actually didn't discover this problem until I hooked up my external keyboard, since the MacBook Pro doesn't even have home and end keys! When I did start to use those keys, I discovered that not only do they not behave the way they do on Windows, but they actually behave differently from application to application on the Mac. In most applications, home and end move to the beginning and end of the page. But in some applications and contexts, they behave like they do on Windows, going to the beginning or end of the line. This is just ridiculous, especially if you use those keys a lot. &lt;strong&gt;Solution&lt;/strong&gt;: I didn't actually find any perfect solutions to this. There are keyboard remapping techniques that you can use but these don't appear to work for all applications (or even all contexts within the same application). I ended up ditching my initial external keyboard for the Apple wireless keyboard, which actually doesn't have home and end keys. As a result, I finally migrated over to the Apple equivalent keystrokes: COMMAND-left and COMMAND-right (or in my case, CTRL-left and CTRL-right).

&lt;P&gt;&lt;strong&gt;Importing email is painful.&lt;/strong&gt; Before Gmail, I stored all of my personal and work mail locally, in multi-gigabyte Outlook PST files. I was stunned to discover that Microsoft doesn't make Outlook for the Mac. To make matters worse, Entourage, their Mac Office equivalent, can't import Outlook PST files. After playing around with Entourage and comparing it to Apple Mail, I decided to use the latter. Unfortunately, Apple Mail didn't provide any simple import solutions either. &lt;strong&gt;Solution&lt;/strong&gt;: I ended up buying a $10 application called &lt;a href="http://www.littlemachines.com/"&gt;O2M&lt;/a&gt; which did the trick. Unfortunately, because my files were so large, it took more than a day, with lots of stopping and restarting, to complete the conversion.

&lt;P&gt;&lt;strong&gt;It's just as buggy as Windows.&lt;/strong&gt; No, OS X is not generally unstable. It's a very solid operating system, as most UNIX flavors tend to be. But I'm one of the rare users that didn't have many stability problems with Windows XP. When it would crash, it was typically an application problem, not an operating system issue. Of course, applications crash on OS X also, and some crash quite a bit. &lt;strong&gt;Solution&lt;/strong&gt;: There's not much to say here except to hope that all software applications, on all operating systems, become more stable over time. That's a nice thought. 

&lt;P&gt;Aside from the above issues, there are countless additional quirks of the Mac that it takes time to get used to, but I would say there are a lot more of these which pleasantly surprise me than frustrate me. If you have any useful switching tips, especially any better suggestions than what I've listed above, I'd love to hear them.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/PHHr9F9ZxFQ" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/switching-to-the-mac-problems.php</feedburner:origLink></entry><entry><title>Fair Isaac Finds Less Click Fraud than Headlines Report</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/RnJl_ZDAuF0/000053.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2007-05-18T22:16:15-07:00</issued><modified>2007-05-18T22:16:15-07:00</modified><id>http://shumans.com/articles/000053.php</id><content type="text/html" mode="escaped">Fair Isaac is one of the leading fraud detection companies in the world, and is an organization I have a great deal of respect for. I have spoken with them in the past, and they told us they have been trying to determine if click fraud detection might be a viable business for them. At Google, we're very happy to see organizations with scientific backgrounds in anomaly detection getting into this space and conducting research. Fair Isaac put out a press release yesterday which has gotten coverage in a number of media outlets. The headlines indicate that Fair Isaac conducted a study which showed that 10-15% of all clicks on online advertising were fraudulent. It turns out this is not true. I spoke with Joe Milana, chief scientist at Fair Isaac, today to find out what the real story was. He told me that most of the headlines and stories were wrong.
&lt;P&gt;First of all, he said Fair Isaac has &lt;em&gt;&lt;strong&gt;not &lt;/strong&gt;&lt;/em&gt;come up with an estimate of click fraud in the industry - and in fact only analyzed data from a handful (fewer than ten) advertisers. And even this finding pertains to only the syndication networks and not search engines, where the majority of pay-per-click advertising occurs.  In fact, they found that the rates of "pathological activity" on search engines was "negligible" ("a few percent or less"). This would imply a combined &lt;strong&gt;click fraud rate in the single digits&lt;/strong&gt; even in their sample set - which they said they would certainly not generalize to the entire industry.
&lt;P&gt;
Fair Isaac indicated that they needed a lot more data before they could conduct a meaningful study. They also recognized the need for clean data, acknowledging the importance of using auto-tagging to remove fictitious clicks as we had mentioned to them previously. Unfortunately none of the advertisers in their initial survey were using auto-tagging to fix this problem, which results in inflated click fraud estimates.
&lt;P&gt;
We're continuing to talk and I hope we'll be able to help them further understand the challenges relating to click fraud detection, which is completely different from fraud detection in other industries. The biggest difference is the fact that it requires unsupervised analysis, something they told us they are aware of. They won't share their methodologies with us to protect their intellectual property of course, but I get the feeling that they may not be aware of many other factors relating to the specific behavior of the Internet, web browsers, etc., which make this much more than just a generic task for existing fraud tools from other industries. I'm looking forward to talking to them more as their study progresses, and hopefully takes these and other issues into account.
&lt;P&gt;
&lt;i&gt;Update: Search Engine Watch has additional details on this at &lt;a href="http://blog.searchenginewatch.com/blog/070519-082108"&gt;"Fair Isaac Click Fraud Report Spreads False Alarm"&lt;/a&gt;.&lt;/i&gt;&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/RnJl_ZDAuF0" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000053.php</feedburner:origLink></entry><entry><title>Advertiser Requests on Invalid Clicks</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/LEaQ9XfdMBQ/000052.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2007-04-30T00:20:27-07:00</issued><modified>2007-04-30T00:20:27-07:00</modified><id>http://shumans.com/articles/000052.php</id><content type="text/html" mode="escaped">The Click Quality Council is a group of
  advertisers which meets regularly to discuss click fraud. A few days ago they
  came out with their "Cornerstone Principles for Pay-Per-Click Quality
  Improvement" &amp;ndash; eight requests from advertisers, similar to Jeffrey Rohrs'
  Sausage Manifesto, which also collected and presented advertiser requests in
  January. I thought folks might be interested in
  where Google stands on some of their requests. Overall, depending on how they
  would define some of these items, it looks like we're already doing these things. Let's take a look.
&lt;BLOCKQUOTE&gt;
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;1) Advertisers should never pay for double clicks or repeat 
    clicks from the same session. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;I agree that advertisers should not be charged for double clicks. While the 
    activity of comparison shopping is a common reason that multiple clicks to 
    the same ad can occur within a short period of time, if the clicks occur so
    close together that they could only be caused by double-clicking or malicious 
    repeated clicking, the extra clicks clearly provide no value to the advertiser. 
    &lt;p&gt; But "same session" is not defined here, and it would be bad for advertisers 
    to define it in a way that would exclude comparison shopping. For example, 
    if publishers and search engines decided not charge for multiple clicks on 
    an ad within the same day, they would redesign their ad systems to not show 
    that advertiser's ad the second time a user searched on the same keyword, 
    since showing ads which produce no revenue is not desirable. But this would 
    deny that advertiser the opportunity to have a user who was comparison shopping 
    revisit their site, and that would rob them of sales opportunities.
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;2) Advertisers should never pay for traffic from bots. &lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; This request surprised me, since I am not aware of any company in the entire 
    industry which has a policy of charging for clicks made by known bots. We obviously 
    monitor for bot activity and have lists of known bots which we maintain. The 
    difficulty is in knowing whether something is a bot. There are bots which 
    are easily identifiable (for example, if their User-Agent value announces 
    them as a bot) but there are also bots which nobody can identify. We have 
    systems and processes to detect and identify bots (as well as other click 
    fraud attempt methods, such as a click farms), but even in cases where traffic 
    cannot be identified as coming from a specific method, our overall detection 
    approach is still effective because it is based on analyzing data related 
    to the clicks themselves.
&lt;p&gt; &lt;em&gt;&lt;strong&gt;3) Advertisers should have control over where, when and to whom 
    ads are distributed. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;Definitely. We provide multiple levels of control, ranging from the coarse 
    granularity offered by geotargeting, or opting in or out of syndication or 
    the content network, to more detailed controls such as opting out of specific 
    URLs, which we're the only major search engine to provide at the moment. We 
    are also going to be releasing the ability to prevent ads from showing to 
    specified IP addresses (see #4) in the next month.
&lt;p&gt; &lt;em&gt;&lt;strong&gt;4) Domain and IP exclusion lists from search providers should 
    be easy to use and maintain.&lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; I agree. We currently have URL/domain exclusion features and will be launching 
    IP exclusion in the next month. We have and will continue to work hard to 
    ensure features like these are easy to use. At the same time, it is important 
    to provide advertisers with more accurate information about domains and IPs so they 
    can make informed decisions and are not misled into thinking that Google expects 
    them to maintain such lists in order to protect against click fraud. These 
    are features which provide targeting controls to advertisers and are more 
    similar to geotargeting than anything related to invalid click detection.&lt;br/&gt;
    &lt;br/&gt;
    &lt;em&gt;&lt;strong&gt;5) Search providers should provide advertisers detailed referrer 
    information on all traffic that is billed. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt;I agree with this, and we are currently working on ways to provide advertisers 
    with more transparency into where their ads are placed. Advertisers can already 
    obtain referrer URLs from their own web logs, of course.&lt;br/&gt;
    &lt;br/&gt;
    &lt;em&gt;&lt;strong&gt;6) Advertisers should never pay for traffic originating outside 
    the specified geo-targeted settings. &lt;/strong&gt;&lt;/em&gt;
  &lt;p&gt; I agree with this also, but we need to be clear on what geotargeting is. 
    Geotargeting is based on IP address and other signals and works very well, 
    but is not perfect. There are some instances of IP addresses where geographic 
    location cannot be determined. In addition, when an advertiser targets a specific 
    country, our policy is to show their ads to users who are in that country 
    as well as to users who opt into results from that country. For example, if 
    a user chooses to use a country-specific Google site such as our French site 
    &lt;a href="http://www.google.fr/"&gt;www.google.fr&lt;/a&gt;, we will show them ads geotargeted 
    to France even if their computer is located elsewhere. (A side note: Google 
    does not have a US-specific site, and using Google.com from non-US countries 
    will not result in the user opting into US results and ads. Instead, the geotargeting 
    in that case will be based only on their machine location). Another example 
    of user choice taking precedence over machine location is when a user actually 
    types in a query which indicates they are interested in ads relevant to a 
    specific geography, such as "paris france travel". 
&lt;p&gt; &lt;em&gt;&lt;strong&gt;7) Search engines should adopt third-party validation for click 
    quality as other media companies have done for their audience validation. 
    &lt;/strong&gt;&lt;/em&gt;
&lt;p&gt; We are in favor of submitting our systems to an audit by a trusted third party, 
    and are working with the other members of the &lt;a href="http://adwords.blogspot.com/2006/08/creating-standards-for-clicks.html" title="IAB Click Measurement Working Group"&gt;IAB 
    Click Measurement Working Group&lt;/a&gt; to set this up. The audit will likely 
    be administered through the Media Ratings Council, the organization which 
    audits Nielson and Arbitron. Third-party click fraud auditing firms should 
    also be audited through the MRC to ensure they do not repeat the types of 
    errors that have happened in the past, when fictitious clicks were included 
    in advertiser reports. Those reports misled advertisers and advised them to 
    make decisions which could significantly damage their businesses. 
  &lt;p&gt; A simple example of continuing serious accounting issues with third parties: 
    several firms have admitted to overcounting errors in the past due to fictitious 
    clicks and have adopted Google's auto-tagging support in their systems to 
    begin to correct the problem for analysis they do for Google advertisers. 
    While they claim to have dealt with the problem of fictitious clicks, some 
    of the same firms continue to publicize estimates of industry click fraud 
    rates which include networks (such as Yahoo and MSN) where it is not yet possible 
    to distinguish between fictitious clicks and real clicks (due to lack of support 
    similar to Google's auto-tagging).
  &lt;p&gt; &lt;em&gt;&lt;strong&gt;8) Search providers should provide an easy mechanism to reconcile 
    paid clicks on a monthly basis. &lt;/strong&gt;&lt;/em&gt; 
&lt;p&gt; Definitely. Google provides this through &lt;a href="https://adwords.google.com/support/bin/answer.py?answer=31216&amp;amp;query=auto-" title="auto-tagging"&gt;auto-tagging&lt;/a&gt;, 
    which allows advertisers (and third party analytics firms, including click 
    fraud auditing firms) to reconcile the clicks they see in their logs with 
    the number of clicks in their AdWords reports. Using auto-tagging, advertisers 
    (and third-party firms) are able to get accurate information on how many clicks 
    occurred on their campaigns and how those figures compare to the activity 
    seen in their logs. This allows them to properly count clicks and avoid the 
    problem of fictitious clicks we have discussed &lt;a href="http://shumans.com/articles/000048.php"&gt;before&lt;/a&gt;.
  &lt;p&gt; Google also provides our advertisers with reports of the &lt;a href="http://adwords.blogspot.com/2006/07/estimating-invalid-clicks.html"&gt;daily 
    number of invalid clicks&lt;/a&gt; on their campaigns, which is what they (and third-party 
    auditing firm) need to verify whether the number of clicks they thought were 
    suspicious was less than or equal to the number of clicks we already filtered 
    out for them that day.
  &lt;p&gt;

  We are the only company in the industry
  that currently provides either of these features,
  but we have been working on
  evangelizing them to our competitors and the industry overall. MSN has
  announced that they will be releasing their version of invalid clicks
  reporting later this year, but none of the other major search engines has yet
  adopted a feature like auto-tagging. We hope both of these will become part of
  the IAB standards.
  
  We have
  also been working on plans to share detailed click information, similar to a
  phone bill as many in the industry have pointed out. It would contain
  information such as the IP addresses, time, and cost associated with
  individual clicks. It would not contain flags for which specific
  clicks were detected as invalid (and not charged for), since that would make
  it simple for a fraudster to pose as an advertiser, run an experiment with
  millions of clicks, and then attempt to reverse engineer our system.
  But this type of report would provide advertisers further
  transparency into which clicks occurred on their ads and more easily identify
  discrepencies between their systems and ours.
&lt;/BLOCKQUOTE&gt;
&lt;p&gt;

  Many thanks to the advertisers who
  provided their suggestions, as well as to all of the other groups that send us
  ideas regularly. We benefit greatly
  from the feedback our advertisers provide us, as it helps
  us constantly improve our systems and customer service, and we would
  always like to get more.
  In fact, we are hosting our
  first advertiser forum dedicated exclusively to invalid clicks
  at Google headquarters this coming week. In it, we will be meeting
  with several dozen advertisers, both large and not-so-large, to
  discuss their concerns, share information about our invalid click
  detection methods and policies, and come up with ways to continue to deliver a
  great advertising experience on Google.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/LEaQ9XfdMBQ" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000052.php</feedburner:origLink></entry><entry><title>Structure of a Click Fraud Botnet</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/sdMq1Av_pDw/000051.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2007-04-11T13:19:14-07:00</issued><modified>2007-04-11T13:19:14-07:00</modified><id>http://shumans.com/articles/000051.php</id><content type="text/html" mode="escaped">One question I frequently get asked by some of our more advanced advertisers is "what is Google doing about click fraud from botnets?" Analyzing botnets is an important activity in both our Click Quality and Security Teams. Yesterday, Dr. &lt;a href="http://www.neildaswani.com/"&gt;Neil Daswani&lt;/a&gt;, a member of both teams, presented a &lt;a href="http://www.usenix.org/events/hotbots07/tech/full_papers/daswani/daswani.pdf"&gt;paper&lt;/a&gt; at the &lt;a href="http://www.usenix.org/events/hotbots07/tech/"&gt;HotBots 2007&lt;/a&gt; workshop on a case study of one such botnet we examined last year called Clickbot.A. The paper provides an in-depth look at how a fraudster was attempting to utilize 100,000 machines to execute a low-noise click fraud attack through syndicated search ads.
&lt;P&gt;
Botnets have of course been around for many years, and have been used most commonly for activities like denial of service attacks. We have also seen them used for click fraud. There are many different ways that click fraud is attempted, and the use of botnets generally represents one of the more sophisticated methods. At a basic level, the main benefit of a botnet to fraudsters is the use of many diverse IP addresses and other machine-specific signals. By utilizing thousands of hijacked IPs, a fraudster hopes that their attack will be difficult to catch. Of course, IP address is only one of hundreds of factors we analyze when looking for evidence of click fraud. Some sophisticated fraudsters realize this, and program their botnets to behave in more complex and subtle ways than just randomizing IPs (as Clickbot.A demonstrates).
&lt;P&gt;
One reason we're publishing this paper is to continue to share more information on the types of analysis we do to protect our advertisers against click fraud. But an even more important reason is to provide greater understanding of a challenging area the entire Internet community should work together to manage. The bad guys share their information with each other, and so should we. We hope to be able to discuss more publicly in the future ourselves, and also we hope that other security-related companies will share similar case studies and findings, which will end up benefitting everyone. The concluding observations and recommendations from the paper are worth repeating here:
&lt;ul&gt;
&lt;li&gt; Search engines need to investigate botnets that might be used to issue automated, distributed click fraud attacks.
&lt;li&gt; ISPs need to protect their web hosting and customer accounts from being compromised. Many of the domains and hosts involved in conducting the attack described in this paper were compromised.
&lt;li&gt; Malware detection rates may need to be improved. Only 7 out of 24 of the anti-virus scanners run as part of Virus-TOTAL detected Clickbot.A around the time the attack was publicly reported.
&lt;li&gt; Web site publishers, financial institutions, and advertisers can encourage their users and customers to proactively install anti-virus tools.
&lt;li&gt; Users can run anti-virus software to help prevent their computer from participating in a botnet. There are several free offerings available to users in the market.
&lt;li&gt; Security researchers and corporate IT departments can proactively and more agressively share data and publish results to help the white-hat community prevent, detect, contain, and recover from attacks conducted by miscreants in the underground Internet economy.
&lt;/ul&gt;
&lt;P&gt;You can read more about the Clickbot.A case at our &lt;a href="http://adwords.blogspot.com/2007/04/new-case-study-on-botnet-based-click.html"&gt;AdWords Blog&lt;/a&gt; post, and you can access Neil's paper, which he co-wrote with Mike Stoppelman and other team members, &lt;a href="http://www.usenix.org/events/hotbots07/tech/full_papers/daswani/daswani.pdf"&gt;here&lt;/a&gt;. Incidentally, Neil is also the author of the recently published &lt;a href="http://www.amazon.com/Foundations-Security-Every-Programmer-Experts/dp/1590597842"&gt;"Foundations of Security: What Every Programmer Needs to Know"&lt;/a&gt;, which is a great reference as well as introduction to security methods.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/sdMq1Av_pDw" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000051.php</feedburner:origLink></entry><entry><title>Why Third-Party Click Fraud Estimates Don't Add Up - Part 2</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/fUaQuDm4f7A/000049.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2007-01-31T12:13:45-08:00</issued><modified>2007-01-31T12:13:45-08:00</modified><id>http://shumans.com/articles/000049.php</id><content type="text/html" mode="escaped">In &lt;a href="http://shumans.com/articles/why-thirdparty-click-fraud-est.php"&gt;part 1&lt;/a&gt;, I wrote about the recent press releases from click fraud consulting firms on industry click fraud rates. In this post, I'd like to follow up on some of the issues we covered in our August 
  report, &lt;em&gt;&lt;a href="http://www.google.com/adwords/ReportonThird-PartyClickFraudAuditing.pdf"&gt;&amp;quot;How 
  Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports&amp;quot;&lt;/a&gt;&lt;/em&gt;, 
  and explain why click fraud firms are still making egregious mistakes in (a) 
  click counting, and even more egregious mistakes in (b) click fraud estimation.
&lt;p&gt;To begin, where do third-party click fraud numbers come from? At Google, whenever 
  we detect malicious activity against an advertiser's account, we mark those 
  clicks as invalid, and thus don't charge the advertiser for them. We utilize 
  a number of different automated techniques and algorithms, as well as proactive 
  manual analysis, to do this, analyzing hundreds of different factors. The analysis 
  that we see from third-party auditing firms (including ClickForensics) seems 
  to essentially rely on just one factor, which we call IP frequency. IP frequency 
  is the number of times an IP address clicks within a certain time window. If 
  it clicks too many times, it could be click fraud. On our end, this is a very 
  simple rule which runs in an automated fashion, protecting Google advertisers 
  24/7. Third-party firms sometimes find the same suspicious IP frequency patterns 
  that our systems do, and include them in their click fraud reports - leading 
  advertisers to request refunds for clicks they were never charged for in the 
  first place.
&lt;p&gt;But that is actually not even the most common problem with their analyses. 
  What is far more common is that the reports we receive from them ask for refunds 
  for clicks which do not even exist. This more serious problem comes from the 
  issues we addressed in our August report on fictitious clicks. In that report, 
  we demonstrated the limits of web log based analysis for any analytics purpose 
  (including click fraud analysis) due to the way Internet Explorer, Firefox and 
  other browsers work. Unfortunately, that was a very technical report, which 
  was difficult for many readers to parse. I'll try to provide a simpler explanation 
  here.
&lt;p&gt;Here's the problem: web logs, whether generated by an advertisers, or by third-party 
  code on an advertiser's site, cannot directly track ad clicks. Instead, they 
  track visits to a special landing page URL on the advertiser's site (e.g. &lt;font size="2" face="Courier New, Courier, mono"&gt;http://example.com/?adwords&lt;/font&gt; 
  ) as a proxy for how many ad clicks occurred. The assumption they're relying 
  upon is that each visit to that URL corresponds to a unique click, and vice 
  versa. But in practice this is not the case. Once a user visits that page, they 
  often browse through the site, navigating through sub pages, and then return 
  to the original landing page by hitting the back button. When the landing page 
  is reloaded in the browser, it appears in the web log as though additional ad 
  &amp;quot;clicks&amp;quot; are occurring. Google can count ad clicks reliably as a click 
  on a Google ad will cause the web browser to contact Google and then we redirect 
  it to the advertiser's landing page. A reload of the advertiser's landing does 
  not contact Google again. In addition, the referrer URL which is passed by the 
  browser when users hit the back button is actually the original referrer URL 
  (which says the page came from an ad click) which gets cached, so there is no 
  analysis which can be done based on logs alone which can resolve this. This 
  is where the fictitious clicks come from.
&lt;p&gt;When one analyzes data from web logs under these default conditions, we find 
  that on average it leads to a 40% inflation of click estimates. You can think 
  of it this way: if an average of 1000 clicks occurred, a log based analysis 
  would estimate on average that there were 1400 clicks, 400 of which are fictitious 
  and did not actually occur.
&lt;p&gt;Now consider the principal analytical tool of third-party click fraud firms: 
  IP frequency. When they see a user browsing through the site, and reloading 
  the landing page multiple times in a short time window, they will classify it 
  as click fraud - even though those &amp;quot;clicks&amp;quot; do not actually exist. 
  It also results in the misclassification of advertisers' best users (the ones 
  who are spending time browsing through their sites) as &amp;quot;fraudulent&amp;quot;.
&lt;p&gt;Thus, while click estimates were inflated by 40% on average, click fraud estimates 
  were inflated by much, much higher amounts. As we detailed in our report, we 
  found cases of firms reporting click fraud rates &lt;em&gt;above 100%&lt;/em&gt; in some 
  instances due to this problem. We also found that in other instances, clicks 
  classified as &amp;quot;click fraud&amp;quot; by third-party firms produced sales at 
  the same rate as the &amp;quot;good&amp;quot; clicks. In other words, the identification 
  of click fraud by third-party firms was much worse than imprecise - it was not 
  even in the right ballpark, with nearly all of the &amp;quot;bad&amp;quot; clicks they 
  identified actually being fictitious.
&lt;p&gt;The net result was that advertisers were consistently being given false data 
  from reports they trusted, which would actually hurt their advertising campaigns 
  if they acted on them. For example, if an advertiser is told certain keywords 
  have higher &amp;quot;fraud rates&amp;quot;, they are likely to change their campaign 
  to eliminate spending on those keywords in favor of others, hurting the performance 
  on their campaigns when this information is false. The damage this can do to 
  advertisers' businesses can be quite large.
&lt;p&gt;So is there a solution to this? Yes. Third-party analytics (not click fraud) 
  firms have been aware of the page reload issue for many years, and generally 
  use redirects (rather than web log based tracking) to avoid it. If one is tied 
  to using web site logs (or landing page code generating logs) however, the only 
  solution is to use the &lt;a href="http://www.google.com/adwords/learningcenter/text/31854.html"&gt;AdWords 
  auto-tagging feature&lt;/a&gt;. Auto-tagging has been available since 2005, and is 
  a feature which appends a unique ID to the landing page URL for every click, 
  so that the cases of (a) multiple clicks and (b) multiple reloads of the landing 
  page can be easily distinguished.
&lt;p&gt;Two of the three firms we identified in our report, AdWatcher and ClickFacts, 
  have not made any changes we're aware of. That's discouraging to say the least. 
  ClickForensics claims to have fixed this problem a couple of months ago by requiring 
  their AdWords clients to use auto-tagging, yet despite such a significant change 
  in methodology, their new numbers are nearly the same as their old numbers. 
  Perhaps it hasn't yet been fully or correctly utilized, so the significant corrective 
  drop in their numbers is yet to come. Or perhaps their network is heavily skewed 
  toward non-Google advertisers, and thus they still cannot correct the problem 
  until Yahoo, MSN and others implement their own versions of auto-tagging. Until 
  then, considering that the total number of clicks they're counting could be 
  off by as much as 40%, and their click fraud estimates could be off by much 
  more, there's very little meaning in a difference of 0.1% from Q2 to Q4 - or 
  in any of their other inferred statistics. But most importantly, the fact that 
  they don't take into account the amount that Google already protects advertisers 
  against means that they're not even trying to measure actual click fraud.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/fUaQuDm4f7A" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000049.php</feedburner:origLink></entry><entry><title>Why Third-Party Click Fraud Estimates Don't Add Up</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/HPLO_ikX8TI/000048.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2007-01-31T12:00:12-08:00</issued><modified>2007-01-31T12:00:12-08:00</modified><id>http://shumans.com/articles/000048.php</id><content type="text/html" mode="escaped">I want to thank everyone who has written to me with questions since I started blogging about the work we do at Google to protect advertisers against click   fraud. I'll be catching up on some of those questions in the next week, but today I want to address some of the more recent items in the media on click fraud rates.
&lt;p&gt;There was a &lt;a href="http://www.clickforensics.com/news/pressreleases/01-30-07.html"&gt;press release&lt;/a&gt; yesterday from ClickForensics stating that their quarterly measure of click fraud for Q4 was 14.2%. They also stated that this was the year's &amp;quot;highest 
  level&amp;quot; (up from 14.1% in Q2) and that the click fraud rate for search engine content networks was 19.2%. This morning there was a competing &lt;a href="http://home.businesswire.com/portal/site/google/index.jsp?ndmViewId=news_view&amp;newsId=20070131005771&amp;newsLang=en"&gt;press release&lt;/a&gt; from Incremental Advantage and several other click fraud firms, stating that &amp;quot;Click Fraud Cost Internet Advertisers $666 Million in 2006&amp;quot;.

&lt;p&gt;On a basic level, these numbers are much higher than what we see at Google, and are not at all representative of the actual statistics of our network. Most savvy advertisers and industry pundits are already aware of this (see &lt;a href="http://www.webpronews.com/blogtalk/blogtalk/wpn-58-20070130WhyWeCantTrustClickFraudNumbers.html"&gt;&lt;em&gt;&amp;quot;Why 
  We Can't Trust Click Fraud Numbers&amp;quot;&lt;/em&gt;&lt;/a&gt; in yesterday's WebProNews), and generally haven't paid much attention to these estimates for a while.

&lt;p&gt;However, these stats are still out there and there are some things everyone should keep in mind when reviewing them. Specifically:
&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Many third-parties have not even counted clicks properly&lt;br&gt;
    &lt;/strong&gt;We did an analysis of Click Forensics and other click fraud consultants 
    back in August 2006 to see why their numbers were so inflated (see &lt;em&gt;&lt;a href="http://www.google.com/adwords/ReportonThird-PartyClickFraudAuditing.pdf"&gt;&amp;quot;How 
    Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports&amp;quot;&lt;/a&gt;&lt;/em&gt; 
    on the Google AdWords Blog). We found serious flaws in their counting of clicks 
    - a more fundamental issue than their counting of click fraud. They were making 
    basic counting mistakes and inflating the number of clicks by an average of 
    40%. The source of this problem is incorrectly counting page views &amp;ndash; from 
    users browsing through an advertiser's site &amp;ndash; as clicks.&lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Inflated click counts result in even more inflated &amp;quot;click 
    fraud&amp;quot; estimates&lt;br&gt;
    &lt;/strong&gt;This over-counting problem results in an even more dramatic inflation 
    of click fraud estimates, in fact consistently classifying an advertiser's 
    best users (the ones spending time browsing their site) as fraudulent. As 
    a result, conclusions based on this data are flawed and the small differences 
    in overall percentages they report are not meaningful. And instead of protecting 
    their businesses against click fraud, advertisers can actually harm their 
    businesses by acting on recommendations from these reports.&lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Even if they fixed those problems, they're not actually measuring 
    click fraud&lt;br&gt;
    &lt;/strong&gt;Even if they were counting clicks correctly, they are still trying 
    to measure only activity (attempted click fraud) and not advertiser impact 
    (actual click fraud). That is, even if they corrected the basic engineering 
    and accounting problems contributing to the above problems, they would still 
    be counting clicks we filter (and do not charge to advertisers) in their click 
    fraud estimates. They admit this.&lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Industry metrics (in any area of our business) are not necessarily 
    the same as Google's metrics&lt;/strong&gt;&lt;br&gt;
    The advertisers in their sample are part of many different networks and not 
    all of these networks have invested as heavily as Google in click fraud protection.&lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ROI on the content network is the same as it is on search&lt;/strong&gt;&lt;br&gt;
    We know there is a more direct incentive for fraud on the content network 
    and we do much more to protect advertisers, ban bad publishers, and improve 
    ROI through &lt;a href="http://adsense.blogspot.com/2005/10/facts-about-smart-pricing.html"&gt;SmartPricing 
    discounts&lt;/a&gt;. As a result, average ROI on our content network is nearly the 
    same as on Google.com. Yes, you read that right. ROI is the same on average 
    - and not by accident, but because we automatically provide discounts to advertisers 
    to make it so. &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key point here is not that their numbers are &amp;quot;too high&amp;quot;. The 
  point is that their data collection methods are inherently flawed and any resemblance 
  their numbers could have to reality would be coincidental. Even so, given that 
  they are not measuring click fraud (see point #3), they apparently don't intend 
  their numbers to reflect reality.
&lt;p&gt;Click fraud protection is something we take very seriously at Google, and it 
  requires a high level of scientific rigor to do well. It's frustrating to see 
  basic mistakes being made by firms selling &amp;quot;additional protection&amp;quot; 
  to AdWords advertisers - in essence, charging them money for advice which can 
  actually hurt their businesses. I've spoken with many firms and a number of 
  academics interested in this area, and the ones who are investing in serious 
  R&amp;amp;D efforts recognize the limitations of their data and analysis and have 
  not been focusing on publicizing unsupportable and flawed numbers such as the 
  above. We're very supportive of those efforts (and in scientific research in 
  this area in general) and we'll continue to work closely with them.
&lt;p&gt;For more information about Google's actual metrics, you can see my previous 
  posts &lt;a href="http://shumans.com/articles/000044.php"&gt;here&lt;/a&gt; and &lt;a href="http://shumans.com/articles/000045.php"&gt;here&lt;/a&gt;.
&lt;P&gt;&lt;i&gt;Update: I've posted a &lt;a href="http://shumans.com/articles/000049.php"&gt;second part &lt;/a&gt;to this post, with more technical details on points #1 and #2.&lt;/i&gt;&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/HPLO_ikX8TI" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000048.php</feedburner:origLink></entry><entry><title>Time's Person of the Year</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/k7CLR5jwhuw/000046.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2006-12-16T10:07:43-08:00</issued><modified>2006-12-16T10:07:43-08:00</modified><id>http://shumans.com/articles/000046.php</id><content type="text/html" mode="escaped">Time's annual Person of the Year (POTY) issue is coming out on Monday, and guess what. It's me. Well, me, you, and apparently, everyone else in the world. &lt;a href="http://www.time.com/time/magazine/article/0,9171,1569514,00.html?aid=434&amp;from=o&amp;to=http%3A//www.time.com/time/magazine/article/0%2C9171%2C1569514%2C00.html"&gt;This year's POTY is "You"&lt;/a&gt;, complete with a mirrored cover to let you look at yourself and contemplate what you'll do next. After receiving this honor, I had to ask myself (and you might too), are we worthy? Time's unusual decision was apparently due to the massive growth of social networking services and online communities in the past year. Obviously the recognition of sites like YouTube, MySpace, Wikipedia, and basically all of Web 2.0, is remarkable. Although only coined as a term in 2004 by O'Reilly, Web 2.0 established its business clout this year, and represents a technological realization of the power of the individual. However, when you peel away some layers (metaphorically and technologically), Web 2.0 stands on the shoulders of many giant technologies and ideas. In fact, those giants have also been about leveraging the power of the individual. Think of e-mail. FTP. IM. The web itself. In addition to being information technologies, these have always been communications technologies focused on connecting people. And BBS sites were doing this even before the general population was using the Internet.
&lt;P&gt;Immense user effort realized the potential of each of these technologies. The difference with Web 2.0 is actually not you, or me, or the rest of the world. The difference is the tools. It's not the fact that everyone is creating so much more content (though they are) it's that the newest tools have enabled the distributed creation of high quality content. Digg's view of the web is far more than the sum of its votes. The network effects of social networking services are more directly visible and usable than more significant and plentiful user-initiated connections on the unstructured web, and that makes them more useful in their specific context.

&lt;P&gt;Internet technologies represent the cutting edge of how we as a civilization build things. From non-mechanical tools we fashioned mechanical devices. The mechanical gave rise to the electrical, the electrical to the electronic, and with the advent of computers, the "machines" transcended physical limitations. With software, the exact same physical machinery could be used to build and operate an unlimited set of applications. We could build virtual skyscrapers in the sky, tear them down, and rebuild them, over and over again. Computer networking, and eventually the Internet, allowed that same software to be distributed at zero marginal cost, further increasing its power, flexibility, and usefulness. The web then gave us direct access to a virtually infinite number of software applications and allowed us to leverage far more computing power than any of us could individually afford.

&lt;P&gt;Now, with Web 2.0 technologies, we have established meaningful modes of interaction, including both data and social protocols, to allow non-engineers to collectively contribute to the creation of the newest virtual tools and mechanisms. And that is a remarkable achievement, the credit for which goes to the creators of those systems (and sorry, not me, you, and everyone else). But I can see how picking 50 entrepreneurs wouldn't work, as well as why Web 2.0 itself couldn't be POTY. Although "The Computer" was chosen in 1982, I don't think Time or its readers are ready to choose a category of software applications as the most influential "thing" of the year. Still, even the recognition of these technologies through its users is a striking choice, one which will hopefully further catalyze innovation in this area.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/k7CLR5jwhuw" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000046.php</feedburner:origLink></entry><entry><title>MediaPost on Click Fraud</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/cNe97YTrz8c/000045.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2006-12-15T09:55:35-08:00</issued><modified>2006-12-15T09:55:35-08:00</modified><id>http://shumans.com/articles/000045.php</id><content type="text/html" mode="escaped">I was interviewed earlier this week by &lt;a href="http://www.outofmygord.com/"&gt;Gord Hotchkiss&lt;/a&gt;, president of Enquiro, for an article on MediaPost's Search Insider about click fraud. &lt;a href="http://publications.mediapost.com/index.cfm?fuseaction=Articles.showArticleHomePage&amp;art_aid=52554"&gt;&lt;em&gt;"The Elusive Click Fraud Issue: Google's Side Of The Story"&lt;/em&gt;&lt;/a&gt; provides a look at some of the major issues in this area, and has a bunch of useful information for anyone interested in really understanding click fraud and how we manage it at Google. It also reiterates and clarifies some of the facts around metrics I discussed in my last post.
&lt;P&gt;I was glad to see that the article covered some of the significant problems with third-party estimates &amp;ndash; for example, a frequently-cited study from Outsell is debunked &amp;ndash; and it also highlights how what we do at Google with respect to fighting click fraud may be considerably different from what is done at our competitors.
&lt;P&gt;One part of the article which a few people asked me for further explanation about was the example of an advertiser with a $100K per month budget. The article states that it &lt;em&gt;"assume[s] that the clicks this advertiser receives are representative of the total Google network."&lt;/em&gt; As such, it's clear that this is a hypothetically constructed model of an advertiser meant to scale the dynamics of our network into smaller dollar figure terms, to make them easily understandable. So it does not represent the experience of a typical advertiser. Why doesn't it? One of the most significant reasons that it doesn't represent  a typical experience is because the vast majority of advertisers are likely not actually affected by undetected click fraud. Our reactive refund figures are based on those who are, who also write into us for investigations. So the percentage amounts refunded to that much smaller group of advertisers can obviously not be extrapolated to a much larger set of advertisers who are not affected by click fraud to begin with. Remember, all network-wide stats are averages, and every advertiser is unique.
&lt;P&gt;
Gord has spent the time to understand this well, and the article is definitely worth a read. I'm happy to see that we're making progress in terms of furthering understanding on both the science and accounting of this issue. I also want to thank everyone who has been asking questions on e-mail and on this blog and others. I'm looking forward to continuing our dialogue.
&lt;P&gt;
&lt;B&gt;&lt;I&gt;Update 12/18:&lt;/I&gt;&lt;/B&gt;  Gord has posted &lt;a href="http://www.outofmygord.com/archive/2006/12/15/Interview-with-Shuman-Ghosemajumder-about-Click-Fraud.aspx"&gt;some additional notes&lt;/a&gt; which never made it into the article on his blog.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/cNe97YTrz8c" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000045.php</feedburner:origLink></entry><entry><title>Google, Click Fraud, and Invalid Clicks</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/U94_Na9PQEo/000044.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2006-12-12T14:40:47-08:00</issued><modified>2006-12-12T14:40:47-08:00</modified><id>http://shumans.com/articles/000044.php</id><content type="text/html" mode="escaped">Yesterday, &lt;a href="http://www.marketingpilgrim.com/andy-beal-online-marketing-expert/"&gt;Andy Beal&lt;/a&gt; posted a &lt;a href="http://www.marketingpilgrim.com/2006/12/google-click-fraud-rate-two-percent.html"&gt;detailed story&lt;/a&gt; on Google and click fraud, in which I was quoted as saying that Google's click fraud rate is less than 2%. Did I really say that? Not quite.
&lt;P&gt;First, some background. Andy and I met during the Search Engine Strategies conference in Chicago last week, and we spent an hour talking about our systems, methods, and policies for fighting click fraud. As everyone who has ever spoken to me about this knows by now, this is an issue we take very seriously, and have dedicated extensive resources to managing effectively. Unfortunately, there is a great deal of misinformation on this topic (mainly from third parties with an incentive to exaggerate the issue), so we have been exploring ways to become more transparent ourselves. Our top priority is to protect advertisers, so that means not disclosing any proprietary methods which would allow click fraud perpetrators to reverse-engineer our systems. However, there is still a great deal of information we can share. I and others on our team have spent literally hundreds of hours on communications and sharing such information outside Google. The goal is to improve the level of understanding of this issue to arm everyone against the FUD out there.
&lt;P&gt;Andy's story provides a great summary of some of the key facts at Google:
&lt;ul&gt;
&lt;li&gt; Invalid clicks and click fraud are separate but related concepts (invalid clicks simply being the clicks for which we do not charge advertisers)
&lt;li&gt; We have a four-stage process which detects the vast majority of invalid clicks before they affect advertisers
&lt;li&gt; The total percentage of clicks we mark as invalid in our system is consistently in the single digits
&lt;/ul&gt;
&lt;P&gt;
We had a limited amount of time to cover a lot of ground, and of course, some miscommunication can result when discussing an issue of this complexity. Unfortunately, the most significant fact that seems to have been misrepresented is the one in the headline. Specifically, I never said that our click fraud rate is less than 2%. 
&lt;P&gt;
Instead, what I said is that the quantity of invalid clicks which we detect as a result of reactive investigations is a "negligible proportion" of the total number of invalid clicks. Andy asked me if that percentage is less than 2%. I told him that I was not able to provide a bound, but yes, "negligible" certainly means less than 2% of invalid clicks.
&lt;P&gt;
However, more significantly, this is quite a different thing than saying that our "click fraud rate" is less than 2%. When we mark clicks as invalid because of suspected malicious activity, the vast majority of the time we do so proactively, and none of those cases are included in the reactive figure in question. We proactively discard a single-digit percentage of our revenue, primarily by filtering traffic before it impacts an advertisers' budgets and, less significantly, through off-line banning of AdSense publishers which leads to refunds to advertisers. The difference between proactive and reactive detection is the difference between the "attempted click fraud" caught by us and the click fraud which actually affects an advertiser in a way that requires their action to correct (by asking for an investigation). Obviously it is the second category which advertisers actually care about, and I think that is the spirit in which Andy wrote his headline.
&lt;P&gt;
So what is our overall "click fraud rate"? As noted in the diagram in the story, it is virtually impossible to know the intent of every click. However, we can do a very effective job using statistical techniques to detect potentially malicious behavior, and the total number of invalid clicks we detect &amp;ndash; whether for suspected malicious or non-malicious intent &amp;ndash; is in the single digit percentages. So third-party estimates which say that click fraud is 15% or higher appear to clearly be substantial exaggerations.
&lt;P&gt;
I gave Andy this feedback, and he was able to make a few updates and corrections, but unfortunately was not able to change the headline. With the aforementioned caveats in mind, I would invite everyone to read Andy's article, as it does provide a great overview of the basic structure of our systems and philosophies about fighting click fraud.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/U94_Na9PQEo" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000044.php</feedburner:origLink></entry><entry><title>BlackJacks, Treos, and the Problems with Styli</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/eRhSyUaZofE/000043.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2006-11-27T08:23:36-08:00</issued><modified>2006-11-27T08:23:36-08:00</modified><id>http://shumans.com/articles/000043.php</id><content type="text/html" mode="escaped">This past weekend I picked up a &lt;a href="http://www.samsungblackjack.com/"&gt;Samsung BlackJack&lt;/a&gt; to replace my Treo. The BlackJack promised a significantly slimmer form factor, faster connection speeds, and well, better style than the blocky, toy-like Treo. I wasn't disappointed in any of these areas, but the real surprise to me was the fact that I did not miss the Treo's stylus at all. A stylus seems like such a good idea, especially in a mobile device. It would be nearly impossible to construct a true mobile mouse and a decent-sized trackball would require additional physical space on a device, so the combination of a stylus and a touchscreen seems like a great way to allow a mobile device to function like a full-size computer. The problem is that this leads to suboptimal user interface design. The first aspect of this is that device operation becomes a two-handed activity &amp;ndash; one hand to hold the device, and the other hand to hold the stylus. This, as BlackBerry users know, is nowhere near as convenient as a side scrollwheel which can be operated by the thumb of the same hand that is holding the device.
&lt;P&gt;The second aspect of this is that the software begins to behave as though this is OK. In other words, menus do not have a well-designed tab order, and some functions are completely inaccessible except through the stylus. The net result is that using the Treo is frustrating, confusing, and worst of all, slow, while using the BlackJack is straightforward and fast.
&lt;P&gt;
With all the negative buzz I had heard about Windows Mobile Edition, I was prepared to hate it. However, it seems that Samsung has implemented it reasonably well and the one-handed accessibility of all functions makes the device significantly more usable for email as well as most other mobile applications. I am still waiting to discover more of its quirks over time, but so far it looks very, very good.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/eRhSyUaZofE" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000043.php</feedburner:origLink></entry><entry><title>Welcome to YouTube</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/2OoinX3wIo4/000041.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2006-11-14T12:11:17-08:00</issued><modified>2006-11-14T12:11:17-08:00</modified><id>http://shumans.com/articles/000041.php</id><content type="text/html" mode="escaped">Now that Google has completed its &lt;a href="http://investor.google.com/releases/20061114.html"&gt;acquisition of YouTube&lt;/a&gt;, I'd like to restart my blog and also join everyone in welcoming the YouTube folks. I've been a big fan and heavy user of YouTube since its launch. YouTube, along with Google Video, pioneered a new way to experience video content. This experience has a few defining attributes, including user-uploaded content, a search-based interface, a wide variety of short video clips, and most importantly, an embedded, fast-loading flash-based player which caches content locally but begins playing it instantly.
&lt;P&gt;Speed matters. One of the many reasons Windows Media Player and RealPlayer-based video services never became as popular as YouTube &amp;ndash; aside from the comparative dearth of varied, user-generated content &amp;ndash; was because the experience of viewing videos on those services was too spasmodic. Not only would each video take many seconds to start playing, but they would generally stall several times while playing &amp;ndash; as opposed to the smoother playback on YouTube. This wasn't accomplished by magic, since it's clear that the bit rate of YouTube videos is clearly lower than that of many other services. But it turns out that providing an experience as close as possible to flipping channels is particularly well-suited to navigating short clips of user-generated content &amp;ndash; which often are not very high quality to begin with.
&lt;P&gt;
Of course, it was ultimately the user-generated content which was the most instrumental in building a community. The fact that uploading clips was extremely easy allowed YouTube to quickly become the biggest video site on the web. Its repository became a superset of others &amp;ndash; and thus it became the first place everyone would go to search for a given video. This was the network effect. In addition, the nature of the content was essentially an extension of the reality TV trend we've been seeing over the past several years.
&lt;P&gt;
There have been many folks who compared YouTube to America's Funniest Home Videos, and of course many videos fall into that category. But what's even more interesting are the videos specifically made by YouTube users for dissemination through YouTube. So it's not just about people watching people &amp;ndash; it's people watching people who want to be watched online. And they haven't seemed to have gotten tired of it yet.
&lt;P&gt;
Overall, it's been fascinating to watch YouTube become part of our culture in the last year, and I look forward to seeing how they'll continue to grow and evolve as part of Google.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/2OoinX3wIo4" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000041.php</feedburner:origLink></entry><entry><title>Everybody Needs an iPod</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/jgtWBO9KVCs/000040.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2004-10-31T21:53:26-08:00</issued><modified>2004-10-31T21:53:26-08:00</modified><id>http://shumans.com/articles/000040.php</id><content type="text/html" mode="escaped">As I opened the streamlined, immaculate white box in which it arrived last week, I felt like I must be the last person on Earth &amp;ndash; or at least in Silicon Valley &amp;ndash; to get an &lt;a href="http://www.amazon.com/exec/obidos/ASIN/B000A3WS84/scienceshuman-20"&gt;iPod&lt;/a&gt;.  Fortunately it was an iPod Mini, which is still considered the best-looking digital music player in the world today. Of course, I'd played with iPods before, but using one as my primary music player revealed some benefits &amp;ndash; and limitations &amp;ndash; of which I wasn't previously aware. The product is a marvel of design. Aside from its impressively compact profile and slick blue aluminum casing, its click wheel is what sets it apart from competitors from a hardware perspective. It produces a very different, and vastly superior, user experience. It operates much like a laptop touch pad and is certainly the fastest way to scroll through hundreds of tracks on a small screen while also providing enough precision to slow down and select the exact song you're looking for. It took me a little longer to become accustomed to using it to control the volume, but the simplicity of the interface was worth it.
&lt;P&gt;
Being able to hold 1000 tracks (vs. the 60 tracks that my 256MB flash player holds) dramatically changes when and how I use the iPod. It's not enough space to download my entire music collection, but it's enough for me to carry an extremely wide variety of music, and certainly enough for any trip. The sound quality and output power is also outstanding, and it was able to drive my Sony MDR-V6 headphones at high volumes without any problems. That was important, since the sound quality of the included earbuds &amp;ndash; while probably fine for listening to music outdoors &amp;ndash; didn't really satisfy my standards for home listening. Of course, using the larger Sonys consumed far more power, and revealed the only significant shortcoming I found &amp;ndash; that the battery life can be dissapointingly short for a dedicated device. 
&lt;P&gt;
Like the iPod overall, the iTunes for Windows software was excellent. It converted my WMA files into AAC format rapidly, synchronized my library with the iPod even faster, and provided a great interface for managing and annotating my collection. As an iPod user, I'm compelled to use iTunes to manage my music, but even if I wasn't, I would certainly consider it. Its capabilities greatly outstrip both Windows Media Player and WinAmp, and its interface is far simpler to use than programs like MusicMatch Jukebox. And it looks great too.
&lt;P&gt;
The biggest disappointment was iTunes. I have previously analyzed iTunes and other music download services from a number of angles and concluded that they cannot be successful solutions for mass audiences.  I thought that my perspective might be different if I was an iPod user myself, and so I decided to give the service the benefit of the doubt. My opinion hasn't changed, and I'll be writing a report of my iTunes experience next.
&lt;P&gt;
But overall I was impressed and very happy with my new toy. And did I mention it looks great? I suspect this reason alone fuels the majority of iPod Mini sales. I'm amazed that Apple's competitors can produce devices which are comparable or superior on nearly every engineering metric, but fall down flat when it comes to making something aesthetically pleasing and easy to use. Has Apple hired all of the world's best interface designers, leaving other high tech companies with the design sensibilities of 1970's auto manufacturers? Competing products from Sony and Creative resemble a cassette player and a Fisher Price toy respectively.  And Dell's Digital Jukebox looks less like a competing product and more like an iPod storage container. Whatever their malfunction is, they need to rectify it soon, because Apple is rapidly becoming the only name in electronics that people associate with beautiful products.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/jgtWBO9KVCs" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000040.php</feedburner:origLink></entry><entry><title>Good News for File Sharing?</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/0rVgkBo5_wI/000038.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2004-08-03T03:55:22-07:00</issued><modified>2004-08-03T03:55:22-07:00</modified><id>http://shumans.com/articles/000038.php</id><content type="text/html" mode="escaped">A recent &lt;a href="http://hbswk.hbs.edu/item.jhtml?id=4206&amp;t=innovation"&gt;business school study&lt;/a&gt; has provided the first substantive proof that "Internet music piracy not only doesn't hurt legitimate CD sales, it may even boost sales of some types of music."  Or so it claims.  The research, conducted by Felix Oberholzer-Gee of Harvard Business School and Koleman Strumpf of the University of North Carolina, does show that CD sales can withstand extensive contemporaneous file sharing, but it does not address the second order and long-term effects of free exchanges.  To be fair, the fault appears to lie more with how the study is being marketed than in its original intentions.
&lt;P&gt;The theoretical methodology of comparing online music sales with downloads is legitimate enough.  Having that data furthers the understanding of behavior of the aggregate of individuals who are continually faced with the choice between either buying a song or downloading it. When those individuals choose to download, CD sales decline, and vice-versa. In reality, however, it is nearly impossible to collect this data accurately.  The main problem is that there are too many file sharing systems, with uncertain levels of market share. In order to assemble a representative data set, each of the largest file sharing systems would need to be monitored using sources correcting for geographic biases, and the resulting data weighted according to share. In addition, the majority of these systems are very difficult to monitor accurately.  The study only collected data from two OpenNap servers, at a time (Fall 2002) when KaZaA was the leading file sharing network. While they address this issue of assembling a representative data set, their defense -- comparing their OpenNap data with P2P activity from Expand Networks and finding they correlate -- is not very convincing, since the geographic distribution of traffic running through 
each of these systems is never documented.  So the validity of the data collected is dubious. 

&lt;P&gt;But suppose that the data which could be collected using this method was, in fact, good enough. What would we know then?  This would only provide the number of tracks downloaded on the Internet.  The number of tracks which are downloaded once, and then burned onto CDs (which are copied) or passed around through private networks, is not captured in this metric at all.  As a greater proportion of the world's back catalog is downloaded into the private collections of file sharers, the less likely the overall file sharing population is to download those particular tracks.  

&lt;P&gt;Of course, this would tend to indicate an underreporting of music piracy rates, which would make any resulting sales even more impressive. This seems paradoxical, as file sharing activity and sales would, prima facie, seem to be inversely proportionate.  In this, the study agrees with other similar studies which found that users who download more often buy more music as well. The idea here is that file sharing stimulates societal interest in music, and thereby increases demand for it. 

&lt;P&gt;So does this mean that file sharing has a neutral, or even positive impact on music sales?  Of course not. The increased demand for music has resulted in almost unquantifiable levels of piracy and a negative cumulative annual growth rate for the CD industry (at least until Q1 '04). Online music stores are still only generating statistically irrelevant amounts of revenue, and the market is, as ever, in dire need of a legitimate, commercial file sharing architecture.  When such systems are launched, the industry will be able to rejuvenate itself, provided it hasn't damaged itself irreparably in the meantime. Oddly enough, research such as this may help ease irrational fear in the executive ranks of the industry and drive companies toward providing such a solution.&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/0rVgkBo5_wI" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/000038.php</feedburner:origLink></entry><entry><title>On the Northeast Blackout</title><link rel="alternate" type="text/html" href="http://feeds.shumans.com/~r/shumans/~3/6ye9-JZdCl8/northeast-blackout.php" /><author><name>Shuman Ghosemajumder</name></author><issued>2003-08-15T22:17:18-07:00</issued><modified>2003-08-15T22:17:18-07:00</modified><id>http://shumans.com/articles/northeast-blackout.php</id><content type="text/html" mode="escaped">&lt;p&gt;After a month of inactivity, it is only fitting that this site comes back online, after the biggest power failure in North American history. Actually, in terms of the total number of megawatts, I imagine this must have been the biggest power failure in the history of the world.  The past 24 hours saw midtown Manhattan &amp;ndash; where all power was restored only at 8pm on Friday night  &amp;ndash; beginning to resemble a scene from &lt;i&gt;28 Days Later&lt;/i&gt;. I was there the day before at 4:15PM when the lights went out. Instantly, word spread that the UN  &amp;ndash; only a few blocks away  &amp;ndash; had also lost power. This made people understandably uneasy. Some mobile phones were working while others were not. Those with working phones started sharing that the entire city, as well as neighboring states, had also lost power. Since it was daytime, there was no sense of panic, although it was around then that traffic seemed to stop moving at the midtown tunnel. As darkness fell, people congregated at friends' apartments, took walks with flashlights, and seemed to make the most of it.&lt;/p&gt;

&lt;p&gt;The next day, I noticed people starting to buy up the unrefrigerated inventory of local delis. Power had been restored north of 40th street, and there was a 30 minute lineup at the McDonald's which extended well beyond the restaurant entrance. As all power in the city was restored before darkness finally fell, the uneasiness people were beginning to feel started to dissipate as everyone realized the situation had gone from being a potential crisis to a mere inconvenience.&lt;/p&gt;

&lt;p&gt;&lt;B&gt;A High Profile Software Failure&lt;/B&gt;&lt;br /&gt;
It is remarkable that even now authorities seem to have no idea how this could have happened. Explanations in the media have been very simplistic &amp;ndash; that the failure of one station instantly offloaded demand to another station which was unable to compensate, so it too failed, and the sequence was repeated across the Northeastern United States, Ontario and Quebec.  You hear that story and think "Gee, why didn't anyone think of that?"  Of course, they did, and electric companies have a century of experience distributing power loads amongst disparate facilities very successfully.&lt;/p&gt;

&lt;p&gt;Many are speculating on whether this could have been the work of a malicious hacker or terrorist group.  While such a theory drastically overestimates the hacking capabilities of terrorists as well as the maliciousness of hackers, it isn't impossible.  But it seems unlikely, considering the number of systems which such a group would need undetected access to and the sophistication of the hypothetical exploit.  If this is a hack, insiders with very high levels of access would need to be involved.  In the current political climate, that seems very unlikely -- and it seems unlikelier still that anyone would go to the trouble, succeed, and then not claim responsibility immediately.&lt;/p&gt;

&lt;p&gt;So what went wrong?  While the original cause of one plant going down could have been anything, the successive cascading blackouts point directly to a computer software failure.  The specific time and way in which the cause of the problem originated created a set of circumstances that the software designers had not anticipated. Safety is the paramount concern at power plants  &amp;ndash; especially nuclear facilities &amp;ndash; so an automatic shutdown can be triggered by anything the software identifies as an unsafe scenario. The trick is to successfully identify all problems which do not require shutdowns and implement solutions instantaneously.&lt;/p&gt;

&lt;p&gt;The implications of the power grid failure highlight not only the frailty of the national infrastructure, but also the fact that we're at the mercy of our computer software for sustaining our way of life. Movies often give science fiction scenarios of machines running amok and turning against humankind.  In the real world, while the software controlling our essential services could not (really) act maliciously, its failure can sometimes have the same effect as an AI robot trying to harm us.&lt;/p&gt;

&lt;p&gt;Just as each generation of software is more powerful than the last, it also increases in complexity. The number of unknown scenarios for which the systems are untested will only increase.  At the same time, knowledge of the fundamental vulnerabilities of computer systems is the only real defense against both malicious attacks and unintentional failures. I only hope that the government does not create a bill requiring the software industry to take on financial responsibility for the effects of its products failing. Software, and software development, are getting better. I think a high profile failure like this will only serve to reinforce the importance of well-designed and thoroughly tested software in mission critical systems.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/shumans/~4/6ye9-JZdCl8" height="1" width="1"/&gt;</content><feedburner:origLink>http://shumans.com/articles/northeast-blackout.php</feedburner:origLink></entry></feed>
