<?xml version="1.0"?>

<rdf:RDF 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
>

<channel rdf:about="http://simon.incutio.com/syndicate/blogging/rss1.0">
  <title>Blogging</title>
  <link>http://simon.incutio.com/</link>
  <description>Simon Willison's Blogging cateory</description>
  <language>en-uk</language>
  <webMaster>simon@incutio.com</webMaster>
  <items>
    <rdf:Seq>
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2006/12/15/moving" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/12/02/delicious" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/10/29/keeping" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/08/26/milestone" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/08/23/snarky" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/05/11/approved" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/04/05/blogsAsFilters" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/04/05/aPositiveEntryAboutMicrosoft" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/03/27/omit" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/03/21/avoiding" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/03/06/ghostTown" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/03/05/attribution" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/02/06/dangers" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/01/28/solvingCommentSpam" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/01/21/mtPageRankKiller" />
    </rdf:Seq>
  </items>
</channel>

<item rdf:about="http://simon.incutio.com/archive/2006/12/15/moving">
  <title>New weblog, new location</title>
  <description>&lt;p&gt;I've just launched my new weblog over at &lt;a href=&quot;http://simonwillison.net/&quot;&gt;simonwillison.net&lt;/a&gt;. I will no longer be updating simon.incutio.com, and will be putting redirects in place for old content over the next few days. You can read more about the new site (powered by &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;Django&lt;/a&gt;, naturally) in &lt;a href=&quot;http://simonwillison.net/2006/Dec/15/upgrade/&quot;&gt;this entry&lt;/a&gt;.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2006/12/15/moving</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2006-12-15T14:47:06-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/12/02/delicious">
  <title>Blogmarks on del.icio.us</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;I'm horribly ill again: having defeated the mumps I now seem to have come down with some kind of 'flu thing. Lovely. In between whinging about my state of health and watching episodes of Frasier I've been playing with &lt;a href=&quot;http://del.icio.us/&quot;&gt;del.icio.us&lt;/a&gt; as part of my research in to web annotation. The connection between the two isn't particularly strong but it's clear that something very exciting is happening over there.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;This evening I wrote a script to import &lt;a href=&quot;http://simon.incutio.com/blogmarks/&quot;&gt;my blogmarks&lt;/a&gt; in to del.icio.us. I don't plan to replace them with a feed from the site for a couple of reasons: firstly, I like to keep my data somewhere I control and secondly, del.icio.us doesn't support my &quot;via&quot; fields. I will however be adding tag support to my blogmarks and some kind of functionality to ensure that anything I post to them is added to del.icio.us as well.&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;The problem I have now is that I've added &lt;a href=&quot;http://del.icio.us/simonw/blogmarks&quot;&gt;nearly 1200 untagged links&lt;/a&gt; to del.icio.us, and anyone who has played with the service for more than a few minutes knows that it's the tags that make it so much fun. Does anyone know of a good tool for bulk-tagging of items in del.icio.us? I've tried &lt;a href=&quot;http://www.scifihifi.com/cocoalicious/&quot;&gt;Cocoal.icio.us&lt;/a&gt; but unfortunately it only lets you assign tags to items one at a time; what I really want to do is run full text searches against my del.icio.us links and mass-apply tags to dozens of items at a time. If one doesn't exist I can always knock up a custom tool with the &lt;a href=&quot;http://del.icio.us/doc/api&quot;&gt;lovely API&lt;/a&gt; but I'd rather not duplicate the effort if I don't have to.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/12/02/delicious</link>
  <dc:subject>Blogging, Information Architecture</dc:subject>
  <dc:date>2004-12-02T00:13:22-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/10/29/keeping">
  <title>Keeping up appearances</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;Wow, I think this is the longest gap in my blogging since I started! I wish I could say I've been enjoying the sunshine or &lt;a href=&quot;http://diveintomark.org/archives/2004/10/18/exit&quot; title=&quot;Every exit&quot;&gt;taking up a new hobby&lt;/a&gt;, but the truth is that the weather's been horrible and I've just been run off my feet readjusting to life in England and at University.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;One thing I haven't been doing is reading blogs. My aggregator (currently the excellent &lt;a href=&quot;http://www.newsfirerss.com/&quot;&gt;NewsFire&lt;/a&gt;, although &lt;a href=&quot;http://ranchero.com/netnewswire/whatsnew/netnewswire20.php&quot; title=&quot;What's New in NetNewsWire 2.0&quot;&gt;NetNewsWire 2.0&lt;/a&gt; could easily steal my affections) has been lying dormant, and aside from occasionally checking a few sites (congratulations Matt &lt;a href=&quot;http://photomatt.net/2004/10/28/press-and-cnet/&quot; title=&quot;Houston Press and CNET&quot;&gt;on the new job&lt;/a&gt;!) I've been reading more academic papers than weblogs. My final year project is tentatively titled &quot;Collaborative annotation of web resources&quot; and looks set to take up a big chunk of my time over the next six months or so.&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;I certainly miss the information flood of blogging, but there's something very liberating about dipping in every now and then rather than following several hundred constant streams of consciousness, &lt;a href=&quot;http://www.kunal.org/scoble/&quot; title=&quot;Robert Scoble's link blog&quot;&gt;Scoble style&lt;/a&gt;. I guess you could say I've been re-evaluating my priorities. I'll certainly be scaling back some commitments in the near future, though which ones and by how much I have yet to decide.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/10/29/keeping</link>
  <dc:subject>Blogging, Uni Life, Personal</dc:subject>
  <dc:date>2004-10-29T11:41:39-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/08/26/milestone">
  <title>1000th Blogmark</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;I just posted my 1000th &lt;a href=&quot;http://simon.incutio.com/blogmarks/&quot;&gt;blogmark&lt;/a&gt;. I can't emphasize enough how much of an impact this &lt;a href=&quot;http://simon.incutio.com/archive/2003/11/24/blogmarks&quot;&gt;15 minute hack&lt;/a&gt; has had on both my browsing and my blogging habits. While I still tend to leave browser windows open for days at a time, I now at least have a procedure for getting rid of the ones that still interest me. More importantly, having blogmarks has eliminated the temptation to write a full blog entry (with quotation) just to share a link. This has dramatically reduced my posting rate, but has meant that when I do post an entry I usually have something moderately interesting to say.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;To celebrate this personal milestone, I've linked up the rudimentary LIKE query search engine I've been using for a while on the blogmarks index page. My long term aim is still to integrate them with my main content and add comments in the style of &lt;a href=&quot;http://photomatt.net/&quot;&gt;photomatt&lt;/a&gt;, but that would require more time spent hacking on my blogging system (or switching to &lt;a href=&quot;http://wordpress.org/&quot;&gt;WordPress&lt;/a&gt;) than I have to spend right now.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/08/26/milestone</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-08-26T00:30:54-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/08/23/snarky">
  <title>A snarky note from the administrator</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;No, you &lt;a href=&quot;http://simon.incutio.com/archive/2004/06/18/invites&quot;&gt;can't have a Gmail invite&lt;/a&gt;. No, I &lt;a href=&quot;http://simon.incutio.com/archive/2002/12/05/rememberingPasswords&quot;&gt;won't hack your email account&lt;/a&gt; for you. And if you &lt;a href=&quot;http://simon.incutio.com/archive/2003/12/02/gotmail&quot;&gt;can't find your hotmail inbox&lt;/a&gt;, you shouldn't be using a computer.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;Semantic &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; is a two-edged sword.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/08/23/snarky</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-08-23T14:59:35-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/05/11/approved">
  <title>Google approved PageRank stripping</title>
  <description>&lt;p&gt;Blogger are now using the redirect-without-PageRank technique to &lt;a href=&quot;http://help.blogger.com/bin/answer.py?answer=808&quot; title=&quot;Why do links on blogger.com (and in comments) redirect through google.com and blogger.com?&quot;&gt;protect their hosted blogs against comment spam&lt;/a&gt; (also &lt;a href=&quot;http://www.movabletype.org/news/2004_01.shtml#000882&quot; title=&quot; Version 2.66 Released&quot;&gt;used by Moveable Type&lt;/a&gt;). At the risk of sounding incredibly pleased with myself (which I am), &lt;a href=&quot;http://simon.incutio.com/archive/2003/10/13/linkRedirects&quot; title=&quot;New anti-comment-spam measure&quot;&gt;that was my idea&lt;/a&gt;! Sweet.&lt;/p&gt;

&lt;p&gt;Here's the real kicker: the &lt;acronym title=&quot;Universal Republic of Love&quot;&gt;URL&lt;/acronym&gt; redirector they are using is hosted on Google's primary domain. This is great news for people like myself who are running their own redirector, as the problem with having a redirect on your site is that it can be abused to make it look like people are visiting a specific site from a link on your domain: or even worse, to trick people in to visiting an unpleasant link (see &lt;a href=&quot;http://slashdot.org/&quot;&gt;Slashdot&lt;/a&gt; comments, where links are displayed along side the domain on which the &quot;real&quot; site is hosted). Now I can point my own PageRank stripper at &lt;samp&gt;http://www.google.com/url?sa=D&amp;amp;q=URL&lt;/samp&gt; and let Google handle the redirects for me. Lovely.&lt;/p&gt;

&lt;p&gt;Peter van Dijck recently &lt;a href=&quot;http://poorbuthappy.com/ease/archives/002901.html&quot;&gt;joined&lt;/a&gt; the ranks of victims of wiki spam. Let's start rolling this technique out on Wikis as well. The trade-off in lost PageRank for linked sites is more than worth it.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/05/11/approved</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-05-11T07:56:08-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/04/05/blogsAsFilters">
  <title>Personalisation? We've already got it</title>
  <description>&lt;p&gt;Vin Crosbie, a highly respected commentator on the online news industry, recently published his long awaited essay &lt;a href=&quot;http://www.ojr.org/ojr/business/1078349998.php&quot;&gt;What Newspapers and Their Web Sites Must Do to Survive&lt;/a&gt;. It's long but captivating and well researched; if you have any interest in the role of traditional newspapers on the web you should take the time to read it.&lt;/p&gt;

&lt;p&gt;Vin believes that customisation of both online and offline editions to serve reader's individual interests is critical to the survival of big-J media. He makes an excellent case; the problem is we are already there.&lt;/p&gt;

&lt;p&gt;I would guess that at least 90% of my news intake comes from reading blogs - the 130 blogs you can see listed in the sidebar on my front page. That's not to say that I only read 130 sites - blogging is about linking, and those 130 sites in turn link me out to a huge network of news sources around the web. The blogs I read work as the ultimate personalised filtering mechanism: I read them because their authors have similar interests to me, and are the people most likely to direct me to content that I will personally find interesting.&lt;/p&gt;

&lt;p&gt;No computerised system could possibly compete with 130 hand-picked human editors, working around the clock to channel interesting information in my direction. Blogs are conversations, but they are also filters. Never before in my life have I had to invest so little effort in finding so much diverse  and fulfilling content. As a certified infovore I don't know how I survived without them.&lt;/p&gt;

&lt;p&gt;What this means for traditional news media is anyone's guess. It's certainly not going to die out: someone has to collect the news and there's only so much unpaid bloggers can do in that regard. We certainly live in interesting times.&lt;/p&gt;

&lt;p&gt;As an aside, if you're interested in online media you should read Steve Yelvington's essay &lt;a href=&quot;http://www.yelvington.com/item.php?id=404&quot;&gt;Ten years in new media: Looking back, looking forward&lt;/a&gt;, which includes a call to arms to the online news industry to get over the novelty of new technology and finally start taking advantage of it.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/04/05/blogsAsFilters</link>
  <dc:subject>Blogging, Online News</dc:subject>
  <dc:date>2004-04-05T01:20:41-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/04/05/aPositiveEntryAboutMicrosoft">
  <title>Microsoft "get" blogging</title>
  <description>&lt;p&gt;Who would have thought a year ago that Microsoft would be the company that took corporate blogging to the next level? Say what you like about the company itself, you can't fault the quality and quantity of bloggers coming out of Redmond at the moment. Yesterday I stumbled across this &lt;a href=&quot;http://blogs.msdn.com/jobsblog/&quot; title=&quot;Technical Careers @ Microsoft&quot;&gt;fascinating blog&lt;/a&gt; that provides an insight in to Microsoft's recruitment techniques. If you're looking for a job at a high-tech company you can't afford &lt;em&gt;not&lt;/em&gt; to read this - they already have &lt;a href=&quot;http://weblogs.asp.net/jobsblog/category/4364.aspx&quot; title=&quot;Resume Tips&quot;&gt;a bunch of resume advice&lt;/a&gt;, tips on &lt;a href=&quot;http://blogs.msdn.com/jobsblog/archive/2004/04/01/105933.aspx&quot; title=&quot;What to wear ...aka Recruiter eye for the Tech guy&quot;&gt;what to wear to interviews&lt;/a&gt; and posts on subjects such as employee referrals, international recriting, phone screening and more.&lt;/p&gt;

&lt;p&gt;The authors are two Microsoft recruiters and display an incredible enthusiasm for their work, which is probably what makes the blog such a great read. It's fun to compare their more recent entries with &lt;a href=&quot;http://weblogs.asp.net/jobsblog/archive/2004/03/12/88834.aspx&quot; title=&quot;Welcome to the &amp;quot;Technical Careers @ Microsoft&amp;quot; blog!&quot;&gt;their opening entry&lt;/a&gt;, which Robert Scoble rightly &lt;a href=&quot;http://radio.weblogs.com/0001011/2004/03/12.html#a6970&quot; title=&quot;I'm too arrogant, friend says, and new Microsoft bloggers pop up&quot;&gt;criticised&lt;/a&gt; as being far too corporate. The great tragedy of the official &lt;a href=&quot;http://www.georgewbush.com/blog/&quot; title=&quot;GeorgeWBush.com :: Official Blog&quot;&gt;Bush&lt;/a&gt; and &lt;a href=&quot;http://blog.johnkerry.com/&quot; title=&quot;John Kerry for President Blog&quot;&gt;Kerry&lt;/a&gt; blogs is that they frequently read like glorified press releases. One of the core messages of the &lt;a href=&quot;http://www.cluetrain.com/&quot;&gt;Cluetrain Manifesto&lt;/a&gt; is that people want to be communicated with in a human voice. I don't know if the Microsoft recruiters have read it, but they've certainly taken that core value to heart.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/04/05/aPositiveEntryAboutMicrosoft</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-04-05T00:11:06-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/03/27/omit">
  <title>Omit needless words, codified</title>
  <description>&lt;p&gt;I continue to try to improve my writing. &quot;Omit needless words&quot; is all well and good, but identifying needless words can be a difficult task for the untrained eye. Paul Ford's &lt;a href=&quot;http://www.ftrain.com/ThePassivator.html&quot;&gt;Passivator&lt;/a&gt; bookmarklet highlights adverbs and passive verbs, both of which can indicate weak writing.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/03/27/omit</link>
  <dc:subject>Blogging, DHTML and Javascript</dc:subject>
  <dc:date>2004-03-27T22:06:00-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/03/21/avoiding">
  <title>Avoiding protracted debates</title>
  <description>&lt;p&gt;I love &lt;a href=&quot;http://fishbowl.pastiche.org/&quot;&gt;Charles Miller's Fishbowl&lt;/a&gt;. His latest entry introduces his &lt;a href=&quot;http://fishbowl.pastiche.org/2004/03/21/charles_rules_of_argument&quot;&gt;rules for argument&lt;/a&gt;. Read them, follow them and save a truck-load of time avoiding protracted debates in the future. Heck, if everyone stuck to them the overall productivity of the internet would probably increase by a factor of ten.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/03/21/avoiding</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-03-21T04:48:25-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/03/06/ghostTown">
  <title>Ghost town, sponsored by Google</title>
  <description>&lt;p&gt;Via &lt;a href=&quot;http://boingboing.net/2004_03_01_archive.html#107852164183349516&quot; title=&quot;Photoblogging Chernobyl &quot;&gt;Boing Boing&lt;/a&gt;, this &lt;a href=&quot;http://www.angelfire.com/extreme4/kiddofspeed/page2.html&quot;&gt;fascinating and utterly chilling&lt;/a&gt; photographic journey through the abandoned ruins of the Chernobyl dead zone.&lt;/p&gt;

&lt;p&gt;As an aside, the free hosting provide used by the site appears to be inserting Google ads. Make sure you look out for them as you explore the site; the relevance algorithm gets stretched to the limit.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/03/06/ghostTown</link>
  <dc:subject>Blogging, Google</dc:subject>
  <dc:date>2004-03-06T00:30:00-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/03/05/attribution">
  <title>Attribution</title>
  <description>&lt;p&gt;Via &lt;a href=&quot;http://fury.com/article/1966.php&quot; title=&quot;Memegraphing&quot;&gt;Kevin Fox&lt;/a&gt;, Wired are running &lt;a href=&quot;http://www.wired.com/news/culture/0,1284,62537,00.html&quot;&gt;an article&lt;/a&gt; that claims that &lt;q cite=&quot;http://fury.com/article/1966.php&quot;&gt;authors of popular blog sites regularly borrow topics from lesser-known bloggers -- and they often do so without attribution&lt;/q&gt;.&lt;/p&gt;

&lt;p&gt;The first part - that bloggers borrow topics from each other - isn't a new observation. Link commentary is one of the clasic forms of blogging, right up there with filling in quizzes and posting pictures of your cat. What's worrying is the lack of attribution.&lt;/p&gt;

&lt;p&gt;Attribution is a critical part of good blogging etiquette. It provides &quot;discovery credit&quot; to the blogger who directed you to something, but more importantly it acts as a tool for communication. It's easy to tell when someone is linking to you, via referral logs or services such as &lt;a href=&quot;http://www.technorati.com/&quot;&gt;Technorati&lt;/a&gt;. By attributing something to another blogger you make two useful statements: I read your blog (or at least stumbled across it somehow), and I'm interested in that particular entry. This is valuable feedback.&lt;/p&gt;

&lt;p&gt;My favoured attribution method is the &quot;via&quot; link, as demonstrated at the start of this entry; I even use it in my blogmarks. One problem that I used to have with attributing interesting links, &lt;a href=&quot;http://blog.meriwilliams.com/archives/000084.html&quot; title=&quot;Kids with No Friends and Other Everyday Things&quot;&gt;described here&lt;/a&gt; by Meri, is that when you browse with multiple tabs or browser windows it's easy to lose track of how you got to a certain page thanks to a &quot;broken&quot; back button. Thankfully there's a simple solution to this: the &lt;a href=&quot;javascript:void(prompt('Referer:', document.referrer));&quot;&gt;show referrer&lt;/a&gt; bookmarklet (adapted from &lt;a href=&quot;http://www.squarefree.com/bookmarklets/misc.html&quot; title=&quot;go to referrer&quot;&gt;a similar bookmarklet&lt;/a&gt; by Jesse Ruderman) which shows the page that referred you to the current page in an easily copy-and-pastable Javascript prompt.&lt;/p&gt;

&lt;p&gt;This still doesn't help you if you clicked on a link in an aggregator such as NetNewsWire, but I've started habitually opening the real entry in a browser before clicking any links to ensure I maintain referral information. This is particularly important as the bookmarklet I use for updating my blogroll grabs the referrer for automatic &quot;via&quot; link generation.
&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/03/05/attribution</link>
  <dc:subject>Blogging</dc:subject>
  <dc:date>2004-03-05T23:13:12-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/02/06/dangers">
  <title>The dangers of PageRank</title>
  <description>&lt;p&gt;A well documented side effect of the weblog format is that it brings Google PageRank in almost absurd quantities. I'm now the 5th result for &lt;a href=&quot;http://www.google.com/search?q=simon&quot; title=&quot;Google Search: simon&quot;&gt;simon&lt;/a&gt; on Google, and I've been the top result for &lt;a href=&quot;http://www.google.com/search?q=simon+willison&quot;&gt;simon willison&lt;/a&gt; almost since the day I launched. High rankings however are not always a good thing, especially when combined with a comment system. A growing number of bloggers have found themselves at the top position for terms of little or no relevance to the rest of their sites, which in turn can attract truly surreal comments from visitors from search engines who may never have encountered a blog before.&lt;/p&gt;

&lt;p&gt;I know of a couple of entries on my own blog that are attracting this kind of traffic. The most interesting is probably &lt;a href=&quot;http://simon.incutio.com/archive/2003/08/13/artificialDiamonds&quot;&gt;this entry&lt;/a&gt; on &lt;a href=&quot;http://www.google.com/search?q=artificial+diamonds&quot; title=&quot;Google Search: artificial diamonds&quot;&gt;artifical diamonds&lt;/a&gt;, which has attracted comments from both buyers and sellers of artificial gems. My &lt;a href=&quot;http://simon.incutio.com/archive/2002/12/09/badInterfaceDesignFromMicrosof&quot;&gt;entry&lt;/a&gt; on MSN messenger usability problems from 2002 has drawn a steady stream of hilarious comments, no doubt caused in part by its top rating on Google for &lt;a href=&quot;http://www.google.com/search?msn+messenger+sucks&quot; title=&quot;Google Search: msn messenger sucks&quot;&gt;msn messenger sucks&lt;/a&gt;. Amusingly, for a long time &lt;a href=&quot;http://search.msn.com/&quot;&gt;Microsoft's own search engine&lt;/a&gt; was giving my page a high rank for a wide variety of less negative messenger related terms.&lt;/p&gt;

&lt;p&gt;My own experiences of this phenomenon pale in to significance to some of the others I've seen. The most impressive example has to be Jason Kottke's &lt;a href=&quot;http://www.kottke.org/03/05/the-matrix-reloaded&quot;&gt;brief review&lt;/a&gt; of the Matrix Reloaded, which drew over 900 comments from Google strays, developed its own micro-community and resulted in Jason pondering &lt;a href=&quot;http://www.kottke.org/03/06/own-conversation&quot;&gt;who owns the conversation on my web site?&lt;/a&gt; Jason eventually deciding to close and archive the thread after the page grew to more than a megabyte in size.&lt;/p&gt;

&lt;p&gt;The problem can take on a far more disturbing twist. I won't link directly to these entries for fear of adding to their predicaments, but searches for &lt;a href=&quot;http://www.google.com/search?q=crime+scene+cleanup&quot; title=&quot;Google Search: crime scene cleanup&quot;&gt;crime scene cleanup&lt;/a&gt; and &lt;a href=&quot;http://www.google.com/search?q=suicide+chat+rooms&quot; title=&quot;Google Search: suicide chat rooms&quot;&gt;suicide chat rooms&lt;/a&gt; both return blogs in the first two results. The former thread is mostly crime scene cleanup companies marketing their services, but the latter is quite frankly disturbing. It's certainly lead me to double check the titles of my entries before posting them.&lt;/p&gt;

&lt;p&gt;Thankfully, avoiding this kind of unwanted comment traffic is pretty simple. One way is to simply disable comments for entries older than a certain time (generally a couple of weeks), although personally I like to see the occasional comment on old entries. A neater solution proposed by Russell Beattie last year is to simply &lt;a href=&quot;http://www.beattie.info/notebook/1003990.html&quot; title=&quot;Googler Comments&quot;&gt;hide comments from search engine referrals&lt;/a&gt;, thus ensuring that random strays won't leave their mark without understanding the nature of your site first.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/02/06/dangers</link>
  <dc:subject>Blogging, Google</dc:subject>
  <dc:date>2004-02-06T16:58:23-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/01/28/solvingCommentSpam">
  <title>Solving comment spam</title>
  <description>&lt;p&gt;There are two main schools of thought concerning comment spam: the optimists and the defeatists. Optimists believe that comment spam can be beaten with technology; defeatists (maybe I should call them pessimists) believe that comments are as doomed as email and we're all &lt;a href=&quot;http://diveintomark.org/archives/2003/11/15/more-spam&quot; title=&quot;dive into mark: Weblog spam&quot;&gt;going to hell in a hand basket&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;The story so far&lt;/h4&gt;

&lt;p&gt;I fall squarely in to the techno-optimist category. Back in September I started &lt;a href=&quot;http://simon.incutio.com/archive/2003/09/02/blacklisting&quot; title=&quot;Blacklisting Comment Spam&quot;&gt;blacklisting domains&lt;/a&gt; linked to from spam comments, defending against return visits from spammers and allowing others to syndicate my block list to run on their own site. Then in October I tweaked my comment system to &lt;a href=&quot;http://simon.incutio.com/archive/2003/10/13/linkRedirects&quot; title=&quot;New anti-comment-spam measure&quot;&gt;eliminate PageRank&lt;/a&gt; from links in comments, making spamming for search engine optimisation a futile exercise. Of course, this measure only works if spammers realise it's there (I know &lt;a href=&quot;http://msittig.blogspot.com/2003_11_01_msittig_archive.html#106981977439323929&quot; title=&quot;and they called ME a greedy bastard!&quot;&gt;at least one has&lt;/a&gt;) which is why I'm personally very happy to see that the latest release of Moveable Type has &lt;a href=&quot;http://simon.incutio.com/archive/2004/01/21/mtPageRankKiller&quot; title=&quot;Moveable Type now kills PageRank on comment links&quot;&gt;adopted the technique&lt;/a&gt; - to mixed reviews from the &lt;acronym title=&quot;Moveable Type&quot;&gt;MT&lt;/acronym&gt; community.&lt;/p&gt;

&lt;p&gt;There have been a whole bunch of other technological innovations over the past few months. Sam Ruby has &lt;a href=&quot;http://www.intertwingly.net/blog/1647.html&quot; title=&quot;Comment Throttle&quot;&gt;implemented throttling&lt;/a&gt; to ban people who post three consecutive comments, and has some great ideas about &lt;a href=&quot;http://www.intertwingly.net/blog/1699.html&quot;&gt;guarding against strangers&lt;/a&gt;. Jay Allen's &lt;a href=&quot;http://www.jayallen.org/projects/mt-blacklist/&quot;&gt;MT-Blacklist&lt;/a&gt; makes the blacklisting concept available to a wide audience. Meanwhile, James Seng's &lt;a href=&quot;http://james.seng.cc/archives/000152.html&quot; title=&quot;Bayesian filter for MT&quot;&gt;MT-Bayesian&lt;/a&gt; introduces trainable spam filters adapted from the fight against email spam.&lt;/p&gt;

&lt;h4&gt;The challenges ahead&lt;/h4&gt;

&lt;p&gt;So those are the solutions so far; the critical question is whether they work. The amount of spam I've been getting has definitely decreased, but as I run a completely custom blogging system I'm safe from the automated scripts that target more widespread systems - other sites make easier targets. Now that the less ethical search engine optimisers have started to catch on to the potential of comment spam to improve their PageRank the amount of spam can only increase. Some bloggers have already &lt;a href=&quot;http://www.simplebits.com/archives/2004/01/22/comments_are_down.html&quot; title=&quot;Comments are Down&quot;&gt;started to disable comments entirely&lt;/a&gt; (thankfully Dan turned them back on again shortly afterwards), setting a worrying precedent for the elimination two way interactions comments allow between bloggers and non-bloggers.&lt;/p&gt;

&lt;p&gt;I'll put it in writing now: I will never disable comments on this blog. In the past few months the comments here have proved far more interesting and valuable than my actual posts, and I really appreciate the quality of the discussions that have arisen here. I will take whatever steps are necessary to keep this a useful environment for discussion.&lt;/p&gt;

&lt;p&gt;Many people have hailed user registration as the ultimate solution to spam. It isn't, because the value of PageRank is just too high - and writing a script to automatically create accounts (even with email confirmation required) is child's play to anyone who is competent in an internet-aware scripting language. Even accessibility-impeding &lt;a href=&quot;http://www.captcha.net/&quot;&gt;captchas&lt;/a&gt; are no defence against spammers who can afford to employ cheap labour to defeat them - and with search engine rankings as critical as they are there's no shortage of spam dollars.&lt;/p&gt;

&lt;p&gt;With those ruled out, let's look at the remaining solutions:&lt;/p&gt;

&lt;h4&gt;The killer&lt;/h4&gt;

&lt;p&gt;Without links, comment spam has no purpose. To eliminate spam, eliminate links. Redirecting them through a PageRank killer already achieves this, but proves too subtle for spammers intent on spreading their links as widely as they can. Too truly eliminate spam, strip out links and anything that even &lt;em&gt;looks&lt;/em&gt; like a &lt;acronym title=&quot;Universal Republic of Love&quot;&gt;URL&lt;/acronym&gt; and force the spammer to preview their carefully crafted advertisement before hitting submit. Seeing as hyperlinks are the single most important feature of the web this may seem draconian - and indeed it is. But on a site that serves more as a discussion forum than a farm and where the alternative to killing links is killing comments entirely this could be the saving factor.&lt;/p&gt;

&lt;p&gt;For most blogs however links are an essential part of the discourse - I certainly wouldn't want to disable them here. Now only do they add huge value to the discussions, but more importantly they act as a &quot;signature&quot; for many commenters - knowing a comment is by &quot;Dan&quot; is far less useful than knowing that it's by Dan from &lt;a href=&quot;http://www.simplebits.com/&quot;&gt;www.simplebits.com&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Finding a compromise&lt;/h4&gt;

&lt;p&gt;Draconian measures such as the above wouldn't be necessary if spammers would wise up to the fact that their carefully crafted missives were having no effect on their precious PageRank. The real challenge then is to make anti-PageRank measures obvious to even the most brain-addled viagra peddlers. I've taken the first step towards this by turning on compulsory previewing for comments, which should have the added benefit of reminding legitimate commenters to use paragraph tags. I'll be working on ways of making the anti PageRank measures more obvious over the next few days, as and when work permits.&lt;/p&gt;

&lt;p&gt;I've seen people argue that depriving legitimate commenters of PageRank is a poor compromise. I disagree: if the only cost of eliminating the incentive to spam is the loss of some Google ego then I see it as a price well worth paying. Of course, I say that as someone who's already built up their &lt;a href=&quot;http://www.google.com/search?q=simon&quot; title=&quot;Number 5, baby!&quot;&gt;Google ego&lt;/a&gt; but at the end of the day it's my blog, my rules. One solution I've considered is creating a whitelist of sites that frequent commenters use in their signatures, causing them to be displayed without a redirect.&lt;/p&gt;

&lt;p&gt;Comment spam is a solvable problem. Furthermore, blogging about comment spamming is almost as dull as blogging about blogging. Let's hurry up and solve it so we can go back to blogging about &lt;a href=&quot;http://www.feedster.com/search.php?q=blogging+about+cats&quot;&gt;cats&lt;/a&gt;.&lt;/p&gt;

&lt;!-- I hope the irony of a blogger with a blogging category complaining about dull posts about comment spam and blogging in a post ABOUT comment spam and blogging isn't lost on people. It made me smile anyway. --&gt;</description>
  <link>http://simon.incutio.com/archive/2004/01/28/solvingCommentSpam</link>
  <dc:subject>Blogging, Online Issues</dc:subject>
  <dc:date>2004-01-28T03:24:56-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/01/21/mtPageRankKiller">
  <title>Moveable Type now kills PageRank on comment links</title>
  <description>&lt;p&gt;This is pretty cool: &lt;a href=&quot;http://www.movabletype.org/news/2004_01.shtml#000882&quot;&gt;Moveable Type 2.661 is out&lt;/a&gt; and includes a whole bunch of comment spam fighting features, including one inspired by &lt;a href=&quot;http://simon.incutio.com/archive/2003/10/13/linkRedirects&quot; title=&quot;New anti-comment-spam measure&quot;&gt;my own anti-spam measure&lt;/a&gt; of disabling PageRank on links from comments by sending them through a redirect. This is great news for me as the redirect acts as a deterrent, and deterrents are only worthwhile if people know about them in the first place. With the most popular blogging system (at least amongst comment spammers) now featuring the same deterrent hopefully &lt;acronym title=&quot;Search Engine Optimisation&quot;&gt;SEO&lt;/acronym&gt; spammers will start to get the message.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/01/21/mtPageRankKiller</link>
  <dc:subject>Blogging, Online Issues</dc:subject>
  <dc:date>2004-01-21T20:05:17-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>

</rdf:RDF>