<?xml version="1.0"?>

<rdf:RDF 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
>

<channel rdf:about="http://simon.incutio.com/syndicate/google/rss1.0">
  <title>Google</title>
  <link>http://simon.incutio.com/</link>
  <description>Simon Willison's Google cateory</description>
  <language>en-uk</language>
  <webMaster>simon@incutio.com</webMaster>
  <items>
    <rdf:Seq>
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/11/16/base" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/07/08/toolbar" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/05/06/bad" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/02/24/cruft" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/02/08/maps" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/01/17/relNoFollow" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/12/14/googlePrint" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/06/18/invites" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/05/13/supplementalResult" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/05/02/google98" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/04/05/whatIsGoogle" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/04/01/googleWebmail" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/03/06/ghostTown" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2004/02/06/dangers" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/10/22/googleBlogs" />
    </rdf:Seq>
  </items>
</channel>

<item rdf:about="http://simon.incutio.com/archive/2005/11/16/base">
  <title>Google Base is interesting</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;I'm still trying to get my head around &lt;a href=&quot;http://base.google.com/&quot;&gt;Google Base&lt;/a&gt;. Here's a brain-dump of my thinking so far. First, some links.&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;&lt;a href=&quot;http://base.google.com/base/about.html&quot;&gt;Google Base FAQ&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;http://googleblog.blogspot.com/2005/11/first-base.html&quot;&gt;Google Base introduction&lt;/a&gt; on the Google Blog (includes testimonials)&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;http://www.plasticbag.org/archives/2005/11/in_which_google_base_launches.shtml&quot;&gt;Tom's first impressions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p id=&quot;p-1&quot;&gt;Base is a very interesting product for a whole bunch of reasons. The data model is surprisingly simple on the surface: all items have a title, description, (optional) external URL, a &quot;type&quot; and a set of labels (a.k.a. tags) and &quot;attributes&quot;. Attributes are something for tag enthusiasts to get excited by - they're name/value pairs that are kind of like tags in that you can apply them to anything, but more structured and with a greater level of implied meaning.&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;Attributes instantly made me think of geotagging on &lt;a href=&quot;http://www.flickr.com/&quot;&gt;Flickr&lt;/a&gt;, where tags are overloaded to store latitude and longitude values (example &lt;a href=&quot;http://www.flickr.com/photos/clagnut/60936865/&quot;&gt;here&lt;/a&gt;). Having first class support for this kind of extensible data is a very powerful concept.&lt;/p&gt;

&lt;p id=&quot;p-3&quot;&gt;Another interesting problem that the Google Base data model could be used to tackle is Wikipedia's &lt;a href=&quot;http://en.wikipedia.org/wiki/Wikipedia:Wikiproject&quot;&gt;WikiProjects&lt;/a&gt;. If you look at any US Navy ship entry on Wikipedia (&lt;a href=&quot;http://en.wikipedia.org/wiki/USS_Iwo_Jima_%28LHD-7%29&quot;&gt;example&lt;/a&gt;) you'll see a table on the right hand side of standard attributes relating to that ship - things like Length, Displacement, Armament and so on. This data isn't really structured - it's just a wiki table, manually maintained by participants of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Ships&quot;&gt;Ships WikiProject&lt;/a&gt;.&lt;/p&gt;

&lt;p id=&quot;p-4&quot;&gt;Obviously this data would be more valuable if it was structured in a way that allowed queries to be made against it. Base-like attributes provide a way of doing this.&lt;/p&gt;

&lt;p id=&quot;p-5&quot;&gt;There's definitely a trend towards this kind of loose data model at the moment. JotSpot allows all pages within a wiki to have as many extra name/value attribute pairs as you like (even the wiki body itself is internally implemented as a special attribute), and Ning works along similar lines.&lt;/p&gt;

&lt;p id=&quot;p-6&quot;&gt;Base currently allows bulk importing of data using tab delimited files, RSS or Atom. There are no outward bound APIs which is a notable omission - I wouldn't be at all surprised to see them added in the next few weeks.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/11/16/base</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2005-11-16T12:34:46-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2005/07/08/toolbar">
  <title>Dissecting the Google Firefox Toolbar</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;Google have finally released a Firefox version of the &lt;a href=&quot;http://toolbar.google.com/&quot;&gt;Google Toolbar&lt;/a&gt;, with some nice &lt;a href=&quot;http://googleblog.blogspot.com/2005/07/platypus-of-internet.html&quot; title=&quot;The platypus of the Internet&quot;&gt;praise for XUL&lt;/a&gt; in to the bargain. Of course, the most interesting part of the toolbar from a geeky point of view is the bit that queries Google's servers for PageRank. Sure enough, if you download the &lt;code&gt;google-toolbar.xpi&lt;/code&gt; file, unzip it, then unzip the &lt;code&gt;google-toolbar.jar&lt;/code&gt; file within there's a file called &lt;code&gt;pagerank.js&lt;/code&gt; with all of the juicy details.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;To query PageRank, the toolbar makes a standard HTTP request to &lt;code&gt;toolbarqueries.google.com&lt;/code&gt;, with the page to query in a parameter along with a hash (presumably to discourage scraping). &lt;code&gt;pagerank.js&lt;/code&gt; includes the hash algorithm, with some amusing implementation details:&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;&lt;code class=&quot;javascript&quot;&gt;var GPR_HASH_SEED = &quot;Mining PageRank is AGAINST GOOGLE'S TERMS OF SERVICE. Yes, I'm talking to you, scammer.&quot;;&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;function GPR_awesomeHash(value) {
  var &lt;a href=&quot;http://www.imdb.com/title/tt0094012/quotes#qt0152593&quot; title=&quot;Spaceballs quotes&quot;&gt;kindOfThingAnIdiotWouldHaveOnHisLuggage&lt;/a&gt; = 16909125;
  ...
}&lt;/code&gt;&lt;/pre&gt;

&lt;p id=&quot;p-3&quot;&gt;The spell check feature (&lt;code&gt;spellcheck.js&lt;/code&gt;) is interesting as well. When you click the &quot;Check&quot; button, the toolbar packages any content in form fields up in XML and POSTs it to http://www.google.com/tbproxy/spell.  It get backs a simple XML document providing the offset, length and confidence for each spelling error along with a list of suggested alternatives. The user interface stuff is all handled by the extension.&lt;/p&gt;

&lt;p id=&quot;p-4&quot;&gt;If you want to watch the toolbar in action, I recommend the fantastic &lt;a href=&quot;http://livehttpheaders.mozdev.org/&quot;&gt;LiveHTTPHeaders extension&lt;/a&gt;.&lt;/p&gt;
</description>
  <link>http://simon.incutio.com/archive/2005/07/08/toolbar</link>
  <dc:subject>Google, Mozilla</dc:subject>
  <dc:date>2005-07-08T10:10:12-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2005/05/06/bad">
  <title>Fighting RFCs with RFCs</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;Google's recently released &lt;a href=&quot;http://webaccelerator.google.com/&quot;&gt;Web Accelerator&lt;/a&gt; apparently has some &lt;a href=&quot;http://www.37signals.com/svn/archives2/google_web_accelerator_hey_not_so_fast_an_alert_for_web_app_designers.php&quot; title=&quot;Google Web Accelerator: Hey, not so fast - an alert for web app designers&quot;&gt;scary side-effects&lt;/a&gt;. It's been spotted pre-loading links in password-protected applications, which can amount to clicking on every &quot;delete this&quot; link  -  bypassing even the JavaScript prompt you carefully added to give people the chance to think twice.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;&quot;Aah,&quot; I hear you cry, &quot;but &lt;a href=&quot;http://www.ietf.org/rfc/rfc2616.txt&quot; title=&quot;Hypertext Transfer Protocol -- HTTP/1.1&quot;&gt;RFC 2616&lt;/a&gt; clearly states that you shouldn't perform state changing operations with a GET or HEAD method!&quot;&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.ietf.org/rfc/rfc2616.txt&quot;&gt;&lt;p id=&quot;p-2&quot;&gt;In particular, the convention has been established that the GET and
   HEAD methods SHOULD NOT have the significance of taking an action
   other than retrieval.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p id=&quot;p-3&quot;&gt;I'll see your RFC 2616 and raise you an &lt;a href=&quot;http://www.ietf.org/rfc/rfc2119.txt&quot;&gt;RFC 2119&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.ietf.org/rfc/rfc2119.txt&quot;&gt;&lt;p id=&quot;p-4&quot;&gt;
SHOULD NOT   This phrase, or the phrase &quot;NOT RECOMMENDED&quot; mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p id=&quot;p-5&quot;&gt;Hiding your dangerous delete links behind an authentication scheme is a perfectly acceptable compromise. Web Accelerator is &lt;a href=&quot;http://www.tbray.org/ongoing/When/200x/2002/09/10/Good%20Technology&quot; title=&quot;Broken As Designed&quot;&gt;B.A.D&lt;/a&gt;.&lt;/p&gt;

&lt;p id=&quot;p-6&quot;&gt;&lt;strong&gt;Update:&lt;/strong&gt; Be sure to read the &lt;a href=&quot;http://simon.incutio.com/archive/2005/05/06/bad#comments&quot;&gt;excellent discussion&lt;/a&gt; brewing in the comments. Hiding behind authentication may not be as acceptable a compromise as I had first thought.&lt;/p&gt;

&lt;p id=&quot;p-7&quot;&gt;&lt;strong&gt;Update 2:&lt;/strong&gt; If you haven't been following the comments, I've had a change of heart. Even in the absence of Web Accelerator, hiding behind authentication leaves your application open to some very nasty security vulnerabilities (malicious pages can piggy-back your session and cause havoc making dangerous GET calls). I still think the RFC language covers people who thought long and hard before implementing a dangerous GET, but if you haven't thought about security and accelerating caching proxies such as Web Accelerator you haven't been thinking hard enough.&lt;/p&gt;

&lt;p id=&quot;p-8&quot;&gt;&lt;strong&gt;Update 3:&lt;/strong&gt; So, it turns out using POST is no defence at all against &lt;a href=&quot;http://www.squarefree.com/securitytips/web-developers.html#CSRF&quot;&gt;CSRF&lt;/a&gt; attacks. I've been learning a whole bunch of interesting stuff this evening.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/05/06/bad</link>
  <dc:subject>Google, Online Issues</dc:subject>
  <dc:date>2005-05-06T20:39:45-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2005/02/24/cruft">
  <title>Google cruft</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;New Google feature: &lt;a href=&quot;http://www.google.com/googleblog/2005/02/google-movies-now-playing.html&quot;&gt;Google Movies&lt;/a&gt;. Displays aggregated movie reviews (like &lt;a href=&quot;http://www.rottentomatoes.com/&quot;&gt;Rotten Tomatoes&lt;/a&gt;), looks up local movie times based on your zip code saved in Google Local (more evidence of the fabled Google cookie), and even handles &lt;a href=&quot;http://www.google.com/search?q=movie:tomantic+zombie+movie&quot;&gt;recommendations&lt;/a&gt;.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;The downside is that it's yet more cruft for the search results page. Here's a screenshot, with the cruft in red and the actual search results in green:&lt;/p&gt;

&lt;p id=&quot;p-2&quot; class=&quot;img&quot;&gt;&lt;img src=&quot;http://simon.incutio.com/images/2005/google-cruft.png&quot; alt=&quot;A screenshot of a Google search for &amp;quot;in good company&amp;quot;, showcases how much of the page is now taken up by results from Google news, Google print and Google movies.&quot; /&gt;&lt;/p&gt;

&lt;p id=&quot;p-3&quot;&gt;Ben Hammersley has &lt;a href=&quot;http://www.benhammersley.com/weblog/2005/02/24/google_movies_yet_another_category_killer.html&quot; title=&quot;Google Movies. Yet Another Category Killer.&quot;&gt;more commentary&lt;/a&gt;.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/02/24/cruft</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2005-02-24T00:34:00-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2005/02/08/maps">
  <title>Google Maps and XSL</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;I'll probably write more on this later, but it seems that &lt;a href=&quot;http://maps.google.com/&quot;&gt;Google Maps&lt;/a&gt; is using &lt;acronym title=&quot;eXtensible Stylesheet Language&quot;&gt;XSL&lt;/acronym&gt;. I spotted it loading the following pages while sniffing its activity with &lt;a href=&quot;http://livehttpheaders.mozdev.org/&quot;&gt;LiveHTTPHeaders&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;&lt;a href=&quot;http://maps.google.com/mapfiles/homepanel.xsl&quot;&gt;homepanel.xsl&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;http://maps.google.com/mapfiles/localinfo.xsl&quot;&gt;localinfo.xsl&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;http://maps.google.com/mapfiles/localpanel.xsl&quot;&gt;localpanel.xsl&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;http://maps.google.com/mapfiles/geocodepanel.xsl&quot;&gt;geocodepanel.xsl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p id=&quot;p-1&quot;&gt;This is in addition to the (now expected) XMLHttpRequest stuff. There even appears to be some of Microsoft's weird &lt;a href=&quot;http://msdn.microsoft.com/workshop/author/vml/default.asp&quot; title=&quot;Introduction to Vector Markup Language (VML)&quot;&gt;VML&lt;/a&gt;, although as I'm on a Mac I don't have access to &lt;acronym title=&quot;Internet Explorer&quot;&gt;IE&lt;/acronym&gt;/Windows to see what it's doing with it.&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;The bulk of the Google Maps JavaScript appears to be hidden away in &lt;a href=&quot;http://www.google.com/mapfiles/maps.1.js&quot;&gt;maps.1.js&lt;/a&gt;, which becomes a lot more readable if you feed it through &lt;a href=&quot;http://www.prettyprinter.de/&quot;&gt;PrettyPrinter.de&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;p-3&quot;&gt;As for Google Maps itself, it's an amazing piece of work but it's a shame they didn't follow &lt;a href=&quot;http://map.search.ch/&quot;&gt;map.search.ch&lt;/a&gt;'s lead in degrading gracefully to a static version for unsupported browsers.&lt;/p&gt;

&lt;p id=&quot;p-4&quot;&gt;If anyone has any further insights in to how it all works, please post them in a comment.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/02/08/maps</link>
  <dc:subject>Google, XML, DHTML and Javascript</dc:subject>
  <dc:date>2005-02-08T14:19:37-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2005/01/17/relNoFollow">
  <title>rel="nofollow"</title>
  <description>&lt;p&gt;Reading between the lines (which in this case isn't particularly hard), &lt;a href=&quot;http://archive.scripting.com/2005/01/14#When:11:45:23AM&quot; title=&quot;Scripting News&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;http://www.bloggercon.org/2005/01/15#a3294&quot; title=&quot;Placeholder&quot;&gt;this&lt;/a&gt; (don't forget to view source) suggest that Google are soon to announce that they won't be calculating PageRank for links with a &lt;code class=&quot;html&quot;&gt;rel=&quot;nofollow&quot;&lt;/code&gt; attribute. Finally, an official way of fighting the economics of comment spam by denying PageRank on user-submitted link content. Sam Ruby &lt;a href=&quot;http://www.intertwingly.net/blog/2005/01/16/rel-nofollow&quot;&gt;points&lt;/a&gt; to Mark Pilgrim's &lt;a href=&quot;http://www.intertwingly.net/blog/2003/11/17/Comment-Throttle#c1069204247&quot;&gt;prediction&lt;/a&gt; that spammers won't care - they'll spam anyway, on the offchance that they hit somewhere undefended. I'm optimistic - if the major weblog (and wiki) vendors get behind this one it could help stem the tide.&lt;/p&gt;

&lt;p&gt;As an aside, I have exams starting in a week and plenty to revise, so I'll probably be on hiatus until the end of the month.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/01/17/relNoFollow</link>
  <dc:subject>Google, Online Issues</dc:subject>
  <dc:date>2005-01-17T01:39:26-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/12/14/googlePrint">
  <title>Google Print</title>
  <description>&lt;p&gt;I'm probably late to the party on this one, but I just noticed that Google Print results are now &lt;a href=&quot;http://www.google.com/search?q=books+on+css&quot; title=&quot;Google Search: books on css&quot;&gt;included&lt;/a&gt; in any Google search that starts with &quot;books on&quot;. I can't say I like the lousy discoverability of the interface much - a search box at &lt;a href=&quot;http://print.google.com/&quot;&gt;print.google.com&lt;/a&gt; would be a welcome addition - but the results are pretty impressive. It's also a shame that they're using a nasty obfuscation technique to disable copying and printing (based on serving book pages up as background images), if only because it will fuel yet more questions from newbie web developers asking how to do exactly that. Still, with &lt;a href=&quot;http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2004/12/14/BUGADABBS91.DTL&quot; title=&quot;Google, 5 big libraries team to offer books&quot;&gt;today's announcement&lt;/a&gt; that Google are to team up with five leading libraries to scan more books this service is going to get a whole lot more important over the next few years.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/12/14/googlePrint</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-12-14T15:44:38-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/06/18/invites">
  <title>I have some gmail invites</title>
  <description>&lt;p&gt;I have four &lt;a href=&quot;http://gmail.google.com/&quot;&gt;gmail&lt;/a&gt; invites left. The first four people to leave a comment with their email address can have them (put it in the email field and fill in a &lt;acronym title=&quot;Universal Republic of Love&quot;&gt;URL&lt;/acronym&gt; as well if you're worried about spam harvesters - the email won't be displayed but I'll still have access to it). No &lt;a href=&quot;http://www.gmailswap.com/&quot; title=&quot;Gmail swap&quot;&gt;random gift&lt;/a&gt; or &lt;a href=&quot;http://www.jluster.org/node/view/134&quot; title=&quot;Do some good&quot;&gt;good deed&lt;/a&gt; necessary - consider it a &quot;thank you&quot; for reading.&lt;/p&gt;

&lt;p&gt;In related news, Adrian's &lt;a href=&quot;http://www.holovaty.com/blog/archive/2004/06/18/1751&quot; title=&quot;Accessing your Gmail inbox with Python&quot;&gt;Python gmail export script&lt;/a&gt; is really rather neat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; They all went in under 15 minutes! To avoid being bombarded with requests, I'll state now that I have no plans of handing out any more in this manner. Still, it was a fun way of drawing some of the lurkers out of hiding.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/06/18/invites</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-06-18T23:57:15-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/05/13/supplementalResult">
  <title>Supplemental Results</title>
  <description>&lt;p&gt;Does anyone know what Google means when it says that something is a &lt;a href=&quot;http://www.google.com/search?q=tortured+artist+skater+goth+jock&quot;&gt;&quot;Supplemental Result&quot;&lt;/a&gt;?&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/05/13/supplementalResult</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-05-13T07:22:11-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/05/02/google98">
  <title>Google, circa 1998</title>
  <description>&lt;p&gt;Thanks to the ever impressive &lt;a href=&quot;http://www.archive.org/&quot;&gt;Internet Archive&lt;/a&gt; (did you know they host &lt;a href=&quot;http://www.archive.org/movies/prelinger.php&quot; title=&quot;Prelinger Archives&quot;&gt;old public information films&lt;/a&gt; as well?) here's &lt;a href=&quot;http://web.archive.org/web/19980502040303/google.stanford.edu/&quot;&gt;Google's homepage from 1998&lt;/a&gt;. Their searchable index was slightly less than 25 million pages, their hardware was &lt;a href=&quot;http://web.archive.org/web/19980502040406/google.stanford.edu/googlehardware.html&quot; title=&quot;Google Hardware&quot;&gt;less than a dozen machines&lt;/a&gt; apparently held together with lego and their crawler &lt;a href=&quot;http://web.archive.org/web/19980502040427/google.stanford.edu/FAQ.html&quot; title=&quot;Google/BackRub Frequently Asked Questions&quot;&gt;was called BackRub&lt;/a&gt;. Following the links will take you to Sergey and Larry's homepages, where digging a little deeper will even uncover the &lt;a href=&quot;http://google.blogspace.com/archives/001199&quot; title=&quot;Google Weblog: Sergey Brin in Drag - EXCLUSIVE&quot;&gt;now infamous&lt;/a&gt; Sergey-in-drag photo.&lt;/p&gt;

&lt;p&gt;My favourite insight though comes from the &lt;a href=&quot;http://web.archive.org/web/19990218090824/www-pcd.stanford.edu/~page/lego.html&quot;&gt;Legos page&lt;/a&gt; (why do Americans insist on adding an 's'?) on Larry's site:&lt;/p&gt;

&lt;blockquote cite=&quot;http://web.archive.org/web/19990218090824/www-pcd.stanford.edu/~page/lego.html&quot;&gt;&lt;p&gt;I attriubute a great deal of my understanding and ability with mechanical devices to Legos and similar construction toys.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I've often thought that the most important factor in leading me to geek-hood, apart from a C64, was a serious obsession with Lego Technic from an early age.&lt;/p&gt;

&lt;p&gt;I've avoided posting about &lt;a href=&quot;http://battellemedia.com/archives/000627.php&quot; title=&quot;John Battelle's analysis&quot;&gt;the Google IPO&lt;/a&gt; because know nothing of the world of finance, but you've gotta love that Google's initial &lt;acronym title=&quot;Initial Public Offering&quot;&gt;IPO&lt;/acronym&gt; valuation is &lt;em&gt;e * 10 ^ 9&lt;/em&gt;.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/05/02/google98</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-05-02T20:43:05-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/04/05/whatIsGoogle">
  <title>What is Google?</title>
  <description>&lt;p&gt;Via &lt;a href=&quot;http://battellemedia.com/archives/000536.php&quot; title=&quot;Skrenta Groks The Google Platform OS&quot;&gt;John Battelle&lt;/a&gt;, Rick Skrenta's &lt;a href=&quot;http://blog.topix.net/archives/000016.html&quot; title=&quot;The Secret Source of Google's Power&quot;&gt;remarkable piece&lt;/a&gt; on what Google have actually built. They don't just have the world's best search engine, they have the world's largest and most scalable platform for developing huge web-based applications.&lt;/p&gt;

&lt;blockquote cite=&quot;http://blog.topix.net/archives/000016.html&quot;&gt;
&lt;p&gt;Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.&lt;/p&gt;

&lt;p&gt;[ ... ]&lt;/p&gt;

&lt;p&gt;While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fascinating stuff.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/04/05/whatIsGoogle</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-04-05T08:06:09-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/04/01/googleWebmail">
  <title>1GB of webmail from Google</title>
  <description>&lt;p&gt;Provided &lt;a href=&quot;http://news.com.com/2100-1032_3-5182805.html?tag=nefd_top&quot; title=&quot;Google to offer gigabyte of free e-mail&quot;&gt;this story&lt;/a&gt; about a new 1 GB webmail service from Google isn't a lame early April fool, I'm really psyched about it. A decent amount of space combined with Google's search technology could really help me keep up to date with my email. Just off the top of my head, here's my ideal hosted webmail feature list:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;Costs between two and ten dollars a month. This is a critical service I'm entrusting to a company - I want the reassurance that there's cold hard cash on the line if they screw anything up. Paying for a no-ads option is fine too.&lt;/li&gt;
 &lt;li&gt;Smart filtering. I'm on a bunch of different mailing lists and can't live without filters.&lt;/li&gt;
 &lt;li&gt;&quot;Virtual folders&quot; - things like &quot;all emails from this person&quot;, &quot;emails sent to this person&quot; should be on hand at all times.&lt;/li&gt;
 &lt;li&gt;Import and export mail as mbox files. Don't lock me in!&lt;/li&gt;
 &lt;li&gt;Decent spam filtering, obviously.&lt;/li&gt;
 &lt;li&gt;Send plain text, receieve plain text. No &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; in either direction.&lt;/li&gt;
 &lt;li&gt;Kick-arse search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don't really have very complex needs, which is why I'm so frustrated that so far nothing I've tried has done what I need it to. Here's hoping Google can hit the mark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The Google &lt;a href=&quot;http://www.google.com/press/pressrel/gmail.html&quot; title=&quot;Google Gets the Message, Launches Gmail&quot;&gt;press release&lt;/a&gt; is dated April 1st. Given &lt;a href=&quot;http://www.google.com/technology/pigeonrank.html&quot; title=&quot;Pigeon Rank&quot;&gt;their history&lt;/a&gt;, it's almost certainly a hoax. Aah well, we can dream.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/04/01/googleWebmail</link>
  <dc:subject>Google</dc:subject>
  <dc:date>2004-04-01T02:35:15-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/03/06/ghostTown">
  <title>Ghost town, sponsored by Google</title>
  <description>&lt;p&gt;Via &lt;a href=&quot;http://boingboing.net/2004_03_01_archive.html#107852164183349516&quot; title=&quot;Photoblogging Chernobyl &quot;&gt;Boing Boing&lt;/a&gt;, this &lt;a href=&quot;http://www.angelfire.com/extreme4/kiddofspeed/page2.html&quot;&gt;fascinating and utterly chilling&lt;/a&gt; photographic journey through the abandoned ruins of the Chernobyl dead zone.&lt;/p&gt;

&lt;p&gt;As an aside, the free hosting provide used by the site appears to be inserting Google ads. Make sure you look out for them as you explore the site; the relevance algorithm gets stretched to the limit.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/03/06/ghostTown</link>
  <dc:subject>Blogging, Google</dc:subject>
  <dc:date>2004-03-06T00:30:00-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2004/02/06/dangers">
  <title>The dangers of PageRank</title>
  <description>&lt;p&gt;A well documented side effect of the weblog format is that it brings Google PageRank in almost absurd quantities. I'm now the 5th result for &lt;a href=&quot;http://www.google.com/search?q=simon&quot; title=&quot;Google Search: simon&quot;&gt;simon&lt;/a&gt; on Google, and I've been the top result for &lt;a href=&quot;http://www.google.com/search?q=simon+willison&quot;&gt;simon willison&lt;/a&gt; almost since the day I launched. High rankings however are not always a good thing, especially when combined with a comment system. A growing number of bloggers have found themselves at the top position for terms of little or no relevance to the rest of their sites, which in turn can attract truly surreal comments from visitors from search engines who may never have encountered a blog before.&lt;/p&gt;

&lt;p&gt;I know of a couple of entries on my own blog that are attracting this kind of traffic. The most interesting is probably &lt;a href=&quot;http://simon.incutio.com/archive/2003/08/13/artificialDiamonds&quot;&gt;this entry&lt;/a&gt; on &lt;a href=&quot;http://www.google.com/search?q=artificial+diamonds&quot; title=&quot;Google Search: artificial diamonds&quot;&gt;artifical diamonds&lt;/a&gt;, which has attracted comments from both buyers and sellers of artificial gems. My &lt;a href=&quot;http://simon.incutio.com/archive/2002/12/09/badInterfaceDesignFromMicrosof&quot;&gt;entry&lt;/a&gt; on MSN messenger usability problems from 2002 has drawn a steady stream of hilarious comments, no doubt caused in part by its top rating on Google for &lt;a href=&quot;http://www.google.com/search?msn+messenger+sucks&quot; title=&quot;Google Search: msn messenger sucks&quot;&gt;msn messenger sucks&lt;/a&gt;. Amusingly, for a long time &lt;a href=&quot;http://search.msn.com/&quot;&gt;Microsoft's own search engine&lt;/a&gt; was giving my page a high rank for a wide variety of less negative messenger related terms.&lt;/p&gt;

&lt;p&gt;My own experiences of this phenomenon pale in to significance to some of the others I've seen. The most impressive example has to be Jason Kottke's &lt;a href=&quot;http://www.kottke.org/03/05/the-matrix-reloaded&quot;&gt;brief review&lt;/a&gt; of the Matrix Reloaded, which drew over 900 comments from Google strays, developed its own micro-community and resulted in Jason pondering &lt;a href=&quot;http://www.kottke.org/03/06/own-conversation&quot;&gt;who owns the conversation on my web site?&lt;/a&gt; Jason eventually deciding to close and archive the thread after the page grew to more than a megabyte in size.&lt;/p&gt;

&lt;p&gt;The problem can take on a far more disturbing twist. I won't link directly to these entries for fear of adding to their predicaments, but searches for &lt;a href=&quot;http://www.google.com/search?q=crime+scene+cleanup&quot; title=&quot;Google Search: crime scene cleanup&quot;&gt;crime scene cleanup&lt;/a&gt; and &lt;a href=&quot;http://www.google.com/search?q=suicide+chat+rooms&quot; title=&quot;Google Search: suicide chat rooms&quot;&gt;suicide chat rooms&lt;/a&gt; both return blogs in the first two results. The former thread is mostly crime scene cleanup companies marketing their services, but the latter is quite frankly disturbing. It's certainly lead me to double check the titles of my entries before posting them.&lt;/p&gt;

&lt;p&gt;Thankfully, avoiding this kind of unwanted comment traffic is pretty simple. One way is to simply disable comments for entries older than a certain time (generally a couple of weeks), although personally I like to see the occasional comment on old entries. A neater solution proposed by Russell Beattie last year is to simply &lt;a href=&quot;http://www.beattie.info/notebook/1003990.html&quot; title=&quot;Googler Comments&quot;&gt;hide comments from search engine referrals&lt;/a&gt;, thus ensuring that random strays won't leave their mark without understanding the nature of your site first.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2004/02/06/dangers</link>
  <dc:subject>Blogging, Google</dc:subject>
  <dc:date>2004-02-06T16:58:23-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/10/22/googleBlogs">
  <title>Google's Internal Blogs</title>
  <description>&lt;p&gt;&lt;a href=&quot;http://news.com.com/2008-1025-5094753.html?tag=nefd_acpro&quot; title=&quot;Blog on&quot;&gt;Evan Williams&lt;/a&gt; on Google's intranet weblogs:&lt;/p&gt;

&lt;blockquote cite=&quot;http://news.com.com/2008-1025-5094753.html?tag=nefd_acpro&quot;&gt;
&lt;dl&gt;
&lt;dt&gt;How many people blog at Google?&lt;/dt&gt;
&lt;dd&gt;Not sure what the count is, but I know there's a couple hundred or more. It's really interesting to see the network grow from scratch.&lt;/dd&gt;
&lt;dt&gt;Do you use that to get to know one another or to keep up-to-date on projects?&lt;/dt&gt;
&lt;dd&gt;A lot of people use it to keep up-to-date on projects and to share pointers or expertise. I've heard people comment on how it's way easier to know what's going on internally now. You can find out what's going on when you go there or when you're curious about it, but you don't have to be deluged or distracted from your normal day.&lt;/dd&gt;
&lt;/dl&gt;
&lt;/blockquote&gt;

&lt;p&gt;Markup question: I used a definition list in the above quotation - was this appropriate? Is there a better way of marking up this information?&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/10/22/googleBlogs</link>
  <dc:subject>Blogging, Google, [X]HTML and CSS</dc:subject>
  <dc:date>2003-10-22T04:58:32-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>

</rdf:RDF>