<?xml version="1.0"?>

<rdf:RDF 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
>

<channel rdf:about="http://simon.incutio.com/syndicate/contentmanagement/rss1.0">
  <title>Content Management</title>
  <link>http://simon.incutio.com/</link>
  <description>Simon Willison's Content Management cateory</description>
  <language>en-uk</language>
  <webMaster>simon@incutio.com</webMaster>
  <items>
    <rdf:Seq>
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2005/08/03/django" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/12/13/staticContentGeneration" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/12/05/simple" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/10/30/nvu" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/09/15/newBlog" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/08/06/cleanURLtip" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/08/03/futureProotContent" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/07/09/textile2" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/07/02/timeline" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/04/25/siteSearch" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/03/18/phpAndJavascriptSpellChecker" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/03/12/moreNukes" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/03/08/spellCheckInWebApplications" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/03/01/vectorSearchEngines" />
      <rdf:li rdf:resource="http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker" />
    </rdf:Seq>
  </items>
</channel>

<item rdf:about="http://simon.incutio.com/archive/2005/08/03/django">
  <title>Exciting developments with Django</title>
  <description>&lt;p id=&quot;p-0&quot;&gt;The amount of activity surrounding the &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;Django web framework&lt;/a&gt; since its not-quite release a few weeks ago is amazing. Adrian, Jacob and Wilson have been working over-time, with 395 check-ins to source control since the 13th of July. They've added &lt;a href=&quot;http://code.djangoproject.com/file/django/trunk/django/core/handlers/wsgi.py&quot;&gt;WSGI support&lt;/a&gt;, a &lt;a href=&quot;http://www.djangoproject.com/weblog/2005/jul/18/local_server/&quot;&gt;development web server&lt;/a&gt;, &lt;a href=&quot;http://www.djangoproject.com/weblog/2005/jul/29/model_examples/&quot;&gt;unit-tests&lt;/a&gt;, a &lt;a href=&quot;http://www.djangoproject.com/documentation/&quot;&gt;ton of documentation&lt;/a&gt;, &lt;a href=&quot;http://www.djangoproject.com/weblog/2005/jul/21/sqlite3/&quot;&gt;SQLite support&lt;/a&gt;, &lt;a href=&quot;http://code.djangoproject.com/changeset/384&quot;&gt;database introspection&lt;/a&gt; and dozens of other feature tweaks and bug fixes. Check out the &lt;a href=&quot;http://code.djangoproject.com/timeline&quot;&gt;project Timeline&lt;/a&gt; for an idea of just how frenetic things have been.&lt;/p&gt;

&lt;p id=&quot;p-1&quot;&gt;The emerging &lt;a href=&quot;http://www.djangoproject.com/community/&quot;&gt;Django community&lt;/a&gt; has been kicking in as well. There's a significant community-led initiative to get &lt;a href=&quot;http://code.djangoproject.com/wiki/InterNationalization&quot;&gt;internationalisation&lt;/a&gt; and &lt;a href=&quot;http://code.djangoproject.com/wiki/Localization&quot;&gt;localisation&lt;/a&gt; going, and a wide number of unofficial tutorials have emerged to complement &lt;a href=&quot;http://www.djangoproject.com/documentation/tutorial1/&quot;&gt;the one on the site&lt;/a&gt;.&lt;/p&gt;

&lt;p id=&quot;p-2&quot;&gt;Here's where things get really interesting: changes at the Journal-World have kick-started the Django job market. Rob Curley, formally in charge of the World Company's web activities, recently &lt;a href=&quot;http://www.digitaledge.org/DigArtPage.cfm?AID=7083&quot;&gt;took up a new position&lt;/a&gt; at the &lt;a href=&quot;http://www.naplesnews.com/&quot;&gt;Naples Daily News&lt;/a&gt; in Florida. Rob &lt;a href=&quot;http://eric.themoritzfamily.com/?p=48&quot;&gt;just hired Eric Moritz&lt;/a&gt;, a regular on the #django IRC channel, to work on Django-powered projects there.&lt;/p&gt;

&lt;p id=&quot;p-3&quot;&gt;Meanwhile, Adrian Holovaty has &lt;a href=&quot;http://www.poynter.org/column.asp?id=31&amp;amp;aid=86489&quot;&gt;taken a new job&lt;/a&gt; at the &lt;a href=&quot;http://www.washingtonpost.com/&quot;&gt;Washington Post&lt;/a&gt; as &quot;Editor, Editorial Innovations&quot; - a role that is sure to involve some very innovative use of Django (Adrian built &lt;a href=&quot;http://chicagocrime.org/&quot;&gt;chicagocrime.org&lt;/a&gt;). Adrian's departure means that the Journal-World are &lt;a href=&quot;http://www.holovaty.com/blog/archive/2005/08/03/0202&quot;&gt;looking for a new developer&lt;/a&gt; - here's &lt;a href=&quot;http://simon.incutio.com/archive/2004/06/29/job&quot; title=&quot;Fancy a job?&quot;&gt;why you should apply&lt;/a&gt;.&lt;/p&gt;

&lt;p id=&quot;p-4&quot;&gt;One thing's for certain: we're going to see some very exciting Django-powered sites in the next few months.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2005/08/03/django</link>
  <dc:subject>Python, Open Source, Content Management, Django</dc:subject>
  <dc:date>2005-08-03T16:56:34-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/12/13/staticContentGeneration">
  <title>Static content generation</title>
  <description>&lt;p&gt;Ian Bicking has an interesting pieces on &lt;a href=&quot;http://blog.colorstudy.com/ianb/weblog/2003/12/12.html#P45&quot; title=&quot;CMS and static publishing&quot;&gt;using static publishing&lt;/a&gt; in a &lt;acronym title=&quot;Content Management System&quot;&gt;CMS&lt;/acronym&gt;. The choice between static and dynamic when building software for the web is a critical one, and one that I think deserves in-depth discussion.&lt;/p&gt;

&lt;p&gt;In a dynamic site, pages are assembled &quot;on the fly&quot; as and when they are requested. Most &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; powered sites do this and as &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; as a technology actively encourages dynamic content creation. Generating pages dynamically allows for all sorts of clever applications, from &lt;a href=&quot;http://www.natbat.co.uk/quotecode.php&quot;&gt;random quote generators&lt;/a&gt; to full on web applications such as Hotmail.&lt;/p&gt;

&lt;p&gt;In a static publishing system, &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; pages are pre-generated by the publishing software and stored as flat files on the web server, ready to be served. This approach is less flexible than dynamic generation in many ways and is often ignored as an option as a result, but in fact the vast majority of content sites consist of primarily static pages and could be powered by static content generation without any loss of functionality to the end user.&lt;/p&gt;

&lt;p&gt;The most widespread example of a static publishing system I've seen is &lt;a href=&quot;http://www.movabletype.org/&quot;&gt;Moveable Type&lt;/a&gt;, which rebuilds static files for a site each time a weblog entry is added or modified - although it can be configured to serve content dynamically instead.&lt;/p&gt;

&lt;p&gt;At first glance, the benefits of dynamic publishing are obvious. What is frequently ignored are the benefits of static publishing, at least for content-driven sites which don't have any heavy need for dynamic features. The most obvious benefit is performance; serving static files is what web servers such as Apache are optimised to do, and they can do it &lt;em&gt;fast&lt;/em&gt;. A second advantage is reliability, as Ian &lt;a href=&quot;http://blog.colorstudy.com/ianb/weblog/2003/12/12.html#P45&quot; title=&quot;CMS and static publishing&quot;&gt;explains&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite=&quot;http://blog.colorstudy.com/ianb/weblog/2003/12/12.html#P45&quot;&gt;&lt;p&gt;A big part is that it takes the pressure off of going live. I can be sure before going live that the public website is correct. The actual CMS may explode in flames, but the site will be fine. Going live with a web application is always a stressful process, and anything that reduces the stress of that is a great benefit. As time goes on, static publishing is also a big stress reduction for the system administrator, since a simple Apache configuration is a lot more reliable under different loads and configurations than any dynamic site will be.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I've been developing dynamic sites almost exclusively for the past two or three years, but a couple of my most recent projects were static rather than dynamic. These were the &lt;a href=&quot;http://coupons.lawrence.com/&quot;&gt;LJWorld.com Coupons site&lt;/a&gt; and the &lt;a href=&quot;http://www.kusports.com/multimedia/photogalleries/basketball/03-04/&quot; title=&quot;Men's basketball 2003-04 season photo galleries&quot;&gt;KUSports.com photo galleries&lt;/a&gt;. I wanted to write both of these in Python, because doing so would make the process of transferring them over to our new mod_python powered &lt;acronym title=&quot;Content Management 

System&quot;&gt;CMS&lt;/acronym&gt; (currently in development) far less involved. Unfortunately our main production servers don't currently have mod_python configured, and we weren't overly keen on setting it up there for the sake of a couple of small projects. Instead I decided to write the administration interfaces using Python &lt;acronym title=&quot;Common Gateway Interface&quot;&gt;CGI&lt;/acronym&gt; scripts, but generate the actual front end pages (which would see far heavier traffic) as static files.&lt;/p&gt;

&lt;p&gt;In addition to the performance and reliability benefits, an additional benefit is that static generation provides a simple &quot;staging area&quot; style feature for free. Both the coupons and the gallery interfaces allow users to make multiple changes to site content safe in the knowledge that none of the changes will become visible until the &quot;Publish Site&quot; button is selected. At first I was worried that this extra step could prove confusing, but in practise it allows our content producers to make changes in a safe environment, without fear of accidentally breaking the public site while they are working.&lt;/p&gt;

&lt;p&gt;Static content generation certainly isn't appropriate for every project, but for plain content sites sites that don't need dynamic features it's a much more viable option than many people think.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/12/13/staticContentGeneration</link>
  <dc:subject>Content Management</dc:subject>
  <dc:date>2003-12-13T01:10:55-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/12/05/simple">
  <title>Simpler content managment</title>
  <description>&lt;p&gt;&lt;a href=&quot;http://www.smh.com.au/articles/2003/12/01/1070127346271.html&quot;&gt;Perls of wisdom in a sea of site mismanagement&lt;/a&gt;, via the ever-excellent &lt;a href=&quot;http://www.steptwo.com.au/columntwo/archives/001013.html#001013&quot; title=&quot;Perls of wisdom in a sea of site mismanagement&quot;&gt;Column Two&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.smh.com.au/articles/2003/12/01/1070127346271.html&quot;&gt;
&lt;p&gt;The great surprise of the past five years of content management is that, despite all the hundreds of systems, no clear winners have emerged. Instead, there's a growing dissatisfaction with the ongoing technical burden that such systems impose.&lt;/p&gt;

&lt;p&gt;Some influential voices are starting to argue that many sites should, in effect, wait out this immature phase of website management. For the moment, they should content themselves with limited automation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The article concludes with the idea that many sites can do perfectly well with a few simple Perl scripts and maybe a relational database on the back end, rather than investing in an expensive super-package that claims to be able to do anything you could possibly want. This is very sound advice. The simple fact of the matter is that many sites really don't need a complex content management platform with support for templating, user logins, workflow, versioning and a dozen other high end features. Most sites just need someone to be able to easily update them, when necessary. This is why Macromedia Contribute has been such a success - people want the ability to hit &quot;Edit This Page&quot;, make a few changes and publish straight to their site.&lt;/p&gt;

&lt;p&gt;I've worked on my fair share of content management systems (in fact I'm helping develop one at the moment) and out of all of the ones I've been involved in, the one I got the biggest kick out of took the shortest time to develop. It was based on &lt;a href=&quot;http://tavi.sourceforge.net/&quot;&gt;Tavi Wiki&lt;/a&gt;, and consisted of a password protected Tavi install for the back end and a slightly modified separate install for the front end. Both installs pointed to the same database, but the front end was altered to disable all editing features and make the site look less like a Wiki. You can see the end result &lt;a href=&quot;http://simon.incutio.com/uni/scheme/&quot; title=&quot;Home Page - Scheme&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All in all, the &lt;acronym title=&quot;Content Management System&quot;&gt;CMS&lt;/acronym&gt; took less than an hour to put together from start to finish. It made it easy enough for contributors with no previous knowledge of HTML to update the site (using Wiki markup) and provided us with full versioning on all content contained within the project. The final site gives very few clues that the underlying engine is a Wiki, and thanks to Tavi's ease of customisation the site design can be easily changed to look even less wiki-like. It's close to the simplest thing that could possibly work and it works just fine.&lt;/p&gt;

&lt;p&gt;Of course, if you don't have a competent server-side programmer to hand your only option is to buy a pre-made solution, but with a half-decent programmer and a good set of tools a simple home built &lt;acronym title=&quot;Content Management System&quot;&gt;CMS&lt;/acronym&gt; customised to fit your needs could be a much better investment than some $100,000 one-size-fits all monstrosity.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/12/05/simple</link>
  <dc:subject>PHP, Content Management</dc:subject>
  <dc:date>2003-12-05T01:36:28-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/10/30/nvu">
  <title>Nvu</title>
  <description>&lt;p&gt;Launched today by &lt;a href=&quot;http://www.lindows.com/&quot;&gt;Lindows&lt;/a&gt;, &lt;a href=&quot;http://www.nvu.com/&quot;&gt;Nvu&lt;/a&gt; is a new project to develop a complete &quot;web authoring system&quot; (aka Dreamweaver/Frontpage style &lt;acronym title=&quot;What You See Is What You Get&quot;&gt;WYSIWYG&lt;/acronym&gt; editor) for the Linux platform. Reading around the marketing hyperbole, What it actually &lt;em&gt;is&lt;/em&gt; is a standalone version of Mozilla's Composer with a whole bunch of improvements and extra features, scheduled for release in early 2004.&lt;/p&gt;

&lt;p&gt;The really good news is that Lindows have hired &lt;a href=&quot;http://webperso.easyconnect.fr/danielglazman/weblog/newarchive/2003_10_26_glazblogarc.html#s106745691981265156&quot; title=&quot;More news about Lindows.com and Composer&quot;&gt;Daniel Glazman&lt;/a&gt; to work on the project full time. Daniel is the principle maintainer of Mozilla Composer and it's great to see his work being funded in this way. Improvements to Composer made for Nvu seem certain to make it back in to the main Mozilla trunk.&lt;/p&gt;

&lt;p&gt;The not-so-good news is that the Nvu site and &lt;a href=&quot;http://www.nvu.com/faq.html&quot;&gt;FAQ&lt;/a&gt; make no mention of a Windows version of the software. Since development is being sponsored by Lindows this shouldn't come as a surprise, but it is disappointing because the great strength of the Mozilla framework is that it supports cross-platform development.&lt;/p&gt;

&lt;p&gt;It's also going to be interesting to see how this pans out from a web standards point of view. On the one hand, Daniel is certainly on the side of web standards. On the other hand, the current Nvu site's markup seems to come right out of 1998. I very much doubt it was created with the tool it promotes but it doesn't give off a particularly reassuring message. A free &lt;acronym title=&quot;What You See Is What You Get&quot;&gt;WYSIWYG&lt;/acronym&gt; &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; editor with strong standards support out of the box would be a valuable tool in helping authors create standards compliant sites. As it is, Dreamweaver is getting close but Dreamweaver also costs nearly $400.&lt;/p&gt;

&lt;p&gt;Wild conjecturing apart, the most interesting thing about Nvu (the name's pretty terrible but at least they got a three letter domain name) may well be the features that are added to the current Composer code base. The &lt;a href=&quot;http://www.nvu.com/screenshots.html&quot;&gt;screenshots&lt;/a&gt; already hint at intelligent FTP publishing support; what would be &lt;em&gt;really&lt;/em&gt; interesting would be the integration of a simple templating system and/or client side &lt;acronym title=&quot;Content Management System&quot;&gt;CMS&lt;/acronym&gt; capabilities - something like a cross between Fog Creek's &lt;a href=&quot;http://www.fogcreek.com/CityDesk/&quot;&gt;CityDesk&lt;/a&gt; and Macromedia's &lt;a href=&quot;http://www.macromedia.com/software/contribute/&quot;&gt;Contribute&lt;/a&gt;. Interested developers are invited to &lt;a href=&quot;http://www.nvu.com/developers.php&quot;&gt;sign up for a mailing list&lt;/a&gt; for notification of when the first development builds are made available. This is going to be an interesting project to watch.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/10/30/nvu</link>
  <dc:subject>Web Standards, Content Management</dc:subject>
  <dc:date>2003-10-30T04:46:03-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/09/15/newBlog">
  <title>New content management blog</title>
  <description>&lt;p&gt;&lt;a href=&quot;http://www.nmpub.com/blog/&quot;&gt;Ideas in Technology and Publishing&lt;/a&gt; is a great new blog covering content management, &lt;acronym title=&quot;eXtensible Markup Language&quot;&gt;XML&lt;/acronym&gt; and other publishing related technologies. It's less than a month old so it's still possible to read through the archives in full, which I've just done and recommend to anyone with an interest in content management.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/09/15/newBlog</link>
  <dc:subject>XML, Content Management</dc:subject>
  <dc:date>2003-09-15T11:06:25-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/08/06/cleanURLtip">
  <title>Neat tip for clean URLs</title>
  <description>&lt;p&gt;Here's one of the neatest tips for clean URLs I've seen yet, &lt;a href=&quot;http://www.vandervossen.net/2003/07/clean_url&quot; title=&quot;Clean URLs&quot;&gt;from Thijs van der Vossen&lt;/a&gt;. He's come up with a mod_rewrite rule that checks to see if the requested file exists if you add .html on to the end of it, and serves it up if that's the case. I'm posting the full code snippet here because it's just too good to risk losing to link-rot in the distant future:&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.vandervossen.net/2003/07/clean_url&quot;&gt;&lt;pre&gt;&lt;code&gt;RewriteEngine on 
RewriteBase /
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.*) $1\.html [L] 
&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;</description>
  <link>http://simon.incutio.com/archive/2003/08/06/cleanURLtip</link>
  <dc:subject>Content Management</dc:subject>
  <dc:date>2003-08-06T20:18:04-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/08/03/futureProotContent">
  <title>XHTML for future-proof content</title>
  <description>&lt;p&gt;Don Park &lt;a href=&quot;http://www.docuverse.com/blog/donpark/2003/07/31.html#a773&quot;&gt;questions&lt;/a&gt; the benefits of emitting &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt;. In one sense, Don is right; publishing a whole site using &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; in this day and age brings very little benefit and can cause a great deal of grief. But just because &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; doesn't provide advantages when publishing whole sites does not mean it should be written off entirely. As I've said on this blog many times before, &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; offers an excellent format for future-proofing site content, especially chunks of content kept in a database. Keith D. Robinson makes some excellent points along the same lines in his latest essay, &lt;a href=&quot;http://www.7nights.com/asterisk/archives/standards_semantic_markup_distributed_authorship_and_knowledge_management.php&quot;&gt;Standards, Semantic Markup, Distributed Authorship and Knowledge Management&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.7nights.com/asterisk/archives/standards_semantic_markup_distributed_authorship_and_knowledge_management.php&quot;&gt;&lt;p&gt;
XHTML is, at it's most basic, much simpler and easier to learn that traditional HTML 4.0.  With a simple style guide, standard markup and CSS styles you can accomplish almost all the formatting a content author would need, just by knowing a handful of markup tags.  Instead of trusting the CMS to sort out code from Word, for example, you can hand a content owner a cheat sheet with the basic tags outlined and trust that they can code their own content.  I mean, really, how hard is is to learn 10 or so tags?  Team this technique with a tool like Contribute and you've got a nice, simple and cheap process that, while doesn't store you content in a database, keeps it in a clean, standard form you can repurpose down the road.
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;As for ensuring entered &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; is valid, I think this site's comment system does a pretty good job of showing how that can be achieved with only a small amount of server side effort.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/08/03/futureProotContent</link>
  <dc:subject>[X]HTML and CSS, Content Management</dc:subject>
  <dc:date>2003-08-03T21:39:09-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/07/09/textile2">
  <title>Textile 2</title>
  <description>&lt;p&gt;Textile 2 is now &lt;a href=&quot;http://www.textism.com/tools/textile/&quot; title=&quot;Textism: Textile&quot;&gt;available for testing&lt;/a&gt;, courtesy of Dean Allen. Textile is one of the more popular structured-text style markup languages, which translate a simple markup language in to &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt;. What made the original Textile special was that it concentrated squarely on structural markup, providing intuitive shortcuts for most structural &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; elements. Textile 2 takes this further, but also introduces a number of presentational effects such as block alignment. Beta &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; code is available in addition to the demo.&lt;/p&gt;

&lt;p&gt;That said, the killer feature of Textile 2 in my opinion is the fact that it has been developed as a collaborative effort between Dean Allen and Brad Choate. Brad created a Perl port of the original Textile (Mark Pilgrim &lt;a href=&quot;http://diveintomark.org/projects/pytextile/&quot; title=&quot;PyTextile&quot;&gt;did the same thing&lt;/a&gt; for Python) but this time round they have been &lt;a href=&quot;http://www.bradchoate.com/past/001653.php&quot; title=&quot;Thither MT-Textile 2 (beta)&quot;&gt;working together&lt;/a&gt; to define the format. If they are successful, Textile could become a useful mini-standard for authoring structural markup. At any rate, since Textile is intended to be a shorthand technique to complement &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; rather than replace it it is well worth a look.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/07/09/textile2</link>
  <dc:subject>Content Management</dc:subject>
  <dc:date>2003-07-09T00:08:04-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/07/02/timeline">
  <title>Knowledge Representation Timeline</title>
  <description>&lt;p&gt;This is pretty impressive: A &lt;a href=&quot;http://www.robotwisdom.com/ai/timeline/0000.html&quot;&gt;Timeline of knowledge-representation&lt;/a&gt; that starts at the dawn of the Universe and continues through the whole of human history right up to the present day.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/07/02/timeline</link>
  <dc:subject>Content Management</dc:subject>
  <dc:date>2003-07-02T16:27:00-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/04/25/siteSearch">
  <title>Site search finally available</title>
  <description>&lt;p&gt;I've finally got around to adding a &lt;a href=&quot;/search&quot;&gt;search page&lt;/a&gt; to this site. It uses MySQL's &lt;a href=&quot;http://www.mysql.com/doc/en/Fulltext_Search.html&quot;&gt;full text indexing&lt;/a&gt;, which is extremely fast and provides good results but comes at the expense of flexibility. Search terms less than 4 letters long are ignored, and multi-word searches are handled using OR rather than AND. This nearly put me off using it, but the relevancy algorithm is excellent which I think outweighs the disadvantage of not being able to use pure AND queries.&lt;/p&gt;

&lt;p&gt;MySQL 4.0 introduces far more powerful boolean mode full text searches which allow all kinds of modifiers and extra syntax, but this site currently runs on 3.23.54 so I can't play with those just yet. Jeremy Zawodny's &lt;a href=&quot;http://www.linux-mag.com/2003-01/mysql_03.html&quot; title=&quot;http://www.linux-mag.com/2003-01/mysql_03.html&quot;&gt;article on MySQL 4&lt;/a&gt; explains boolean mode and describes many other exciting new MySQL features as well.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/04/25/siteSearch</link>
  <dc:subject>Content Management, Search Engines</dc:subject>
  <dc:date>2003-04-25T16:55:02-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/03/18/phpAndJavascriptSpellChecker">
  <title>PHP and Javascript spell checker</title>
  <description>&lt;p&gt;Last week I &lt;a href=&quot;http://simon.incutio.com/archive/2003/03/08/#spellCheckInWebApplications&quot;&gt;commented&lt;/a&gt; that &lt;a href=&quot;http://www.intertwingly.net/blog/1247.html&quot;&gt;Sam Ruby's spell checking feature&lt;/a&gt; could be made even funkier with the addition of a javascript powered &quot;corrections&quot; menu. I spent a few hours this afternoon playing with the idea, and I've now got quite a nice proof of concept:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://simon.incutio.com/demos/spellcheck/&quot;&gt;Spell Checker Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've tested it on Phoenix and &lt;acronym title=&quot;Internet Explorer&quot;&gt;IE&lt;/acronym&gt;5 on Windows - I'll check it on &lt;acronym title=&quot;Internet Explorer&quot;&gt;IE&lt;/acronym&gt;6 later on this evening. &lt;acronym title=&quot;Internet Explorer&quot;&gt;IE&lt;/acronym&gt;5 gets the menus in the wrong place but other than that it seems to work fine in both browsers. I adapted Sam's &lt;a href=&quot;http://www.intertwingly.net/code/spellcheck.py&quot; title=&quot;spellcheck.py&quot;&gt;Python code&lt;/a&gt; for &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; on the server side, while the client side bit is a whole lot of messing around with the &lt;acronym title=&quot;Document Object Model&quot;&gt;DOM&lt;/acronym&gt;.&lt;/p&gt;

&lt;p&gt;If you want to nose around the source code, take a look at this lot:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;&lt;a href=&quot;/demos/spellcheck/SpellChecker.class.php.txt&quot;&gt;SpellChecker.class.php&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;/demos/spellcheck/index.php.txt&quot;&gt;index.php&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href=&quot;/demos/spellcheck/speling.js&quot;&gt;speling.js&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are still a few bugs, and I haven't quite worked out an elegant way to get the menus to behave more like menus, but on the whole it's worked out pretty well.&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/03/18/phpAndJavascriptSpellChecker</link>
  <dc:subject>PHP, DHTML and Javascript, Content Management</dc:subject>
  <dc:date>2003-03-18T21:50:40-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/03/12/moreNukes">
  <title>More nukes</title>
  <description>&lt;p&gt;[PHP|Post|myPHP]-Nuke has to be one of the most-forked open source projects in history! &lt;a href=&quot;http://www.xaraya.org/&quot;&gt;Xaraya&lt;/a&gt; appears to be a fork from &lt;a href=&quot;http://www.postnuke.com/&quot;&gt;Post-Nuke&lt;/a&gt;, which itself forked from &lt;a href=&quot;http://php-nuke.org/&quot;&gt;PHP-Nuke&lt;/a&gt; several years ago (and I'm pretty sure there are more). They've got &lt;a href=&quot;http://docs.xaraya.com/docs/rfcs/rfcindex.html&quot; title=&quot;&quot;&gt;an interesting set of RFCs&lt;/a&gt; on how they intend to build the next big open source content / community management system (nothing about generating pretty URLs yet). While browsing their site I found a link to &lt;a href=&quot;http://phpxref.sourceforge.net/&quot;&gt;PHPXref&lt;/a&gt;, a powerful looking tool for generating &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; source code documentation. Unsurprisingly for such a lot of text munging, it's written in Perl ;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I obviously wasn't paying attention: &lt;a href=&quot;http://docs.xaraya.com/docs/rfcs/rfc0023.html&quot;&gt;RFC 0023: Short URL Support&lt;/a&gt;&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/03/12/moreNukes</link>
  <dc:subject>Open Source, Content Management</dc:subject>
  <dc:date>2003-03-12T23:55:54-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/03/08/spellCheckInWebApplications">
  <title>Spell check in web applications</title>
  <description>&lt;p&gt;Sam Ruby has &lt;a href=&quot;http://www.intertwingly.net/blog/1247.html&quot; title=&quot;Preview now with spellcheck&quot;&gt;enabled spell checking&lt;/a&gt; for the preview comment tool on his blog. I wonder how it works... I've lost track of the scripting language Sam uses for Intertwingly (&lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt;? Python? Perl?) but I know &lt;acronym title=&quot;PHP: Hypertext Preprocessor&quot;&gt;PHP&lt;/acronym&gt; can be compiled with support for the &lt;a href=&quot;http://www.php.net/manual/en/ref.pspell.php&quot;&gt;Pspell&lt;/a&gt; module.&lt;/p&gt;

&lt;p&gt;Sam's user interface is pretty neat - misspelled words are marked up with a span, underlined in dashed red and have suggested spellings listed in the span's title attribute. Theoretically, it should be possible to build a javascript right-click menu offering alternatives instead (preferably dynamically generated from the list of words in the title attribute using the &lt;acronym title=&quot;Document Object Model&quot;&gt;DOM&lt;/acronym&gt;). Actually modifying the preview textarea text based on the menu selection would be quite a lot harder - it could be done with a simple search-and-replace operation, but doing so might change other words with the same &quot;incorrect&quot; spelling without the user realising.&lt;/p&gt;

&lt;p&gt;It would be fun to integrate something like this with a rich text editor, such as the recently announced &lt;a href=&quot;http://www.interactivetools.com/staff/ben/htmlarea3_demo/example.html&quot;&gt;htmlArea 3.0&lt;/a&gt;  that works with Mozilla 1.3b as well as &lt;acronym title=&quot;Internet Explorer&quot;&gt;IE&lt;/acronym&gt; (more information &lt;a href=&quot;http://www.interactivetools.com/iforum/Open_Source_C3/htmlArea_v3.0_-_Alpha_Release_F14/htmlArea_3%3A_Alpha_release_P7101/&quot; title=&quot; htmlArea 3: Alpha release&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/03/08/spellCheckInWebApplications</link>
  <dc:subject>Mozilla, DHTML and Javascript, Content Management</dc:subject>
  <dc:date>2003-03-08T17:23:17-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/03/01/vectorSearchEngines">
  <title>Vector search engines</title>
  <description>&lt;p&gt;&lt;a href=&quot;http://www.perl.com/pub/a/2003/02/19/engine.html&quot;&gt;Building a Vector Space Search Engine in Perl&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite=&quot;http://www.perl.com/pub/a/2003/02/19/engine.html&quot;&gt;
&lt;p&gt;Vector-space search engines use the notion of a &lt;strong&gt;term space&lt;/strong&gt;, where each document is represented as a vector in a high-dimensional space. There are as many dimensions as there are unique words in the entire collection. Because a document's position in the term space is determined by the words it contains, documents with many words in common end up close together, while documents with few shared words end up far apart.&lt;/p&gt;

&lt;p&gt;To search our collection, we project a query into this term space and calculate the distance from the query vector to all the document vectors in turn. Those documents that are within a certain threshold distance get added to our result set. If all this sounds like gobbledygook to you, then don't worry - it will become clearer when we write the code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Having done a course on Linear Algebra last term, it's interesting to see how it can be applied to the search problem. The technique described lends itself well to finding &quot;similar documents&quot; as well, as documents with similar word content will end up &quot;near&quot; to each other when projected on to the vector space.&lt;/p&gt;

&lt;p&gt;The article is also yet another demonstration of how Perl's modules make it such a powerful tool.  Lingua::Stem is used to find word &quot;stems&quot;, providing a free algorithm for eliminating related words like cat and cats. The performance overhead of using Perl arrays to represent large vectors is avoided with the &lt;acronym title=&quot;Perl Data Language&quot;&gt;PDL&lt;/acronym&gt; module, which implements a whole set of matrix algebra functions in compiled C for high performance. Without these two modules the technique described would be a great deal less powerful. Of course, neither of them are available for &lt;acronym title=&quot;PHP: HyperText Preprocessor&quot;&gt;PHP&lt;/acronym&gt; or Python, my scripting languages of choice.&lt;/p&gt;
</description>
  <link>http://simon.incutio.com/archive/2003/03/01/vectorSearchEngines</link>
  <dc:subject>Content Management, Search Engines</dc:subject>
  <dc:date>2003-03-01T13:07:18-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>
<item rdf:about="http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker">
  <title>Safe HTML checker</title>
  <description>&lt;p&gt;I've finally enabled a subset of &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; in my comments. In doing so, I had several requirements that needed to be fulfilled:&lt;/p&gt;

&lt;ol&gt;
 &lt;li&gt;Entered markup must be valid to &lt;acronym title=&quot;eXtensible HyperText Markup Language&quot;&gt;XHTML&lt;/acronym&gt; strict, to stop comments form breaking validation and keep things nice and tidy.&lt;/li&gt;
 &lt;li&gt;No presentational markup! I want to maintain control over how things look via my stylesheets - comments posted should only be able to use structural &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; elements.&lt;/li&gt;
 &lt;li&gt;Attributes should be restricted to those that add semantic meaning. Javascript event attributes and &lt;acronym title=&quot;Cascading Style Sheets&quot;&gt;CSS&lt;/acronym&gt; related attributes should not be allowed.&lt;/li&gt;
 &lt;li&gt;I should retain full control over the tags and attributes allowed in the comments.&lt;/li&gt;
 &lt;li&gt;Submitted &lt;acronym title=&quot;HyperText Markup Language&quot;&gt;HTML&lt;/acronym&gt; must be kept free from anything that could pose a security risk, such as &lt;code&gt;javascript:&lt;/code&gt; &lt;acronym title=&quot;Uniform Resource Locators&quot;&gt;URL&lt;/acronym&gt;s.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system I have implemented works by running submitted posts through an &lt;acronym title=&quot;eXtensible Markup Language&quot;&gt;XML&lt;/acronym&gt; parser, which checks that each element is in my list of allowed elements, is nested correctly (you can't put a &lt;code&gt;blockquote&lt;/code&gt; inside a &lt;code&gt;p&lt;/code&gt; for example) and doesn't have any illegal attributes. My initial test have shown it to work pretty well, but if anyone wants to have a go at breaking it please, be my guest.&lt;/p&gt;

&lt;p&gt;The code for the main class is available here: &lt;a href=&quot;/code/php/SafeHtmlChecker.class.php.txt&quot;&gt;SafeHtmlChecker.class.php&lt;/a&gt;&lt;/p&gt;</description>
  <link>http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker</link>
  <dc:subject>XML, Content Management</dc:subject>
  <dc:date>2003-02-23T15:04:37-00:00</dc:date>
  <dc:creator>Simon Willison</dc:creator>
</item>

</rdf:RDF>