Search Engines
Microdoc News have a poorly researched story suggesting that Google have been engineering their search results to favour their own properties:
It could be argued that the most important site that s...
Tim Bray's series On Search now has a table of contents page linking to each of the previous entries. The most recent article covers metadata, and includes some insightful commentary on the huge probl...
Slate: Digging for Googleholes:
Type in the make and model of a new DVD player, and you'll get dozens of online electronic stores in the top results, all of them eager to sell you the item. But y...
This is pretty cool: Scott's taken Nat's time-since function and added it to Feedster, giving a quick indication of how long ago an item was posted....
I love it when bloggers stick to their word. The other day, while describing a quick Perl hack that really impressed a major client a few years ago, Tim Bray mentioned the following:
Then I turne...
Feedster finally supports AND as the default search operator. This is a very good thing. I've decided to leave this site's search engine as using OR, mainly because I feel for a small search set (appr...
While browsing around my phoenix/ directory I spotted a sub-directory called searchplugins, which appears to control the list of search engines available in the very useful search box at the top right...
I've finally got around to adding a search page to this site. It uses MySQL's full text indexing, which is extremely fast and provides good results but comes at the expense of flexibility. Search term...
100 random AltaVista pictures is fascinating, if not guaranteed work-safe....
In all the fuss about Yahoo's new search interface over the past few days, the extensive use of CSS in the results pages was almost completely overlooked, probably because the page still contains a sm...
Unsurprisingly, the new Yahoo is generating a whole load of commentary. There's a good thread going on Signals vs Noise, and ia/ has coverage as well. I've been playing with it a bit and it's definite...
New York Times: Yahoo Plans Improvements in Effort to Regain Lost Ground. I'm guessing this is what it's going to look like (via thelist)....
I'm finding myself slightly confused about the Google backlash washing around the blogosphere, which is summarised quite well by Gavin Sheridan. Most of the arguments against using Google unsurprising...
Scott Johnson has put together a blog search engine with a difference: it indexes RSS feeds rather than crawling the blogs themselves. Roogle is still under heavy development (and Scott is blogging it...
Building a Vector Space Search Engine in Perl:
Vector-space search engines use the notion of a term space, where each document is represented as a vector in a high-dimensional space. There are as...
Douglas Bowman provides some background to the new HotBot redesign, which uses CSS for layout and almost but doesn't quite validate. It was all looking great until the HotBot Skins page told me I shou...
I've suspected this for ages, but finally it can be categorically announced that search engines just don't care about the meta keywords attribute. The only major engine that still notices it is Inktom...
How the Wayback Machine Works is a must read for anyone geeky enough to be interested in cheap clustered databases on a huge scale. The interview includes some fascinating details on the cost effectiv...
Sam Buchanan: The Netscape Google mystery. A user complains of a non functional web appli ation, and when asked what browser they are using replies "Netscape Google". Sam suspects that this is because...
AlltheWeb.com introduced an innovative feature called Alchemist a while ago which allows visitors to customise the site by specifying the URL to their own style sheet. They have now announced a CSS de...
Christina Wodtke: Mind your phraseology!, a tutorial on controlled vocabularies. The concept is very similar to that used by TopicMaps - relationships are defined between terms that take in to account...
The ODP require you to display an attribution on any page that reuses ODP data. The recommended attribution fails to validate as XHTML, so I created an XHTML compliant alternative which looks visually...
Having looked at some of these tools for syndicating content from the ODP, it seems that the standard method is to grab and parse the actual HTML files from the site rather than grabbing the huge RDF ...
I've had my application for editorship of the DMOZ University of Bath Category accepted. Bath's main site has notoriously bad navigation, so hopefully I'll be able to use DMOZ to build an alternative....
The Register: Tiscali to launch Excite across Europe. From Excite.co.uk:We are back - and we have created the most complete search channel on the web for you. Take advantage of our collection of 2,100...