Archive

HotStuff Project

After killing off Hot Stuff due to a server upgrade, I find that I’m kinda missing it!

So, I’ve decided to have a second stab at the problem and this time the code is much cleaner and faster. In particular, I’m using Bloglines to handle fetching all of the feeds and then grabbing the new posts via the Bloglines API.

It’s too early for the code to start spotting new keywords and topics yet, so it’ll be early in the new year before it launches fully. In the meantime, feel free to check that your favourite library/librarian blogs are included in the list of sites I’m pulling content from: http://www.bloglines.com/public/liblogs.

Please post a comment with the URL of any blogs you’d like including!

I’m hoping the make the new code a little more visual, so expect to see things like these…

final6_50_1 final_015

[edit] HotStuff 2.0 is gradually appearing here: http://www.daveyp.com/hotstuff/

As one final “hurrah” from the Hot Stuff service, I thought it would be fun to put all of the data into Wordle. Every day, for the last 2 years or so, my code has saved away the top 100 words from all of the new blog posts from around 500 librarian blogs…

hot001
http://wordle.net/gallery/wrdl/109981/biblioblogosphere

…so, from all of this painstaking research we can clearly see that librarian bloggers love to talk about books! 😉

Apologies for the spam words that are currently appearing in the hot topics cloud at the moment.

It looks like the BlogJunction blog has been hacked — if you view the page source for the blog, you’ll find multiple hidden links to gambling sites (the links are currently being hosted by Universitat Oberta de Catalunya UOC).

I’ve removed BlogJunction from the list of sites used for the cloud, so the spam should disappear in the next 48 hours.

Apologies to anyone who’s picking up the “hot stuff” tag cloud feed of library/librarian weblogs — unfortunately one of the blog feeds that it aggregates has gone down with a nasty case of spam…

Rest assured that the Monty Python Vikings are currently rampaging their way through the blog feed database and masticating all of that lovely spam…

“Spam spam spam spam. Lovely spam! Wonderful spam! Spam spa-a-a-a-a-am spam spa-a-a-a-a-am spam. Lovely spam! Lovely spam! Lovely spam! Lovely spam! Lovely spam! Spam spam spam spam!”

This is a variation of the previous cloud which attempts to show which words have been used more frequently in the last couple of days compared to previous days.

I’ve added a lot of the more common words to the stop word list (e.g. “librar*” and “google”) to try and allow some of the less frequently used words to gain importance.

blogcloud2

So, why is Mozart back in vogue? Several bloggers have recently posted about NMA Online (inc. Peter Scott’s Library Blog & ResourceShelf).

If a word is used several times in a post (e.g. “segala” and “liszen”) then that can make the word appear “hotter” than it perhaps should be, and some posts are appearing more than once (e.g. those from ResourceShelf) — I’ll try and fix that.

You can click on any of the words in the cloud to see links to relevant blog posts.

http://161.112.232.18/cgi-bin/sl/cloud2.pl

I’ll continue to tweak the code, so it might change over the next few days…