"whatcha talking 'bout?" version 4!

Ok, this is pretty much the final definitive version of analysing the latest hot topics from library/librarian blogs…


The page is updated approximately every 15 minutes and uses the following methodology…

1) posts older than 48 hours are analysed and the frequency by which every unique word appears is calculated

2) posts from the last 48 hours are analysed in the same way and the word frequency is compared to the older posts

3) when a word has become noticeably more frequently used in the last 48 hours, it’ll appear in the word cloud — the bigger the increase in frequency, the larger the size in the cloud

4) if a word appears in multiple new blog posts then the shading is darker

5) if a word only appears lots of times, but only in a small number of new posts, then the shading will be lighter

So, in the last 24 hours, the usage of “2007” has increased substantially.  In new posts the word has a frequency of 28%, but only 7% in older posts.

Several bloggers have picked up on the sale of ProQuest — if other bloggers talk about it today, then it will grow in size and be shaded darker.

The usage of the word “disallow” has also increased, but it only appears in a single blog post (by the Baby Boomer Librarian) and is therefore shaded lightly.

Unlike the previous versions, this one doesn’t require a stop word list — words like “library” and “the” tend to have a high frequency of usage in both old and new posts, so the relative difference in usage is usually too small for the words to appear in the cloud.

The other cool thing is that this version uses the “network effect” — the more posts it has to work with, the better the cloud becomes as delivering the latest hot topics.  For example, Stephen Abram‘s RSS feed is currently delivering posts from the last 3 days and he usually ends them with “Stephen”, so he’s currently making a strong appearance in the cloud.  However, over time, the number of older archived posts with the word will increase which means he’ll no longer (relatively speaking) be a hot topic in the cloud …although not in real life, of course!

  1. I’ve spent a couple of hours sifting though dozens of blogrolls and the code is now picking up feeds from around 450 library related blogs.

    Given the increase in blogs, I’ve changed the page so that it updates every 30 minutes.

  2. I spotted this morning that the aggregator was choking on some of the atom feeds, so I’ve fixed that. It does mean that there’s been a huge influx of new posts and it’ll take a few days for some of the more common words (like “it’s” and “new”) to drop in frequency.

  3. Whilst we wait for the archive of older posts to mature, I’ve tweaked the cloud to show words that rarely appear in the archived posts in green — these are potentially new buzzwords and new topics of conversation.

    Also, given the large number of feeds I’m pulling in, I’ve dropped to only updating the cloud every hour. This is partly because the physical server the code is running on is already overdue for retirement, but I’m in the middle of prep’ing a replacement.

  4. What a great tool. It cuts right to the main trends and topics and picks up on the terminology, too. Thank you!

  5. Steve — many thanks! If I’ve coded it right, then it should get better at picking out the trends and topics over time.

  6. Hi Christine! Many thanks for that — I’ve updated the blog details, so it should pick up your new feed today 🙂

  7. A little late to notice it, but I just noticed: Walt at Random isn’t in your master list. I may not be a librarian, but I’ll assert that W.a.R. is part of “libraryland.” Your call, of course.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: