OPAC keyword cloud

This is crying out to be done like the visual word map in AquaBrowser, but here’s a browseable tag cloud based on data from nearly 2 million keyword searches on our OPAC.

shakespeare performance

The code looks for other keywords that were entered as part of the same search (e.g. “ethics of nursing care”) to draw out the most commonly used words. For example, the most common keyword used with “performance” is “management”. The size of the word in the cloud is determined by how often it appears with the search keyword.

nursing

I’ve not removed keywords that generated zero search results, so the cloud for “acrobat” includes “abode”. (I’ve now removed zero result searches)

I’ll have to have a play to see if there’s a way of incorporating the cloud into the OPAC — for example, if you used a vague/general keyword such as “health“, then maybe the OPAC could suggest more specific searches for “health care”, “mental health” or “health promotion”?

Advertisements
6 comments
  1. Would you be willing to share the code? I’d *love* to feed this a record of our patron searches (both catalog and website), and build off of existing code rather than start from scratch.

  2. Hi Josh

    It’s just a crappy prototype at the moment, but I’ll tidy up the code today and upload it to the blog.

  3. Here’s a cleaned up version of the code:

    http://www.daveyp.com/blog/stuff/keywordcloud/

    The Perl script is cloud.pl and it uses a list of stop words (i.e. words to ignore) from stopwords.txt.

    The list of keyword searches needs to have each search on a separate line. I’ve included a short sample file (newcache.txt) to give you the idea. You can speed up the code by removing any single keyword searches from the file (i.e. any entries where just a single word appears on a line by itself), as you’re only interested in searches where multiple keywords were used.

    The main chunk of ugliness in the original code was for working out the font sizes, so I’ve removed that and replaced it with the HTML::TagCloud module (which you’ll need to install).

    You can see the new code in action here.

  4. I’ve added the suggestions to our OPAC — they only appear if you’ve done a single keyword search…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: