HIPpie — how to build a dictionary

Many thanks to those of you who’ve tested the code from yesterday! Those of you outside of the UK might want to see if this version works slightly faster for you:

hippie_spellcheck_v0.02.txt

The next thing I’ll be looking at is how to optimise the spellchecker dictionary for each library. Some of you will already have read this in the email I sent out this morning or in the comment I left previously, but I’m thinking of attacking it this way:

1) Start off with a standard word list (e.g. the 1000 most commonly used English words) to create the spellcheck dictionary for your library, as the vast majority should match something on your catalogue.

2) Add some extra code to your HIP so that all successful keyword searches get logged. Those keywords can then be added to your dictionary.

It could even be that starting with an empty dictionary might prove to be more effective (i.e. don’t bother with step 1) — just let the “network effect” of your users searching your OPAC generate the dictionary from scratch (how “2.0” is that?!)

To avoid any privacy issues, the code for capturing the successful keywords could be hosted locally on your own web server (I should be able to knock up suitable Perl and PHP scripts for you to use). Then, periodically, you’d upload your keyword list to HIPpie so that it can add the words to your spellchecker dictionary.

What about if you don’t have SirsiDynix HIP? Well, as mentioned previously, the spellchecker has been implemented as a web service (more info here), and the HIP spellchecker makes use of that web service to get a suggestion. At the moment it only returns text or XML, but I’m planning to add JSON as an option soon. Also, if you have a look at the HIP stylesheet changes, you can see the general flow of the code:

1) insert a div with an id of “hippie_spellchecker” into the HTML

2) make a call to “http://library.hud.ac.uk/hippie_perl/spellchecker2.pl” with your library ID (currently “demo”) and the search term(s) as the parameters

3) the call to “spellchecker2.pl” returns JavaScript to update the div from step 1

4) clicking on the spelling suggestion triggers the “hippie_search” JavaScript function which is responsible for creating a search URL suitable for the OPAC (which might include things like a session ID or an index to search)

None of the above 4 steps are specifically tied to the SirsiDynix HIP and should be transferable to other OPACs. I’ve put together a small sample HTML page that does nothing apart from pull in a suggestion using those 4 steps:

example001.html

If you do want to have a go with your own OPAC, please let me know — at some point I’ll need people to register their libraries so that each can have their own dictionary, and I might start limiting the number of requests that any single IP address can make using the “demo” account. Also, it would be good to build up a collection of working implementations for different OPACs.

Advertisements
11 comments
  1. CH said:

    Is there any library that uses your scripts with PICA catalogues? I wonder if and how it works.

  2. Hi Chip — that’s really cool! As soon as the scripts are ready for building custom dictionaries, I’ll let you know.

    I’m not aware of any PICA catalogues using the script yet, but please feel free to try and figure out a way of making it work 🙂

  3. Dave,

    This is very, very nice! I’ve added it to our development box and, with Admin’s blessing, plan to have it available to our patrons. You are a gem.

  4. Thanks Chip — I’ve never heard of Jaunter before! It looks like they’re storing details of all successful searches.

  5. lare said:

    dave, i’ve got this running on our test server calling your perl webservice, and it looks fine, but i was interested in implementing it with our particular set of keywords … any update on this post in terms of local dictionary use, or local implementation for those of us with in-house resources? thanks! — lare mischo

  6. John said:

    Hi Dave,
    The link to hippie_spellcheck_v0.02.txt is broken – is this code still available?
    Thanks
    John

  7. Dave Pattern said:

    Hi John

    Thanks for the “heads up”! Just moved the site to a new server, so it looks like I forgot to copy across the “hippie” stuff. I’ll see if I can find it again.

  8. Judy Fuss said:

    Dave –

    I was so pleased to find your HIPpie spellcheck after a frustrating year working with Jaunter trying to implement their spellcheck piece. I hope you are able to get this back up & running. I would also be interested in more information about putting in a local dictionary for this. I was not successful in following the blog posts on this from last year.

    Thanks.
    Judy

  9. David Wongwai said:

    Dave,

    We are looking at implementing your spell check in our Horizon HIP (roundrocklibrary.org). Will I be able to get what I need of the blog posts? I haven’t read through all of them yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: