Monthly Archives: November 2005

Jenny Levine has linked to an excellent article by Roy Tennant on the Library Journal web site:

What I Wish I Had Known

I love Roy’s statement that:

I wish I had known that the solution for needing to teach our users how to search our catalog was to create a system that didn’t need to be taught — and that we would spend years asking vendors for systems that solved our problems but did little to serve our users.

A few minutes later, I stumbled across Jennifer Matthews‘ blog – she’s a student of English and Comparative Literary Studies at the University of Warwick:

So I figure that the library is evil. And it hates me.
(The Library)


Do your 856 URLs show up in a big font size that doesn’t seem to quite fit in with the rest of the text on the full bib page?

The quickest way to fix it is to fire up the Horizon table editor, select marc_map, and then locate the marc_map that you use for your 856 URLs.

In the “HTML format (Info Portal only)” field, insert class="smallAnchor" before the href. For example, if your HTML format looks like this:

<a href="$_">{<img src="$9">|$y|$_}</a>

…then change it to:

<a class="smallAnchor" href="$_">{<img src="$9">|$y|$_}</a>

Save the change, and then restart JBoss and the 856 links should pick up the formatting of the “smallAnchor” element from your HIP cascading style sheets (CSS).

And, for the more adventurous – if you’d like to know which 856 links your users are clicking on, then you can set your marc_map up to redirect to a CGI script that logs the URL and then redirects the user’s web browser to the true 856 link.

Once you’ve got your CGI script ready (in this case, I’ve called it, you just need to change the 856 marc_map to link to the script – e.g.

<a href="$_">{<img src="$9">|$y|$_}</a>

Once you’ve saved that and restarted JBoss, your 856 URLs look like this in HIP:

Your CGI script just needs to take the contents of the QUERY_STRING environment variable (in the above, it’s, append it to your log, and then issue a redirect to that URL.

(disclaimer: all of the above was done with Horizon 7.32 UK and HIP 3.04 – your mileage may vary depending on which versions you’ve got!)

One of the things we’re trying to do this year at Huddersfield is to make better use of our data archives:

…as each student goes through a library turnstile, data is written away…
…as each student borrows a book, more data is quietly written away…
…as each student uses an electronic resource, data is written away…
…as each student logs onto a PC, yet another piece of data is…

…okay, enough already – you get the idea!

We’re not particularly interested in what an individual student has done, but we’d like to see the broader pictures. For example, we open the Library 24/7 at certain times of the year (e.g. Easter) – we’d like to know more about the kinds of students who come in late at night and leave early the next morning:

  • are certain ethnic groups more likely to use the Library outside of the standard opening hours?
  • do we get more male or female students using the Library in the wee small hours?
  • are students coming in to use the computers, to issue/return items, or to sit quietly in a corner and study?

The answers to those kinds of questions tend to be found in several databases. The Sentry database tells us when someone entered the Library, but it doesn’t tell us if they are male or female, Asian or Caucasian – that kind is information is stored in the Student Records System. Also, the Sentry database doesn’t tell us what the student actually did – Circ transactions are in Horizon and PC usage info is stored in other databases.

So, long term we’re looking at ways of trying to combine data from all of those sources into meaningful and enlightening stats.

“What has this got to do with showing borrowing suggestions in HIP?”, I hear you ask!

Well, once I’d had a hunt around in our circ_tran table in Horizon, it seemed like a great use of all that historical Circ data would be to do an Amazon-like “patrons who borrowed this book also borrowed…”.

Before I proceed with the “how to”, I’ve got a hunch that not everyone has got a circ_tran table – it might be something that SirsiDynix needs to set for you, rather than a default table that ships with Horizon (can anyone confirm this?)

The circ_tran table contains (amongst other things) two very useful bits of information – the borrower# and the item# of the item they borrowed. You can use the item# to look up the bib# of that item (using the item table).

Once you’ve got the borrower# and bib#, you can use that to create two lists of data:

  • a list of all the bib#s that a specific borrower has ever borrowed
  • a list of all the borrower#s who have borrowed a specific bib#

To build the list of borrowing suggestions, you start with a bib# and:

  • 1) build the list of all the borrower#s who have borrowed that bib#
  • 2) for each of those borrower#s, compile all the bib#s of all the items they’ve borrowed to a single big list of bib#s
  • 3) take that big list and count how many times each bib# appears in the list
  • 4) sort your list of individual bib#s by the count of how many times they appear in the big list

…those bib#s that appear the most times in the big list are therefore the most appropriate ones to suggest.

Unfortunately those 4 steps can take some serious CPU time, so it’s not possible to do it on the fly as each of your patrons brings up a full bib page in HIP. Therefore, you need to pre-process each of your bib#s to generate a list of other suggested bib#s.

I wrote a Perl script this evening (which I’ll make available soon) that slurps up the entire circ_tran table into your PCs memory and then processes each of the bib#s to create up to 10 other suggested bib#s. Each of those suggestions is then pumped into a MySQL database where it will sit until a patron views that bib#s page in HIP.

A single line of JavaScript added to the fullnonmarcbib.xsl stylesheet then pulls in dynamic content from a Perl CGI script. That CGI script simply fetches the list of suggested bib#s from the MySQL database, quickly runs them via the title table in Horizon, and then displays a random selection of them underneath the copy/holding info:

click to view larger image

The only real drawback is that it’s not working with your circ_tran data in real time – the list of 10 possible suggestions per bib# won’t change until I run the slurping Perl script again to rebuild all of the suggestions. On our database of 2,046,180 circ_tran entries, that took about 3 hours to process. So, in theory, you could schedule it to run once a week or once a month.

Wow! Fame and glory – hopefully the untold riches will be just around the corner! 😉

For anyone who wants to have a go with their Horizon/HIP, I’ve uploaded the script to here:

I’ve done a little bit of tweaking, and the final keyword list now looks like this.

You’ll need to download the Perl script and the sample config.txt file.

As with many of the other scripts I’ve uploaded, you’ll need a working ODBC connection to your Horizon database – if you’re running ReportSmith or EasyAsk, then you’ll know all about that. You’ll also need to have Perl installed, along with the DBD::ODBC module from CPAN.

The config.txt file has three columns:

  • the first column defines the range for each keyword count, and this works with the $threshold variable to select the font size & colour for each keyword in the HTML output
  • the second column defines the font size – you should be able to use any valid CSS value (e.g. 50%, 10px, or x-small, etc)
  • the final column defines the font colour – in the example file I’ve gone for a blue gradient (#006 thu #77D), but if you prefer a single colour then just change all the entries to that (e.g. #00F) – again, you should be able to use any valid CSS value (#123456, red, etc)

To run the script, just put it in the same directory as the config.txt file and run it (e.g. perl getsubjects.txt). The HTML output file should get created in the same directory.

There’s a few variables that you can tweak:

  • $minimumBibs – this is used in the intial SQL query on the subject table, so a lower value means more subjects will be included for processing, but the query might take longer to run and/or hit your Horizon server harder
  • $threshold – once all the subjects have processed, any whose total number of matching bibs fall below the threshold value will be exlcuded from the output – if you’d prefer a smaller list of keywords in the HTML output, then choose a higher value and vice versa
  • $spacing – this is a string of characters to insert between each keyword
  • $hipUrl – unless you really want to link to our HIP, then you’ll need to tweak this URL

Have fun with the script!

Jenny emailed me to ask if the script could work with other systems (e.g. Innovative), so I’m going to have a go writing a smaller version of the script that will take a list of keywords and counts, such as the example below, and then create the same output:

1237 American poetry
381 Java
857 World Wide Web

…so, as long as you can query your system to get something in the above format, then it should work.

[one quick sandwich and Coke later]

…and here is a more general version of the script that should work with other systems:

As well as the Perl script, you’ll need to download the config.txt file. Also, you’ll need to create your own subjects.txt file – I’ve included a sample one so you can get a rough idea of the layout.

As before, you can do a bit of tweaking with the variables and the config.txt file to customise the final HTML output.

Horizon users who don’t want to faff around with getting the first script to use ODBC can generate their own subjects.txt by copying the output of running the following SQL statement in SQL Advantage (or similar):

select n_bibs,processed from subject where n_bibs > 50

…however, you won’t get the advantage of the way the first script collapses sub-subjects together.

Inspired by Jenny Levine‘s mock up of an OPAC with keyword tags, I’ve gone a step further and used our Horizon database (the “subject” table in particular) to generate a real page based on subject keywords with more than 10 bibs:

I did a bit of tweaking so that sub-subjects (is that a real word?) are collapsed into the parent subject – if you hover your mouse pointer over one of the links, then you should get a better idea of what I mean. Once I’d got the totals for each parent subject, I excluded anything with less than 100 bibs.

Sadly the two “spare days” after the end of CODI 2005 have flown by and tomorrow morning we’re setting off back to the UK. By the way, just in case anyone wants to know what “the sun going down on CODI 2005” looked like, here it is/was:

sun set on Wednesday evening

We spent most of yesterday in St Paul (the sibling to Minneapolis in the title “The Twin Cities”). As some of the Horizon mailing list regulars will know, we were keen to visit the Catbus exhibit at the Minnesota Children’s Museum – and just to prove we did, here’s Bryony in the front seat:

driving the Catbus

…and here’s Totoro himself:


…and there’s more pictures here!

For lunch, we walked about a mile out of St Paul to Red’s Pizza Savoy. After the cosmopolitan Minneapolis, Red’s felt like a taste of true Americana.

We even made it back in time to go and give the Loring Park squirrel’s their tea: