A library dating service

In my UKSG presentation, I briefly touched on the need for library services (perhaps the OPAC, but perhaps not) to start joining users together in the same way that sites like Facebook do.

In the same way that a “people who borrowed this, also borrowed…” service starts exposing the hidden links between items on shelves, I think we need to start finding the connections between our users.

Using circulation data, we can start to locate clusters of users who’ve borrowed the same books. In an academic environment, these may be students who are studying on the same course. However, what if we discovered that two separate courses being run in different parts of the university had a strong overlap in borrowing? Would value be gained from introducing those students to each other?

No sooner had I tweeted that I was thinking about this kind of thing, Tony Hirst sent a response

…a library dating service, then? Heh heh 😉

I’m keen to know what your first reaction to Tony’s comment is!

What if you were a lonely researcher who wanted to find someone similar to yourself, in order to collaborate on a project? By mining the circulation data and/or OpenURL article access data, a library could find your ideal partner — someone who’d been looking at the same books and resources that you’d been using. If libraries were aggregating their usage data at a national level, that perfect partner could well be a researcher at another institution.

To test this out, I tweaked our “people who borrowed this” code to generate the links between users (rather than the books). As an aside, I’ve been trying all day to figure out what the user equivalent of “people who borrowed this, also borrowed…” is, but haven’t been able to wrap my head around the logical linguistics of it!

Data Protection obviously means that I can share that prototype with you, but it did throw up some interesting results. For my partner Bryony, her closest match was one of her colleagues who works in the same department as her — they both share similar craft related interests, so have borrowed similar books. However, what if her closest match was someone working in another department? Maybe they’d want to meet up over a coffee and swap crafty ideas.

I also tried the same for one of my colleagues, who’s a lecturer, and found that his ideal match is himself! Or rather, the closest match for his current library account (as a member of staff) was his old library account from when he was a student. In other words, since becoming a lecturer, he’s re-borrowed quite a few of the books he used as a student.

Although I can’t show you the data for individuals, we can step back a level and look at the borrowing at the course level. I’ve put together a quick and dirty prototype to play with. The prototype will pick a course at random and then show the courses that have the closest matches in terms of book borrowing — if you’re unlucky and get an empty list (i.e. no matches were found), try refreshing the page.

Taking the BSC Applied Criminology course as an example — 59.3% of the books borrowed by students on that course were also borrowed by students on the BSC Behavioural Sciences course (HB100). The other top matches all seem to be related to criminology: psychology, social work, police studies, child protection, probation work, etc. However, there also appears to be some synergy with books borrowed by midwifery, history and hospitality management students.

I’ll try and add some extra code in tomorrow to show what the most popular books are that inhabit those course intersections.

Advertisements
12 comments
  1. This is a great idea. I think it’d be even more powerful with the OpenURL data then the book borrowing data. Depending on discipline, some researchers don’t do a lot of book borrowing. But the OpenURL data is a LOT of data to track, and we don’t generally track it — and there’s a non-trivial de-dup problem, and might be little overlap because the domain of all articles is so large.

    Interesting that OpenURL link resolver, which is really a hacky solution to a lack of ‘web’ functionality — also potentially gives us a hook into article usage data in a service controlled by us, that we wouldn’t have otherwise. Making use of that data was actually part of rsinger’s original goal with Umlaut, although my Umlaut implementation never really followed up on that.

    Hmm, but you know what, Scopus has assigned keywords (don’t know if human or machine, but they’re not bad) for all the articles it tracks. Not sure if those keywords are revealed by the Scopus API or not though, I’ll have to remember to check when I’m back in the office. What if you could capture -keywords- on the articles requested from the link resolver, and use THAT to find similar users! THAT could be awesome!

    I suppose in any incarnation of this service, it really ought to be an ‘opt in’ though. If a user opts in, they are available for matching, and can see matches of other opt-ins.

    Overall, I think a user matching service is a great idea.

  2. And the first example I got was:

    BL420 — MA LEGAL STUDIES

    27.1% SC100 UG CHEMICAL SCIENCES FT/SW

    That certainly doesn’t seem expected to me. Was that a legal course focusing on the law of chemicals or something?

  3. As a postgrad, I used to try to find out from the Library who kept recalling the books I had out on loan (it seemed as if no sooner had I taken a book out that someone else wanted it) but the Library desk couldn’t help…

    I resorted to putting pieces of paper in the books along the lines of: “Hi, I’m currently doing a PhD on blah, and I couldn’t help noticing you’re after this book too. Fancy a chat sometime?” and sometimes scribbled contact details in the inside cover of the book in pencil: “people who borrowed this book – want to chat about it?”, giving my email address…

    Chatting to Library folk over the years, I think I understand why they don’t want to share who’s borrowing books, and maybe why they don’t even want to know. They (you) lay themselves open to becoming thought police…

    Btw, could you run a search for me on who’s borrowing fundamentalist politic, civil unrest and direct action books…? Ta.

    (Btw 2, I’m with you on thinking the “topic buddy” service is a good one, just relaying one of the arguments I’ve heard against it… Maybe having an opt-in pathway, and giving users control via a personal a/c page would be one way round to accommodate Data Protection and privacy concerns?)

  4. Having taken the dog for a quick walk to mull this over further (!), here are a few more thoughts:

    1) aggregated data is nice – finding courses that overlap in terms of book loans could be fed back to course leaders;

    2) aggregated loans info gives you a hook into relevance marketing (maybe?!), e.g. on Amazon, or on OER sites that reference books in your loans profiles. Amazon “people who bough” data maybe gives you insight to precursor books? (Hmmm, or does Amazon ‘people who bought’ only suggest boooks that were bought AFTER the book you’re looking at);

    3) loans lists are not just lists – they are ORDERED lists. So you can maybe suggest “people who borrowed this usually borrowed that before it”. This is fraught with issues of course; e.g. the “that” book might be crap (and so needs removing from reading lists?).

    4) do you offer course leaders ‘library analytics” back, e.g. reports showing how often (and when) books on course recommended reading lists are borrowed (again, just the aggregated data would be fine here).

  5. Thanks Jonathan and Tony — some really good comments and I suspect I won’t be spending the weekend sipping piña coladas in the garden… there’s some coding to be done!!!

    Re: Legal Studies/Chemical Sciences — some of the courses might have low borrowing figures (maybe only run for a single year with a small number of students), so could give higher percentage overlaps. If you click thru to the SC100 course, you’ll spot that BL420 legal course doesn’t appear. I’m only showing the top 50 matches, so the overlap must be lower than 3.1%. I’ll get the code tweaked today to show what the book titles that were common to both groups were. Although the number of books in common will be the same, the percentages are relative to the total number of books borrowed by the students on that course.

    Tony — borrowing after & before is something I’ve been pondering off and on for a while now. Back in 2006, I was thinking along the lines of lending paths: by analysing the data, you can make an educated guess as what the best book to borrow afterwards (or before) would be. If you extrapolate that, you get a path of books. The idea that you could hold a book in your hand and then predict the future kinda appealed to me!!! At the moment, we’re using that data to make a personalised “we think you might be interested in…” recommendation when someone logs into the OPAC — by looking at the last few books they borrowed, what were the books that are most usually borrowed afterwards.

    Another possible use is for if you picked up a book on a topic and, after flicking through it, decided it looked a little too advanced. Can we combine the usage data with something like Dewey or LCSH to give a precursor recommendation? For example, you’ve picked up “Advanced Java” so the code might recommend you start with “Java for Dummies”.

    Data Protection is definitely an issue here, so I’d definitely see this an “opt-in”. Wouldn’t it be great if there was a link between a user’s library account ID and their Facebook ID? As well as being able to tap into their Facebook network of friends, the library might be able to begin recommending new friends.

    In terms of tracking civil unrest, direct action books, etc, maybe we need to approach the problem from a US perspective. In order for a US library to begin tracking circulation history, they would need a way of anonymising the borrower whilst still assigning an ID (so that future transactions by that borrower can be linked together). Things like salted MD5 hashes spring to mind. Of course, there’s a whole other issue as to why someone would borrow those books — am I planning a terrorist outrage or (more likely) am I conducting research into the topic?

    We do have “in house” library analytics — it’s a more granular version of the public stats that we include in the OPAC (e.g. Sociology by Giddens (5th ed)) that includes loans per course. It had never occurred to me before that the course leaders would find that data useful (d’oh!)

  6. Okay, I’ve just added some code to show the overlapping titles.

    If we take the BSC Applied Criminology as the starting point again, you can now see that students on that course have borrowed 2,747 titles. The largest overlap is with the BSC Behavioural Sciences course (HB100) course, whose students borrowed 1,629 of those 2,747 books. If you click on the 1629 link, you can see the common books between both courses.

    I can’t decide what the best method of ranking the common books should be, so I’ve gone with an ordering where similar levels of borrowing rank higher. The two figures in brackets show the number of times the book was borrowed on each course.

    Where the borrowing figure was low on either course, the book won’t get shown. Therefore, you might come across occasional “sorry – not enough overlap” messages. That means there was borrowing of common titles, but not by enough students to make it significant.

    Not surprisingly, some of the common books aren’t subject specific (e.g. books on how to write a dissertation or carry out research). Also, you’ll probably come across items of equipment that the library lends out (dictaphones, digital cameras, etc).

    Looking through the overlapping titles, it seems to be throwing out some useful data. For example, here’s the overlap between MA Victorian Studies and BSC Criminology…

    1) Prostitution and victorian society: women, class and the state
    2) Prostitution: prevention and reform in England, 1860-1914
    3) The reform of prisoners 1830-1900
    4) Reconstructing the criminal: culture, law, and policy in England, 1830-1914

    …you could imagine that would be times on each course where it might be beneficial to bring together both sets of students.

  7. Great stuff! In terms of data protection and the Facebook idea, if we are going to go beyond aggregated data is to offer the borrowers the ability to publish their own borrower history. I guess this is a bit like way you can allow Facebook apps access to vary parts of your profile.

    I suspect that making borrowing history portable is something we ought to do anyway – it is their data after all.

  8. “did anyone ever respond to your notes?”

    i don’t think so… hmmm 😉

    Though I did track down one person who was borrowning the same books (not sure I remmeber how, tho…?)

  9. One thing that jumped out at me when I wacthed Ross’ video was that it would be useful to track which books a user clicks on after doing a keyword search.

  10. Agree – also see COPAC beta where if you add a book to your ‘My References’ it automatically gets tagged with your search terms. These tags don’t seem to feedback into the general search (in the beta at the moment) – but what a great idea – if I used those terms to find that book, surely someone else would as well? Also gives an extra dimension for relevance ranking?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: