Archive

Usage Data

Just a little follow on from the previous blog post

Spurred on by comments from Lisa, I’m exploring if we can filter the recommendations so that they become more relevant to students in a specific academic school, or even to students on a specific course, and the initial results look fairly promising :-)

Let’s look at a couple of examples:

International Journal of Sociology and Social Policy (ISSN 0144-333X)

Here are the recommendations based on usage by all users. A quick browse through the items shows a range of subject areas — social exclusion, economics, human resources, etc, and a student would need to sift through to spot the items relevant to their subject area.

Now let’s filter the recommendations so that they’re only based on usage by students in a specific academic school:

…hopefully you can spot that the recommendations suddenly jump to becoming much more relevant to courses in that particular school.

Managerial Auditing Journal (ISSN 0268-6902)

Let’s drill a bit deeper this time and look at courses in the Business school:

Without knowing how our course codes are created, you can probably guess that courses starting with “BA…” are mostly accountancy & finance, and that those starting with “BM…” are to do with leadership and management.

We’ve had serendipity suggestions on the OPAC for nearly 7 years now, but they’ve been based entirely around the physical collection in the library.

After Friday’s Skype chat to the SPLURGE Hackfest, I got to thinking about how we can hook the e-stuff into the recommendations, so I’ve spent the weekend gathering data from our library management system, our link resolver and our EZProxy logs to see what happens if they all go into the same melting pot.

It’s a very rough & ready “crappy prototype”, but you can have a play around with it here. If you get an empty page, click on the “pick random item” link until something interesting happens.

At the moment, the recommendations are being built from a database of just over 5 million events (approx 70% of those are item loans and the rest are accesses of online journals). If you take the “Midwifery” journal as a starting point, you’ll get a list of the other books and journals that people have looked at. The algorithm behind it is the same one I’ve discussed previously.

If you hover over a title, you’ll see the usage info breakdown, e.g. “42 / 56″ means that 56 different users in total have looked at the recommended item, and 42 of those also looked at the item we’re generating the recommendations for.

I’ve not done any de-duping, so you might get the same journal title being repeated (once for the print ISSN and once for the e-ISSN), and I’ve not included any ebook usage data yet. I’ve also avoided merging the two lists together until I can figure out a suitable way of weighting book loans against online journal usage.

Picking random items, it’s apparent that some courses lean more towards book borrowing (i.e. very few journal recommendations), whilst stundents studying other subjects are heavy online journal users (i.e. very few book recommendations).

So, what do you think — is it useful to be able to show more than just book recommendations to students?

Nigel’s comment on the “5 years of book loans and grades” post reminded me that I did do a breakdown by discipline of the same data.

One of the caveats with this is that it represents nearly a decade’s worth of usage and, during that time, the seven academic schools at Huddersfield have changed — e.g. some courses and subjects have moved from one school to another.

Music, Humanities and Media

In terms of books, the students in this school rack up highest average number of loans.

Business

Business students’ borrowing is much lower, but considerably more stable across the 5 years of graduates.

Computing and Engineering

I guess there are no surprises here — when I did my HND in Computing at Huddersfield in the 1990s, I only visited the library once :-D

Education and Professional Development

There’s something interesting going on here — the borrowing levels for firsts and thirds is very similar, with 2:1 and 2:2 being lower. Very curious!

Human & Health Sciences

Applied Sciences

From memory, Applied Sciences make much higher usage of journals than books.

Art, Design and Architecture

The art stock is much more likely to be used within the library, rather than loaned.

I’m journeying down to Llandrindod Wells tomorrow to give a presentation about usage data to the Welsh Libraries, Archives and Museums Conference (hashtag #cilipw11). I’ve been promised that there’ll be real ale there :-)

You can grab a draft copy of my presentation (“If you want to get laid, go to college…”) from here (15MB).

The main web links in the presentation are:

- JISC Library Impact Data Project
JISC Activity Data Programme (including a list of the projects)
Rufus Pollock (Open Data and Componentization, XTech 2007)
Paul Walk (“The coolest thing to do with your data will be thought of by someone else”)
University of Huddersfield – Open Data Release (from Dec 2008)

I’m just starting to pull our data out for the JISC Library Impact Data Project and I thought it might be interesting to look at 5 years of grades and book loans. Unfortunately, our e-resource usage data and our library visits data only goes back as far as 2005, but our book loan data goes back to the mid 1990s, so we can look at a full 3 years of loans for each graduating students.

The following graph shows the average number of books borrowed by undergrad students who graduated with an specific honour (1, 2:1, 2:2 or 3) in that particular academic year…

books

…and, to try and tease out any trends, here’s a line graph version….

books2

Just a couple of general comments:

  • the usage & grade correlation (see original blog post) for books seems to be fairly consistent over the last 5 years, although there is a widening in the usage by the lowest & highest grades
  • the usage by 2:2 and 3 students seems to be in gradual decline, whilst usage by those who gain the highest grade (1) seems to on the increase

At a recent event in Edinburgh, I was asked about how we generate the “people who borrowed this, also borrowed…” suggestions in our OPAC and whether or not there are privacy issues with generating them.

Last week, I popped over to Manchester for a meeting of the JISC funded SALT (Surfacing the Academic Long Tail), which is one of the recently funded Activity Data projects. Part of the discussion at the meeting was around how to generate recommendations for items that haven’t circulated many times.

At both events, I promised to put together a blog post detailing the method we use, so here it is!

To generate recommendations for book A, we find every person who’s borrowed that book. Just to simply things, let’s say only 4 people have borrowed that book. We then find every book that those 4 people have borrowed. As a Venn diagram, where each set represents the books borrowed by that person, it’d look like…

To generate useful and relevant recommendations (and also to help protect privacy), we set a threshold and ignore anything below that. So, if we decide to set the threshold at 3 or more, we can ignore anything in the red and orange segments, and just concentrate on the yellow and green intersections…

There’ll always be at least one book in the green intersection — the book we’re generating the recommendations for, so we can ignore that.

If we sort the books that appear in those intersections by how many borrowers they have in common (in descending order), we should get a useful list of recommendations. For example, if we do this for “Social determinants of health (ISBN 9780198565895), we get the following titles (the figures in square brackets is the number of people who borrowed both books and the total number of loans for the suggested book)…

  1. Health promotion: foundations for practice [43 / 1312]
  2. The helping relationship: process and skills [41 / 248]
  3. Skilled interpersonal communication: research, theory and practice [31 / 438]
  4. Public health and health promotion: developing practice [29 / 317]
  5. The sociology of health and illness [29 / 188]
  6. Promoting health: a practical guide [28 / 704]
  7. Sociology: themes and perspectives [28 / 612]
  8. Understanding social problems: issues in social policy [28 / 300]
  9. Psychology: the science of mind and behaviour [27 / 364]
  10. Health policy for health care professionals [25 / 375]

When we trialled generating suggestions this way, we found a couple of issues:

  • more often than not, the suggested books tend to be ones that are popular and circulate well already — is there a danger that this creates a closed loop, where more relevant but less popular don’t get recommended?
  • the suggested books are often more general — e.g. the suggestions for a book on MySQL might be ones that cover databases in general, rather than specifically just MySQL

To try and address those concerns, we tweaked the sorting to take into account the total number of times the suggested book has been borrowed. So, if 10 people have borrowed book A and book B, and book B has only been borrowed by 12 people in total, we could imply that there’s a strong link between both books.

If we divide the number of common borrowers (10) with the total number of people who’ve borrowed the suggested book (12), we’ll end up with a figure between 0 and 1 that we can use to sort the titles. Here’s a list that uses 15 and above as the threshold…

…and if we used a lower threshold of 5, we’d get…

  1. Status syndrome : how your social standing directly affects your health [15 / 33]
  2. What is the real cost of more patient choice? [5 / 12]
  3. Interpersonal helping skills [5 / 12]
  4. Coaching and mentoring in higher education : a learning-centred approach [6 / 15]
  5. Understanding social policy [5 / 13]
  6. Managing and leading in inter-agency settings [11 / 29]
  7. Read, reflect, write : the elements of flexible reading, fluent writing, independent learning [5 / 14]
  8. Community psychology : in pursuit of liberation and well-being [6 / 20]
  9. Communication skills for health and social care [9 / 32]
  10. How effective have National Healthy School Standards and the National Healthy School programme been, in contributing to improvements in children’s health? [5 / 18]

If you think of the 3 sets of suggestions in terms of the Long Tail, the first set favours popular items that will mostly appear in the green (“head”) section, the second will be further along the tail, and the third, even further along.

As we move along the tail, we begin to favour books that haven’t been borrowed as often and we also begin to see a few more eclectic suggestions appearing (e.g. the “How effective have National Healthy School Standards…” literature based study).

One final factor that we include in our OPAC suggestions is whether or not the suggested book belongs to the same stock collection in the library — if it does, then the book gets a slight boost.

I’m chuffed to bits that the Library Impact Data bid that Huddersfield submitted, along with 7 project partner institutions, was one of the successful ones in the JISC Activity Data Programme and the project will kick off on Tuesday this week!

… the aim of this project is to prove a statistically significant correlation between library usage and student attainment. The project will collect anonymised data from University of Bradford, De Montfort University, University of Exeter, University of Lincoln, Liverpool John Moores University, University of Salford, Teesside University as well as Huddersfield. By identifying subject areas or courses which exhibit low usage of library resources, service improvements can be targeted. Those subject areas or courses which exhibit high usage of library resources can be used as models of good practice.

If you’re interested, keep an eye on the project blog: http://library.hud.ac.uk/blogs/projects/lidp/

Following on from the last blog post, I’ve done some coding to see how well (or not!) a course level new journal article feed might work.

The process behind the code is…

  1. for a given course, identify the most frequently accessed journal titles
  2. use JournalTOCs to fetch the latest articles from each journal’s RSS feed
  3. for each course, create a list of articles (sorted in descending date)

…and you can see the initial output from the code here: http://www.daveyp.com/blog/stuff/journals/

For some courses (e.g. Educational Administration) it looks like usage is focused on a single journal, but most seem to bring in content from multiple titles — for example, BSc Criminology is bringing in content from:

One of the opportunities here is to use the journal usage data to identify potentially relavant journals that aren’t being used on a course and include those in the feed. In the above example, the Journal of criminal justice might be such a journal.

A mega quick blog post before the afternoon session kicks off!

Lynn Connaway‘s talk mentioned that they’d found that students wanted the library/librarian to provide a filtered feed of relevant stuff, so here’s our idea…

1) capture OpenURL usage data along with user data (so you know who’s looking at which journals)

2) identify the most popular journals for individual courses

3) for each course, use TicTOCs/JournalTOCs to provide an aggregated feed of new articles for those journal

Whilst chatting to one of the delegates at yesterday’s “Gaining business intelligence from user activity data” event (my Powerpoint slides can be grabbed from here) about non & low-usage of library services/resources, I began wondering how that relates to final grades.

In the previous blog post, we’ve seen that there appears to be evidence of a correlation between usage and grades, but that doesn’t really give an indication into how many students are non/low users. For example, if we happened to know that 25% of all students never borrow anything from the library, does that mean that 25% of students who gain the highest grades don’t borrow a book?

Let’s churn the data again :-)

In the following 3 graphs, we’re looking at:

  • X axis: bands of usage (zero usage, then incremental bands of 20, then everything over 180 uses)
  • Y axis: as a percentage, what proportion of the students who achieved a particular grade are in each band

You can click on the graphs to view a full-sized version.

One of the things to look for is which grade peaks in each band of usage.

Borrowing

The usage bands represent the number of items borrowed from the library during the final 3 years of study…

horizon
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything from the library

In terms on non-usage (i.e. never borrowing an item), there’s a marked difference between those who get the two highest grades (1 and 2:1) and those who get the lowest honours grade (3). It seems that those who get a third-class honour are twice as likely to be non-users than those who get a first-class or 2:1 degree.

E-Resource Usage

The usage bands represent the number of times the student logged into MetaLib (or AthensDA) during the final 3 years of study…

metalib
caveat: this is a relatively crude measure of e-resource usage, as it doesn’t measure what the student accessed or how long they accessed each e-resource

Even at a quick glance, we can see that this graphs tells a different story to the previous one — the numbers of non-users is lower, but there’s a huge (worrying?) amount of low usage (the “1-20″ band). I can only speculate on that:

  • did students try logging in but found the e-resources too difficult to use?
  • how much of an impact do the barriers to off-campus access (e.g. having to know when & how to authenticate using Athens or Shibboleth) have on repeat usage?
  • are students finding the materials they need for their studies outside of the subscription materials?

As I mentioned previously, Summon is a different kettle of fish to MetaLib, so it’s unlikely we’ll be able to capture comparative usage data — if you’ve tried using Summon, you’ll know that you don’t need to log in to use it (authentication only kicks in when you try to access the full-text). However, we’re confident that Summon’s ease-of-use and the work we’ve done to improve off-campus access will result in a dramatic increase in e-resource usage.

As before, we see it’s those students who graduate with a third-class honour who are the most likely to be non or low-users of e-resources.

Visits to the Library

The usage bands represent the number of visits to the library during the final 3 years of study…

sentry
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything the the library

Again, the graph shows that those who gain a third-class degree are twice as likely to never visit the library than those who gain a first-class or 2:1.

Follow

Get every new post delivered to your Inbox.