Archive

Summon

After yesterday’s blog post, I thought I’d have a go at narrowing down my definition of a “separate search”.

If a user enters some search terms, and then uses 2 facets to refine the search before clicking on a result, I was classing that as 3 separate searches — what niggled me overnight was that that approach might inflate the facet use statistics …after all, 30.6% of all searches used at least one facet felt a little high given that I’m forever hearing staff moan that students never use the facets, no?!

For today’s blog post, I’ve removed all searches that didn’t lead to a result click. (There’s a small caveat that my jQuery code currently doesn’t capture a result click for links to the OPAC where the user clicked on the availability message (highlighted in red below) — this is because my jQuery code that captures the result clicks runs once the page has loaded, but before the AJAX’d availability information has been retrieved. When I get some time, I’ll see if I can find a way around that.)

So, let’s see how much of a difference that makes to yesterday’s stats

  • 29.4% of searches used at least one facet to refine the results
  • 10.4% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
  • 7.8% of searches were refined to just items with full text available online
  • 9.4% of searches were refined by publication date
  • 5.6% of searches were refined to just articles from scholarly publications (including peer-review)
  • 3.7% of searches were refined using the language facets
  • 2.5% of searches were refined using the subject term facets
  • 2.1% of searches used a Boolean operator, with nearly all of them being AND

So, that overall figure for the % of searches which used at least one facet hasn’t dropped by much from yesterday’s figure of 30.6%.

Anyone who follows me on Twitter will know that I like to cheekily mock the importance of Boolean and the data from the last 7 days reveals a few things:

  1. no-one who used a Boolean NOT in their search clicked on a result
  2. only 0.07% of searches (that’s just 7 searches in every 10,000!) used a Boolean OR, which is arguably the most useful operator to use
  3. unless you’re using a search that includes one of the other Boolean operators, the use of AND is pretty much redundant as it’s the default Boolean operator in a search (i.e. the search “dogs AND cats” is the same as “dogs cats”)… so why are we telling students to use it in Summon?

After poking a bit of fun at someone for entering a 356 word search query yesterday, I can reveal that the longest search in the last 7 days that resulted in a result click was 98 words (it was a paragraph copied and pasted from a journal article).

I guess the big question here is why the disconnect between the “students don’t use facets” mantra and the actual usage data?

Finally, I thought I’d figure out how many results are clicked on after a search…

summonclickspersearch

[ update: slightly revised stats are available here! ]

We’ve just started collecting in-depth data about how students are searching Summon (keywords entered, facets selected, etc) and I thought some of you might be interested in an early analysis from the last 7 days (just under 40,000 separate searches by 2,807 students)…

  • On average, students used 4.5 keywords per search (the mode is 3 keywords and the majority of searches used 3 keywords or less — view graph) [1]
  • 30.6% of searches used at least one facet to refine the results [2]
  • 11.7% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
  • 9.5% of searches were refined to just items with full text available online
  • 9.2% of searches were refined by publication date [3]
  • 7.2% of searches were refined to just articles from scholarly publications (including peer-review)
  • 3.4% of searches were refined using the language facets [4]
  • 2.6% of searches were refined using the subject term factes
  • 2.3% of searches used a Boolean operator, with AND being by far the most common (2.23% of searches) [5]

notes:

[1] – One student copied & pasted the following 356 word title & abstract into the search box!

Peter J. Shaw, David J. Rawlins Article first published online 2 AUG 2011 DOI:10.1111/j.1365-2818.1991.tb03168.x 1991 Blackwell Science Ltd Issue Journal of Microscopy Volume 163, Issue 2, pages 151–165, August 1991 Additional Information(Show All) How to CiteAuthor InformationPublication History SEARCH Search Scope Search String Advanced >Saved Searches > ARTICLE TOOLS Get PDF (1119K) Save to My Profile E-mail Link to this Article Export Citation for this Article Get Citation Alerts Request Permissions Share Abstract References Cited By Get PDF (1119K) Keywords Confocal microscopy;three-dimensional fluorescence microscopy;point-spread function;deconvolution;computer image processing SUMMARY We have measured the point-spread function (PSF) for an MRC-500 confocal scanning laser microscope using subresolution fluorescent beads. PSFs were measured for two lenses of high numerical aperture—the Zeiss plan-neofluar 63 × water immersion and Leitz plan-apo 63 × oil immersion—at three different sizes of the confocal detector aperture. The measured PSFs are fairly symmetrical, both radially and axially. In particular there is considerably less axial asymmetry than has been demonstrated in measurements of conventional (non-confocal) PSFs. Measurements of the peak width at half-maximum peak height for the minimum detector aperture gave approximately 0·23 and 0·8 μm for the radial and axial resolution respectively (4·6 and 15·9 in dimensionless optical units). This increased to 0·38 and 1·5 μm (7·5 and 29·8 in dimensionless units) for the largest detector aperture examined. The resulting optical transfer functions (OTFs) were used in an iterative, constrained deconvolution procedure to process three-dimensional confocal data sets from a biological specimen—pea root cells labelled in situ with a fluorescent probe to ribosomal genes. The deconvolution significantly improved the clarity and contrast of the data. Furthermore, the loss in resolution produced by increasing the size of the detector aperture could be restored by the deconvolution procedure. Therefore for many biological specimens which are only weakly fluorescent it may be preferable to open the detector aperture to increase the strength of the detected signal, and thus the signal-to-noise ratio, and then to restore the resolution by deconvolution. Get PDF (1119K) More content like thisFind more content like this article Find more content written by Peter J. ShawDavid J. RawlinsAll Authors ABOUT USHELPCONTACT USA

…sadly, Summon failed to find a result for that as we don’t subscribe to the article!

[2] – Normally, you search Summon by entering your keywords then, after the results appear, you select facets to refine your search and each facet selection invokes a new search. So, if you ran a search and then select 2 facets, that will be logged as 3 separate searches (1 without any facets, and 2 with).

[3] – Mostly, the publication date facet is being used to limit the search to the X most recent years.

[4] – The vast majority of the content in our Summon instance is in English and, apart from one search that refined the results to just Italian, every use of the language facet was to refine the results to English only.

[5] – Boolean operators have to be entered in UPPER CASE in Summon, with an invisible AND being implict in any multi keyword search that doesn’t include Boolean. Looking at the searches queries that included a Boolean operator, 6% were entered entirely in upper case, implying that the user wasn’t conciously invoking a Boolean search.

Inspired by the Summon result click stats that Matthew Reidsma has extracted (and, to be honest, I find myself being regularly inspired by what Matthew’s doing!), I’ve started tracking the clicks on our Summon instance too.

Anyone who’s had the misfortune to hear me present recently will know I’ve been waffling on about the importance of making e-resources easy to use and painless to access, and the fact that most of us are biologically programmed to follow the easiest route to information

…an information [seeker] will tend to use the most convenient search method, in the least exacting mode available. Information seeking behaviour stops as soon as minimally acceptable results are found.
Wikipedia, Principle of least effort

Why will our students not get up and walk a hundred meters to access a key journal article in the library? … the overwhelming propensity of most people is to invest as absolutely little effort into information seeking as they possibly can.
Prof Marcia J. Bates, “Toward an Integrated Model of Information Seeking & Searching” (2002)

…numerous studies have shown users are often willing to sacrifice information quality for accessibility. This fast food approach to information consumption drives librarians crazy. “Our information is healthier and tastes better too” they shout. But nobody listens. We’re too busy Googling.
Peter Morville, “Ambient Findability” (O’Reilly 2005)

As early as 2004, in a focus group for one of my research studies, a college freshman bemoaned, “Why is Google so easy and the library so hard?”
Carol Tenopir, “Visualize the Perfect Search” (Library Journal 2009)

The present findings indicated that the principle of least effort prevailed in the respondents’ selection and use of information sources.
Liu & Yang, “Factors Influencing Distance-Education Graduate Students’ Use of Information Sources: A User Study” (2004)

People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable — so long as it requires little effort to find — rather than using information they know to be of high quality and reliable, though harder to find.
Jason Vaughan, “Web Scale Discovery Services” (ALA TechSource 2011)

ili2010_026

If you’re looking at Discovery Services, demand a trial and don’t get distracted by how many options the advanced search page has, how well it handles complex Boolean queries, or how many obscure specialist subject headings it supports — to misquote Obi-Wan Kenobi, “these are not the features you are looking for”. The real questions you should be asking are:

  • Can students use the skills they’ve already picked up from a lifetime of searching Google to use this thing?
  • If I pluck 2 or 3 vaguely relevant keywords out of the air and type them in (possibly misspelling them), do I get useful and relevant results?
  • If I choose some slightly more carefully considered keywords, are the first 5 results on the first page all relevant?
  • Does the interface look uncluttered, straightforward to use and, if I wanted to, is it obvious how to refine the search?
  • Does this product work with EZProxy (or similar) to provide easy off-campus access to articles?

…in fact, and please don’t take this wrong way, you’re possibly not the best person to be answering some of those questions as your neural pathways have been severely damaged by years of using poorly designed journal database interfaces and you have an unhealthy (bordering on the sexually perverse) obsession with “advanced” search pages ;-)

Instead, grab some of your newest students (ideally ones who look blankly at you when you ask them if they know what a Boolean operators is) and let them play with it — the more Information Illiterate they are, the better! Treat their comments as pearls of wisdom (“out of the mouth of babes…”) and try to see the library’s e-resource world through their eyes for what it really is: a scary alien landscape of weird library terminology, perplexing login screens, and unnecessary friction at every turn. Above all, never forget that “Libraries are a cruel mistress“!

Matt Borg nicely summed up the above when he cheekily said (and apologies for paraphrasing you, Matt!)…

The trouble with Summon is that students don’t need to be taught how to use it, but librarians do

In other words, you shouldn’t have to be an Information Professional to use a Discovery Service and you don’t have to become a mini-librarian just in order to figure out how the damn thing works. If the interface looks comfortable and familiar to you, it’s probably been designed for librarians to use and will the scare the bejebus out of most of your students. Swallow hard, gird your loins and remember that you’re not buying this product to make your life easier (although chances are it will), you’re buying it to make life easier for your users.

Or, to put it another way, if a Discovery Service looks like a journal database and acts like a journal database, then it probably is a journal database and not a Discovery Service. There’s a very good reason Summon looks more like Google and less like like <insert name of your favourite database here> :-D

(If your idea of a “good time” is to scare undergraduates in training sessions by showing them journal database interfaces — “it’s OK, I’m a friendly librarian and I’m here to show you just how hard it can be to find an article!” — then it’s probably high time you sought medical counselling ;-))

OK, so why am I ranting on about all this stuff? It’s simply because I’ve been pulling out some usage stats from our Summon instance…

  • The library’s print collection accounts for just 0.3% of the items, but accounts for 10.3% of the result clicks — I think our users are trying to tell us that they think our OPAC sucks and they’d rather use Summon to search for books
  • 89% of the results clicked on appeared on the first page of results — as with Google, users rarely delve any further the page 1 of the results
  • Only 2% of result clicks came from beyond the 4th page of the results — very few users will explore the long tail of results
  • 50.5% of result clicks were for the first 4 results on page 1 — the majority of users won’t even bother to scroll down the page!
  • 72.3% of searches used 3 keywords or less — students are using their Google skills
  • Since launching Summon, we’ve seen increases of 300% to 1000% in the COUNTER full-text download stats for many of the journal platforms we subscribe to — although “cost per use” can be a crude measure, we’re getting much better value out of our e-resource subscriptions now

All of the above tells me that Summon is doing all the things we originally bought it for and that the relevancy ranking is schmokin’!

“Yes”, there’s still a place for Information Literacy in all of this, and, “yes”, we need to be able to support researchers and Boolean Buffs, but the majority of students just want to whack in a few keywords and quickly find something that’s relevant — if you select a product that allows them to do just that, they will come :-)

OK, I’ll admit it, I’ve fallen in love with jQuery over the last 18 months :-)

I’ve ended up using quite a bit of jQuery in our new reading list software (“MyReading”), to add various bells and whistles, including dropping an “add to MyReading” option into the Summon interface.

Like they say, “when you’ve got a hammer, everything looks like a nail”, once you know a bit of jQuery, every web page looks hackable, so I’ve pondering what else might be fun and/or sensible to do. To be honest, I really like the Summon interface, so making any major changes to it feels a bit like drawing a moustache on the Mona Lisa (or Mr Graham Stone, for that matter).

So, rather than hack the interface around too much, you could use jQuery to start collecting usage data from Summon (“hmmmm… [drool] usage data!”)…

…or maybe add a helpful hint if a search brings back a silly number of results?

To do the above, you’ll need to host a JavaScript file on your own web server and then include a link to that file in the Summon Administration options, e.g.

Because Summon already uses jQuery, it means you can put jQuery code into your JavaScript file without having to worry about loading the jQuery library yourself. To do the above helpful hint, you could use the following 7 lines of code:

$(document).ready(function() {
  var count = $('#summary .highlight:last').html( );
  count = count.replace(/[^0-9]/g,'');
  if( count > 50000 ) {
    $('#summary').append('<div style="margin-top:5px;"><span id="refineSearchHelp" style="display:none; font-style:italic;">Too many results? Use the options below to refine your search...</span>&nbsp;</div>');
    $('#refineSearchHelp').delay(1000).fadeIn(1000);
} });

Let’s walk through each of those lines…

line 1

Typically, you don’t want your jQuery JavaScript to run until the web page has finished loading, so you’ll often see this line of code — it ensures what follows won’t be executed until after the web page has loaded. If you’ve coded JavaScript before, you’ll probably be familiar with using the onload event in the body HTML tag to do that.

line 2

jQuery lets you easily grab bits of the web page, typically by referencing id attributes (which should be unique) and/or class attributes (which can be repeated). In the same way that CSS uses “#” and “.” to style ids and classes, jQuery uses them to select elements of the page.

If you hunt through the source of a Summon results page, you’ll find something like the following bit of HTML…

<h1 id="summary">
<span class="label">Search Results:</span>
Your search for
<span class='highlight'>germany</span>
returned
<span class='highlight'>3,892,793</span>
results
</h1>

…so, the number of results (3,892,793) appears in a span with a class value of highlight, which itself is inside a h1 with an id of summary. Unfortunately, there’s another span that also has the same class value before it, so we need to use :last in the jQuery to make sure we fetch the HTML contents of the second (i.e. last) span.

line 3

OK, at this point, we should have a JavaScript variable named count that contains the string 3,892,793, so this line strips out the commas (in fact, it strips out anything that isn’t a digit), which should leave count containing 3892793.

line 4

How many results is too many results? Let’s say we’ll display the message for anything more than 50,000 results…

line 5

Time for some more jQuery! :-)

jQuery lets you add new bits of HTML to a page, so let’s create a new div — that will appear underneath the results summary message — by appending it to that existing h1. Just to show off, we’re going to have the helpful hint gradually fade in, so we’ll pop the text within its own span that has an id value of refineSearchHelp and we’ll style it so it’s initially hidden (display:none).

In case you’re wondering, I added that space character &nbsp; just so that the div contains something to start off with, which should ensure the page doesn’t suddenly jump as the hint fades in.

line 6

So, now that we’ve got our helpful hint in a hidden span, let’s wait a second (delay(1000) …OK, we’ll actually wait 1,000 milliseconds!) before letting the message gradually fade in (fadeIn(1000)).

line 7

We’ve got to balance the books, so for every brace and bracket we’ve opened, we need to close them, otherwise the web browser might get upset.

Disclaimer!

Dropping jQuery into Summon isn’t officially supported by Serials Solutions, so be sure to take full responsibility for anything to do and thoroughly test it to make sure you’ve not broken Summon for your users, otherwise they’ll be grumpy.

The other thing to be aware of is that Summon is in a state of coninual development, so you’ll need to test any tweaks you’ve made after each update (to make sure that they still work) and that they don’t conflict with any changes Serials Solutions have made to the Summon HTML.

Appendum

By subverting the “Custom Link” option to insert the JavaScript file, you lose the opportunity to add in a normal custom link (this appears to the left of the “Help | About | Feedback” options at the top right of the Summon interface)… or do you?

Well, there’s absolutely no reason why you can’t use jQuery to do that and, in fact, rather than just having one custom link, you could add 2 or 3…

$('#topbar .link').prepend('<a href="http://library.hud.ac.uk/wiki/">A to Z List of Electronic Resources</a>');

The default links appear in a div with a class of link, which has a parent div with an id of topbar. To add in our new extra link before those existing links, we have to prepend it.

Summon has a really cool new custom search box building widget that includes the ability to pre-limit a search to a specific discipline (or disciplines). The widget also allows you to pre-select which facets to apply to the search.

A question came up on the SummonClients mailing list asking if it was possible to exclude facets from the search — “[is there] a way to exclude newspapers AND book reviews (AND possibly Dissertations) from the initial search”? There isn’t an obvious way at the moment to do that, but I’m a shambrarian and I like to tweak and tinker with things :-D

So, to exclude a content type facet…

1) Go into the Search Box Builder widget and expand the Content Type selection:

2) Select any Content Types to you want to exclude (e.g. Book Review, Dissertation/Thesis and Newspaper Article):

3) Make any other changes you want (appearance, other facets, etc) and click on Get Code to get the widget’s HTML:

At this point, we’ve got a search widget that will only find results that are Book Reviews, Dissertation/Thesis (Thesii? Thesissesses?) or Newspaper Articles. So, the final change to make is to tweak the HTML so that those 3 types are excluded, which you can do by adding a ,t to each of them:

...["ContentType,Dissertation,t",
"ContentType,Book Review,t",
"ContentType,Newspaper Article,t"]...

The result should be a custom search box that excludes the chosen content types:

dallas_063

I had planned to go along to SummonCamp at ALA Midwinter on Sunday and talk about using the Summon API but, perhaps all too predictably, I ended up staying up waaaaay too late on Saturday night sampling some yummy US beers, forgot to set my alarm and overslept :-(

Anyway, here’s what I would have talked about if I hadn’t been asleep at the time…

MyReading Project

For the last 12 months, I’ve been working on developing reading list software for the University of Huddersfield (home page and blog). By making use of both the Summon and 360 Link APIs, I’ve been able to cut down development time and also improve the functionality of the software for both staff and students.

360 Link API

E-journals and e-journal articles make up about 15% of all the reading list references in the software. One of the primary issues was how to provide accurate links to that material and how to ensure those links are updated whenever we change e-journal subscriptions or database platforms. On top of that, we also needed to ensure that authentication was as seamless as possible. Seeing as our link resolver (360 Link) already does all of the above, it made sense to use that.

So, for journal and article references, we’re storing the OpenURL so that we can query the 360 Link API on-the-fly to fetch back current access links. As 360 Link also handles the creation of EZProxy URLs for authentication, the API will return EZProxy prepended URLs when relevant.

If we take this reference to Iodine status of UK schoolgirls: a cross-sectional survey from The Lancet, we’ve stored the OpenURL as part of the reference:

ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&rfr_id=info:sid/summon.serialssolutions.com&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iodine+status+of+UK+schoolgirls:+a+cross-sectional+survey&rft.jtitle=The+Lancet&rft.au=Vanderpump,+Mark+PJ&rft.au=Holder,+Roger+L&rft.au=Lazarus,+John+H&rft.au=Boelaert,+Kristien&rft.au=Laurberg,+Peter&rft.au=Smyth,+Peter+P&rft.au=Franklyn,+Jayne+A&rft.date=2011-06-17&rft.pub=Elsevier+B.V&rft.issn=0140-6736&rft.volume=377&rft.issue=9782&rft.spage=2007&rft.epage=2012&rft_id=info:doi/10.1016/S0140-6736(11)60693-4&rft.externalDBID=GLAN&rft.externalDocID=10_1016_S0140_6736_11_60693_4

By calling the 360 Link API with the above OpenURL, we can get back a page of XML.

At the time of writing, the ssopenurl:linkGroups element contains a couple of ssopenurl:linkGroup elements of type holding which, in turn, contain the current article access links for SwetsWise Online Content and ScienceDirect Journals.

So, as long as we’ve got an accurate OpenURL for a reference, we should be able to automatically insert the correct access links into the reading list. But, how do you get the OpenURL in the first place…?

Summon API

Once staff are logged into the reading list software, they’ll find an option to import any result from Summon as a reference into one of their reading lists…

Although Summon doesn’t officially support modifications like this, unofficially it’s possible to execute jQuery by hacking in a link to suitable JavaScript via the “Custom Link” option within the Summon Administration Console…

As doing this isn’t officially supported by Serials Solutions, it’s possible that it could stop working at any time. But, until that day comes, it’s a useful way of making minor tweaks to the Summon interface ;-)

I’m only a beginner with jQuery, so the following might not be the most efficient and/or elegant way of adding the custom links, but it does the job…

$(document).ready(function(){ doMyReading( ); });

function doMyReading( )
{
  $( '.metadata' ).each(function(intIndex)
  {
    var myReadingDocID = $( this ).parent().parent().parent().parent().parent().parent().parent().attr("id");

    if( myReadingDocID )
    {
      $( this ).append( '<div style="margin-top:3px;background:#004088;color:#ccf;padding:3px 8px;font-size:98%; white-space:nowrap;">item options: <a title="add this item to MyReading" style="color:#fff;" href="http://library.hud.ac.uk/myreading/perl/admin/import_summon.pl?id='+myReadingDocID+'">add to MyReading</a></div>' );
    }
  });
}

…the important bit is that we grab the document ID value for the result (myReadingDocID in the above), which we can then use to retrieve the exact same result via the Summon API.

When the staff user clicks on the “add to MyReading” link, the reading list software uses the document ID to pull in the reference’s details from the Summon API and automatically populates the reference form…

…which includes the OpenURL and DOI, both of which can subsequently be used to query the 360 API to fetch access links :-)

We can also use the document ID to retrieve the article’s subject terms and abstract from Summon…

Summary

So, in summary, we’ve used the APIs to:

  1. avoid having to manually maintain links to e-journal content
  2. make it both quicker and easier for staff to add items from Summon (which currently encompasses over 600,000,000 items!) to reading lists
  3. enhance records by bringing in abstracts and subject terms from Summon

At the recent SummonCamp in New Orleans, there was a question about the local “Availability:” messages that appear in Summon for things like books, e.g.

Availability: available, Huddersfield (Loan Collection Floor 6 – 2 wk loan)

By default, Summon either scrapes your OPAC or makes use of an ILS/LMS API to get real time availability. If neither are available, or if the OPAC takes too long to respond, a “check availability” message appears instead (which typically links through to the item page on the OPAC).

Early on in our Summon implementation, we were concerned about the potential impact on our OPAC — SirsiDynix HIP — of screen scraping. In particular, HIP wasn’t designed to be scraped like this or to be indexed by search engines (many Horizon sites deliberately block Google et al from indexing their HIP) and it creates a new session ID for each request. As each new session takes up some of the OPAC server’s resources, there’s a theoretical limit to the number of concurrent sessions the OPAC can maintain before slowing down (or even crashing). Also, if you’ve done a search in Summon that delivers 25 book results, it takes time for the OPAC to respond to the 25 HTTP requests generated by Summon, and so you often end up getting the “check availability” message anyway.

So, working with Andrew Nagy at Serials Solutions, we implemented a very basic DLF XML web service (code and brief documentation available here) that bypasses our OPAC and pulls the live availability data straight from the Horizon database. Not only does it ensure the OPAC doesn’t take a performance hit, it’s also extremely fast (especially if you run it using mod_perl with a persistent database connection to Horizon) — you can see a typical response (for this book) here: library.hud.ac.uk/perl/summon/dlf.pl?497856

In his Code4Lib Journal article — “Hacking Summon” — Michael B. Klein talks about enhancing an availability API to include extra info and even embedded hyperlinks. This would also be a great way of including item level hold/request functionality into Summon.

At Huddersfield, we’ve done something similar to Michael for our e-resource/database level links, e.g.:

Availability: available, online resource (University network login required)

To help with known item searching, we’ve created some dummy MARC records on our library catalogue for most of the resources listed on our e-resources wiki and these get pushed out to Summon (in the same way that book MARC records do). If the user clicks on the result, they get passed through to the relevant wiki page. However, we also decided we wanted to try and save the user a mouse-click by embedding the actual URL to the resource into the availability message.

To do this, we extended the DLF script so that it detects when an incoming availability request from Summon is for one of the dummy MARC records (rather than a book). The script then does the following:

  1. as the link to the wiki page for that resource is part of the dummy MARC record (the 856 field), it extracts that URL up from the record in Horizon
  2. it then web scrapes that wiki page to extract the actual link to the e-resource (in this particular case, it’s an EZproxy’d link)
  3. the DLF XML is then generated, including the link: library.hud.ac.uk/perl/summon/dlf.pl?646531

One thing that we’ve not done yet, but plan to do, is to include an extra step that queries our E-Resources Blog to check if there are any known problems for that e-resource. If there were, then a link through to the relevant blog post would also be included.

Yesterday, Tim Fletcher tweeted me a question about Summon:

How does Summon rank results? is there a logic?

…it’s not the kind of question that you can answer in 140 characters, but I quickly knocked off an email to Tim. This morning David F. Flanders suggested I should also blog the response.

So, first of all, a quick caveat: much of the following was gleaned from various presentations over the last couple of years or so and may not be 100% accurate (I’m particularly good at misunremembering stuff!)

The first time I saw Summon (back in early 2009), I believe Serials Solutions were still using the default relevancy ranking that comes with the Open Source Lucene software (which is documented here). In a nutshell, Lucene generates a score for each indexed item (that matches the search query) and then those items are sorted by score (in descending order) to produce the ranked results.

I’ve read quite a few times that the relevancy ranking engine in Lucene is regarded as one of the best, which might be one of the reasons why SirsiDynix recently moved Enterprise from using Brainware to Lucene.

When you mention Lucene, chances are Solr won’t be too far behind. Solr (which is also Open Source) extends Lucene to provide a host of extra features, including facets.

As Summon has developed, and in response to customer feedback, Serials Solutions have gradually tweaked the way their Lucene installation generates the scores by giving each result an additional boost (or reduction) depending on a variety of factors, including:

  • Currency – newer items are given a slight boost over older items
  • Content type – books, ebooks and journal articles get a boost to their scores, whilst newspaper articles and book reviews have their scores reduced
  • Local collections – things that come from the user’s library (e.g. books, repository items, local archives, etc) get a little boost

Additionally, the Summon search engine handles certain words and phrases differently. For example, Lucene normally treats the singular and plural version of words as the same, so searches for “africa hospital” and “africas hospitals” both bring back roughly the same number of results. However, Summon understands that “africa aid” isn’t the same thing as “africa aids“.

Given that few users go beyond the first page of results (I was told the exact figure last week, but it’s slipped from my memory — I think it was less than 5%?), Serials Solutions put a lot of effort into trying to ensure that the most relevant results appear on that first page. Given that the Summon master index is fast approaching 1,000,000,000 items, that’s no trivial task!

As they say, the proof of the pudding is in the eating, so feel free to run some searches on our Summon instance to see how well you think it ranks the results.

I’m jetlagged (this is the first time I’ve had jetlag that feels like being drunk) and still coming down from an-ALA induced high, but here goes a blog post!

I’m currently fortunate enough to be a member of the Serials Solutions Summon Advisory Board, and last week saw the fourth pre-ALA meeting, this time in the one and only New Orleans, the home of hurricane cocktails, shrimp po’boys, high heat & humidity and more seafood than you can shake a stick at…

nola_307
(seafood platter at the Grand Isle Restaurant)

Summon Advisory Board notes

  • there are now more than 250 Summon customers around the world
  • the company is currently concentrating on comprehensiveness (in terms of coverage and seamless access to articles)
  • gone are the days when Serials Solutions had to approach publishers and argue the case for them to make their content in Summon — most publishers now realise the value and are approaching the company directly to have their content added
  • John Law’s manta is currently “relevancy, relevancy, relevancy!” — with 800,000,000 items in Summon, relevancy is key to ensuring the user gets the right articles on the first page of results
  • it wasn’t until I saw some demo searches that the awesomeness of the deal with HathiTrust Collection integration began to sink in — librarians of the world, this truly is a game changer! (on a practial note, it’s going to take Serials Solutions a little while to complete the indexing of the entire HathiTrust Collection)
  • a pilot with JSTOR means that a Summon search box is integrated into the JSTOR web site interface — it appears when a JSTOR search produces only a small number (or zero) results, so that the user’s search can be expanded to other journal platforms
  • due to being en route from the UK to New Orleans, I’d missed this annoucement, but the long-awaited deal with Elsevier has been signed
  • for journal articles, Serials Solutions create “super records” that combine the best metadata from multiple sources — this is de-duping on steroids!
  • coming soon — discipline searching (currently 63 subject disciplines have been defined, which work at the journal title and journal article level)
  • coming soon — new article linking improvements (when relevant, Summon results will link directly to the article abstract page on the supplier’s platform, instead of using OpenURLs)
  • Daniel Forsman (Chalmers University of Technology, Gothenburg, Sweden) suggested that we should promote Summon to our users as being more comprehensive that Google Scholar
  • although librarians often get hung-up on what’s not in Summon, some analysis by a Summon customer indicated that the non-indexed content is often low quality “filler material” added by aggregator platforms to bump up journal totals

nola_323
(a bourbon nightcap after the Advisory Board Meeting)

Follow

Get every new post delivered to your Inbox.