# Miskatonic University Press

## Better ways of using R on LibStats (3): Fifteen minutes of research support

Another update about using R to analyze the reference desk statistics that we keep at York University Libraries with LibStats (see also Better ways of using R on LibStats (1) and Better ways of using R on LibStats (2): durations, which is about how long we spend helping people at the desk). This is more about time spent, and it rehauls what I wrote up in May 2011 in Ref desk 5: Fifteen minutes for under one per cent. There I said:

Put those two charts together and it shows that during term time we spend on average about fifteen minutes a week giving research help to each of under one per cent of our students.

That’s still true, but now I can calculate it faster and make better charts to show it. And I figure it monthly: weekly showed some nice variations, but monthly is very easy to deal with, and nicely handles the three busy months per term.

As before, I get the data ready.

Thanks to a suggestion from Hadley Wickham (whose `dplyr` and `ggplot2` packages I use all the time), I calculate the estimated durations of each encounter by using `inner_join` from `dplyr`, which does the same thing as `merge` but faster and more tidily.

The data frame is too big to look at nicely, so I’ll pick out three columns.

I want to add a new column, `est.duration`, that turns the `time.spent` column into an estimate of how many minutes each interaction took. 5-10 minutes becomes 8, 10-20 minutes becomes 15, etc. So I make a `durations` data frame and do an `inner_join` with `l` that adds a new column and puts the right value in every row.

The rows got reordered, but that doesn’t matter. The `est.duration` column does actually have the right numbers in it, it’s not just all 1s.

I’ll skip over some of the rest of the preparation, which I explained last time, and get on to figuring out about research questions. First, I use `dplyr` to make a new data frame that collapses just the research questions into monthly summaries of how many were asked and how long they took.

Slight decline in the numbers of research questions asked each month, year to year. Why? I don’t know. We need to investigate.

(The images look a bit grotty because I shrank them down, but each is a link to a full-size version.)

This shows that sometimes fewer quesions doesn’t mean less time spent helping people with research. But knowing how many questions were asked and how long they all took, it’s trivial to divide to find the average.

Some variation there among the smaller branches, but Scott, which has the most students and is by far the busiest, stays very consistent at 15–20 minutes per research help session.

I explained in another post about “home users,” the number of students that (in theory) use a given library—this should be especially true for research questions—and here I set up a data frame with the branch names and numbers for each year.

Before we match things up we need to align things by academic year. Academic 2013–2014, which I’d label 2014, started last May (2013-05-01) and ends today (2014-04-30). An easy way to calculate the academic year of a given date is to push it ahead by as many days separate 1 May from 1 January and then use the year of the resulting date. Anything from January to April stays in the same year, but everything later gets knocked ahead a year.

Assuming each research question is asked by a different person, this shows that each month we see 2-3% of home users about research. Of course, if some people ask more than one question, that ratio is lower. (There is research underway to find out who these users are—the hypothesis is that it isn’t 3% of all students that get research help, but a much higher proportion of a much smaller set of students: who are they, what kind of help do they need, and how can we change what we offer to be of more use?) I’m curious about the situation at other libraries and what their numbers show.

Two years ago I said:

Put those two charts together and it shows that during term time we spend on average about fifteen minutes a week giving research help to each of under one per cent of our students.

Now I say: “During term time each month we give about fifteen minutes of research support to two or three per cent of our students.”

On average, each home user gets under one minute of research support each month. Again, this isn’t how it actually works—a smaller percentage of users get more help, though we don’t know the details yet—but again I’m curious to know how this compares to other libraries.

That’s it for the R and library stats posts for now, I think. Try `dplyr`, it’s great.