Modeling library use

The Macalester College library inventory, as reported on July 30, 2015, consists of 430002 items. The large majority of these are books, but there are many other formats as well. The following types compose 99.9% of the library holdings.

Inventory %>%
  group_by(Material.Format) %>%
  tally() %>% 
  arrange(n) %>%
  filter(cumsum(n) > nrow(Inventory)/1000) %>%
  arrange(desc(n)) %>%
  as.data.frame(.)
Material.Format n
Book 362871
Jrnl 30240
Video_DVD 13812
Music_CD 5399
Video_VHS 5176
MsScr 4180
Object 2458
Map 1699
Book_Mic 1505
Book_thsis 1052
Book_Digital 720
Video_Bluray 157
News 150
Video 137
Book_LargePrint 104

A record is kept of how many times each item has been checked out since the start of record-keeping, and how many times it has been checked out in the last year.1 In addition, the date of the most recent check out is recorded.

Inventory %>%
  group_by(Material.Format) %>%
  summarise(events_per_item_total = sum(Issued.Count, na.rm=TRUE)/n(),
            events_per_item_YTD = sum(Issued.Count.YTD, na.rm=TRUE)/n(), count=n()) %>%
  arrange(desc(events_per_item_total)) %>% 
  filter(count > 500) %>%
  as.data.frame()
Material.Format events_per_item_total events_per_item_YTD count
Object 116.45 36.20 2458
Video_DVD 10.14 0.77 13812
Music_CD 9.62 0.22 5399
Video_VHS 3.31 0.18 5176
Book 1.98 0.22 362871
MsScr 0.86 0.02 4180
Book_thsis 0.64 0.06 1052
Map 0.62 0.03 1699
Book_Digital 0.11 0.04 720
Book_Mic 0.06 0.00 1505
Jrnl 0.04 0.00 30240

Presumably, some kinds of items (e.g. newspapers) are not checked out. Others are in obsolete formats (e.g. audiobook cassettes). By far the most heavily used items in the inventory are “Objects” and “otr.” For books and journals, by far the largest components of the library’s collection, use is much less. The average book has been checked out about 2 times since January 2001. Journals seem not to be checked out much. This could be because of sparse use or because they are used without being checked out.

Books

The analysis presented above indicates that the “average book” has been used about 2 times since 2001, and about 0.2 times in the last year. It’s important to note, however, that book use is highly skewed. Some books are used many times; many books are not used at all. Among other things, this may depend on the subject of the book.

CircBooks <-
  Inventory %>%
  filter(Material.Format == "Book", 
         grepl("Stacks - Level [2|3|4]$", Shelving.Location) |
         grepl("Service Desk - Res", Shelving.Location))

Analysis

Write a summary of library book usage with an eye toward giving advice to the library administration on how many books can be archived or disposed of and an estimate of the yearly cost of providing alternative access though inter-library loan.

In order to make such an estimate, you’ll need to estimate how much inter-library loan costs. Your figure does not need to be exactly accurate as you make the estimate. You can easily repeat the analysis with any new value.

As part of your presentation, tabulate results separately for each of the Library of Congress subject codes (in LOC-CODES.csv). You can access the data on inventory at http://tiny.cc/dcf/Inventory.rda. A very abbreviated list of book subjects, based on the Library of Congress number, is available at http://tiny.cc/dcf/LOC-Codes.csv.


  1. Judging from the dates in the records, record keeping for the data studied here started in January 2001.