access to subjects

[Note: I started writing this as a comment on this post on “Bookworm and Library Search” over at Benjamin Schmidt’s Sapping Attention, but Blogger wouldn’t let me post it, probably for length reasons. Since my comment turned out to be mostly a standalone discussion and defense of subject headings, I figure I can post it here without causing too much confusion. I’ve added some context to make it clearer what I’m responding to, but it will probably make more sense if you read Schmidt’s post first.]

Summary: As I understand it, Bookworm is a new tool, still in an alpha/beta stage, for accessing and interacting with public domain books using both catalog data and full-text word data. As Schmidt describes it, Bookworm “straddles the space between something like Ngrams and the more traditional library catalog.” I’ve only played around with it just briefly and it looks pretty neat. My post below really has little to do with Bookworm itself.

Instead, I’ve used part of Schmidt’s post as a jumping off place to talk about subject headings and why I think they’re still useful in a full-text world. As an introduction to what Bookworm can do, Schmidt describes four ways of accessing books. The first two have their origins in the pre-digital era. First, subject headings:

1) Use the subject headings in a library card catalog.

Subject headings are the best resource for a particular topic, but a lot of the time they won’t work; your subject may not exist in the catalog, you may not know what it’s called, and librarians may not have assigned some relevant books to the subject heading you’re using.

As will become clear below, I think this is a too narrow way of characterizing subject searching/browsing; this is essentially all Schmidt writes about the method. I actually do quite a bit of subject browsing – when the catalog is set up in a way that makes it possible.

Next up:

2) Find one book in the stacks that you find interesting, and see what’s next to it on the shelves.

Since books in libraries are arranged on shelves according to a classification system (usually the Library of Congress or Dewey) that was designed to group similar books near each other, this is more or less another kind of browsing by subject. Schmidt covers this one pretty well; when I write about subjects below, I mean specifically subject headings, not classifications or call numbers, and even more specifically the Library of Congress Subject Headings (LCSH), which are the main ones in use in North American libraries, especially academic libraries.

Schmidt doesn’t think the development of electronic catalogs made much of a difference for these two search strategies. Summing up, he writes, “so, this is the pre-digitized library; even electronic card catalogs don’t change this balance in any important way.” You can probably guess by now that I’m going to argue that it did make a difference for how people can use subject headings.

Instead, Schmidt writes, the big change came with full-text access:

3) Search the full text of thousands of books for a word or phrase of interest.

For scholarly journals and newspapers, two fields where full-text search is older than for books, this is probably the most important way of finding texts. For most purposes, full-text search obliterates method (1) above; where before you had to find a subject vaguely connected to your interests, now you can identify your topic as precisely as you can describe it in language.

I’m not going to disagree with the view that full-text searching represents a huge change in how people can access works – how could I? – but I don’t think it “obliterates” many of the uses of subject searching. More on that below. My take is that at least for now and for books, subject searching and full-text searching work alongside each other. Journal articles and newspaper articles may be a different story; historically, these have been dealt with according to different indexing and cataloging techniques, and in some cases have not been given subject headings in any form. Length of a work can make a difference too, although the best full-text search designs try to account for this.

For completeness, I’m including Schmidt’s fourth method in this summary:

4) Organize the library according to your personal principles, and browse it from arbitrary points.

It’s worth your time to go over to his post and read about this; I don’t have much to say on this point, except to say that I’m all for it. To the extent that it ties in with what I say about subjects, I’m really just advocating for the continued use of subjects as an additional “arbitrary point” from which you can browse. As long as they exist and continue to be assigned to books, there’s value in making them available for people to use (or not) as they so choose when accessing library and book data.

And with that long, and I hope fair, introduction out of the way, here’s what I tried to post as a comment.

___

I’m looking forward to digging around in Bookworm, but at the risk of missing the larger point, I want to speak up for subject searching/browsing. Subject headings are by no means perfect, but I’ve found them quite useful. They don’t seem to get much support when I hear people talk about them (not too often, but it happens), so I feel obligated to defend them, or at least to point out additional ways to use them.

In the card era, which I remember but which was long enough ago that I never did much searching that way, you did indeed have to pick subjects ahead of time, hope they would work out, and go searching. But ideally a search would not be just a series of sequential [subject –> titles] lookups. What I remember being taught was that once you find a relevant book – maybe even a marginally relevant book – you should check its subjects and write down the ones you don’t already have. Indeed, if you already have a known book to start with, look that up first and begin building your subjects from there. Then you repeat this process until you think you’re done.

This is essentially the subject heading equivalent of your method 2) about classification. It’s more laborious, in that (I assume) this involved opening and re-opening a bunch of drawers in the catalog instead of being able to browse in a relatively confined area in the stacks — and then, of course, you would still have to take your list of books and go to the stacks. But it’s still a way of browsing relationally.

I must admit that I’m glad I never really did this with cards, but I’ve done and still do the electronic equivalent quite a lot. It’s not unusual for me to end up, starting from one “base” search and continuing through subjects, pulling books from two or three or more classification letters in the stacks (where I also browse the shelves). I haven’t found full-text searching to obliterate this but instead to augment it.* I could see developing a full-text search that attempts actual subject analysis and then tries to identify related works for a subject created on the fly, but raw relevance-ranked searches don’t really do that.

I also think the transition from cards to electronic catalog records did represent a fairly significant shift in how people could search library collections. In terms of subjects, it obviously made the strategies I described above a lot easier. Now you could put in a search – and this meant any (enabled) search: title, author, and so on, not just subject – and then pull out your next set(s) of subjects from the relevant results without having to physically traverse a bunch of card drawers. That’s not exactly revolutionary, but it was still an advance.

The big differences, to my mind, were the ability to browse subject hierarchies and the implementation of keyword searching. In the abstract, you could always browse the entirety of the Library of Congress Subject Headings by using the big printed (red) books of headings put out by the LOC – if your library had them in the reference collection. But there was no guarantee that your catalog would have the book(s) that correspond to a particular heading. The nice thing about subject browsing in the electronic catalog was that 1) you were looking at what your library held 2) you could look at the headings themselves without having to look at each particular (card) catalog record for each heading, and 3) you could more easily identify the hierarchies within the headings, which is the key to browsing. So now you could start with, say, “United States — Congress — History” and quickly see that you might also want “United States — Congress — History — Anecdotes” and “United States — Congress — House –History” and so on. (There are a lot of Congress and history headings.) And then you could call up the records for each heading right there in the catalog. This eventually got a bit easier on the web when you could simply click through to the results.

Furthermore, keyword searching meant that you no longer had to know the order of headings – there are rules to this, but most users aren’t going to know them – just the terms you’re looking for in the headings. So [United States Congress history] would lead you to the same subjects as [history United States Congress], and the results would include all subjects using those words in any order, not just the subjects that use only those words.

I suppose even that’s still not so revolutionary, but eventually keyword searching began to run across all catalog fields (as it does now): instead of having to run both a title keyword and a subject keyword search for the same terms, something I remember doing quite often, you could run one search and turn up all records that contained your terms anywhere in the record. A far cry from full-text search, sure, but still a major improvement in searching power. In a card catalog, you might have to deal with subject cards and author cards and title cards. (Have I mentioned I’m glad I did little with cards?)

Keyword searching did one more thing for subjects that I don’t think has gotten much attention and, to some people, might seem like just a curiosity: it made visible and searchable the subheadings that exist in LCSH that pretty much never show up as top-level headings.

The subheading “Homes and Haunts” is a fun example**: as of today, a subject keyword search in the Library of Congress catalog for “Homes and Haunts” (with the quotes) turns up 9853 headings, starting from A to beyond Z (i.e some use non-roman scripts). But not a single one of these uses a top-level subject of “Homes and Haunts.”*** This is because “Homes and Haunts” is one of a set of established headings that are used only to subdivide broader headings. Without the electronic catalog, it would be essentially impossible to track how it was being assigned across all subjects. Granted, I’ve never actually made this kind of heading a subject of research, but I find them sort of fascinating nonetheless. I’d love to see an analysis of which people’s homes and haunts have been considered subject-worthy and, by implication, book-worthy.

Unfortunately, my anecdotal impression is that subject headings are not widely valued and there seems to be a trend towards library catalog interfaces making it harder and harder to use them effectively. Whether this (if my impression is indeed correct) is by design or simply the byproduct of other changes, I don’t know, but it’s a big part of my motivation for writing such a long comment. (I don’t mean to pick on your post; it just seemed like a good jumping off point. I’ve been meaning to write about subject searching/browsing for a while.) I wholeheartedly support developing more and more sophisticated ways to access library collections, but I think it would be a shame if subject headings got left behind because no one worked out a way to maximize their potential usefulness.

*I’m sure I could get more proficient at full-text book searches, but I often find a frustrating amount of false positives when I do that kind of search on a large collection, and not being able to re-order a result set can drive me crazy.

**I owe this example to my dad, who worked in library automation during the transition from paper cards to electronic catalogs.

***The closest you get are:

Homer, Art–Homes and haunts–Ozark Mountains Region.

Homer–Homes and haunts–Greece.

Homer, Winslow, 1836-1910–Homes and haunts–Maine–Prouts Neck.

Advertisements

2 Responses to access to subjects

  1. Ben Schmidt says:

    I think you’re completely right that LCSH headings are a vastly under-used resource nowadays; all the thought put into them makes them far better where they exist than full-text search.

    But beyond just me, I think there’s definitely a trend towards libraries using an imitation of Google’s single-box search that mashes together full-text, titles, and catalog headings in mysterious ways. I think users are getting less and less willing to use the constrained vocabulary of subject headings in searches, and they’re getting somewhat endangered as a result. But I may just be describing some of my own research shortcomings, here.

    A really browsable version of subject headings for a lot of these collections that integrates them with browsing would be very valuable, and something I’ve been thinking about a bit. I think I partly was so hard on subject headings b/c I haven’t put any into bookworm yet for boring technical reasons.

    Also, I think the LC hasn’t made them completely available for public data manipulation, which has kept them from playing as big a role as they might in the open web ecosystem.

  2. andrew says:

    I’m actually taking a course in cataloging this term. It’s more about how to form catalog records for individual items than it is about electronic catalog design, but I’m hoping we’ll cover some of issues surrounding how catalogs make use of the data they get. We haven’t gotten to subjects yet, so everything I’ve been saying is pretty much based on my own experience and searching behavior. It’ll be interesting to see how this looks from the subject-assigner side of things.

    Overall, I think the trend towards providing a single search box as a starting point has made catalogs a lot easier to use, especially if you’re doing a quick search for a known item/title/author/or even subject. And there’s almost always an “advanced search” that lets you do subject keywords. I think the issue is more towards the browsing end of things. Because there’s usually multiple subjects for any given item, you can’t really do a simple sort by subject in the way that a lot of catalogs allow you to sort by title/author/date. You generally have to see the headings instead of only a list of items that fall under the headings.

    The trend I’ve been seeing has been towards making it harder to get to the headings list from a given item. On some catalogs if you click on a link for a heading, you get an alphabetical, browsable list showing the headings nearby which you can then use to move back and forth between items and headings. My university catalog has now changed this so that clicking through shows you only the subject you clicked on and any narrower subjects within the hierarchy. You can still get to the headings another way through a dedicated browse function, but this requires extra steps. Other catalogs might take you to a list of results, as if you searched for that specific subject.

    On the other hand, just because I use the headings lists a lot clearly doesn’t mean that that’s the best way to make use of the headings data. It certainly doesn’t seem to be a popular method. So I’m not at all tied to keeping catalogs the way they’ve been in the past. I can imagine that there could be more intuitive and interactive ways of using the data that don’t require so much backtracking between item records and headings lists. Some catalogs seem to be adding sidebars that give more options for narrowing or expanding searches. In my limited experience, I’ve found it hard to figure out how they’re using subjects (or “topics” or “keywords”), but I could see this way of searching overtaking the older methods.

    As for the LC data, I wonder why they might not be as forthcoming with subject headings. I wonder if it’s more of a technical challenge or if there’s some other reason to keep more control over them. I noticed when I was clicking through to a couple of records on Open Library from Bookworm that the Open Library records don’t show subjects – maybe there’s a full view somewhere? – although subjects are included in worldcat and in the LC’s own catalog. Hathitrust shows them in catalog records, but of course they could be getting them from their own member libraries.

%d bloggers like this: