access to subjects

4 October 2011

[Note: I started writing this as a comment on this post on “Bookworm and Library Search” over at Benjamin Schmidt’s Sapping Attention, but Blogger wouldn’t let me post it, probably for length reasons. Since my comment turned out to be mostly a standalone discussion and defense of subject headings, I figure I can post it here without causing too much confusion. I’ve added some context to make it clearer what I’m responding to, but it will probably make more sense if you read Schmidt’s post first.]

Summary: As I understand it, Bookworm is a new tool, still in an alpha/beta stage, for accessing and interacting with public domain books using both catalog data and full-text word data. As Schmidt describes it, Bookworm “straddles the space between something like Ngrams and the more traditional library catalog.” I’ve only played around with it just briefly and it looks pretty neat. My post below really has little to do with Bookworm itself.

Instead, I’ve used part of Schmidt’s post as a jumping off place to talk about subject headings and why I think they’re still useful in a full-text world. As an introduction to what Bookworm can do, Schmidt describes four ways of accessing books. The first two have their origins in the pre-digital era. First, subject headings:

1) Use the subject headings in a library card catalog.

Subject headings are the best resource for a particular topic, but a lot of the time they won’t work; your subject may not exist in the catalog, you may not know what it’s called, and librarians may not have assigned some relevant books to the subject heading you’re using.

As will become clear below, I think this is a too narrow way of characterizing subject searching/browsing; this is essentially all Schmidt writes about the method. I actually do quite a bit of subject browsing – when the catalog is set up in a way that makes it possible.

Next up:

2) Find one book in the stacks that you find interesting, and see what’s next to it on the shelves.

Since books in libraries are arranged on shelves according to a classification system (usually the Library of Congress or Dewey) that was designed to group similar books near each other, this is more or less another kind of browsing by subject. Schmidt covers this one pretty well; when I write about subjects below, I mean specifically subject headings, not classifications or call numbers, and even more specifically the Library of Congress Subject Headings (LCSH), which are the main ones in use in North American libraries, especially academic libraries.

Schmidt doesn’t think the development of electronic catalogs made much of a difference for these two search strategies. Summing up, he writes, “so, this is the pre-digitized library; even electronic card catalogs don’t change this balance in any important way.” You can probably guess by now that I’m going to argue that it did make a difference for how people can use subject headings.

Instead, Schmidt writes, the big change came with full-text access:

3) Search the full text of thousands of books for a word or phrase of interest.

For scholarly journals and newspapers, two fields where full-text search is older than for books, this is probably the most important way of finding texts. For most purposes, full-text search obliterates method (1) above; where before you had to find a subject vaguely connected to your interests, now you can identify your topic as precisely as you can describe it in language.

I’m not going to disagree with the view that full-text searching represents a huge change in how people can access works – how could I? – but I don’t think it “obliterates” many of the uses of subject searching. More on that below. My take is that at least for now and for books, subject searching and full-text searching work alongside each other. Journal articles and newspaper articles may be a different story; historically, these have been dealt with according to different indexing and cataloging techniques, and in some cases have not been given subject headings in any form. Length of a work can make a difference too, although the best full-text search designs try to account for this.

For completeness, I’m including Schmidt’s fourth method in this summary:

4) Organize the library according to your personal principles, and browse it from arbitrary points.

It’s worth your time to go over to his post and read about this; I don’t have much to say on this point, except to say that I’m all for it. To the extent that it ties in with what I say about subjects, I’m really just advocating for the continued use of subjects as an additional “arbitrary point” from which you can browse. As long as they exist and continue to be assigned to books, there’s value in making them available for people to use (or not) as they so choose when accessing library and book data.

And with that long, and I hope fair, introduction out of the way, here’s what I tried to post as a comment.

___

I’m looking forward to digging around in Bookworm, but at the risk of missing the larger point, I want to speak up for subject searching/browsing. Subject headings are by no means perfect, but I’ve found them quite useful. They don’t seem to get much support when I hear people talk about them (not too often, but it happens), so I feel obligated to defend them, or at least to point out additional ways to use them.

In the card era, which I remember but which was long enough ago that I never did much searching that way, you did indeed have to pick subjects ahead of time, hope they would work out, and go searching. But ideally a search would not be just a series of sequential [subject –> titles] lookups. What I remember being taught was that once you find a relevant book – maybe even a marginally relevant book – you should check its subjects and write down the ones you don’t already have. Indeed, if you already have a known book to start with, look that up first and begin building your subjects from there. Then you repeat this process until you think you’re done.

This is essentially the subject heading equivalent of your method 2) about classification. It’s more laborious, in that (I assume) this involved opening and re-opening a bunch of drawers in the catalog instead of being able to browse in a relatively confined area in the stacks — and then, of course, you would still have to take your list of books and go to the stacks. But it’s still a way of browsing relationally.

I must admit that I’m glad I never really did this with cards, but I’ve done and still do the electronic equivalent quite a lot. It’s not unusual for me to end up, starting from one “base” search and continuing through subjects, pulling books from two or three or more classification letters in the stacks (where I also browse the shelves). I haven’t found full-text searching to obliterate this but instead to augment it.* I could see developing a full-text search that attempts actual subject analysis and then tries to identify related works for a subject created on the fly, but raw relevance-ranked searches don’t really do that.

I also think the transition from cards to electronic catalog records did represent a fairly significant shift in how people could search library collections. In terms of subjects, it obviously made the strategies I described above a lot easier. Now you could put in a search – and this meant any (enabled) search: title, author, and so on, not just subject – and then pull out your next set(s) of subjects from the relevant results without having to physically traverse a bunch of card drawers. That’s not exactly revolutionary, but it was still an advance.

The big differences, to my mind, were the ability to browse subject hierarchies and the implementation of keyword searching. In the abstract, you could always browse the entirety of the Library of Congress Subject Headings by using the big printed (red) books of headings put out by the LOC – if your library had them in the reference collection. But there was no guarantee that your catalog would have the book(s) that correspond to a particular heading. The nice thing about subject browsing in the electronic catalog was that 1) you were looking at what your library held 2) you could look at the headings themselves without having to look at each particular (card) catalog record for each heading, and 3) you could more easily identify the hierarchies within the headings, which is the key to browsing. So now you could start with, say, “United States — Congress — History” and quickly see that you might also want “United States — Congress — History — Anecdotes” and “United States — Congress — House –History” and so on. (There are a lot of Congress and history headings.) And then you could call up the records for each heading right there in the catalog. This eventually got a bit easier on the web when you could simply click through to the results.

Furthermore, keyword searching meant that you no longer had to know the order of headings – there are rules to this, but most users aren’t going to know them – just the terms you’re looking for in the headings. So [United States Congress history] would lead you to the same subjects as [history United States Congress], and the results would include all subjects using those words in any order, not just the subjects that use only those words.

I suppose even that’s still not so revolutionary, but eventually keyword searching began to run across all catalog fields (as it does now): instead of having to run both a title keyword and a subject keyword search for the same terms, something I remember doing quite often, you could run one search and turn up all records that contained your terms anywhere in the record. A far cry from full-text search, sure, but still a major improvement in searching power. In a card catalog, you might have to deal with subject cards and author cards and title cards. (Have I mentioned I’m glad I did little with cards?)

Keyword searching did one more thing for subjects that I don’t think has gotten much attention and, to some people, might seem like just a curiosity: it made visible and searchable the subheadings that exist in LCSH that pretty much never show up as top-level headings.

The subheading “Homes and Haunts” is a fun example**: as of today, a subject keyword search in the Library of Congress catalog for “Homes and Haunts” (with the quotes) turns up 9853 headings, starting from A to beyond Z (i.e some use non-roman scripts). But not a single one of these uses a top-level subject of “Homes and Haunts.”*** This is because “Homes and Haunts” is one of a set of established headings that are used only to subdivide broader headings. Without the electronic catalog, it would be essentially impossible to track how it was being assigned across all subjects. Granted, I’ve never actually made this kind of heading a subject of research, but I find them sort of fascinating nonetheless. I’d love to see an analysis of which people’s homes and haunts have been considered subject-worthy and, by implication, book-worthy.

Unfortunately, my anecdotal impression is that subject headings are not widely valued and there seems to be a trend towards library catalog interfaces making it harder and harder to use them effectively. Whether this (if my impression is indeed correct) is by design or simply the byproduct of other changes, I don’t know, but it’s a big part of my motivation for writing such a long comment. (I don’t mean to pick on your post; it just seemed like a good jumping off point. I’ve been meaning to write about subject searching/browsing for a while.) I wholeheartedly support developing more and more sophisticated ways to access library collections, but I think it would be a shame if subject headings got left behind because no one worked out a way to maximize their potential usefulness.

*I’m sure I could get more proficient at full-text book searches, but I often find a frustrating amount of false positives when I do that kind of search on a large collection, and not being able to re-order a result set can drive me crazy.

**I owe this example to my dad, who worked in library automation during the transition from paper cards to electronic catalogs.

***The closest you get are:

Homer, Art–Homes and haunts–Ozark Mountains Region.

Homer–Homes and haunts–Greece.

Homer, Winslow, 1836-1910–Homes and haunts–Maine–Prouts Neck.


the social construction of reality

10 January 2010

This is more interesting as social phenomenon than as a sports clip. At least that’s what I’m telling myself.

J.R. Smith is taking a lot of (deserved) criticism for walking out of bounds with the ball as if the shot had gone in, but what’s really remarkable is how almost everyone acts just like the shot had been made* until the referee blows the whistle. It’s like those cartoons where the coyote acknowledges gravity.

*I suppose an alternative theory is that people thought the ball had gone out of bounds, which is not uncommon for shots that miss the rim from that range.


following leaders

29 January 2009

0.2:

Fifty-five Bostonians, including the president of Harvard, A. Lawrence Lowell, signed a petition accusing Brandeis of lacking the “judicial temperament.” It was the kind of campaign that could get people muttering that if those guys didn’t like Brandeis, maybe he was no good.

teotaw-brandeis-chart

One of Brandeis’s allies drew up a chart pointing out that the fifty-five anti-Brandeisians all belonged to the same clubs, worked in the same State Street banks, and lived in the same neighborhoods. As Walter Lippmann wrote, “All the smoke of ill-repute which had been gathered around Mr. Brandeis originated in the group psychology of these gentlemen and because they are men of influence it seemed ominous. But it is smoke without any fire except that of personal or group antagonism.”

_______________

2.0:

What is this thing?

We often describe LittleSis as an involuntary facebook for powerful people, in that the database includes information on the various relationships of politicians, CEOs, and their friends — what boards they sit on, where they work, who they give money to. All of this information is public record, but it is scattered across a wide range of websites and resources. LittleSis is an attempt to organize it in a way that meaningfully exposes the social networks that wield disproportionate influence over this country’s public policy.

I’m not sure if you can create maps, graphs, trees, and charts on Little Sis right now, but hopefully it will be possible to do things like this in the future.


activism and web 0.2

21 November 2008

It seems that Obama has been reading Lincoln; this is encouraging. Some of Obama’s reading about Lincoln may be less encouraging. I am not getting ready to lead a nation or form a cabinet; I have been reading Wendell Phillips. Phillips, you may recall, is the only non-politician profiled in Hofstadter’s American Political Tradition: his type is the “agitator” (something Obama once was, not so incidentally).

I’m reading Phillips mostly because I’ve meant to do so ever since I was a TA in a course on the nineteenth century U.S. whose instructor confessed admiration for him, and I saw a reference to Phillips in something else I read just recently. Many of his speeches are now online, but unfortunately not in easily copy-and-pastable form. I’ve only read a few so far, but I can already say that Phillips is worth a look for anyone interested not just in his history, but in political and social movements and agitation in general.

Here’s Phillips in “Public Opinion” (1851) on the theory of change:

We are apt to feel ourselves overshadowed in the presence of colossal institutions. We are apt, in coming up to a meeting of this kind, to ask what a few hundred or a few thousand persons can do against the weight of government, the mountainous odds of majorities, the influence of the press, the power of the pulpit, the organization of parties, the omnipotence of wealth. At times, to carry a favorite purpose, leading statesmen have endeavored to cajole the people into the idea that this age was like the past, and that a “rub-a-dub agitation,” as ours is contemptuously styled, was only to be despised.

The time has been when, as our friend observed, from the steps of the Revere House — yes, and from the depots of New York railroads — Mr. Webster has described this Antislavery Movement as a succession of lectures in school houses, — the mere efforts of a few hundred men and women to talk together, excite each other, arouse the public, and its only result a little noise. He knew better. He knew better the times in which he lived. No matter where you meet a dozen earnest men pledged to a new idea — wherever you have met them, you have met the beginning of a Revolution.

Revolutions are not made: they come. A revolution is as natural a growth as an oak. It comes out of the past. Its foundations are laid far back. The child feels; he grows into a man, and thinks; another, perhaps, speaks, and the world acts out the thought. And this is the history of modern society. Men undervalue the Antislavery Movement, because they imagine you can always put your finger on some illustrious moment in history, and say, here commenced the great change which has come over the nation. Not so. The beginning of great changes is like the rise of the Mississippi. A child must stoop and gather away the pebbles to find it. But soon it swells broader and broader, bears on its ample bosom the navies of a mighty republic, fills the Gulf, and divides a Continent.

“Rub-a-dub agitation” might be sort of a mid-nineteenth century version of “bloggers in their pajamas.” Later in that same speech Phillips takes up the subject of technology and organization:

In working these great changes, in such an age as ours, the so-called statesman has far less influence than the many little men who, at various points, are silently maturing a regeneration of public opinion. This is a reading and thinking age, and great interests at stake quicken the general intellect. Stagnant times have been when a great mind, anchored in error, might snag the slow-moving current of society. Such is not our era. Nothing but Freedom, Justice and Truth is of any permanent advantage to the mass of mankind. To these society, left to itself, is always tending.

In our day, great questions about them have called forth all the energies of the common mind. Error suffers sad treatment in the shock of eager intellects. “Everybody,” said Talleyrand, “is cleverer than anybody”; and any name, however illustrious, which links itself to abuses, is sure to be overwhelmed by the impetuous current of that society which, (thanks to the press and a reading public) is potent, always, to clear its own channel. Thanks to the PrintingPress, the people now do their own thinking, and statesmen, as they are styled — men in office, — have ceased to be either the leaders or the clogs of society.

_____

Note: I have broken up the speech text into more readable paragraph lengths. The hyperlinks, of course, are in the original – that’s just how far ahead of his times Phillips was.