Miskatonic University Press

Code4Lib 2009 notes


I went to Code4Lib2009 in late February and had a fantastic time, not just because Jodi Schneider and I gave a talk called What We Talk About When We Talk About FRBR. The whole conference was a blast and it was great to hang out with people I knew and meet new people.

Ed Summers posted his conference notes, Jonathan Rochkind doesn't like the Code4Lib award idea Eric Morgan put forward, Karen Coombs posted notes, Jay Luker did a cool IRC channel timeline, Terry Reese did ten things to take away from it, Jon Phipps wasn't enchanted with it all.

Here are my notes, pretty much raw. I didn't take notes on the linked data pre-conference, the morning of which was excellent. The afternoon got a little too loose and I got distracted with some annoying Ruby on Rails problem.

Monday 24 February 2009

Roy Tennant: Introduction

Mark Matienzo: How to Meet People and Have Fun at Code4LibCon

Stefano Mazzocchi: A Bookless Future for the Libraries?

TODO: Listen to Jon Udell interview

Interesting talk. Began by talking about how marginal costs of communication have been dropping over history, from cave drawings to clay tablets to the printing press to electronic publishing.

Pros to this, but cons now, too, like a "degraded consumption experience:" low screen resolution, batteries required, poor network access.

Business models are being disrupted. Institutions like libraries are being disrupted. Almost-zero marginal costs are here to stay.

Libraries vs museums of books. Non-unique books can all go online in electronic versions. (Unique books still require special attention.)

No more shelves. Nearly infinite storage space. (He's getting into a lot of "the book is dead" stuff---tide turning against him in IRC.)

Do we still need metadata? (Shouts of "yes!"). Does metadata have to be made by people or can it be done by computers with statistical analysis?

Information is fragmented. Can the library mindset still work across a spectrum of such fragmented information, hyperlinks, journals, networks? Networks of relational assertions = the web of data.

He asked a few questions ("who's the youngest 2008 Academy Award winner") that were basically trivia questions. Where would we find the answers? His answer: Freebase. He found the answer to that question easily enough, but his talk ran into trouble on the next one when he tried to show how Freebase made the answer easy to find and the whole thing failed due to network slowness. He lost some of the audience here.

Showed FMDB, a granular linked-data version of IMDB data. Showed a translator. Showed http://typewriter.freebaseapps.com/, where they crowdsource specifying unspecified data. Showed the Genderizer (!?).

[A few days later Mazzocchi posted Post-Mortem of a Dissonant Keynote: he went into the #code4lib channel logs to see what people were saying while he was taking. That took guts. Great post.]

Anders Soderback: Why Libararies Should Embrace Linked Data

(I talked to Anders for a while after this. Very nice fellow, and Libris is a great piece of work.)

Libris, the Swedish national union catalogue.

Less technical version of the talk he gave the day before. Martin Malmsten was supposed to be here but couldn't make it. He talked about the web, that it's social, that it's a network, etc. He spent a while getting to the actual catalogue. When he did, he showed the basics of it: the RDF version, the frbr-related, different representations, etc.

Their blog: http://blog.libris.kb.se/semweb/

Ross Singer, Like a Can Opener Through Your Data Silo: Simple Access Throgh AtomPub and Jangle


Problems with library systems APIs: not very good. OAI-PMH etc. Atom + REST = AtomPub, an Atom publishing model. Ed Summers pointed out a couple of years ago how well this would work. Google, Microsoft, IBM, WordPress, Drupal, Movable Type, etc., all use it.

Jangle applies a common data model to library information through AtomPub. Four kinds of entities: resources, items, actors (users etc.), collections.

Jangle vocabulary: http://jangle.org/vocab/

Ross is building connectors to various systems so that it doesn't matter what kind of ILS you're using (ideally) you can always talk to it with Jangle.

Godmar Back complimented the idea and the importance of interoperability.

Glen Newton, LuSQL: (Quickly and Easily) Getting Your Data from your DBMS into Lucene

Glen's from CISTI in the Digital Library Research Group. They have 8.5 million full-text articles and want to index them all to make them easy to search. They use Lucene but not Solr, they use LuSql. He showed how to do some queries with LuSQL, talked about how the indexing was done by Lucene, showed some output.

Terence Ingram, RESTafarian-ism at the National Library of Australia

My attention was caught when he said they used VuFind and asked how many people here used. Laughs when Andrew Nagy stuck up his hand. Ingram said they just needed something so had a quick look around and chose VuFind. Very casual. All those people who look to the NLA as a guide in their own choice of VuFind will be surprised!

Birkin James Diana, The Dashboard Initiative


Showed some widgets that make a dashboard for showing all of the activity going on at a library, with widgets for circ, ILL activity, other kinds of things. Has some nice visualizations in there using some kind of Google visualizer. People in channel also mentioned Flot. The code is available for download. Open source!


Ed Summers and Michael Giarlo, Open Up Your Repository with a SWORD


"SWORD is a lightweight protocol for depositing content from one location to another. It stands for Simple Web-service Offering Repository Deposit and is a profile of the Atom Publishing Protocol (known as APP or ATOMPUB)." Lets you deposit into repositories without worrying exactly what kind of repository it is: Fedora, DSpace, whatever. Ed explained it and Mike showed some examples. Looks easy to use and very useful if you need to do what it does.

Mark Matienzo, How I Failed to Present on Using DVCS for Archival Metadata

He proposed to talk about how DVCS would work for archival metadata but found it was too complicated. Slide: "I failed, epically." He looked at bzr and git but then went with Mercurial. Problem: It works on line diffs (for code) but EAD is in XML. How do you diff the XML? There are various tools, but in the end the whole thing turned into a complicated mess.

Godmar Back, LibX 2.0


Summarized LibX 1.0. But a toolbar is great, though what about emerging technology trends (mashups, SOA) and educational trends (online tutorials, social tagging, visualizations)?

In 2.0, librarians can create Libapps (?) and users can add them into pages where they want. LibX runs/houses them. Libapps are made from reusable modules.

How to make a libapp? Modules is JS plus metadata description, a Libapp is a group of modules, a Package is a folder of Libapps.

Modules: Named at a URL, published with AtomPub. They use tuples in JSON. { isbn: "074322670" }

Example: user goes to ACM site. There's a LibX button. Click on it and a YouTube video shows up, with person giving some help! E.g. Annette Bailey.

He showed the code to implement that, just a bit of JS. Simple. Some other bits of code needed, but all pretty easy.

LibX 2.0: been rewritten in full browser-independent OO. about:libx gives built-in documentation. They have unit tests and it's all hot updatable (?).

Roles in the LibX 2.0 world: developers, adapters, user community.

LibX community repository will be built, to hold modules, packages, etc. Coming over the next two years.

Gradual transition from LibX 1.5, not a huge major release.

Kevin Clarke and John Fereira, Djatoka for Djummies

Kevin gave an overview of Djatoka, then John did some demos.

Could we use this for the big maps that in Maps? Instead of Zoomify? Lots of overhead, though---runs on Java, requires Tomcat, etc. But worth a look.

Breakout sessions

LibX 2.0, Zope3/Grok/Plone, Fedora, Solr, Jangle. I went upstairs and napped.

Lightning Talks

David Lindahl, XC

Quick overview of XC, but he lost me by showing a couple of short cartoons while was talking, which distracted me.

Casey Bisson, Scriblio

The internal data model is improved and now you can do original cataloguing in it. It's a digital library out of the box now, too, and people are using it for that. Some kind of tie-in to LibraryThing's Common Knowledge, too.

Mark Matienzo, enjoysthin.gs

Showed this new social bookmarking site.

Emily Lynema, E-Matrix

ERM she's working on at NCSU. FRBRy data model. E-resources can be "narrowly related" or "core" to a subject/discipline. Interesting! Gradations of relevancy to a program or subject. List them on subject guides.

Eric Lease Morgan, Alex4


Goal: facilitate a person's ongoing liberal arts education.

Geoffrey Bilder, "Cool URIs Must Die."

He's from CrossRef. They're fighting linkrot. "Persistence isn't a technical issue, it's a social issue."

John Law, Summon

http://www.serialssolutions.com/summon/. Searches everything a library has: physical, virtual, etc. One search box.

Erik Hatcher, LucidFind


They're searching everything on lucene.apache.org: mailing lists, bugs, etc.

Mike Taylor (Index Data), Making Distributed Configuration Simple with the Torus

Metasearch engine: http://indexdata.com/pazpar2/

They've done a way of simply specifying exactly where you want to search and what fields you want to show in the results. Translucent Record Store = Torus

Jakub Skoczen, also from Index Data, on how Torus is implemented

Michael Klein and Jonathan Brinley on zoia's FOAF support


Andy Ashton, Biblio

He's at the Scholarly Technology Group at Brown. One of many projects called Biblio. Biblography project.

Naomi Dushay, VuFind at Stanford



searchworks-test.stanford.edu: They did a shelf-browse across all of the libraries, including storage. They show it in a right nav, which is a good idea. Should use this.

Mike Beccaria, Zoom Zoom Zoom

Paul Smith's College: paulsmiths.edu They've scanned in old yearbooks and put them in ContentDM. Didn't like it.

Showed Microsoft's Deep Zoom. Looked cool. Worth checking---for Maps? He put a whole yearbook into one image and then put it into the system, so you can zoom from a full overview right into something very small.

Photosynth: took pictures of the stacks and showed how you can move around them and zoom in on spines. Could link to the catalogue?

Dan Chudnov, BagIt

http://digitalpreservation.gov/ uses BagIt.

Wednesday 25 February 2009

Sebastian Hammer, Index Data

Talked about Index Data and their history. Got into books and libraries in general. Whithr libraries, with Google? They're digitizing things better than we are. Libraries could be swallowed by whipper-snapper technology.

Why (Local) Libraries: bearers and presevers of cultural heritage; conveyers of authoritative information; supporters of learning and research; pillars of democracy

"Even if we end up dying, we can't go quietly."

Libraries need to stay local but also come together: consortia, a group of libraries working together on web sites, turn into a "super-robot."

Quoted Lorcan Dempsey on "stitching costs." Most so-called APIs are loyalty schemes rather than interoperability devices. Standardization is hard/boring but essential for collaboration. Need more collaboration and working together, while keeping our business models. Systems and organizations need to surrender our data freely. Library hackers must become adocates within their orgs.

"MARC is a joke. Z39.50 is for old people. SRU is dumb. NCIP ... forget about it."

Tim McGreary, OLE Project: A New Frontier

OLE is "working to redefine the business processes for libraries."

They want to be "flexible, adaptable, community-developed," they want "improvement beyond the ILS." Format and resource agnostic. No need for a separate ERM. This is not just an ILS replacement: special collections, video, DRM, everything. They want a service-orientd architecture.

Community-based software development and governance. Raise the Library system to the enterprise level. Complement human interaction.

He reviewed the project timeline. Ah, project timelines.

They're using the National Library of Australia Services Framework. (What is this?)

In March: OLE core services will be defined. "Final design document" to be published in July, and then another group will do stuff.

Bess Sadler, Blacklight

http://blacklightopac.org/, http://blacklightrubyforge.org/

Give up on the idea that we can do a single interface that will work well for everyone. Different kinds of students have different needs.

Top question at music ref desk: I play violin, my boyfriend plays piano. What can we play together? Their old system wasn't indexing this information. Not everyone would want to search for that, but music students would. Should let them do it. Another q: can I find things by era?

They have a music portal: new books on home page are all music books.

Special behaviour for special objects: any musical recording, they go to Musicbrainz, get the unique ID, then go to other services to get metadata that key on that ID.

"College students are broke and they like music. If we are spending lots of money on music and not letting them know about it, that's a fail."

Blacklight runs on Rails. Uses Rails Engines so it's easier to keep up with the code base but make local changes.

TODO: marc4j is now much better ... look it up.

TODO: Install Blacklight and try it out. Look at Rails code.

Joshua Ferraro, biblios.net

He's from LibLime. Explained about their business. Open Library, Open Data Commons License.

Look at: extJS

"biblios.net is LibLime's free browser-based cataloguing service." 35 million freely-licensed records.

TODO: Make OpenFRBR query biblios.net (requires biblios.net account)

Look at: bws.biblios.net: APIs that let you interact with the database

He downloaded a MARCXML record from Blacklight and loaded it up into Biblios, using curl at the command line.

They're working with ONIX records, too, but keeping them separate from MARC because they're made with different rules.

Authority files are available. Useful?

This is a great project. York should be uploading everything there.

Chris Catalfa, Adding Functionality to Biblios with CouchDB

UI: extjs.com + jquery.org

1. Implementing an editor for Dublin Core. He showed a bit of XML that constructs an editor. (Faint code, couldn't read.)

2. Then they use CouchDB on the backend.

He was showing a lot of code snippets and a lot of unreadable XML.

Toke Eskildsen, Complete Faceting



Erik Hatcher, The Rising Sun: Making the Most of Solr Power

9. Performance
8. Memory
7. Query parser: default is to show the raw query parser
6. Data import: bring data into Solr with Data Import Handler, Solr Cell, CSV, LuSQL, APIs.
5. Request handlers: leverage Solr's configurability
4. Solr and IR toolkit
3. LocalSolr
2. Faceting: can do multi-select faceting
1. The interface is the app. (Solritas, Velocity)
0. Community

Lucid Imagination: they do support and training for Solr.

TODO: Look for screencast Erik Hatcher did showing off Solr and how to get it working.

Chris Shoemaker, FreeCite: An Open Source Free-text Citation Parser


API URL: POST to http://freecite.library.brown.edu/citations/create

JSON responses implemented today. Give FreeCite an unformatted citation and it will give you back a formatted citation. Nice piece of work ... except it gives back almost-OpenURL?

Richard Wallis, Squeezing More from the OPAC

People may like facets and relevance, but they also want links to Amazon and Google Books and other places outside the catalogue.

JUICE: Showed injecting Javascript into pages on the fly, in the browser, so that when looking at something in a catalogue, it added links to WorldCat/Google Books/LibraryThing/Open Library/etc., either as text or as images.


Inheriting CSS of the page means things look the same.

Just a matter of pasting in a bit of Javascript, built on a template, looks pretty easy to build.

Sean Hannan, Freebasing for Fun and Profit

Intro to Freebase. They slurp up info from Wikipedia, Musicbrainz, etc., plus people can also edit it.

REST API: http://api.freebase.com/

Example: embedddable snippet of code that links Academy Awards data to entries in own catalogue. Hmm! I could use this.

Example code: http://code4lib.mrdys.user.dev.freebaseapps.com/

Lightning Talks

Lots of good talks here. I was a bit distracted through some of them because I was getting people to sign the release forms and was keeping trak of who had signed what.

Chris Fitzpatrick: thinkbase? Check it out.

Thursday 26 February 2009

Ian Davis, If You Love Something ... Set It Free

The Semantic Web has fundamentally changed how people, computers and information interact, not because of the semantics, but because of the web and how easy it now is to connect information.

Conjecture 1: Data outlasts code. Therefore open data is more important than open source.

Conjecture 2: There is more structured data in the orld than unstructured.

Conjecture 3: Most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

"The goal is not to build a web of data. The goal is to enrich lives through access to information."

"Technology grows exponentially, but society adapts linearly."

Warning: We should exchange and share information, but we must be careful that there are people involved, and some data should be kept private and protected. But most of it doesn't need to be.

Ed Corrado, The Open Platform Strategy: What It Means for Library Developers

Marshall Breeding: TOC of open source is roughly equivalent to commercial. But a big ARL found that open source software would costs 30% (?) more than commercial over a few years (cite: Grant?).

Talked about Ex Libris's Open Platform program.

Adam Soroka, A Modern Open Webservice-Based GIS Infrastructure

Chris Beer and Courtney Michael, Visualizing Media Archives

Cool demo built on the conference FOAF data: http://ratherinsane.com/~chris/c4l09/index2.php

Lightning Talks

Christopher Morgan, http://bookgenius.org/beta

Shows visual structure of subject headings. Try "evolution." He likes Thomas Mann's book on library research and emphasis on use of LCSH.

Ross Shanley-Roberts, Extracting Data from III with Expect

Rosalyn Metz

Showed how easy it is to set up a server in Amazon's EC2. Really only takes a few minutes! Not cheap, though. Great talk!

Heikki Levanto

Another Index Data guy talking about their stuff. This guy talked about regexes and parsing web pages.

Mikkel Erlandsen, Summa

Jonathan Rochkind, Umlaut

Richard Wallis, Open Catalogue Crawling Protocol

Xiaoming Liu, checking links and URIs

TODO: look at Firefox plugins: LinkChecker, LinkEvaluator.

TODO: look at command-line: linkchecker