Miskatonic University Press

At Their Fingertips

22 June 2009 talks

My colleague Sarah Coysh, e-learning librarian at York University Libraries, and I gave a talk at the 2009 Canadian Library Association Conference on 1 June in Montreal: At Their Fingertips: Customized Library Resources in Course Management Systems and Student Portals. Slides (with notes) and audio (I put my MP3 player on the lectern) are available. If you’re interested in seeing screenshots of another university’s Moodle and student portal then have a look.


Video of What We Talk About When We Talk About FRBR

22 May 2009 talks frbr

In late February Jodi Schneider and I did a talk at Code4Lib 2009, and I posted the slides on The FRBR Blog: What We Talk About When We Talk About FRBR.

The video of the talk is now online!

Thanks to Talis and Karen Schneider and the Code4Lib organizers and Brown University for doing all the work.

UPDATE 22 June 2009: The video is also up at the Internet Archive, along with the rest of the C4L 2009 talks.


Stoic podcasts and interviews

15 May 2009 stoicism


"It would perhaps increase my acquaintance, the thing which I chiefly study to decline."

03 May 2009 quotes

Quote from a letter from Sir Isaac Newton to one Collins, 18 February 1669/70:

The solution of the annuity problem, if it will be of any use, you have my leave to insert it into the Philosophical Transactions, so it be without my name to it. For I see not what there is desirable in public esteem, were I able to acquire and maintain it. It would perhaps increase my acquaintance, the thing which I chiefly study to decline.

Found when I flipped through Correspondence of Scientific Men of the Seventeenth Century, edited by Stephen Jordan Rigaud (with an index by Augustus de Morgan!).


Footnotes in The Spellman Files

03 May 2009 footnotes

I added to Lisa Lutz’s The Spellman Files to the list of fictional footnotes. You can download the footnotes:

For those of you on the Kindle, or eReaders, you’ll be happy to know that we’re now making available the footnotes for all the books in a downloadable PDF (which you can use for quick reference): Download Footnotes for The Spellman Files. [PDF]

Are the footnotes not available in e-book versions of the novel? Is this generally true? I’ve never seen a Kindle, much less examined how it handles footnotes.


isbn2marc is back

28 April 2009 code

My Ruby script isbn2marc is back. I thought I needed to use the advice in Configuring .htaccess to ignore specific subfolders but it turned out the Options Indexes line in /src/.htaccess was choking things and causing permissions errors. Beats me why, but at least it’s fixed. Sorry about the disruption.


Code4Lib 2009 notes

22 April 2009 code4lib

I went to Code4Lib2009 in late February and had a fantastic time, not just because Jodi Schneider and I gave a talk called What We Talk About When We Talk About FRBR. The whole conference was a blast and it was great to hang out with people I knew and meet new people.

Ed Summers posted his conference notes, Jonathan Rochkind doesn’t like the Code4Lib award idea Eric Morgan put forward, Karen Coombs posted notes, Jay Luker did a cool IRC channel timeline, Terry Reese did ten things to take away from it, Jon Phipps wasn’t enchanted with it all.

Here are my notes, pretty much raw. I didn’t take notes on the linked data pre-conference, the morning of which was excellent. The afternoon got a little too loose and I got distracted with some annoying Ruby on Rails problem.

Monday 24 February 2009

Roy Tennant: Introduction

Mark Matienzo: How to Meet People and Have Fun at Code4LibCon

Stefano Mazzocchi: A Bookless Future for the Libraries?

TODO: Listen to Jon Udell interview

Interesting talk. Began by talking about how marginal costs of communication have been dropping over history, from cave drawings to clay tablets to the printing press to electronic publishing.

Pros to this, but cons now, too, like a “degraded consumption experience:” low screen resolution, batteries required, poor network access.

Business models are being disrupted. Institutions like libraries are being disrupted. Almost-zero marginal costs are here to stay.

Libraries vs museums of books. Non-unique books can all go online in electronic versions. (Unique books still require special attention.)

No more shelves. Nearly infinite storage space. (He’s getting into a lot of “the book is dead” stuff—tide turning against him in IRC.)

Do we still need metadata? (Shouts of “yes!”). Does metadata have to be made by people or can it be done by computers with statistical analysis?

Information is fragmented. Can the library mindset still work across a spectrum of such fragmented information, hyperlinks, journals, networks? Networks of relational assertions = the web of data.

He asked a few questions (“who’s the youngest 2008 Academy Award winner”) that were basically trivia questions. Where would we find the answers? His answer: Freebase. He found the answer to that question easily enough, but his talk ran into trouble on the next one when he tried to show how Freebase made the answer easy to find and the whole thing failed due to network slowness. He lost some of the audience here.

Showed FMDB, a granular linked-data version of IMDB data. Showed a translator. Showed http://typewriter.freebaseapps.com/, where they crowdsource specifying unspecified data. Showed the Genderizer (!?).

[A few days later Mazzocchi posted Post-Mortem of a Dissonant Keynote: he went into the #code4lib channel logs to see what people were saying while he was taking. That took guts. Great post.]

Anders Soderback: Why Libararies Should Embrace Linked Data

(I talked to Anders for a while after this. Very nice fellow, and Libris is a great piece of work.)

Libris, the Swedish national union catalogue.

Less technical version of the talk he gave the day before. Martin Malmsten was supposed to be here but couldn’t make it. He talked about the web, that it’s social, that it’s a network, etc. He spent a while getting to the actual catalogue. When he did, he showed the basics of it: the RDF version, the frbr-related, different representations, etc.

Their blog: http://blog.libris.kb.se/semweb/

Ross Singer, Like a Can Opener Through Your Data Silo: Simple Access Throgh AtomPub and Jangle

http://jangle.org/

Problems with library systems APIs: not very good. OAI-PMH etc. Atom + REST = AtomPub, an Atom publishing model. Ed Summers pointed out a couple of years ago how well this would work. Google, Microsoft, IBM, WordPress, Drupal, Movable Type, etc., all use it.

Jangle applies a common data model to library information through AtomPub. Four kinds of entities: resources, items, actors (users etc.), collections.

Jangle vocabulary: http://jangle.org/vocab/

Ross is building connectors to various systems so that it doesn’t matter what kind of ILS you’re using (ideally) you can always talk to it with Jangle.

Godmar Back complimented the idea and the importance of interoperability.

Glen Newton, LuSQL: (Quickly and Easily) Getting Your Data from your DBMS into Lucene

Glen’s from CISTI in the Digital Library Research Group. They have 8.5 million full-text articles and want to index them all to make them easy to search. They use Lucene but not Solr, they use LuSql. He showed how to do some queries with LuSQL, talked about how the indexing was done by Lucene, showed some output.

Terence Ingram, RESTafarian-ism at the National Library of Australia

My attention was caught when he said they used VuFind and asked how many people here used. Laughs when Andrew Nagy stuck up his hand. Ingram said they just needed something so had a quick look around and chose VuFind. Very casual. All those people who look to the NLA as a guide in their own choice of VuFind will be surprised!

Birkin James Diana, The Dashboard Initiative

http://library.brown.edu/dashboard/info/

Showed some widgets that make a dashboard for showing all of the activity going on at a library, with widgets for circ, ILL activity, other kinds of things. Has some nice visualizations in there using some kind of Google visualizer. People in channel also mentioned Flot. The code is available for download. Open source!

Lunch

Ed Summers and Michael Giarlo, Open Up Your Repository with a SWORD

http://swordapp.org/

“SWORD is a lightweight protocol for depositing content from one location to another. It stands for Simple Web-service Offering Repository Deposit and is a profile of the Atom Publishing Protocol (known as APP or ATOMPUB).” Lets you deposit into repositories without worrying exactly what kind of repository it is: Fedora, DSpace, whatever. Ed explained it and Mike showed some examples. Looks easy to use and very useful if you need to do what it does.

Mark Matienzo, How I Failed to Present on Using DVCS for Archival Metadata

He proposed to talk about how DVCS would work for archival metadata but found it was too complicated. Slide: “I failed, epically.” He looked at bzr and git but then went with Mercurial. Problem: It works on line diffs (for code) but EAD is in XML. How do you diff the XML? There are various tools, but in the end the whole thing turned into a complicated mess.

Godmar Back, LibX 2.0

LibX

Summarized LibX 1.0. But a toolbar is great, though what about emerging technology trends (mashups, SOA) and educational trends (online tutorials, social tagging, visualizations)?

In 2.0, librarians can create Libapps (?) and users can add them into pages where they want. LibX runs/houses them. Libapps are made from reusable modules.

How to make a libapp? Modules is JS plus metadata description, a Libapp is a group of modules, a Package is a folder of Libapps.

Modules: Named at a URL, published with AtomPub. They use tuples in JSON. { isbn: “074322670” }

Example: user goes to ACM site. There’s a LibX button. Click on it and a YouTube video shows up, with person giving some help! E.g. Annette Bailey.

He showed the code to implement that, just a bit of JS. Simple. Some other bits of code needed, but all pretty easy.

LibX 2.0: been rewritten in full browser-independent OO. about:libx gives built-in documentation. They have unit tests and it’s all hot updatable (?).

Roles in the LibX 2.0 world: developers, adapters, user community.

LibX community repository will be built, to hold modules, packages, etc. Coming over the next two years.

Gradual transition from LibX 1.5, not a huge major release.

Kevin Clarke and John Fereira, Djatoka for Djummies

Kevin gave an overview of Djatoka, then John did some demos.

Could we use this for the big maps that in Maps? Instead of Zoomify? Lots of overhead, though—runs on Java, requires Tomcat, etc. But worth a look.

Breakout sessions

LibX 2.0, Zope3/Grok/Plone, Fedora, Solr, Jangle. I went upstairs and napped.

Lightning Talks

David Lindahl, XC

Quick overview of XC, but he lost me by showing a couple of short cartoons while was talking, which distracted me.

Casey Bisson, Scriblio

The internal data model is improved and now you can do original cataloguing in it. It’s a digital library out of the box now, too, and people are using it for that. Some kind of tie-in to LibraryThing’s Common Knowledge, too.

Mark Matienzo, enjoysthin.gs

Showed this new social bookmarking site.

Emily Lynema, E-Matrix

ERM she’s working on at NCSU. FRBRy data model. E-resources can be “narrowly related” or “core” to a subject/discipline. Interesting! Gradations of relevancy to a program or subject. List them on subject guides.

Eric Lease Morgan, Alex4

http://infomotions.com/sandbox/alex4/

Goal: facilitate a person’s ongoing liberal arts education.

Geoffrey Bilder, “Cool URIs Must Die.”

He’s from CrossRef. They’re fighting linkrot. “Persistence isn’t a technical issue, it’s a social issue.”

John Law, Summon

http://www.serialssolutions.com/summon/. Searches everything a library has: physical, virtual, etc. One search box.

Erik Hatcher, LucidFind

http://www.lucidimagination.com/search/

They’re searching everything on lucene.apache.org: mailing lists, bugs, etc.

Mike Taylor (Index Data), Making Distributed Configuration Simple with the Torus

Metasearch engine: http://indexdata.com/pazpar2/

They’ve done a way of simply specifying exactly where you want to search and what fields you want to show in the results. Translucent Record Store = Torus

Jakub Skoczen, also from Index Data, on how Torus is implemented

Michael Klein and Jonathan Brinley on zoia’s FOAF support

Fun!

Andy Ashton, Biblio

He’s at the Scholarly Technology Group at Brown. One of many projects called Biblio. Biblography project.

Naomi Dushay, VuFind at Stanford

http://searchworks.stanford.edu/

Related results. Showed FROGS OF AUSTRALIA page. OK. LITTLE BEAR, good. ASSAULT WITH A DEADLY DONUT: bad!

searchworks-test.stanford.edu: They did a shelf-browse across all of the libraries, including storage. They show it in a right nav, which is a good idea. Should use this.

Mike Beccaria, Zoom Zoom Zoom

Paul Smith’s College: paulsmiths.edu They’ve scanned in old yearbooks and put them in ContentDM. Didn’t like it.

Showed Microsoft’s Deep Zoom. Looked cool. Worth checking—for Maps? He put a whole yearbook into one image and then put it into the system, so you can zoom from a full overview right into something very small.

Photosynth: took pictures of the stacks and showed how you can move around them and zoom in on spines. Could link to the catalogue?

Dan Chudnov, BagIt

http://digitalpreservation.gov/ uses BagIt.

Wednesday 25 February 2009

Sebastian Hammer, Index Data

Talked about Index Data and their history. Got into books and libraries in general. Whithr libraries, with Google? They’re digitizing things better than we are. Libraries could be swallowed by whipper-snapper technology.

Why (Local) Libraries: bearers and presevers of cultural heritage; conveyers of authoritative information; supporters of learning and research; pillars of democracy

“Even if we end up dying, we can’t go quietly.”

Libraries need to stay local but also come together: consortia, a group of libraries working together on web sites, turn into a “super-robot.”

Quoted Lorcan Dempsey on “stitching costs.” Most so-called APIs are loyalty schemes rather than interoperability devices. Standardization is hard/boring but essential for collaboration. Need more collaboration and working together, while keeping our business models. Systems and organizations need to surrender our data freely. Library hackers must become adocates within their orgs.

“MARC is a joke. Z39.50 is for old people. SRU is dumb. NCIP … forget about it.”

Tim McGreary, OLE Project: A New Frontier

OLE is “working to redefine the business processes for libraries.”

They want to be “flexible, adaptable, community-developed,” they want “improvement beyond the ILS.” Format and resource agnostic. No need for a separate ERM. This is not just an ILS replacement: special collections, video, DRM, everything. They want a service-orientd architecture.

Community-based software development and governance. Raise the Library system to the enterprise level. Complement human interaction.

He reviewed the project timeline. Ah, project timelines.

They’re using the National Library of Australia Services Framework. (What is this?)

In March: OLE core services will be defined. “Final design document” to be published in July, and then another group will do stuff.

Bess Sadler, Blacklight

http://blacklightopac.org/, http://blacklightrubyforge.org/

Give up on the idea that we can do a single interface that will work well for everyone. Different kinds of students have different needs.

Top question at music ref desk: I play violin, my boyfriend plays piano. What can we play together? Their old system wasn’t indexing this information. Not everyone would want to search for that, but music students would. Should let them do it. Another q: can I find things by era?

They have a music portal: new books on home page are all music books.

Special behaviour for special objects: any musical recording, they go to Musicbrainz, get the unique ID, then go to other services to get metadata that key on that ID.

“College students are broke and they like music. If we are spending lots of money on music and not letting them know about it, that’s a fail.”

Blacklight runs on Rails. Uses Rails Engines so it’s easier to keep up with the code base but make local changes.

TODO: marc4j is now much better … look it up.

TODO: Install Blacklight and try it out. Look at Rails code.

Joshua Ferraro, biblios.net

He’s from LibLime. Explained about their business. Open Library, Open Data Commons License.

Look at: extJS

“biblios.net is LibLime’s free browser-based cataloguing service.” 35 million freely-licensed records.

TODO: Make OpenFRBR query biblios.net (requires biblios.net account)

Look at: bws.biblios.net: APIs that let you interact with the database

He downloaded a MARCXML record from Blacklight and loaded it up into Biblios, using curl at the command line.

They’re working with ONIX records, too, but keeping them separate from MARC because they’re made with different rules.

Authority files are available. Useful?

This is a great project. York should be uploading everything there.

Chris Catalfa, Adding Functionality to Biblios with CouchDB

UI: extjs.com + jquery.org

1. Implementing an editor for Dublin Core. He showed a bit of XML that constructs an editor. (Faint code, couldn’t read.)

2. Then they use CouchDB on the backend.

He was showing a lot of code snippets and a lot of unreadable XML.

Toke Eskildsen, Complete Faceting

Summa

Lunch

Erik Hatcher, The Rising Sun: Making the Most of Solr Power

10.
9. Performance
8. Memory
7. Query parser: default is to show the raw query parser
6. Data import: bring data into Solr with Data Import Handler, Solr Cell, CSV, LuSQL, APIs.
5. Request handlers: leverage Solr’s configurability
4. Solr and IR toolkit
3. LocalSolr
2. Faceting: can do multi-select faceting
1. The interface is the app. (Solritas, Velocity)
0. Community

Lucid Imagination: they do support and training for Solr.

TODO: Look for screencast Erik Hatcher did showing off Solr and how to get it working.

Chris Shoemaker, FreeCite: An Open Source Free-text Citation Parser

http://freecite.library.brown.edu/

API URL: POST to http://freecite.library.brown.edu/citations/create

JSON responses implemented today. Give FreeCite an unformatted citation and it will give you back a formatted citation. Nice piece of work … except it gives back almost-OpenURL?

Richard Wallis, Squeezing More from the OPAC

People may like facets and relevance, but they also want links to Amazon and Google Books and other places outside the catalogue.

JUICE: Showed injecting Javascript into pages on the fly, in the browser, so that when looking at something in a catalogue, it added links to WorldCat/Google Books/LibraryThing/Open Library/etc., either as text or as images.

liblists.sussex.ac.uk/lists/v3029.html

Inheriting CSS of the page means things look the same.

Just a matter of pasting in a bit of Javascript, built on a template, looks pretty easy to build.

Sean Hannan, Freebasing for Fun and Profit

Intro to Freebase. They slurp up info from Wikipedia, Musicbrainz, etc., plus people can also edit it.

REST API: http://api.freebase.com/

Example: embedddable snippet of code that links Academy Awards data to entries in own catalogue. Hmm! I could use this.

Example code: http://code4lib.mrdys.user.dev.freebaseapps.com/

Lightning Talks

Lots of good talks here. I was a bit distracted through some of them because I was getting people to sign the release forms and was keeping trak of who had signed what.

Chris Fitzpatrick: thinkbase? Check it out.

Thursday 26 February 2009

Ian Davis, If You Love Something … Set It Free

The Semantic Web has fundamentally changed how people, computers and information interact, not because of the semantics, but because of the web and how easy it now is to connect information.

Conjecture 1: Data outlasts code. Therefore open data is more important than open source.

Conjecture 2: There is more structured data in the orld than unstructured.

Conjecture 3: Most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

“The goal is not to build a web of data. The goal is to enrich lives through access to information.”

“Technology grows exponentially, but society adapts linearly.”

Warning: We should exchange and share information, but we must be careful that there are people involved, and some data should be kept private and protected. But most of it doesn’t need to be.

Ed Corrado, The Open Platform Strategy: What It Means for Library Developers

Marshall Breeding: TOC of open source is roughly equivalent to commercial. But a big ARL found that open source software would costs 30% (?) more than commercial over a few years (cite: Grant?).

Talked about Ex Libris’s Open Platform program.

Adam Soroka, A Modern Open Webservice-Based GIS Infrastructure

Chris Beer and Courtney Michael, Visualizing Media Archives

Cool demo built on the conference FOAF data: http://ratherinsane.com/~chris/c4l09/index2.php

Lightning Talks

Christopher Morgan, http://bookgenius.org/beta

Shows visual structure of subject headings. Try “evolution.” He likes Thomas Mann’s book on library research and emphasis on use of LCSH.

Ross Shanley-Roberts, Extracting Data from III with Expect

Rosalyn Metz

Showed how easy it is to set up a server in Amazon’s EC2. Really only takes a few minutes! Not cheap, though. Great talk!

Heikki Levanto

Another Index Data guy talking about their stuff. This guy talked about regexes and parsing web pages.

Mikkel Erlandsen, Summa

Jonathan Rochkind, Umlaut

Richard Wallis, Open Catalogue Crawling Protocol

Xiaoming Liu, checking links and URIs

TODO: look at Firefox plugins: LinkChecker, LinkEvaluator.

TODO: look at command-line: linkchecker


How I use Emacs for Getting Things Done

21 April 2009 emacs gtd

I use the Getting Things Done system to keep track of what I’m doing. It works very well for me. My personal stuff I keep track of in paper in a Filofax, but I have a lot more detail to track at work at York University, so I use text files. Here’s my system.

The files

I have three files to manage what I’m doing, plus a monthly work diary:

  • next-actions.outline.txt
  • waiting-for.outline.txt
  • projects.outline.txt
  • work-diary-200904.outline.txt, in which I jot down notes about what I did that day and what’s on my mind.

(I’ll explain about Emacs and outline mode below.)

git to manage the files

I use the distributed version control system git to manage these files. First, I set up a basic repository on a Unix host where I do my personal e-mail.

$ mkdir -p york/gtd
$ cd york/gtd
$ git init
Initialized empty Git repository in /home/buff/york/gtd/.git/

Here I edited next-actions.outline.txt. More on its format below. Right now it only matters that the file exists.

$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       next-actions.outline.txt
nothing added to commit but untracked files present (use "git add" to
track)

$ git add next-actions.outline.txt
$ git commit -m 'getting started'
[master (root-commit)]: created e98f811: "getting started"
 1 files changed, 4 insertions(+), 0 deletions(-)
 create mode 100644 next-actions.outline.txt
$ git log
commit e98f811b6b67ffd354ff33ef5df3da872a8e7059
Author: William Denton <wtd@pobox.com>
Date:   Tue Apr 21 21:12:44 2009 -0400

    getting started

Now I have a git repository with one file sitting on a Unix server I can get to from anywhere: work, home, anywhere with an Internet connection. I make a copy of it on my home machine:

$ cd york
$ git clone picketfence:york/gtd/
Initialized empty Git repository in /home/buff/york/gtd/.git/
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), 268 bytes, done.

I already had a york directory where I kept stuff. I was able to specify the server and file path with just picketfence:york/gtd because I already have ssh set up to save me time in ~/.ssh/config:

Host picketfence
Hostname picketfence.server.com
User buff

This lets me just say ssh picketfence and I connect. I’ve got my keys set up so no password is required, either.

Back to cloning a local copy of the repository.

$ cd gtd
$ ls -l
total 2
-rw-r--r--  1 buff  wheel  44 21 Apr 21:13 next-actions.outline.txt

Here I can edit next-actions.outline.txt again and add a task. When I’m done, I do this:

$ git commit -m 'update' next-actions.outline.txt
[master]: created 15f2969: "update"
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
nothing to commit (working directory clean)
$ git push
Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 326 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
warning: updating the currently checked out branch; this may cause
confusion,
as the index and working tree do not reflect changes that are now in HEAD.
To picketfence:york/gtd
   e98f811..15f2969  master -> master

The next day at work I did the same thing and made a local copy of the repository there. I made my lists and so on, and at the end of the day I committed all of the files. At home in the evening, I ran git pull and it downloaded all of the changes. I could edit them, do a git push, and then the next day do another git pull first thing at work.

Now I have an easy way of keeping my GTD files in synch across various machines. They’re not in the cloud so I can work on them without Internet access.

Emacs and outline mode

I’m a Unix-loving geek, so of course I keep my text in text files. To manage the GTD files I settled on outline mode, which is built into Emacs.

That page explains what an outline mode is. Here’s an example of mine:

* E-mail
** Catherine: accurate collection stats for Wikipedia entry
** LCC: will be away for next meeting
** Peter R: is Joomla in use anywhere at York?
If so, could I get a test account to see what it's like?

To prevent having to run M-x outline-mode every time I open one of these files, I added this to .emacs:

;; outline-mode
(add-to-list 'auto-mode-alist '("\\.outline\\.txt\\'" . outline-mode))
(add-hook 'outline-mode-hook 'hide-body)

Now when I open next-actions.outline.txt Emacs automatically goes into outline mode and hides the bodies of all the entries so I just see the tasks.

It works for me

This system works really well for me. When I’m jotting down notes on what I did that day, if I remember I need to do something (e-mail someone, read something, fix something, whatever) I can switch buffers to the next actions list and put it down there. If I have a new project on the go, and I copy notes to the projects lists. If I have a next action of e-mailing someone, when I’ve e-mailed them I just copy the line to the waiting for list and add the date I sent the e-mail.

Plain text, Emacs to edit it, outline mode to give it some structure, and git so I always have a current copy of the files I need. I’m really happy with this.