Miskatonic University Press

MARC magic for file


I was chuffed to see Kevin Ford report that the Unix utility file now recognizes MARC records:

$ file 101015_001.mp3
101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, Stereo
$ file my-cats.jpg
my-cats.jpg: JPEG image data, JFIF standard 1.02
$ file OL.20100104.01.mrc
OL.20100104.01: MARC21 Bibliographic

If you download the source and look at the magic/Magdir/marc21 file you’ll see what makes it work. Every file type has some “magic” that lets you identify it:

# marc21: file(1) magic for MARC 21 Format
# Kevin Ford (kefo@loc.gov)
# MARC21 formats are for the representation and communication
# of bibliographic and related information in machine-readable
# form.  For more info, see http://www.loc.gov/marc/

# leader position 20-21 must be 45
20      string  45

# leader starts with 5 digits, followed by codes specific to MARC format
>0      regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z]  MARC21 Bibliographic
!:mime  application/marc
>0      regex/1 (^[0-9]{5})[acdnosx][z] MARC21 Authority
!:mime  application/marc
>0      regex/1 (^[0-9]{5})[cdn][uvxy]  MARC21 Holdings
!:mime  application/marc
0       regex/1 (^[0-9]{5})[acdn][w]    MARC21 Classification
!:mime  application/marc
>0      regex/1 (^[0-9]{5})[cdn][q]     MARC21 Community
!:mime  application/marc

# leader position 22-23, should be "00" but is it?
>0      regex/1 (^.{21})([^0]{2})       (non-conforming)
!:mime  application/marc

A small victory now that a basic Unix/Linux utility can recognize a key library file format, but as Kyle Bannerjee put it on the Code4Lib mailing list, “I’m not sure whether to laugh or cry that it’s a sign of progress that a 40 year old utility designed to identify file types is now just beginning to be able to recognize a format that’s been around for almost 50 years.”