Skip to content

Instantly share code, notes, and snippets.

@sandsfish
Created February 12, 2013 18:54
Show Gist options
  • Save sandsfish/4772275 to your computer and use it in GitHub Desktop.
Save sandsfish/4772275 to your computer and use it in GitHub Desktop.
R function to parse out list of specific field/sub-field from MARC/XML
library(XML)
getMARCField = function(marc_doc, tag, code) {
xpath = paste("/m:collection/m:record/m:datafield[@tag='", tag, "']/m:subfield[@code='", code, "']", sep="")
return(xpathApply(marc_doc, xpath, namespaces=c("m"), xmlValue))
}
vs = xmlRoot(xmlParse('vail-first-3500.xml'))
field100a = getMARCField(vs, '100', 'a')
@sandsfish
Copy link
Author

After fighting with R's XPath support for a bit, I figured out how to assign the default namespace, which it doesn't do automagically...

namespaces=c("m")

Also, using xmlParse, as it turns out, is key to using XPath, since it keeps nodes represented in C-level objects, which allows XPath to work. xmlTreeParse (unless useInternalNodes = FALSE is specified) does not allow this, and is also much slower.

vs = xmlRoot(xmlParse('vail-first-3500.xml'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment