metsrw API Documentation
METSDocument
- class metsrw.METSDocument[source]
Bases:
object
- all_files()[source]
Return a set of all FSEntrys in this METS document.
- Returns:
Set containing all
FSEntry
in this METS document, including descendants of ones explicitly added.
- append(fs_entry)
Adds an FSEntry object to this METS document’s tree. Any of the represented object’s children will also be added to the document.
A given FSEntry object can only be included in a document once, and any attempt to add an object the second time will be ignored.
- Parameters:
fs_entry (metsrw.mets.FSEntry) – FSEntry to add to the METS document
- append_file(fs_entry)[source]
Adds an FSEntry object to this METS document’s tree. Any of the represented object’s children will also be added to the document.
A given FSEntry object can only be included in a document once, and any attempt to add an object the second time will be ignored.
- Parameters:
fs_entry (metsrw.mets.FSEntry) – FSEntry to add to the METS document
- classmethod fromfile(path)[source]
Creates a METS by parsing a file.
- Parameters:
path (str) – Path to a METS document.
- classmethod fromstring(string)[source]
Create a METS by parsing a string.
- Parameters:
string (str) – String containing a METS document.
- classmethod fromtree(tree)[source]
Create a METS from an ElementTree or Element.
- Parameters:
tree (ElementTree) – ElementTree to build a METS document from.
- get_file(**kwargs)[source]
Return the FSEntry that matches parameters.
- Parameters:
file_uuid (str) – UUID of the target FSEntry.
label (str) – structMap LABEL of the target FSEntry.
type (str) – structMap TYPE of the target FSEntry.
- Returns:
FSEntry
that matches parameters, or None.
- get_subsections_counts()[source]
Return a dictionary with the count of the following subsections: dmdSec, amdSec, techMD, rightsMD, digiprovMD and sourceMDs.
- Returns:
Dict with subsections counts.
- classmethod read(source)[source]
Read
source
into aMETSDocument
instance. This is an instance constructor. Thesource
may be a path to a METS file, a file-like object, or a string of XML.
- remove(fs_entry)
Removes an FSEntry object from this METS document.
Any children of this FSEntry will also be removed. This will be removed as a child of it’s parent, if any.
- Parameters:
fs_entry (metsrw.mets.FSEntry) – FSEntry to remove from the METS
- remove_entry(fs_entry)[source]
Removes an FSEntry object from this METS document.
Any children of this FSEntry will also be removed. This will be removed as a child of it’s parent, if any.
- Parameters:
fs_entry (metsrw.mets.FSEntry) – FSEntry to remove from the METS
- serialize(fully_qualified=True, normative_structmap=True)[source]
Returns this document serialized to an xml Element.
- Returns:
Element for this document
- tostring(fully_qualified=True, pretty_print=True, encoding='UTF-8')[source]
Serialize and return a string of this METS document.
To write to file, see
write()
.The default encoding is
UTF-8
. This method will return a unicode string whenencoding
is set tounicode
.- Returns:
String of this document
- write(filepath, fully_qualified=True, pretty_print=False, encoding='UTF-8')[source]
Serialize and write this METS document to filepath.
The default encoding is
UTF-8
. This method will return a unicode string whenencoding
is set tounicode
.- Parameters:
filepath (str) – Path to write the METS document to
FSEntry
- class metsrw.FSEntry(path=None, fileid=None, label=None, use='original', type='Item', children=None, file_uuid=None, derived_from=None, checksum=None, checksumtype=None, transform_files=None, mets_div_type=None)[source]
Bases:
DependencyPossessor
A class representing a filesystem entry - either a file or a directory.
When passed to a
metsrw.mets.METSDocument
instance, the tree of FSEntry objects will be used to construct the <fileSec> and <structMap> elements of a METS document.Unless otherwise specified, an FSEntry object is assumed to be a file; pass the type value as ‘Directory’ to specify that the object is instead a directory.
An FSEntry object must be instantiated with a path as the first argument to the constructor, which represents its path on disk.
An FSEntry object which is a Directory may have one or more children, representing files or directories contained within itself. Directory trees are designed for top-to-bottom traversal. Files cannot have children, and attempting to instantiate a file FSEntry object with children will raise a ValueError.
Any FSEntry object may have one or more metadata entries associated with it; these can take the form of either references to other XML files on disk, which should be wrapped in MDRef objects, or wrapped copies of those XML files, which should be wrapped in MDWrap objects.
- Parameters:
path (str) – Path to the file on disk, as a bytestring. This will populate FLocat @xlink:href
label (str) – Label in the structMap. If not provided, will be populated with the basename of path
fileid (str) – Provides a mechanism to assign a FILEID to an FSENTRY when a pointer file is being created, so when a METS file is being written for an package-file-type, i.e. an AIP. The FILE ID is an XML NC (Non-colonized) name and so callers must understand the restricted character-set of that type to use it properly. There is currently no validation on this attribute on generation.
use (str) – Use for the fileGrp. Items with identical uses will be grouped together.
type (str) – Type of FSEntry this is. This will appear in the structMap.
children (list) – List of
metsrw.fsentry.FSEntry
that are direct children of this element in the structMap. Only allowed if type is ‘Directory’file_uuid (str) – UUID of this entry. Will be used to construct the FILEID used in the fileSec and structMap, and GROUPID. Only required if type is ‘Item’.
derived_from (metsrw.fsentry.FSEntry) – FSEntry that this FSEntry is derived_from. This is used to set the GROUPID in the fileSec.
checksum (str) – Value of the file’s checksum. Required if checksumtype passed.
checksumtype (str) – Type of the checksum. Must be one of
FSEntry.ALLOWED_CHECKSUMS
. Required if checksum passed.transform_files (list) – a list of dicts representing METS transform file elements, which provide “a means to access any subsidiary files listed below a <file> element by indicating the steps required to ‘unpack’ or transform the subsidiary files.”
- Raises:
ValueError – if children passed when type is not ‘Directory’
ValueError – if only one of checksum or checksumtype passed
ValueError – if checksumtype is not in
FSEntry.ALLOWED_CHECKSUMS
- ALLOWED_CHECKSUMS = ('Adler-32', 'CRC32', 'HAVAL', 'MD5', 'MNP', 'SHA-1', 'SHA-256', 'SHA-384', 'SHA-512', 'TIGER WHIRLPOOL')
- PREMIS_AGENT = 'PREMIS:AGENT'
- PREMIS_EVENT = 'PREMIS:EVENT'
- PREMIS_OBJECT = 'PREMIS:OBJECT'
- PREMIS_RIGHTS = 'PREMIS:RIGHTS'
- add_child(child)[source]
Add a child FSEntry to this FSEntry.
Only FSEntrys with a type of ‘directory’ can have children.
This does not detect cyclic parent/child relationships, but that will cause problems.
- Parameters:
child (metsrw.fsentry.FSEntry) – FSEntry to add as a child
- Returns:
The newly added child
- Raises:
ValueError – If this FSEntry cannot have children.
ValueError – If the child and the parent are the same
- add_dmdsec(md, mdtype, mode='mdwrap', **kwargs)[source]
Add dmdsec.
Extension of _add_metadata_element that adds a dmdSec and updates the previous dmdSecs with the same MDTYPE and OTHERMDTYPE attribute values, marking them as “superseded” and using the same group_id for all of them.
- property admids
Returns a list of ADMIDs for this entry.
- property children
- delete_dmdsec(mdtype, othermdtype='')[source]
Mark latest dmdsec of mdtype_othermdtype as deleted.
It doesn’t delete the dmdsec from the METS. It only sets its status attribute to “deleted”.
- property dmdids
Returns a list of DMDIDs for this entry.
- get_path()[source]
Return the relative path to this FSEntry.
If the path is not set, it’s generated from the ancestor labels. Raises an AttributeError if the path cannot be generated. Returns None for the top level FSEntry.
- has_dmdsec(mdtype, othermdtype='')[source]
Check if a dmdsec of mdtype_othermdtype exists for this entry.
- property is_aip
- property is_empty_dir
Returns
True
if this fs item is a directory with no children or a directory with only other empty directories as children.
- premis_agent_class
alias of
PREMISAgent
- premis_event_class
alias of
PREMISEvent
- premis_object_class
alias of
PREMISObject
- premis_rights_class
alias of
PREMISRights
- remove_child(child)[source]
Remove a child from this FSEntry
If child is not actually a child of this entry, nothing happens.
- Parameters:
child – Child to remove
- serialize_filesec()[source]
Return the file Element for this file, appropriate for use in a fileSec.
If this is not an Item or has no use, return None.
- Returns:
fileSec element for this FSEntry
- serialize_md_inst(md_inst, md_class)[source]
Serialize object
md_inst
by transforming it into anlxml.etree._ElementTree
. If it already is such, return it. If not, make sure it is the correct type and return the output of callingseriaize()
on it.
- serialize_structmap(recurse=True, normative=False)[source]
Return the div Element for this file, appropriate for use in a structMap.
If this FSEntry represents a directory, its children will be recursively appended to itself. If this FSEntry represents a file, it will contain a <fptr> element.
- Parameters:
recurse (bool) – If true, serialize and apppend all children. Otherwise, only serialize this element but not any children.
normative (bool) – If true, we are creating a “Normative Directory Structure” logical structmap, in which case we add div elements for empty directories and do not add fptr elements for files.
- Returns:
structMap element for this FSEntry
Metadata classes
Classes for metadata sections of the METS. Include amdSec, dmdSec, techMD, rightsMD, sourceMD, digiprovMD, mdRef and mdWrap.
- class metsrw.metadata.AMDSec(section_id=None, subsections=None, tree=None)[source]
Bases:
object
An object representing a section of administrative metadata in a document.
This is ordinarily created by
metsrw.mets.METSDocument
instances and does not have to be instantiated directly.- Parameters:
section_id (str) – ID of the section. If not provided, will be generated from ‘amdSec’ and a random number.
subsections (list) – List of
metsrw.metadata.SubSection
that are part of this amdSectree (Element) – An lxml.Element that is an externally generated amdSec. This will overwrite any automatic serialization. If passed, section_id must also be passed.
- classmethod get_current_id_count()[source]
Returns the current count of AMDSec objects, for id generation purposes.
- classmethod parse(root)[source]
Create a new AMDSec by parsing root.
- Parameters:
root – Element or ElementTree to be parsed into an object.
- serialize(now=None)[source]
Serialize this amdSec and all children to lxml Element and return it.
- Parameters:
now (str) – Default value for CREATED in children if none set
- Returns:
amdSec Element with all children
- tag = 'amdSec'
- class metsrw.metadata.Agent(role, **kwargs)[source]
Bases:
object
An object representing an agent with a relationship to the METS record.
This is ordinarily created by
metsrw.mets.METSDocument
instances and does not have to be instantiated directly.- Parameters:
role (str) – Agent role, e.g. ‘CREATOR’.
id (str) – Optional unique identifer for an agent.
type (str) – Optional agent type, e.g. ‘ORGANIZATION’.
name (str) – Optional agent name, e.g. ‘9461beb-22eb-4942-88af-848cfc3462b2’.
notes (List[str]) – Optional agent notes, e.g. ‘Archivematica dashboard UUID’.
- AGENT_TAG = <lxml.etree.QName object>
- NAME_TAG = <lxml.etree.QName object>
- NOTE_TAG = <lxml.etree.QName object>
- ROLES = ('CREATOR', 'EDITOR', 'ARCHIVIST', 'PRESERVATION', 'DISSEMINATOR', 'CUSTODIAN', 'IPOWNER')
- TYPES = ('INDIVIDUAL', 'ORGANIZATION')
- classmethod parse(element)[source]
Create a new Agent by parsing root.
- Parameters:
element – Element to be parsed into an Agent.
- Raises:
exceptions.ParseError – If element is not a valid agent.
- class metsrw.metadata.AltRecordID(alt_record_id, **kwargs)[source]
Bases:
object
An object representing an alternative record identifier in the METS document (alternatives to the OBJID).
This is ordinarily created by
metsrw.mets.METSDocument
instances and does not have to be instantiated directly.- Parameters:
id (str) – Optional unique identifer for the identifier.
type (str) – Optional identifer type, e.g. ‘Accession number’.
- ALT_RECORD_ID_TAG = <lxml.etree.QName object>
- classmethod parse(element)[source]
Create a new AltRecordID by parsing root.
- Parameters:
element – Element to be parsed into an AltRecordID.
- Raises:
exceptions.ParseError – If element is not a valid altRecordID.
- class metsrw.metadata.IdGenerator(prefix)[source]
Bases:
object
Helper class to generate unique, sequential ids.
- class metsrw.metadata.MDRef(target, mdtype, loctype, label=None, otherloctype=None, xptr=None, othermdtype=None)[source]
Bases:
object
An object representing an external XML document, typically associated with an
metsrw.fsentry.FSEntry
object.- Parameters:
target (str) – Path to the external document. MDRef does not validate the existence of this target.
mdtype (str) – The string representing the mdtype of XML document being enclosed. Examples include “PREMIS:OBJECT” and “PREMIS:EVENT”.
label (str) – Optional LABEL for the mdRef element
loctype (str) – LOCTYPE of the mdRef. Must be one of ‘ARK’, ‘URN’, ‘URL’, ‘PURL’, ‘HANDLE’, ‘DOI’ or ‘OTHER’.
otherloctype (str) – OTHERLOCTYPE of the mdRef. Should be provided if loctype is OTHER.
- VALID_LOCTYPE = ('ARK', 'URN', 'URL', 'PURL', 'HANDLE', 'DOI', 'OTHER')
- class metsrw.metadata.MDWrap(document, mdtype, othermdtype=None)[source]
Bases:
object
An object representing an XML document enclosed in a METS document. The entirety of the XML document will be included; to reference an external document, use the
MDRef
class.- Parameters:
document (str) – A string copy of the document, and will be parsed into an ElementTree at the time of instantiation.
mdtype (str) – The MDTYPE of XML document being enclosed. Examples include “PREMIS:OBJECT”, “PREMIS:EVENT,”, “DC” and “OTHER”.
othermdtype (str) – The OTHERMDTYPE of the XML document. Should be set if mdtype is “OTHER”.
- classmethod parse(root)[source]
Create a new MDWrap by parsing root.
- Parameters:
root – Element or ElementTree to be parsed into a MDWrap.
- Raises:
exceptions.ParseError – If mdWrap does not contain MDTYPE
exceptions.ParseError – If xmlData contains no children
- class metsrw.metadata.SubSection(subsection, contents, section_id=None)[source]
Bases:
object
An object representing a metadata subsection in a document.
This is usually created automatically and does not have to be instantiated directly.
- Parameters:
subsection (str) – Tag name for the subsection to be created. Should be one of ‘techMD’, ‘rightsMD’, ‘sourceMD’ or ‘digiprovMD’ if contained in an
amdSec
, or ‘dmdSec’.contents (
MDWrap
orMDRef
) – The MDWrap or MDRef contained in this subsection.section_id (str) – ID of the section. If not provided, will be generated from subsection tag and a random number.
- ALLOWED_SUBSECTIONS = ('techMD', 'rightsMD', 'sourceMD', 'digiprovMD', 'dmdSec')
- classmethod get_current_id_count(subsection_type)[source]
Returns the current count of SubSection objects of the type provided, for id generation purposes.
- get_status()[source]
Returns the STATUS when serializing.
Calculates based on the subsection type and if it’s replacing anything.
- Returns:
None or the STATUS string.
- classmethod parse(root)[source]
Create a new SubSection by parsing root.
- Parameters:
root – Element or ElementTree to be parsed into an object.
- Raises:
exceptions.ParseError – If root’s tag is not in
SubSection.ALLOWED_SUBSECTIONS
.exceptions.ParseError – If the first child of root is not mdRef or mdWrap.
- replace_with(new_subsection)[source]
Replace this SubSection with new_subsection.
Replacing SubSection must be the same time. That is, you can only replace a dmdSec with another dmdSec, or a rightsMD with a rightsMD etc.
- Parameters:
new_subsection (
SubSection
) – Updated version of this SubSection
Validation
- metsrw.validate.get_schematron(sct_path)[source]
Return an lxml
isoschematron.Schematron()
instance using the schematron file atsct_path
.
- metsrw.validate.get_xmlschema(xmlschema, mets_doc)[source]
Return a
class::lxml.etree.XMLSchema
instance given the path to the XMLSchema (.xsd) file inxmlschema
and theclass::lxml.etree._ElementTree
instancemets_doc
representing the METS file being parsed. The complication here is that the METS file to be validated via the .xsd file may reference additional schemata viaxsi:schemaLocation
attributes. We have to find all of these and import them from within the returned XMLSchema.For the solution that this is based on, see: http://code.activestate.com/recipes/578503-validate-xml-with-schemalocation/
For other descriptions of the problem, see: - https://groups.google.com/forum/#!topic/archivematica/UBS1ay-g_tE - https://stackoverflow.com/questions/26712645/xml-type-definition-is-absent - https://stackoverflow.com/questions/2979824/in-document-schema-declarations-and-lxml
- metsrw.validate.report_string(report)[source]
Return a human-readable string representation of all of the validation errors.
- metsrw.validate.schematron_validate(mets_doc, schematron='resources/archivematica_mets_schematron.xml')[source]
Validate a METS file using a schematron schema. Return a boolean indicating validity and a report as an
lxml.ElementTree
instance.
- metsrw.validate.sct_report_string(report)[source]
Return a human-readable string representation of the error report returned by lxml’s schematron validator.
- metsrw.validate.validate(mets_doc, xmlschema='resources/mets.xsd', schematron='resources/archivematica_mets_schematron.xml')[source]
Validate a METS file using both an XMLSchema (.xsd) schema and a schematron schema, the latter of which typically places additional constraints on what a METS file can look like.
Exceptions
Exceptions for metsrw.
All exceptions generated by this library will descend from MetsError.