metsrw API Documentation

METSDocument

class metsrw.METSDocument[source]

Bases: object

all_files()[source]

Return a set of all FSEntrys in this METS document.

Returns:

Set containing all FSEntry in this METS document, including descendants of ones explicitly added.

append(fs_entry)

Adds an FSEntry object to this METS document’s tree. Any of the represented object’s children will also be added to the document.

A given FSEntry object can only be included in a document once, and any attempt to add an object the second time will be ignored.

Parameters:

fs_entry (metsrw.mets.FSEntry) – FSEntry to add to the METS document

append_file(fs_entry)[source]

Adds an FSEntry object to this METS document’s tree. Any of the represented object’s children will also be added to the document.

A given FSEntry object can only be included in a document once, and any attempt to add an object the second time will be ignored.

Parameters:

fs_entry (metsrw.mets.FSEntry) – FSEntry to add to the METS document

classmethod fromfile(path)[source]

Creates a METS by parsing a file.

Parameters:

path (str) – Path to a METS document.

classmethod fromstring(string)[source]

Create a METS by parsing a string.

Parameters:

string (str) – String containing a METS document.

classmethod fromtree(tree)[source]

Create a METS from an ElementTree or Element.

Parameters:

tree (ElementTree) – ElementTree to build a METS document from.

get_file(**kwargs)[source]

Return the FSEntry that matches parameters.

Parameters:
  • file_uuid (str) – UUID of the target FSEntry.

  • label (str) – structMap LABEL of the target FSEntry.

  • type (str) – structMap TYPE of the target FSEntry.

Returns:

FSEntry that matches parameters, or None.

get_subsections_counts()[source]

Return a dictionary with the count of the following subsections: dmdSec, amdSec, techMD, rightsMD, digiprovMD and sourceMDs.

Returns:

Dict with subsections counts.

classmethod read(source)[source]

Read source into a METSDocument instance. This is an instance constructor. The source may be a path to a METS file, a file-like object, or a string of XML.

remove(fs_entry)

Removes an FSEntry object from this METS document.

Any children of this FSEntry will also be removed. This will be removed as a child of it’s parent, if any.

Parameters:

fs_entry (metsrw.mets.FSEntry) – FSEntry to remove from the METS

remove_entry(fs_entry)[source]

Removes an FSEntry object from this METS document.

Any children of this FSEntry will also be removed. This will be removed as a child of it’s parent, if any.

Parameters:

fs_entry (metsrw.mets.FSEntry) – FSEntry to remove from the METS

serialize(fully_qualified=True, normative_structmap=True)[source]

Returns this document serialized to an xml Element.

Returns:

Element for this document

tostring(fully_qualified=True, pretty_print=True, encoding='UTF-8')[source]

Serialize and return a string of this METS document.

To write to file, see write().

The default encoding is UTF-8. This method will return a unicode string when encoding is set to unicode.

Returns:

String of this document

write(filepath, fully_qualified=True, pretty_print=False, encoding='UTF-8')[source]

Serialize and write this METS document to filepath.

The default encoding is UTF-8. This method will return a unicode string when encoding is set to unicode.

Parameters:

filepath (str) – Path to write the METS document to

FSEntry

class metsrw.FSEntry(path=None, fileid=None, label=None, use='original', type='Item', children=None, file_uuid=None, derived_from=None, checksum=None, checksumtype=None, transform_files=None, mets_div_type=None)[source]

Bases: DependencyPossessor

A class representing a filesystem entry - either a file or a directory.

When passed to a metsrw.mets.METSDocument instance, the tree of FSEntry objects will be used to construct the <fileSec> and <structMap> elements of a METS document.

Unless otherwise specified, an FSEntry object is assumed to be a file; pass the type value as ‘Directory’ to specify that the object is instead a directory.

An FSEntry object must be instantiated with a path as the first argument to the constructor, which represents its path on disk.

An FSEntry object which is a Directory may have one or more children, representing files or directories contained within itself. Directory trees are designed for top-to-bottom traversal. Files cannot have children, and attempting to instantiate a file FSEntry object with children will raise a ValueError.

Any FSEntry object may have one or more metadata entries associated with it; these can take the form of either references to other XML files on disk, which should be wrapped in MDRef objects, or wrapped copies of those XML files, which should be wrapped in MDWrap objects.

Parameters:
  • path (str) – Path to the file on disk, as a bytestring. This will populate FLocat @xlink:href

  • label (str) – Label in the structMap. If not provided, will be populated with the basename of path

  • fileid (str) – Provides a mechanism to assign a FILEID to an FSENTRY when a pointer file is being created, so when a METS file is being written for an package-file-type, i.e. an AIP. The FILE ID is an XML NC (Non-colonized) name and so callers must understand the restricted character-set of that type to use it properly. There is currently no validation on this attribute on generation.

  • use (str) – Use for the fileGrp. Items with identical uses will be grouped together.

  • type (str) – Type of FSEntry this is. This will appear in the structMap.

  • children (list) – List of metsrw.fsentry.FSEntry that are direct children of this element in the structMap. Only allowed if type is ‘Directory’

  • file_uuid (str) – UUID of this entry. Will be used to construct the FILEID used in the fileSec and structMap, and GROUPID. Only required if type is ‘Item’.

  • derived_from (metsrw.fsentry.FSEntry) – FSEntry that this FSEntry is derived_from. This is used to set the GROUPID in the fileSec.

  • checksum (str) – Value of the file’s checksum. Required if checksumtype passed.

  • checksumtype (str) – Type of the checksum. Must be one of FSEntry.ALLOWED_CHECKSUMS. Required if checksum passed.

  • transform_files (list) – a list of dicts representing METS transform file elements, which provide “a means to access any subsidiary files listed below a <file> element by indicating the steps required to ‘unpack’ or transform the subsidiary files.”

Raises:
  • ValueError – if children passed when type is not ‘Directory’

  • ValueError – if only one of checksum or checksumtype passed

  • ValueError – if checksumtype is not in FSEntry.ALLOWED_CHECKSUMS

ALLOWED_CHECKSUMS = ('Adler-32', 'CRC32', 'HAVAL', 'MD5', 'MNP', 'SHA-1', 'SHA-256', 'SHA-384', 'SHA-512', 'TIGER WHIRLPOOL')
PREMIS_AGENT = 'PREMIS:AGENT'
PREMIS_EVENT = 'PREMIS:EVENT'
PREMIS_OBJECT = 'PREMIS:OBJECT'
PREMIS_RIGHTS = 'PREMIS:RIGHTS'
add_child(child)[source]

Add a child FSEntry to this FSEntry.

Only FSEntrys with a type of ‘directory’ can have children.

This does not detect cyclic parent/child relationships, but that will cause problems.

Parameters:

child (metsrw.fsentry.FSEntry) – FSEntry to add as a child

Returns:

The newly added child

Raises:
  • ValueError – If this FSEntry cannot have children.

  • ValueError – If the child and the parent are the same

add_digiprovmd(md, mdtype, mode='mdwrap', **kwargs)[source]
add_dmdsec(md, mdtype, mode='mdwrap', **kwargs)[source]

Add dmdsec.

Extension of _add_metadata_element that adds a dmdSec and updates the previous dmdSecs with the same MDTYPE and OTHERMDTYPE attribute values, marking them as “superseded” and using the same group_id for all of them.

add_dublin_core(md, mode='mdwrap', **kwargs)[source]
add_premis_agent(md, mode='mdwrap')[source]
add_premis_event(md, mode='mdwrap')[source]
add_premis_object(md, mode='mdwrap')[source]
add_premis_rights(md, mode='mdwrap')[source]
add_rightsmd(md, mdtype, mode='mdwrap', **kwargs)[source]
add_techmd(md, mdtype, mode='mdwrap', **kwargs)[source]
property admids

Returns a list of ADMIDs for this entry.

property children
delete_dmdsec(mdtype, othermdtype='')[source]

Mark latest dmdsec of mdtype_othermdtype as deleted.

It doesn’t delete the dmdsec from the METS. It only sets its status attribute to “deleted”.

classmethod dir(label, children)[source]

Return FSEntry directory object.

property dmdids

Returns a list of DMDIDs for this entry.

file_id()[source]

Returns the fptr @FILEID if this is not a Directory.

classmethod from_fptr(label, type_, fptr)[source]

Return FSEntry object.

get_path()[source]

Return the relative path to this FSEntry.

If the path is not set, it’s generated from the ancestor labels. Raises an AttributeError if the path cannot be generated. Returns None for the top level FSEntry.

get_premis_agents()[source]
get_premis_event(event_uuid)[source]
get_premis_events()[source]
get_premis_objects()[source]
get_premis_rights()[source]
get_premis_rights_statement(rights_statement_uuid)[source]
get_subsections_of_type(mdtype, md_class)[source]
group_id()[source]

Returns the @GROUPID.

If derived_from is set, returns that group_id.

has_dmdsec(mdtype, othermdtype='')[source]

Check if a dmdsec of mdtype_othermdtype exists for this entry.

property is_aip
property is_empty_dir

Returns True if this fs item is a directory with no children or a directory with only other empty directories as children.

premis_agent_class

alias of PREMISAgent

premis_event_class

alias of PREMISEvent

premis_object_class

alias of PREMISObject

premis_rights_class

alias of PREMISRights

remove_child(child)[source]

Remove a child from this FSEntry

If child is not actually a child of this entry, nothing happens.

Parameters:

child – Child to remove

serialize_filesec()[source]

Return the file Element for this file, appropriate for use in a fileSec.

If this is not an Item or has no use, return None.

Returns:

fileSec element for this FSEntry

serialize_md_inst(md_inst, md_class)[source]

Serialize object md_inst by transforming it into an lxml.etree._ElementTree. If it already is such, return it. If not, make sure it is the correct type and return the output of calling seriaize() on it.

serialize_structmap(recurse=True, normative=False)[source]

Return the div Element for this file, appropriate for use in a structMap.

If this FSEntry represents a directory, its children will be recursively appended to itself. If this FSEntry represents a file, it will contain a <fptr> element.

Parameters:
  • recurse (bool) – If true, serialize and apppend all children. Otherwise, only serialize this element but not any children.

  • normative (bool) – If true, we are creating a “Normative Directory Structure” logical structmap, in which case we add div elements for empty directories and do not add fptr elements for files.

Returns:

structMap element for this FSEntry

Metadata classes

Classes for metadata sections of the METS. Include amdSec, dmdSec, techMD, rightsMD, sourceMD, digiprovMD, mdRef and mdWrap.

class metsrw.metadata.AMDSec(section_id=None, subsections=None, tree=None)[source]

Bases: object

An object representing a section of administrative metadata in a document.

This is ordinarily created by metsrw.mets.METSDocument instances and does not have to be instantiated directly.

Parameters:
  • section_id (str) – ID of the section. If not provided, will be generated from ‘amdSec’ and a random number.

  • subsections (list) – List of metsrw.metadata.SubSection that are part of this amdSec

  • tree (Element) – An lxml.Element that is an externally generated amdSec. This will overwrite any automatic serialization. If passed, section_id must also be passed.

classmethod get_current_id_count()[source]

Returns the current count of AMDSec objects, for id generation purposes.

classmethod parse(root)[source]

Create a new AMDSec by parsing root.

Parameters:

root – Element or ElementTree to be parsed into an object.

serialize(now=None)[source]

Serialize this amdSec and all children to lxml Element and return it.

Parameters:

now (str) – Default value for CREATED in children if none set

Returns:

amdSec Element with all children

tag = 'amdSec'
class metsrw.metadata.Agent(role, **kwargs)[source]

Bases: object

An object representing an agent with a relationship to the METS record.

This is ordinarily created by metsrw.mets.METSDocument instances and does not have to be instantiated directly.

Parameters:
  • role (str) – Agent role, e.g. ‘CREATOR’.

  • id (str) – Optional unique identifer for an agent.

  • type (str) – Optional agent type, e.g. ‘ORGANIZATION’.

  • name (str) – Optional agent name, e.g. ‘9461beb-22eb-4942-88af-848cfc3462b2’.

  • notes (List[str]) – Optional agent notes, e.g. ‘Archivematica dashboard UUID’.

AGENT_TAG = <lxml.etree.QName object>
NAME_TAG = <lxml.etree.QName object>
NOTE_TAG = <lxml.etree.QName object>
ROLES = ('CREATOR', 'EDITOR', 'ARCHIVIST', 'PRESERVATION', 'DISSEMINATOR', 'CUSTODIAN', 'IPOWNER')
TYPES = ('INDIVIDUAL', 'ORGANIZATION')
classmethod parse(element)[source]

Create a new Agent by parsing root.

Parameters:

element – Element to be parsed into an Agent.

Raises:

exceptions.ParseError – If element is not a valid agent.

serialize()[source]
class metsrw.metadata.AltRecordID(alt_record_id, **kwargs)[source]

Bases: object

An object representing an alternative record identifier in the METS document (alternatives to the OBJID).

This is ordinarily created by metsrw.mets.METSDocument instances and does not have to be instantiated directly.

Parameters:
  • id (str) – Optional unique identifer for the identifier.

  • type (str) – Optional identifer type, e.g. ‘Accession number’.

ALT_RECORD_ID_TAG = <lxml.etree.QName object>
classmethod parse(element)[source]

Create a new AltRecordID by parsing root.

Parameters:

element – Element to be parsed into an AltRecordID.

Raises:

exceptions.ParseError – If element is not a valid altRecordID.

serialize()[source]
class metsrw.metadata.IdGenerator(prefix)[source]

Bases: object

Helper class to generate unique, sequential ids.

clear()[source]
register_id(id_string)[source]

Register a manually assigned id as used, to avoid collisions.

class metsrw.metadata.MDRef(target, mdtype, loctype, label=None, otherloctype=None, xptr=None, othermdtype=None)[source]

Bases: object

An object representing an external XML document, typically associated with an metsrw.fsentry.FSEntry object.

Parameters:
  • target (str) – Path to the external document. MDRef does not validate the existence of this target.

  • mdtype (str) – The string representing the mdtype of XML document being enclosed. Examples include “PREMIS:OBJECT” and “PREMIS:EVENT”.

  • label (str) – Optional LABEL for the mdRef element

  • loctype (str) – LOCTYPE of the mdRef. Must be one of ‘ARK’, ‘URN’, ‘URL’, ‘PURL’, ‘HANDLE’, ‘DOI’ or ‘OTHER’.

  • otherloctype (str) – OTHERLOCTYPE of the mdRef. Should be provided if loctype is OTHER.

VALID_LOCTYPE = ('ARK', 'URN', 'URL', 'PURL', 'HANDLE', 'DOI', 'OTHER')
classmethod parse(root)[source]

Create a new MDWrap by parsing root.

Parameters:

root – Element or ElementTree to be parsed into a MDWrap.

serialize()[source]
class metsrw.metadata.MDWrap(document, mdtype, othermdtype=None)[source]

Bases: object

An object representing an XML document enclosed in a METS document. The entirety of the XML document will be included; to reference an external document, use the MDRef class.

Parameters:
  • document (str) – A string copy of the document, and will be parsed into an ElementTree at the time of instantiation.

  • mdtype (str) – The MDTYPE of XML document being enclosed. Examples include “PREMIS:OBJECT”, “PREMIS:EVENT,”, “DC” and “OTHER”.

  • othermdtype (str) – The OTHERMDTYPE of the XML document. Should be set if mdtype is “OTHER”.

classmethod parse(root)[source]

Create a new MDWrap by parsing root.

Parameters:

root – Element or ElementTree to be parsed into a MDWrap.

Raises:
serialize()[source]
class metsrw.metadata.SubSection(subsection, contents, section_id=None)[source]

Bases: object

An object representing a metadata subsection in a document.

This is usually created automatically and does not have to be instantiated directly.

Parameters:
  • subsection (str) – Tag name for the subsection to be created. Should be one of ‘techMD’, ‘rightsMD’, ‘sourceMD’ or ‘digiprovMD’ if contained in an amdSec, or ‘dmdSec’.

  • contents (MDWrap or MDRef) – The MDWrap or MDRef contained in this subsection.

  • section_id (str) – ID of the section. If not provided, will be generated from subsection tag and a random number.

ALLOWED_SUBSECTIONS = ('techMD', 'rightsMD', 'sourceMD', 'digiprovMD', 'dmdSec')
classmethod get_current_id_count(subsection_type)[source]

Returns the current count of SubSection objects of the type provided, for id generation purposes.

get_status()[source]

Returns the STATUS when serializing.

Calculates based on the subsection type and if it’s replacing anything.

Returns:

None or the STATUS string.

classmethod parse(root)[source]

Create a new SubSection by parsing root.

Parameters:

root – Element or ElementTree to be parsed into an object.

Raises:
replace_with(new_subsection)[source]

Replace this SubSection with new_subsection.

Replacing SubSection must be the same time. That is, you can only replace a dmdSec with another dmdSec, or a rightsMD with a rightsMD etc.

Parameters:

new_subsection (SubSection) – Updated version of this SubSection

serialize(now=None)[source]

Serialize this SubSection and all children to lxml Element and return it.

Parameters:

now (str) – Default value for CREATED if none set

Returns:

dmdSec/techMD/rightsMD/sourceMD/digiprovMD Element with all children

Validation

metsrw.validate.get_schematron(sct_path)[source]

Return an lxml isoschematron.Schematron() instance using the schematron file at sct_path.

metsrw.validate.get_xmlschema(xmlschema, mets_doc)[source]

Return a class::lxml.etree.XMLSchema instance given the path to the XMLSchema (.xsd) file in xmlschema and the class::lxml.etree._ElementTree instance mets_doc representing the METS file being parsed. The complication here is that the METS file to be validated via the .xsd file may reference additional schemata via xsi:schemaLocation attributes. We have to find all of these and import them from within the returned XMLSchema.

For the solution that this is based on, see: http://code.activestate.com/recipes/578503-validate-xml-with-schemalocation/

For other descriptions of the problem, see: - https://groups.google.com/forum/#!topic/archivematica/UBS1ay-g_tE - https://stackoverflow.com/questions/26712645/xml-type-definition-is-absent - https://stackoverflow.com/questions/2979824/in-document-schema-declarations-and-lxml

metsrw.validate.report_string(report)[source]

Return a human-readable string representation of all of the validation errors.

metsrw.validate.schematron_validate(mets_doc, schematron='resources/archivematica_mets_schematron.xml')[source]

Validate a METS file using a schematron schema. Return a boolean indicating validity and a report as an lxml.ElementTree instance.

metsrw.validate.sct_report_string(report)[source]

Return a human-readable string representation of the error report returned by lxml’s schematron validator.

metsrw.validate.validate(mets_doc, xmlschema='resources/mets.xsd', schematron='resources/archivematica_mets_schematron.xml')[source]

Validate a METS file using both an XMLSchema (.xsd) schema and a schematron schema, the latter of which typically places additional constraints on what a METS file can look like.

metsrw.validate.xsd_error_log_string(xsd_error_log)[source]

Return a human-readable string representation of the error log returned by lxml’s XMLSchema validator.

metsrw.validate.xsd_validate(mets_doc, xmlschema='resources/mets.xsd')[source]

Exceptions

Exceptions for metsrw.

All exceptions generated by this library will descend from MetsError.

exception metsrw.exceptions.MetsError[source]

Bases: Exception

Base Exception for this module.

exception metsrw.exceptions.ParseError[source]

Bases: MetsError

Error parsing a METS file.

exception metsrw.exceptions.SerializeError[source]

Bases: MetsError

Error serializing a METS file.