DeDuplication of EBU-TT-Live documents¶
When documents are ReSequenced, duplication of style and region elements
and attributes can occur. To address this, the DeDuplicator node processes the
document(s) with the
ebu_tt_live.node.deduplicator.DeDuplicatorNode.remove_duplication() function.
After copying styling and layout into a list() and setting them up for new
style and region elements, respectively, the list() is passed through the
ebu_tt_live.node.deduplicator.DeDuplicatorNode.CollateUniqueVals() function.
Because style and region elements can have style attributes, these
are deduplicated first. At this stage, it’s possible that where two identical elements
that differed only in their style references, these may end up looking the same.
Each element is then passed through the
ebu_tt_live.node.deduplicator.ComparableElement class, which processes
each attribute, omitting the xml:id and using the
ebu_tt_live.node.deduplicator.ReplaceNone() function to replace empty
attributes with a non-legal character to avoid collisions, and assigns a hash
to each element. The hash is then stored in old_id_dict as a key-value pair,
where the xml:id is the key and the hash is the value. The hash is also stored
in hash_dict where the hash is the key, and the value is the contents of the element.
The list() is then passed through the ebu_tt_live.node.deduplicator.DeDuplicatorNode.AppendNewElements()
function, which takes in the list of elements, the path to the parent element
(styling or layout), old_id_dict, new_id_dict and hash_dict
The function iterates through the key-value pairs of hash_dict and the contents
of the list of elements; where the xml:id of both matches, the element is appended to
the parent element. The hash and xml:id is then stored in new_id_dict,
where the hash is the key and the xml:id is the value.
In the final step, before emitting the document, the document as it is at this stage is passed
through the ebu_tt_live.node.deduplicator.ReplaceStylesAndRegions class.
This utilises RecursiveOperation i.e. this class is used to recursively fix
the style and region attribute values, by addressing where a style or region
attribute has been declared, and replaces the reference to the old xml:id with
the new one stored in new_id_dict, which is done by taking the reference to
the old xml:id, matching it to the key in old_id_dict to find the hash
value, then matching the hash to the key in new_id_dict to get the
new xml:id reference. While doing this, it also deduplicates instances where
multiple style attributes have been referenced, removing duplicates while
maintaining the hierarchy in which they were declared.