DeDuplication of EBU-TT-Live documents

When documents are ReSequenced, duplication of style and region elements and attributes can occur. To address this, the DeDuplicator node processes the document(s) with the ebu_tt_live.node.deduplicator.DeDuplicatorNode.remove_duplication() function.

After copying styling and layout into a list() and setting them up for new style and region elements, respectively, the list() is passed through the ebu_tt_live.node.deduplicator.DeDuplicatorNode.CollateUniqueVals() function.

Because style and region elements can have style attributes, these are deduplicated first. At this stage, it’s possible that where two identical elements that differed only in their style references, these may end up looking the same. Each element is then passed through the ebu_tt_live.node.deduplicator.ComparableElement class, which processes each attribute, omitting the xml:id and using the ebu_tt_live.node.deduplicator.ReplaceNone() function to replace empty attributes with a non-legal character to avoid collisions, and assigns a hash to each element. The hash is then stored in old_id_dict as a key-value pair, where the xml:id is the key and the hash is the value. The hash is also stored in hash_dict where the hash is the key, and the value is the contents of the element.

The list() is then passed through the ebu_tt_live.node.deduplicator.DeDuplicatorNode.AppendNewElements() function, which takes in the list of elements, the path to the parent element (styling or layout), old_id_dict, new_id_dict and hash_dict

The function iterates through the key-value pairs of hash_dict and the contents of the list of elements; where the xml:id of both matches, the element is appended to the parent element. The hash and xml:id is then stored in new_id_dict, where the hash is the key and the xml:id is the value.

In the final step, before emitting the document, the document as it is at this stage is passed through the ebu_tt_live.node.deduplicator.ReplaceStylesAndRegions class. This utilises RecursiveOperation i.e. this class is used to recursively fix the style and region attribute values, by addressing where a style or region attribute has been declared, and replaces the reference to the old xml:id with the new one stored in new_id_dict, which is done by taking the reference to the old xml:id, matching it to the key in old_id_dict to find the hash value, then matching the hash to the key in new_id_dict to get the new xml:id reference. While doing this, it also deduplicates instances where multiple style attributes have been referenced, removing duplicates while maintaining the hierarchy in which they were declared.