Grouping of Update operation in Delta Report

Grouping of create and remove operations in delta report is fairly straightforward because the two do not lead to complex scenarios. If a data node is added all its child nodes are also added, same applies in case of remove operation where if a parent node is removed all its child nodes are also removed. Similarly, operations on parent-child nodes like create+update, create+remove, remove+update and remove+createare not possible because if a parent is added or removed the child cannot be added, removed or updated. Hence, the grouping of data nodes which are either added or removed is simpler.

But the same does not apply in case of update operation. because there can be several complex scenarios which can occur in case update operation. This documentation discusses a few of the scenarios and how to handle them.

Core Concept

The basic idea is to report every data node that has been updated individually. There are 2 main reasons behind this:

  • In order to find the values that have been updated between two data nodes, each leaf node present under the source data node is to be accessed and compared to the leaf node present under the target data node. In doing so all the information about the data nodes relation to other data nodes is lost, because the algorithm first flattens all the data nodes into a linear structure for easier comparision.

  • Then the algorithm accesses the source and target data node based on their xpath

  • When it finds source and target data nodes having the same xpath, we fetch the individual leaves nodes of the data node, these leaves are then compared and if any changes detected the leaves are added to the delta report.
    Since the data is accessed at a low level (leaf node) to generate the delta report, it becomes impossible to reconstruct the data node back and represent the updated data in the original JSON/XML structure. Therefore, the best approach is to provide the delta of updated nodes individually irrespective of the fact that they are in a parent-child relationship.

  • Reconstructing the data node to represent the original data structure would require some kind of data manipulation which is not suitable as it could lead to discrepancies in the delta that is generated.

  • The same is described below in form of the flowchart.

For the following scenarios we take the following source and target data as a common starting point

Source Data

Target Data

Source Data

Target Data

[ { "parent": { "leaf-1": "leaf-1 data", "leaf-2": "leaf-2 data", "child": { "child-1 data": "child-1 data", "child-2 data": "child-2 data" } } } ]
[ { "parent": { "leaf-1": "leaf-1 data", "leaf-2": "leaf-2 updated data", "child": { "child-1 data": "child-1 updated data", "child-2 data": "child-2 data" } } } ]

Mechanism for comparison

Flattening of Data Nodes before comparing

Flowchart: Mechanism for comparison of Data Nodes to check for updates

 

Scenario 1: Parent is updated; child is updated

If a parent and its child are updated, then both of them should be reported as 2 separate update operations. This is because when comparing data nodes that have been updated, it is not possible to maintain the parent-child relation of the 2 data nodes as explained above.

Source Data

Target Data

Source Data

Target Data

[ { "parent": { "leaf-1": "leaf-1 data", "leaf-2": "leaf-2 data", "child": { "child-1 data": "child-1 data", "child-2 data": "child-2 data" } } } ]

Scenario 2: Parent is updated, child is added or removed

In a scenario where parent node is updated and its child node either added or removed, then in such a scenario each node must be reported separately and grouped based on their respective operation. For example, if a parent is updated and 2 child nodes are either added. Then the delta report should have 2 entries, first an update entry containing details of the updated parent node and the second a create entry containing both the child nodes, grouped together.

Source Data

Target Data

Source Data

Target Data

Scenario 3: Parent is updated, child 1 is added or removed, child 2 is updated (Siblings scenario) 

If the parent and one of its children are updated while another child is added or removed then in such a case, the parent and updated child will be reported individually as two update operations while the added/removed child would be reported as a separate create/remove operation.

Scenario 4: Parent is updated, child remains unchanged, grandchild is updated

This is another scenario which justifies the proposal that every update operation should be reported separately, because if a parent and its grandchild are updated while the child node is unchanged, then maintaining the parent-child-grandchild relation and the original JSON/XML structure in the delta report becomes impossible because the delta report will not contain any information about the child node as it was not updated. Hence making the grouping operation impossible in such a scenario.