Skip to content

Commit 7826035

Browse files
committed
Improve forking
1 parent 8c45fc2 commit 7826035

7 files changed

Lines changed: 192 additions & 152 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
11
# Changelogs
22

3-
## Version 1.2.0
4-
(Released : April 30, 2021) : tag 1.2.0
3+
## Version 1.2.2
4+
(Released : April 30, 2021) : tag 1.2.2
5+
* Improve forking mechanism to match with sub_graph
6+
* Improve fork/sub_graph documentation
7+
* Forward kwargs on standard visitor to be able to use BaseVisitor constructor
8+
* Add replace node utility
9+
10+
## Version 1.2.0 / Version 1.2.1
11+
(Released : April 30, 2021) : tag 1.2.0/1.2.1
512
* Add sub_graph
613
* Fix bug (forced to set an id on GraphNode constructor)
714
* Fix bug (standalone node without graph visitation failed)

README.md

Lines changed: 50 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ It provide:
1818
* [Template Example](#template-example)
1919
* [Graph creation](#graph-creation)
2020
* [Graph Node](#graph-node)
21-
* [Fork mechanism](#fork)
2221
* [Sub graph](#sub-graph)
22+
* [Fork mechanism](#fork)
2323
* [Visitors](#visitors)
2424
* [Visitor hook](#abstract-visitor-hooks)
2525
* [Standard visitors](#standard-visitor-provided-by-freexgraph)
@@ -212,6 +212,52 @@ execution_graph.add_node(GraphNode(id_graph, parents={id1}, graph=inner_graph))
212212
execution_graph.add_node(NodeForTest(idc, parents={id1, id_graph}))
213213
```
214214

215+
### Sub-Graph
216+
217+
A method exist in the `FreExGraph` class in order to make sub graph out of the graph.
218+
The signature is the following:
219+
```python
220+
def sub_graph(
221+
self, from_node_id: str, to_nodes_id: Optional[List[str]] = None
222+
) -> FreExGraph:
223+
```
224+
225+
Providing a node to start the subgraph and optionally a set of node where you want the sub graph to end. If no node matching the ``to_nodes_id`` is encountered, the subgraph go until the leaf nodes of the graph.
226+
The subgraph nodes are hard copy of the node of the initial graph. Modification to a subgraph doesn't impact its original.
227+
This feature may be used as a fork mechanism (as seen above). It is easier to manipulate and monitor.
228+
229+
Usage Example:
230+
231+
```python
232+
```python
233+
# id0
234+
# |
235+
# .___ id1 ___.______.
236+
# / | \ \
237+
# id2 id3 id4 ida
238+
# | / \ |
239+
# | id5 id6 |
240+
# \ \ / /
241+
# `---- id7 ----'
242+
# / \
243+
# id8 id9
244+
#
245+
246+
# If we decide to fork (with the fork id f1) from the node id4
247+
sub_graph: FreExGraph = execution_graph.sub_graph(from_node_id="id1", to_nodes_id=["id7"]))
248+
249+
# The sub graph would be the following
250+
251+
# .___ id1 ___.______.
252+
# / | \ \
253+
# id2 id3 id4 ida
254+
# | / \ |
255+
# | id5 id6 |
256+
# \ \ / /
257+
# `---- id7 ----'
258+
```
259+
260+
215261
### Fork
216262

217263
FreExGraph provide a fork mechanism. It provides an easy way to duplicate a graph from a given node until the end of the graph (or to a specific node that would be used as a join).
@@ -252,27 +298,13 @@ execution_graph.fork_from_node(FreExNode(uid="id4", fork_id="f1"))
252298
```
253299
Obviously if you want the node id4::f1 (which is of type FreExNode as you asked) to ever be visited by one of your visitor, you should provide your own node implementation.
254300

255-
It is also possible to provide a join node. It will be a node used as join for the fork in order to not have to fork the whole graph from the source of the fork. It is usefull if you have to multiply a big chunk of execution graph because one node has to change some internal values (in experimentation fields, it can be useful for parameter explorations).
301+
It is also possible to provide a join node. It will be a node used as join for the fork in order to not have to fork the whole graph from the source of the fork.
302+
It is useful if you have to multiply a big chunk of execution graph because one node has to change some internal values (in experimentation fields, it can be useful for parameter explorations).
256303

257-
> **Try avoiding forks** : This is a mecanism that can be useful in certain cases (the main one would be parameter exploration on an experimentation) But when it comes to map reduce for example, it is advised to manually fo the nodes you want instead (improve readibility of what you are doing when making your graph). A chaining of fork can start being very hard to understand for the user.
304+
Under the hook, fork is using sub_graph and is reconstructing the links of the subgraph directly into the graph (and extends its name and parents name).
258305

259-
But if you want to do a map reduce with a fork, it is do-able by setting the join_id to the `fork_from_node` method. The join_id has to be an existing node on which, for every parents that are part of the fork has only this join node as child.
260306
See [test using this mechanism](https://github.com/FreeYourSoul/FreExGraph/blob/ae707cf0fcb8486bde783cd0c7fe67217a56b3d2/test/fork_test.py#L41-L66) for more details
261307

262-
### Sub-Graph
263-
264-
A method exist in the `FreExGraph` class in order to make sub graph out of the graph.
265-
The signature is the following:
266-
```python
267-
def sub_graph(
268-
self, from_node_id: str, to_nodes_id: Optional[List[str]] = None
269-
) -> FreExGraph:
270-
```
271-
272-
Providing a node to start the subgraph and optionally a set of node where you want the sub graph to end. If no node matching the ``to_nodes_id`` is encountered, the subgraph go until the leaf nodes of the graph.
273-
274-
This feature may be used as a fork mechanism (as seen above). It is easier to manipulate and monitor.
275-
276308
## Visitors
277309

278310
### Abstract Visitor hooks

freexgraph/freexgraph.py

Lines changed: 77 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,9 @@
2020
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2121
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2222
# SOFTWARE.
23-
from copy import copy
24-
from typing import Optional, Union, Set, List, Any
23+
24+
from copy import copy, deepcopy
25+
from typing import Optional, Union, Set, List, Any, Tuple
2526

2627
import networkx as nx
2728

@@ -37,7 +38,7 @@ class FreExNode:
3738
"""Representation of the content of a node in the execution graph """
3839

3940
parents: Set[str]
40-
"""Parents of the node to add """
41+
"""Parents of the node to add"""
4142

4243
extension_node: bool
4344
""" """
@@ -121,6 +122,9 @@ def graph_ref(self) -> nx.DiGraph:
121122
return self._graph_ref
122123

123124

125+
RemovedParent = Tuple[FreExNode, Set[str]]
126+
127+
124128
class RootNode(FreExNode):
125129
pass
126130

@@ -163,29 +167,6 @@ def _remove_duplicated_node(nodes: List[FreExNode]) -> List[FreExNode]:
163167
return filtered_list
164168

165169

166-
def _continue_fork(
167-
join_node_id: Optional[str], node: FreExNode, successors: List[str]
168-
) -> bool:
169-
"""check that the fork has to continue for the given node
170-
:return: return true if the next is not the join node and thus fork continue, False otherwise
171-
"""
172-
if join_node_id is None:
173-
return True
174-
if node.id == join_node_id:
175-
return False
176-
177-
assert (
178-
len(successors) > 0
179-
), f"Fork Join error {join_node_id}: node {node.id} reached (doesn't link with the join node)"
180-
181-
if any([join_node_id in ss for ss in successors]):
182-
assert (
183-
len(successors) == 1
184-
), f"Fork Join error {join_node_id} : all element from a fork should be joining uniquely the fork"
185-
return False
186-
return True
187-
188-
189170
class FreExGraph:
190171
"""Execution Graph main class"""
191172

@@ -201,6 +182,8 @@ def __init__(self):
201182
@staticmethod
202183
def _make_node_id_with_fork(node_id: str, fork_id: str) -> str:
203184
"""make a unique id for the new fork"""
185+
if node_id == root_node:
186+
return node_id
204187
return f"{node_id}::{fork_id}"
205188

206189
def add_nodes(self, nodes: List[AnyFreExNode]) -> None:
@@ -290,14 +273,20 @@ def get_node(self, node_id: str) -> Optional[AnyFreExNode]:
290273
return self._graph.nodes[node_id]["content"]
291274

292275
def sub_graph(
293-
self, from_node_id: str, to_nodes_id: Optional[List[str]] = None
294-
) -> "FreExGraph":
276+
self,
277+
from_node_id: str,
278+
to_nodes_id: Optional[List[str]] = None,
279+
return_removed_parents: bool = False,
280+
) -> Union["FreExGraph", Tuple["FreExGraph", List[RemovedParent]]]:
295281
"""Utility method to retrieve a subgraph from a given node until the end of the graph or until one of the
296282
provided node is encountered.
297283
298284
:param from_node_id: node from which the sub graph start
299285
:param to_nodes_id: nodes on which the sub graph stop, if none encountered, subgraph go until the leaf nodes
300-
:return: a sub graph delimited by the provided nodes id
286+
:param return_removed_parents: set to false by default, if set to true, return a second return tuple value that
287+
contains the removed parents in the subgraph
288+
:return: a sub graph delimited by the provided nodes id, if return_removed_parents set to true, also return a
289+
tuple that represent Tuple[node_that_got_parents_deleted, deleted_parent_links_set]
301290
"""
302291
from_node: FreExNode = self.get_node(from_node_id)
303292
assert (
@@ -310,7 +299,7 @@ def sub_graph(
310299
def add_node_in_subgraph(current_node: FreExNode):
311300
if current_node.id in nodes_in_subgraph_id:
312301
return
313-
nodes_in_subgraph.append(current_node)
302+
nodes_in_subgraph.append(deepcopy(current_node))
314303
nodes_in_subgraph_id.add(current_node.id)
315304
if to_nodes_id is not None and current_node.id in to_nodes_id:
316305
return
@@ -324,12 +313,20 @@ def add_node_in_subgraph(current_node: FreExNode):
324313

325314
add_node_in_subgraph(from_node)
326315

316+
saved_removal: List[RemovedParent] = []
317+
327318
# cleanup parents
328319
for n in nodes_in_subgraph:
320+
if return_removed_parents:
321+
saved_removal.append(
322+
(n, {p for p in n.parents if p not in nodes_in_subgraph_id})
323+
)
329324
n.parents = {p for p in n.parents if p in nodes_in_subgraph_id}
330325

331326
sub_graph = FreExGraph()
332327
sub_graph.add_nodes(nodes_in_subgraph)
328+
if return_removed_parents:
329+
return sub_graph, saved_removal
333330
return sub_graph
334331

335332
def fork_from_node(
@@ -345,21 +342,8 @@ def fork_from_node(
345342
> It is the user responsibility to ensure that those id doesn't collide.
346343
347344
If the provided join_node doesn't exist, an exception is thrown.
348-
If a join node is provided, all node from the provided one to the join node is duplicated. All last forked node
349-
will depend on the join_node. The join node HAS to be the only node linking all node to be forked.
350-
351-
example:
352-
353-
||
354-
. id1 .
355-
// \\
356-
id2 id3
357-
\\ //
358-
` id4 ` id5
359-
||
360-
361-
With the graph above. if we fork from id1 and id4 as join, it works as id2 and id3 are joined on it. But if id5
362-
where to be linked with id3 or id2, as id4 wouldn't be the only link possible. An assertion error would arise.
345+
If a join node is provided, all node from the provided one until the join node is encountered are duplicated. if
346+
the join node is not encountered, duplicate node until leaf
363347
364348
side_note:
365349
':' is used as a separator for the id and the fork_id to ensure a unique name. This is the reason why '::'
@@ -374,69 +358,63 @@ def fork_from_node(
374358
assert self._graph.has_node(
375359
forked_node.id
376360
), f"Error fork of node {forked_node.id}, node to fork has to be in the execution graph"
377-
assert not isinstance(
378-
self.get_node(forked_node.id), GraphNode
379-
), f"Error fork of node {forked_node.id}: cannot fork a graph node"
380361
assert (
381362
forked_node.fork_id
382363
), f"Error fork of node {forked_node.id}: doesn't have fork_id"
383364
assert join_id is None or self._graph.has_node(
384365
join_id
385366
), f"Error fork of node {forked_node.id} with join_id {join_id}: join_id node doesn't exist in graph "
386367

387-
# list of all node that will need to be created (all are created at once with add_nodes)
388-
all_forked_nodes_to_add: List[FreExNode] = []
389-
390-
# recursive internal function to copy a node and then copying all children
391-
def fork_node(id_new_forked_node: str, to_fork: FreExNode) -> None:
392-
new_node = type(to_fork)(
393-
uid=id_new_forked_node,
394-
parents=(
395-
to_fork.parents
396-
if to_fork.id != forked_node.id
397-
else self._graph.nodes[forked_node.id]["content"].parents
398-
),
399-
graph_ref=forked_node.graph_ref,
400-
)
401-
new_node._fork_id = forked_node.fork_id
402-
new_node._depth = to_fork.depth
403-
all_forked_nodes_to_add.append(new_node)
404-
all_suc = list(self._graph.successors(to_fork.id))
405-
if not _continue_fork(join_id, to_fork, all_suc):
406-
return
407-
for successor in all_suc:
408-
n = self.get_node(successor)
409-
id_next_fork = self._make_node_id_with_fork(n.id, forked_node.fork_id)
410-
if not self._graph.has_node(id_next_fork):
411-
fork_node(id_next_fork, n)
412-
413-
fork_node(
414-
self._make_node_id_with_fork(forked_node.id, forked_node.fork_id),
415-
forked_node,
368+
join_node_list = [join_id] if join_id else []
369+
sub_graph, removed_parents = self.sub_graph(
370+
from_node_id=forked_node.id,
371+
to_nodes_id=join_node_list,
372+
return_removed_parents=True,
416373
)
417-
all_forked_nodes_to_add = _remove_duplicated_node(all_forked_nodes_to_add)
418-
419-
# update parents links of all forks (to target their homologue forked and not the root one)
420-
all_forked_id = [n.id for n in all_forked_nodes_to_add]
421-
for node in all_forked_nodes_to_add:
422-
if node.id != forked_node.id:
423-
forked_parents = set()
424-
for p in node.parents:
425-
# if parent is in the list of node that has been forked, add the fork name
426-
if any([n.startswith(p) for n in all_forked_id]):
427-
forked_parents.add(
428-
self._make_node_id_with_fork(p, forked_node.fork_id)
429-
)
430-
else:
431-
forked_parents.add(p)
432-
node.parents = forked_parents
433-
434-
self.add_nodes(all_forked_nodes_to_add)
435-
436-
# Add links on the join node if any is set
374+
sub_graph._graph.remove_node(root_node)
437375
if join_id is not None:
438-
for node_id in all_forked_id:
439-
self._graph.add_edge(node_id, join_id)
376+
sub_graph._graph.remove_node(join_id)
377+
378+
# renaming all nodes (and remove useless root_node)
379+
for node_to_fork_rename in [
380+
sub_graph.get_node(n) for n in sub_graph.graph.nodes
381+
]:
382+
if root_node in node_to_fork_rename.parents:
383+
initial_graph_node = self.get_node(node_to_fork_rename.id)
384+
if root_node not in initial_graph_node.parents:
385+
node_to_fork_rename.parents.remove(root_node)
386+
387+
node_to_fork_rename._fork_id = forked_node.fork_id
388+
node_to_fork_rename._id = self._make_node_id_with_fork(
389+
node_to_fork_rename.id, forked_node.fork_id
390+
)
391+
node_to_fork_rename.parents = {
392+
self._make_node_id_with_fork(p, forked_node.fork_id)
393+
for p in node_to_fork_rename.parents
394+
}
395+
396+
# redo linking
397+
for node, removed_parents in removed_parents:
398+
node.parents.update(removed_parents)
399+
400+
self.add_nodes([sub_graph.get_node(n) for n in sub_graph.graph.nodes])
401+
402+
def replace_node(self, to_replace: FreExNode) -> None:
403+
"""Replace the node defined by the provided node id with the provided node, keeping the important data needed
404+
for the graph consistency between the initial node and the new one.
405+
406+
:param to_replace: node to replace (use id property in order to retrieve the proper node to replace)
407+
"""
408+
previous_node = self.get_node(to_replace.id)
409+
assert (
410+
previous_node is not None
411+
), f"Cannot replace node {to_replace.id} that is not in the graph"
412+
413+
to_replace._fork_id = previous_node.fork_id
414+
to_replace._graph_ref = previous_node.graph_ref
415+
to_replace._depth = previous_node.depth
416+
to_replace.parents = previous_node.parents
417+
self._graph.nodes[to_replace.id]["content"] = to_replace
440418

441419
def __find_current_depth(self, parents: Set[str]) -> int:
442420
"""

0 commit comments

Comments
 (0)