GraphABC

by Matteo Dell'Amico

This is meant to be a proposal for an Abstract Base Class representing graphs. There are lots of graph libraries for Python, and many people think it would be good if they were able to talk to each other. From Python versions 2.6 and 3.0, Abstract Base Classes are the way to represent objects having a common interfaces in Python.

Please consider this as a rough draft: the intent is defining a clean interface. Please comment! My email is della at linux dot it.

The Rationale

A graph is, mathematically, a set V of nodes (or vertices) and a set E of edges contained in V*V. In a very common case, edges are labeled.

With the new abstract base classes in the collections module of the standard library, it is easy to implement collections with a rich interface. The proposal is to exploit this by defining graph nodes as a set and graph edges as a mapping from pairs of nodes to edge labels. The two containers are available as attributes for all instances of the Graph class, and they are paired so that the invariant "E is a subset of V*V" is maintained. In particular, removing nodes also removes all incident edges; adding edges also ensures that the corresponding nodes are added.

A single node can be any Python object; edges are represented by pairs of nodes for directed graphs and unordered pairs (i.e., frozensets) for undirected graphs.

The advantages of such a representation are:

Python users are already familiar with the set and mapping semantics, so the API, while rich, is easy to graphs;

NodesView and EdgesView are easy to implement, since most of the methods are implemented as mixin: an implementor only needs to write NodesView.__contains__, NodesView.__len__, NodesView.__iter__, EdgesView.__contains__, EdgesView.__len__, EdgesView.__iter__, EdgesView.__getitem__. For mutable graphs, NodesView.add, NodesView.discard, EdgesView.__setitem__, EdgesView.__delitem__ are needed as well.

Interface Details

A common Graph class is specialized by two possible instances: Directed and Undirected. All graphs should have two attributes: nodes and edges, respectively of type Graph.NodesView and Graph.EdgesView. Graph.NodesView is a subclass of collections.Set, and has the same abstract and mixin methods. Graph.EdgesView is a subclass of collections.Set, and provides the additional mixin methods add and discard. MutableGraph subclasses Graph; MutableGraph.NodesView and MutableGraph.EdgesView are respectively subclasses of collections.MutableSet and collections.MutableMapping.

For more details, see the graphs.py source file.

Use Cases

Unweighted Edges

GraphABC supports labeled edges by default; a user who doesn't care about edges will use the default of 1. g.edges.add(n1, n2) and g.edges.discard(n1, n2) are provided as alternatives to respectively g.edges[n1, n2]=1 and del g.edges[n1, n2], mapping better to a Set-like interface. For a real Set instance, use g.edges.keys().

Implementations that don't support edge labels should raise a ValueError if __setitem__(self, edge, label) is not called with label=1.

Labeled and/or Colored Nodes

Adding annotations to a graph doesn't need to be supported by the API IMHO: it is sufficient to add attributes to class instances, doing stuff like g.colors={1: 'red', 2: 'green', 3:'blue'}

Multigraphs

I think it is not necessary to cover multigraphs (i.e., those where more than one edge is allowed between two nodes) explicitly in the API, since most of the time an algorithm does not need what to do in presence of multiple edges between the same nodes. When edges are unlabeled, it is sufficient use as label the number of edges: as a bonus, algorithms such as those which calculate the maximal flow will automatically do the right thing.

Conversion between formats

When a graph algorithm is costly (i.e., when the cost of running the algorithm exceeds the cost of converting it to a different representation), it is actually advisable to convert to a reasonable format. For example, to convert to the supplied "directed dict of lists" format, it would be possible to do so: g = Directed_DictOfLists.from_graph(g).data. This way, it is possible to access the desired representation regardless of the original representation of the input graph.

Example: An Interactive Session

Python 3.0.1+ (r301:69556, Apr 15 2009, 17:25:52) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dict_of_dicts
>>> g = dict_of_dicts.Undirected_DictOfDicts()
>>> g.edges.add(1,3)
>>> g.edges.add(1,2)
>>> g.edges[2,3] = 2 # Weighted edge this time
>>> list(g.nodes)
[1, 2, 3]
>>> g.nodes.add(4) # New node with no incident edges
>>> g.data # Access the internal representation
{1: {2: 1, 3: 1}, 2: {1: 1, 3: 2}, 3: {1: 1, 2: 2}, 4: {}}
>>> g.nodes &= {1, 3, 5} # Extraction of a subgraph
>>> g.data
{1: {3: 1}, 3: {1: 1}}
>>> g.edges[1, 3] # Access to the edge value
1
>>> 1 in g.nodes # Membership test for nodes
True
>>> {1, 4} in g.edges # Membership test for edges
False
>>> (1, 3) in g.edges # edges are tested with tuple unpacking, so the specific container passed to __contains__ doesn't matter.
True

Source Code

This is code written for Python 3.0. Backporting to 2.6 should be quite easy. Beware: the interface is up for discussion, and code may have bugs.

graphs.py: The source code for the ABC.

simple.py: A simple and inefficient implementation where nodes are coded as a set and edges are coded as a dictionary.

dict_of_lists.py: An implementation of the format described in Guido's famous essay.

dict_of_dicts.py: A dictionary-of-dictionary reasonably efficent implementation, similar to the one adopted in NetworkX.

Thanks

Thanks to Magnus Lie Hetland, Geremy Condra, and Nick Vatamaniuc for their feedback!