the sugiyama framework: laying out dags in four phases

2026-05-30

force-directed layouts (parts 1 and 2) discover structure by simulating physics. you define forces, run the simulation, and whatever shape emerges is the answer. that works because most graphs have no intrinsic "up". but some graphs do. a build dependency graph, a class hierarchy, a state machine, a flowchart: these have direction baked into their meaning, and a layout that ignores it is worse than useless. for those graphs you want every edge to point the same way, layers stacked top to bottom, and as few edges crossing as the structure allows.

that is what the Sugiyama framework produces. it is not one algorithm. it is a pipeline of four phases, each solving a self-contained subproblem and feeding its output to the next:

   input: a directed graph (maybe with cycles)
     |
     v
  [1] cycle removal        reverse a few edges to make the graph acyclic
     |
     v
  [2] layer assignment     give each node an integer rank (its row)
     |
     v
  [3] crossing reduction    order nodes within each layer to cut crossings
     |
     v
  [4] coordinate assignment assign real x-coordinates; straighten edges
     |
     v
   output: a layered drawing

the framework comes from Sugiyama, Tagawa, and Toda in 1981. it is the engine inside Graphviz dot, inside dagre, and inside ELK's layered algorithm. if you have ever seen a clean top-down dependency diagram, you have seen its output.

the defining property of the pipeline is that it is non-commutative and non-reversible. each phase commits decisions the later phases take as fixed. layer assignment cannot undo a reversed edge. crossing reduction cannot move a node to a different layer. coordinate assignment cannot reorder a layer. that rigidity is what makes each phase tractable: the subproblem is small because the previous phases already pinned down everything else. it is also the source of the framework's failure modes, which we will get to.

throughout, the reference implementation is dagrejs/dagre at commit 4713b59b. dagre is written in TypeScript and its source maps almost one file to one phase, which makes it an unusually readable place to watch the pipeline run.

phase 1: break the cycles

the rest of the pipeline assumes the graph is a directed acyclic graph: every edge points from a lower layer to a higher one. but real input graphs have cycles. a state machine loops back. a dependency graph has a retry edge. the first phase makes the graph acyclic by reversing a small set of edges, and remembers which ones so it can flip them back at the very end for rendering.

the question is which edges to reverse. you want the smallest set, because every reversed edge is an edge drawn against the grain (an arrow that points up in a top-down drawing). that set is called a feedback arc set, and finding the minimum one is NP-hard. so we use a heuristic.

the simplest heuristic is a depth-first search. walk the graph depth-first. keep a set of nodes currently on the recursion stack. when you follow an edge to a node that is already on the stack, that edge closes a cycle: it is a back edge. mark it for reversal. every other edge is part of the DFS tree or points to an already-finished node, so it cannot be part of a cycle once the back edges are gone.

   a --> b --> c --> d
         ^           |
         |           |
         +-----------+
         (d -> b closes a cycle)

   DFS from a: visit a, b, c, d. at d, follow d -> b.
   b is still on the stack, so d -> b is a back edge.
   reverse it. the graph is now acyclic:

   a --> b --> c --> d
         ^___________|   (drawn reversed, restored at the end)

dagre's acyclic.ts does exactly this. its run function picks between a greedy heuristic and the DFS version, then reverses each edge in the feedback set by removing it and re-adding it with its endpoints swapped, tagging the label with reversed: true so the final undo pass can restore the original direction.

function dfsFAS(graph: Graph<GraphLabel, NodeLabel, EdgeLabel>): Edge[] {
    const fas: Edge[] = [];
    const stack: { [key: string]: boolean } = {};
    const visited: { [key: string]: boolean } = {};

    function dfs(v: string): void {
        if (Object.hasOwn(visited, v)) {
            return;
        }
        visited[v] = true;
        stack[v] = true;
        graph.outEdges(v)!.forEach(e => {
            if (Object.hasOwn(stack, e.w)) {
                fas.push(e);
            } else {
                dfs(e.w);
            }
        });
        delete stack[v];
    }

    graph.nodes().forEach(dfs);
    return fas;
}

the two dictionaries are the whole trick. visited prevents reprocessing a node. stack tracks the active recursion path: stack[v] = true on entry, delete stack[v] on exit. the test Object.hasOwn(stack, e.w) asks "is the edge's target currently an ancestor of where i am". if yes, the edge points backward up the active path and closes a cycle, so it goes into the feedback arc set.

the greedy alternative dagre offers (Eades, Lin, Smyth 1993) usually reverses fewer edges. it repeatedly peels off sinks and sources, and for the remaining nodes reverses edges based on the difference between out-degree and in-degree. it is the better default when minimizing reversed edges matters. the DFS version is faster and simpler and is fine when the cycle count is low, which it usually is.

either way, the output of phase 1 is a DAG plus a record of which edges were flipped. the minimization is heuristic, but it does not need to be optimal: a couple of extra reversed edges costs a little visual clarity, not correctness.

phase 2: assign layers

now that the graph is acyclic, give every node an integer rank. the rank is its row in the final drawing. the constraint is that for every edge from v to w, rank(w) must be strictly greater than rank(v), so every edge points downward. among all rankings that satisfy that, you want one that keeps edges short: a long edge that spans many layers is harder to follow and forces dummy nodes (next section).

the trivial algorithm is longest-path. the rank of a node is the length of the longest path from any source to it. equivalently, process nodes so that a node is ranked one below its deepest descendant. it is linear time, O(V + E), and always produces a valid layering. its weakness is that it pushes nodes as far down as possible, leaving the bottom layers crowded and many edges longer than they need to be.

dagre's rank/util.ts implements longest-path with a memoized DFS. note the sign convention: it ranks from the sinks upward, taking the minimum over successors, so smaller numbers are deeper. the orientation is flipped later; what matters is that connected nodes land on adjacent integer levels.

function longestPath(graph: Graph<GraphLabel, NodeLabel, EdgeLabel>): void {
    const visited: { [key: string]: boolean } = {};

    function dfs(v: string): number {
        const label: NodeLabel = graph.node(v);
        if (Object.hasOwn(visited, v)) {
            return label.rank!;
        }
        visited[v] = true;

        const outEdges = graph.outEdges(v);
        const outEdgesMinLens: number[] = outEdges ? outEdges.map(e => {
            if (e == null) {
                return Number.POSITIVE_INFINITY;
            }

            return dfs(e.w) - graph.edge(e).minlen!;
        }) : [];

        let rank: number = applyWithChunking(Math.min, outEdgesMinLens);

        if (rank === Number.POSITIVE_INFINITY) {
            rank = 0;
        }

        return (label.rank = rank);
    }

    graph.sources().forEach(dfs);
}

the minlen on each edge is the minimum number of layers it must span, usually 1. the recursion sets a node's rank to min over successors of (successor.rank - minlen), so the node sits exactly minlen above its tightest successor. a node with no successors bottoms out at rank 0.

longest-path is only the starting point. the better algorithm, and dagre's default, is network simplex, from the Graphviz dot paper (Gansner, Koutsofios, North, Vo 1993). it treats layer assignment as a linear program: minimize the total weighted edge length (sum of rank(w) - rank(v) over all edges) subject to every edge spanning at least its minlen. the key concept is slack. the slack of an edge is how much longer it is than its minimum:

   slack(v -> w) = rank(w) - rank(v) - minlen(v -> w)

an edge with zero slack is tight: it spans exactly its minimum and cannot be shortened without moving its endpoints. network simplex builds a spanning tree of tight edges, then repeatedly finds a tree edge with negative "cut value" (one whose removal would let a whole component shift to shorten other edges) and swaps it for a non-tree edge, tightening the layout each time. it terminates at a ranking that minimizes total edge length. dagre exposes the slack helper right next to longest-path:

function slack(graph, edge): number {
    return graph.node(edge.w).rank - graph.node(edge.v).rank - graph.edge(edge).minlen;
}

network simplex is the right default: it produces tight, balanced layerings and runs fast in practice despite the theory. longest-path stays useful as the cheap initial feasible ranking that network simplex refines.

interlude: dummy nodes for long edges

after phase 2, an edge can span more than one layer. a node at rank 0 connected to a node at rank 3 has an edge crossing layers 1 and 2 with nothing in between. that breaks the assumptions of the next two phases, which both reason layer by layer about edges that connect adjacent layers only.

the fix is to replace every long edge with a chain of dummy nodes, one per intermediate layer, so the edge becomes a sequence of unit-length segments:

   before normalization:        after normalization:

   rank 0:   A                   rank 0:   A
                                            |
   rank 1:                       rank 1:   d1   (dummy)
                                            |
   rank 2:                       rank 2:   d2   (dummy)
                                            |
   rank 3:   B                   rank 3:   B

   one edge A -> B               three unit edges A->d1->d2->B

from here on, every node (real or dummy) sits on exactly one layer and every edge connects adjacent layers. crossing reduction treats a dummy node like any other node when counting crossings. coordinate assignment treats the dummy chain as a thing to keep straight, which is how you get long edges drawn as clean vertical or gently-bent lines instead of diagonals. at the end, the dummy nodes are removed and their positions become the bend points of the original edge.

dagre's normalize.ts walks every edge and splits the long ones:

function normalizeEdge(graph: Graph<GraphLabel, NodeLabel, EdgeLabel>, e: Edge): void {
    let v: string = e.v;
    let vRank: number = graph.node(v).rank!;
    const w: string = e.w;
    const wRank: number = graph.node(w).rank!;
    const name: string | undefined = e.name;
    const edgeLabel: EdgeLabel = graph.edge(e);
    const labelRank: number | undefined = edgeLabel.labelRank;

    if (wRank === vRank + 1) return;

    graph.removeEdge(e);

    let dummy: string;
    let attrs: Partial<NodeLabel>;
    let i: number;
    for (i = 0, ++vRank; vRank < wRank; ++i, ++vRank) {
        edgeLabel.points = [];
        attrs = {
            width: 0,
            height: 0,
            edgeLabel: edgeLabel,
            edgeObj: e,
            rank: vRank
        };
        dummy = addDummyNode(graph, "edge", attrs, "_d");
        if (vRank === labelRank) {
            attrs.width = edgeLabel.width;
            attrs.height = edgeLabel.height;
            attrs.dummy = "edge-label";
            attrs.labelpos = edgeLabel.labelpos;
        }
        graph.setEdge(v, dummy, {weight: edgeLabel.weight}, name);
        if (i === 0) {
            graph.graph().dummyChains!.push(dummy);
        }
        v = dummy;
    }

    graph.setEdge(v, w, {weight: edgeLabel.weight}, name);
}

the guard if (wRank === vRank + 1) return; short-circuits edges that already span one layer: most edges in a tight layering, so most edges are left alone. for the rest, the loop walks rank by rank from just below v up to w, dropping a zero-size dummy node on each intermediate layer and stitching unit edges between them. one subtlety: edge labels need room too, so the dummy that sits at the label's rank is given the label's width and height instead of being zero-size. that reserves space so the label does not overlap other nodes. the first dummy in each chain is recorded in dummyChains so the chain can be found and collapsed later.

phase 3: reduce crossings with barycenters

now every layer has a set of nodes and we get to choose the left-to-right order within each layer. that ordering determines how many edges cross. minimizing crossings is the phase that does the most for readability, and it is the hardest: even the two-layer version, where you fix one layer's order and only reorder the adjacent one, is NP-hard.

the standard heuristic is the barycenter method. fix one layer. for each node in the adjacent layer, compute the average position (the barycenter) of its neighbors in the fixed layer. then sort the adjacent layer by that average. the intuition: if a node's neighbors are mostly on the left of the fixed layer, the node wants to be on the left too, so its edges run roughly straight down instead of slashing across.

   fixed layer (positions 0..3):   A0   B1   C2   D3

   movable node X connects to A and C   -> barycenter = (0 + 2) / 2 = 1.0
   movable node Y connects to D         -> barycenter = 3 / 1       = 3.0
   movable node Z connects to A and B   -> barycenter = (0 + 1) / 2 = 0.5

   sort the movable layer by barycenter: Z (0.5), X (1.0), Y (3.0)

you do not do this once. you sweep: order layer 2 against fixed layer 1, then layer 3 against the now-fixed layer 2, down to the bottom, then sweep back up, fixing each layer against its neighbor in turn. each sweep can only reduce or hold the crossing count for the pair it touches. you run several sweeps and keep the ordering that produced the fewest total crossings, because the heuristic can oscillate and a later sweep is not guaranteed to beat an earlier one.

dagre's order/barycenter.ts is the computation in its purest form. it takes the movable layer and, for each node, averages the order of its in-neighbors weighted by edge weight:

export default function barycenter(graph: Graph, movable: string[] = []): BarycenterEntry[] {
    return movable.map(v => {
        const inV = graph.inEdges(v);
        if (!inV || !inV.length) {
            return {v: v};
        } else {
            const result = inV.reduce((acc, e) => {
                const edge = graph.edge(e);
                const nodeU = graph.node(e.v);
                return {
                    sum: acc.sum + (edge.weight * nodeU.order),
                    weight: acc.weight + edge.weight
                };
            }, {sum: 0, weight: 0});

            return {
                v: v,
                barycenter: result.sum / result.weight,
                weight: result.weight
            };
        }
    });
}

the weighted form, sum of (edge.weight * neighbor.order) / sum of edge.weight, matters because dummy nodes carry higher edge weights in dagre. that biases long edges toward straightness: a dummy chain pulls hard on its barycenter, so the heuristic keeps long edges vertical rather than letting them weave. a node with no in-edges returns just {v} with no barycenter, which the sort treats as "stay where you are" relative to its neighbors.

one piece this excerpt does not show is how crossings are counted to compare orderings. counting crossings naively is O(E^2). dagre uses the accumulator-tree method (Barth, Jünger, Mutzel 2002) in order/cross-count.ts, which counts them in O(E log V) by walking the edges sorted by their southern endpoint and summing, in a binary index tree, how many already-placed edges land to the right. that lets the sweep afford to count crossings after every pass and keep the best.

phase 4: assign x-coordinates (brandes-köpf)

after phase 3 we know the order of nodes in every layer, but not their actual x-coordinates. the layer's vertical position is its rank; the horizontal coordinate is still free. we want three things at once, and they pull against each other: nodes in a layer should not overlap and should sit in their assigned order; long edges (dummy chains) should be as straight as possible, ideally vertical lines; and a parent should sit roughly centered over its children. balancing those is the last phase.

the algorithm dagre uses is Brandes-Köpf (2001), which is fast (linear in the graph size) and produces good results without solving a linear program. the idea is to align nodes into vertical blocks. each node tries to align with its median neighbor in the next layer, so that an edge to that neighbor becomes a straight vertical segment. conflicts arise when two such alignments would cross, and the algorithm resolves them by priority, favoring the alignment of long edges (the dummy chains) so they come out straight.

the clever part is that aligning toward the median has a bias: do you break ties leftward or rightward, and do you align looking up the layers or down them. rather than pick one, Brandes-Köpf computes all four combinations: up-left, up-right, down-left, down-right. each produces a complete set of x-coordinates with a different lean. then it combines them: align the four candidates to the one with the smallest width, and set each node's final x to the average (in the original paper, the median) of its four values. the four biases cancel and you get a balanced, symmetric layout.

dagre's position/bk.ts is a faithful implementation. the positionX function is where the four candidates are generated and merged:

function positionX(graph: Graph<GraphLabel, NodeLabel, EdgeLabel>): PositionMap {
    const layering: string[][] = util.buildLayerMatrix(graph);
    const conflicts: Conflicts = Object.assign(
        findType1Conflicts(graph, layering),
        findType2Conflicts(graph, layering));

    const xss: XssMap = {};
    let adjustedLayering: string[][];
    ["u", "d"].forEach(vert => {
        adjustedLayering = vert === "u" ? layering : Object.values(layering).reverse();
        ["l", "r"].forEach(horiz => {
            if (horiz === "r") {
                adjustedLayering = adjustedLayering.map(inner => {
                    return Object.values(inner).reverse();
                });
            }

            const neighborFn = (v: string): string[] => {
                const result = vert === "u" ? graph.predecessors(v) : graph.successors(v);
                return result || [];
            };
            const align: AlignmentResult = verticalAlignment(graph, adjustedLayering, conflicts, neighborFn);
            let xs: PositionMap = horizontalCompaction(graph, adjustedLayering,
                align.root, align.align, horiz === "r");
            if (horiz === "r") {
                xs = util.mapValues(xs, x => -x);
            }
            xss[vert + horiz] = xs;
        });
    });


    const smallestWidth: PositionMap = findSmallestWidthAlignment(graph, xss);
    alignCoordinates(xss, smallestWidth);
    return balance(xss, graph.graph().align);
}

the nested ["u", "d"] over ["l", "r"] loop is the four candidates made literal. the vertical direction vert decides whether neighborFn looks at predecessors (align upward) or successors (align downward); the horizontal direction horiz is handled by reversing each layer's order so the same "align to median, break ties left" code produces a rightward lean. each iteration runs verticalAlignment to form the blocks, then horizontalCompaction to push the blocks together without overlapping, and stores the result in xss keyed by the two-letter combination ("ul", "ur", "dl", "dr").

the last three lines are the merge. findSmallestWidthAlignment picks the narrowest of the four. alignCoordinates shifts the other three to line up with it (otherwise averaging coordinates from layouts with different left edges would be meaningless). balance produces the final coordinate per node from the four aligned candidates. the findType1Conflicts and findType2Conflicts computed up front are the inner-segment crossings the alignment step must avoid: a type-1 conflict is a regular edge crossing a long-edge (dummy) segment, and the algorithm always sacrifices the regular edge to keep the long edge straight.

putting it together

walk a tiny graph through the whole pipeline. five nodes: a -> b, a -> c, b -> d, c -> d, d -> a (the last edge makes it cyclic).

   phase 1, cycle removal:
     DFS finds d -> a is a back edge. reverse it. now acyclic.

   phase 2, layer assignment (network simplex):
     rank 0:  a
     rank 1:  b   c
     rank 2:  d
     (the reversed edge a..d now spans ranks 0 to 2)

   interlude, dummy nodes:
     the long edge a -> d gets a dummy at rank 1:
     rank 0:  a
     rank 1:  b   c   d1
     rank 2:  d

   phase 3, crossing reduction:
     order rank 1 by barycenter so b, c, and the dummy d1
     sit in an order that minimizes crossings. d1 wants to be
     near a (above) and d (below), so it lands at an edge.

   phase 4, coordinate assignment (brandes-kopf):
     assign x so a centers over its children, the a->d dummy
     chain runs straight down one side, and nothing overlaps.

   final: a clean top-down drawing, with the original d -> a
   edge drawn reversed (an upward arrow) and routed through d1.

every phase took the previous phase's output as immutable and solved exactly one thing. that is the framework. it is also why swapping any two phases breaks it: you cannot count crossings before you have layers, and you cannot assign coordinates before you have an order.

complexity and where each phase bites

the phases have very different costs, and the bottleneck depends on the graph.

cycle removal is O(V + E) for the DFS heuristic, O(V + E) amortized for the greedy one. cheap. it is never the bottleneck. its cost is qualitative: a poor feedback arc set means more upward-pointing edges in the final drawing.

layer assignment is where the algorithm choice shows. longest-path is O(V + E). network simplex has bad worst-case bounds but is fast in practice, near-linear on the graphs people actually draw; Gansner et al. reported it dominates real workloads despite the theory. this phase rarely bites on time. it bites on quality: a bad layering makes long edges that inflate the next two phases.

crossing reduction is usually the real cost. each sweep reorders every layer and counts crossings; with the accumulator-tree counter that is O(E log V) per sweep, times a fixed number of sweeps (dagre runs a handful by default). the heuristic is also where quality is most variable: barycenter is good but not optimal, and on dense graphs the crossing count it settles at can be far above the true minimum. this is the phase to tune if a layout looks tangled.

coordinate assignment with Brandes-Köpf is O(V + E): four linear-time alignment passes plus a linear merge. fast. where it bites is the interaction with dummy nodes: a graph with many long edges has many dummy nodes, and dummy nodes are real nodes for the purpose of this phase, so a graph that looks small but has long-range edges can have a node count several times its visible size.

the cross-cutting cost is the dummy nodes themselves. a single edge spanning k layers adds k - 1 nodes. a graph with a few hubs that connect to distant layers can balloon. every phase after layer assignment pays for them. it is why long edges are the enemy of layered layout, and why network simplex (which minimizes total edge length, hence total dummy count) is worth its complexity over longest-path.

reading the source: dagre

dagre maps the framework to files almost exactly, which makes it the best place to read the pipeline end to end. at commit 4713b59b:

   phase 1  cycle removal       lib/acyclic.ts        (+ lib/greedy-fas.ts)
   phase 2  layer assignment    lib/rank/index.ts     (+ network-simplex.ts,
                                                          rank/util.ts)
   interlude  dummy nodes        lib/normalize.ts
   phase 3  crossing reduction  lib/order/index.ts    (+ barycenter.ts,
                                                          cross-count.ts)
   phase 4  coordinate assign   lib/position/bk.ts    (+ position/index.ts)

   the conductor               lib/layout.ts

lib/layout.ts is the conductor: it calls each phase in order, inserts and removes the dummy nodes, and runs the undo passes that restore reversed edges and collapse dummy chains into edge bend points. reading it top to bottom is the fastest way to see the whole pipeline as one function. the per-phase files above are where the actual algorithms live, and each is small enough to read in one sitting. the source is TypeScript, so the types double as documentation: Conflicts, AlignmentResult, and PositionMap in bk.ts tell you what Brandes-Köpf is passing between its stages without reading the bodies.

if a phase's dagre implementation is too terse, ELK (eclipse-elk/elk) has the same pipeline in Java with heavier documentation, and Graphviz dot has the original C. but for learning the shape of the framework, dagre's file-per-phase layout is hard to beat.

Sugiyama, K., Tagawa, S., and Toda, M. (1981). "Methods for Visual Understanding of Hierarchical System Structures." IEEE Transactions on Systems, Man, and Cybernetics 11(2), pp. 109-125. DOI: 10.1109/TSMC.1981.4308636. the original framework. dense and of its era, but it is where the four-phase decomposition is introduced and argued for. read it for the framing: the insight that layout of a hierarchy can be split into independent subproblems is the whole contribution, and it has survived forty years unchanged.

Gansner, E. R., Koutsofios, E., North, S. C., and Vo, K.-P. (1993). "A Technique for Drawing Directed Graphs." IEEE Transactions on Software Engineering 19(3), pp. 214-230. DOI: 10.1109/32.221135. the Graphviz dot paper, and the better first read. it is modern Sugiyama in practice: network simplex for layer assignment, a clean treatment of the iterated barycenter sweep, and a priority method for coordinate assignment that predates Brandes-Köpf. the writing is excellent and the algorithms are the ones still in use. if you read one paper here, read this one.

Brandes, U., and Köpf, B. (2001). "Fast and Simple Horizontal Coordinate Assignment." Graph Drawing 2001, Springer LNCS 2265, pp. 31-44. DOI: 10.1007/3-540-45848-4_3. the coordinate-assignment algorithm from phase 4, in full. the four-alignment trick and the block-compaction method are laid out precisely, and the paper is genuinely short and readable. worth it to understand why the layout comes out balanced and why the whole phase stays linear.

next: part 4 leaves directed graphs behind for trees, where the layout problem changes shape entirely. trees are planar by construction, so there are no crossings to fight; the question becomes how to pack a recursive structure tidily, and the answer (Reingold-Tilford) is one of the most elegant linear-time algorithms in the field.