serawi — frontend engineer

graphs are abstract. nodes and edges have no inherent position. any layout you see was invented: a set of coordinates chosen by an algorithm according to some criteria. the whole field exists because different criteria give radically different pictures of the same data, and the wrong picture hides the structure you're trying to see.

consider a dependency graph for a large software project: 400 packages, 1,200 edges. place all nodes randomly and you get a tangle. apply force-directed and you get clusters that roughly correspond to subsystems. apply a layered layout and you get a top-to-bottom flow that makes build order legible but hides the cluster structure. both are correct representations of the same graph. they just answer different questions.

that's the real problem. it's not "how do I draw a graph." it's "what does readable mean for this graph, and which algorithm encodes those assumptions."

seven families of algorithms have settled into common use. they disagree with each other about what readable means, and they're each right for different data.

the aesthetic criteria

in 1998, Bennett, Ryall, Spalteholz, and Zyda catalogued what practitioners considered desirable in a graph drawing. the list:

minimize edge crossings
minimize edge length, and minimize variance in edge length
maximize symmetry
maximize angular resolution at each node (spread edges evenly around a node's circumference)
preserve the mental map when the graph changes (small edits should produce small visual changes)
avoid node overlaps

none of these criteria are binary. most involve trade-offs with each other. and many are NP-hard to optimize exactly: minimizing edge crossings alone is NP-complete in the general case.

the trade-offs are structural, not just mathematical. minimizing edge length pulls connected nodes close together, which creates clusters, which increases local crossing density where clusters border each other. maximizing symmetry often requires edge lengths to be non-uniform (some edges span levels; others don't). maximizing angular resolution at high-degree nodes stretches edges outward, which increases overall layout area, which reduces apparent density and makes spatial relationships harder to read from a distance.

every layout algorithm picks a subset of this list to optimize and accepts that the others will suffer. that's not a bug; it's the design decision. knowing which criteria an algorithm prioritizes tells you more than its name does.

seven families

force-directed. Eades 1984, Fruchterman and Reingold 1991, Kamada and Kawai 1989. model nodes as charged particles that repel each other. model edges as springs that pull connected nodes together. run a physics simulation until the system settles into low energy. the result looks organic, clusters emerge naturally, and the algorithm needs no prior knowledge of graph structure.

the non-determinism is a specific consequence of gradient descent on a non-convex energy landscape. the energy function has many local minima that are all "low energy" by the algorithm's measure but look different to human eyes. which minimum you land in depends on starting positions, which are usually randomized. seed the random number generator with a fixed value and you get a deterministic layout. seed it differently and you get a valid but visibly different one.

the cost: O(n²) pairwise repulsion calculations in the naive form, local minima, non-determinism, and visible jitter during simulation. parts 1 and 2 cover the algorithm and the scaling tricks that made it practical.

hierarchical (sugiyama framework). Sugiyama, Tagawa, and Toda 1981. built for directed graphs that are "almost" DAGs: dependency trees, org charts, build pipelines. a four-phase pipeline: break cycles (reverse a small feedback arc set, which then appear as upward arrows in the final drawing), assign nodes to horizontal layers (long-span edges get dummy nodes inserted at intermediate layers), reduce crossings within each layer (NP-hard in general, but good heuristics exist), then assign x-coordinates within each layer. the result makes direction legible: cause reads before effect, top-to-bottom. the failure mode is dense general graphs: if the data doesn't have hierarchy, the algorithm invents one, and the result looks wrong. part 3.

tree layouts. Reingold and Tilford 1981. trees are planar by construction, which makes them a tractable special case. the recursive idea: place children, then center the parent above them. the key insight in Reingold-Tilford is that subtrees can be computed independently, then packed together as close as possible without overlap, in a single left-to-right pass followed by a right-to-left adjustment pass. done carefully, this runs in O(n) and produces tidy, non-overlapping trees where subtrees at the same depth are consistently spaced. variations multiply: radial layouts place the root at the center and fan children outward in rings (good for organizational hierarchies where depth is secondary to breadth), hyperbolic layouts use a fish-eye projection to compress distant subtrees while keeping the focus region large, treemaps fill a fixed rectangle with area-proportional tiles (every node has an area, not a point). part 4.

spectral. Hall 1970, Koren 2003. compute eigenvectors of the graph Laplacian matrix and use them directly as x and y coordinates. pure linear algebra, no iteration. the Laplacian encodes local connectivity: its second-smallest eigenvector (the Fiedler vector) partitions the graph at its sparsest cut. use the second and third eigenvectors as x and y, and you get a layout where graph distance is approximately preserved in Euclidean distance. the strengths are mathematical: fast, closed-form, globally consistent. the weakness is aesthetic: spectral layouts often look strange to human readers. the eigenvectors don't know about edge crossings or angular resolution. they optimize connectivity preservation, not the visual criteria that practitioners find important. part 5.

constraint-based. cola.js, IPSepCoLa. express layout as a system of constraints: node A is at least 30px left of B, these nodes align on a horizontal axis, these groups of nodes don't overlap. formally, this is quadratic programming, solved in practice by gradient projection. the energy function is similar to force-directed (minimize a weighted sum of spring tensions), but certain constraints are hard: they must be satisfied regardless of the energy cost. this is the key distinction. force-directed treats everything as soft (it will accept minor overlaps if the global energy is lower). constraint-based enforces node non-overlap as a hard constraint, which means clusters stay readable even under dense conditions. the strength is that domain knowledge slots in naturally: a constraint that "these nodes represent a team and should cluster" is trivial to express. the weakness is complexity: constraint conflicts require resolution policy, and the solver is slower than pure force. part 5, alongside spectral.

orthogonal. Tamassia 1987. edges are drawn only as horizontal and vertical segments with right-angle bends. the algorithm works in three stages: find a planar embedding (the topology), assign angles to edges at each vertex (the shape), and then find actual coordinates that realize those angles with minimum total bends (the metrics). that third stage is the min-cost flow problem, where each edge bend costs one unit, and the algorithm finds a coordinate assignment that minimizes total cost. the result is the rectilinear diagrams common in technical drawings, circuit schematics, and network topology maps, where right angles signal structure. useless for organic data. slow on large graphs due to the combinatorial constraints in the shape and metrics phases. part 6.

specialized: circular, arc, concentric, chord. folklore and Wattenberg 2002 (arc), Krzywinski 2009 (chord / Circos). the deterministic, non-iterative family. position comes from a formula: every node gets an angle proportional to its rank in some ordering. the only real algorithmic question is node ordering. order by degree and high-degree nodes cluster at one side of the circle. order by community membership and edges between communities become visible as distinct chord bundles. the wrong ordering produces a visual mess of crossing arcs even though the graph hasn't changed. the right ordering is the algorithm. part 7.

the dimensions that actually matter when choosing

the family name tells you the algorithmic approach. these dimensions tell you whether it fits your situation.

determinism. does the same input always produce the same output? force-directed: usually not (random initial positions, gradient descent, local minima). layered: yes. spectral: yes. circular: yes. this matters for reproducibility and for user trust. a dashboard that shows a different layout every refresh fails the mental-map criterion before the user has even formed one.

scalability. 100 nodes, 10,000 nodes, 1,000,000 nodes? force-directed at O(n²) fails around 1,000 nodes without approximations. spectral scales well but needs sparse matrix solvers. circular layouts scale arbitrarily because they're arithmetic, not iterative.

online updates. can the algorithm incorporate node additions and edge changes without recomputing from scratch? force-directed can restart from the previous positions, which helps. constraint-based is designed for incremental updates. layered pipelines typically require full recompute.

constraint support. can you pin a node's position, enforce alignment, or prevent specific nodes from overlapping? constraint-based: yes, by definition. force-directed: partially, via pinning. tree and layered: no.

domain shape. does your graph have inherent hierarchy? trees, layered, and concentric layouts read that structure and make it legible. does it have cycles? force-directed handles cycles naturally; Sugiyama has to break them first by reversing selected edges, which adds artificial direction. is it sparse (few edges per node) or dense (many)? spectral works best on sparse graphs because the Laplacian stays manageable. is it planar (embeddable in a plane without crossings)? that opens orthogonal and additional exact algorithms not covered in this series.

stability. when a single node or edge changes, how much does the layout move? a fully-connected graph with 50 nodes: add one edge. in a fresh force-directed run, the change can cascade across the entire layout because the global energy minimum shifted. in constraint-based layout with fixed anchors, only the local neighborhood moves. instability destroys mental map preservation. incremental force-directed with fixed seeds helps. constraint-based with careful initialization helps. most algorithms have some stability/quality trade-off. tuning them to be more stable usually means accepting a worse layout quality for the static case.

the one insight

there is no universal layout algorithm. each family encodes a theory of what makes a graph readable. picking one is picking that theory. force-directed says "readable means organic clustering." sugiyama says "readable means top-to-bottom flow." orthogonal says "readable means right angles signal structure." circular says "readable means seeing all connection density at a glance."

when the choice is wrong, no parameter tuning fixes it. the algorithm is solving for criteria your data doesn't satisfy. the fix is switching families.

this also means that benchmarks comparing layout algorithms are usually misleading. a benchmark that scores layouts on edge crossing count will favor algorithms that optimize for edge crossings. a benchmark that scores on stress (how well Euclidean distance matches graph distance) will favor spectral and force-directed with Kamada-Kawai energy. neither benchmark tells you what matters for your domain. that judgment requires understanding what each family is actually optimizing.

knowing the families, and knowing which criteria each one optimizes, is the prerequisite for making that choice correctly. that's what this series is for.

what's in this series

eight posts, one algorithm family per part. each post covers the algorithm, why it works, and when to use or avoid it.

part 0: the graph layout series: a survey (this post). what graph layout is, why it's hard, and the seven families that solve it.
part 1: force-directed: springs and charges. the spring-and-charge model, the fruchterman-reingold cooling schedule, and why two runs can give two different layouts.
part 2: force-directed at scale: barnes-hut and multilevel. the quadtree approximation that turned force-directed from o(n²) to o(n log n), and the coarsening trick that scales it to a million nodes.
part 3: the sugiyama framework: laying out dags in four phases. cycle removal, layer assignment, crossing reduction, coordinate assignment.
part 4: tree layouts: tidy, radial, hyperbolic, treemap. reingold-tilford in o(n), the polar-coordinate variant, hyperbolic focus+context, and squarified treemaps.
part 5: spectral and constraint-based layouts. two mathematical approaches: the laplacian eigenvectors that give you coordinates for free, and the constraint solvers that let you bring your own.
part 6: orthogonal layouts: right angles and bend minimization. the topology-shape-metrics framework and the min-cost flow that minimizes bends.
part 7: specialized layouts: circular, arc, chord, concentric. four deterministic layouts where the only real algorithm is choosing the node order.

the graph layout series: a survey

the aesthetic criteria

seven families

the dimensions that actually matter when choosing

the one insight

what's in this series