|
A few weeks ago I wrote a post explaining computer science terminology to musicians. This is the inverse: an explanation of music notation for programmers who assume the domain is simpler than it is. That assumption is common. It's also expensive. Most notation software has been written by very competent engineers who underestimated the problem. The result is systems that work impressively until they don't – at which point they fail in ways that seem arbitrary to users but are, in fact, architectural. This post explains why. Notation Is Not MIDI with Graphics The most damaging assumption programmers bring to this domain is that music notation is a visual layer over time-stamped events. Work with audio software, game engines, or digital music for long enough, and this seems obvious: notes have start times, durations, and pitches. Notation just draws them nicely. This is wrong in a way that corrupts everything downstream. Consider two representations of identical duration:
These produce the same acoustic event. They do not mean the same thing. The first is a single rhythmic impulse sustained. The second implies a specific re-articulation point encoded in the notation itself. Performers read these differently. Conductors might even beat them differently. The semantic distinction is not ornamental. MIDI cannot represent this distinction. It knows only NOTE_ON, NOTE_OFF, and duration. Notation systems built on MIDI-like assumptions inherit this blindness and spend their entire existence fighting it. Music notation encodes musical meaning, not acoustic events. The implications of this single fact occupy the rest of this post. Time Is Not Integers Programmers reach instinctively for integer representation. MIDI uses ticks – typically 480 or 960 per quarter note. DAWs use samples. The assumption is that any temporal precision can be achieved by choosing a sufficiently small quantum. This is false for music notation. Consider a simple case: a quarter-note triplet against regular eighth notes. The triplet divides the beat into thirds; the eighths divide it into halves. To represent both exactly in a tick-based system, you need a tick count divisible by both 2 and 3. Fine: use 6 subdivisions. Now add a quintuplet in another voice. You need divisibility by 2, 3, and 5: LCM = 30. Now consider what real music does. Chopin writes nested tuplets. Ferneyhough writes 7:6 against 5:4 against 3:2. Elliott Carter writes simultaneous streams at different tempi. The required tick resolution grows combinatorially: where di are tuplet denominators and k is your base resolution. For complex contemporary music, this can exceed 10^6 ticks per beat, at which point integer overflow becomes a real concern and accumulated floating-point error in tempo calculations produces audible drift. The solution is obvious once stated: rational arithmetic. A quarter-note triplet is exactly 1/3 of a half note. No approximation. No accumulated error. The arithmetic is exact because the representation matches the domain.
These don't share a finite decimal expansion. They don't need to. Rationals are closed under the operations music requires. Clojure provides this natively. Most notation software doesn't use it. The consequences appear everywhere timing matters – which in notation is everywhere The Structure Is Not a Tree Programmers expect hierarchical data. XML, JSON, ASTs, DOM trees – the mental model is that complex structures decompose into nested containers. Parent nodes own child nodes. Traversal is well-defined. Music notation has multiple simultaneous hierarchies that do not nest cleanly. A single note participates in:
These hierarchies intersect. They do not nest. Consider cross-staff notation: a chord where some notes appear on the treble staff and others on the bass. Which staff 'owns' the chord? Neither. Both. The question assumes a tree structure that doesn't exist. Or consider voice exchange: soprano and alto swap pitches mid-phrase. The melodic line (a horizontal relationship) contradicts the voice assignment (a vertical relationship). Both are musically meaningful. Neither subsumes the other. Grace notes exist precisely at the intersection of conflicting hierarchies. They have pitch (placing them in a voice) but steal time from adjacent notes (making their metric position contextual). They may or may not take accidentals from the main-note context. The rules vary by historical period. Systems that force music into a single hierarchy spend enormous effort on edge cases that are only 'edge' cases because the data model is wrong. Context Is Non-Local: The Accidental ProblemHere is a concrete example of why music notation is harder than it looks. Question: Does this F need a sharp sign? Answer: It depends on:
A naive algorithm checks each note against previous notes in the measure. This is O(n) per note, O(n^2) per measure. Fine for simple music. Now add multiple voices. The F in the soprano might be affected by an F♯ in the alto, depending on conventions. You need to track alterations across voices. If voices cross staves (as in keyboard music), the interactions multiply. Now add simultaneity. What happens when two voices sound the same pitch at the same instant but with different accidentals? Yet another edge case difficult to catch when imprisoned in the Jail of Recursive Descent. Now add grace notes. A grace note at the beginning of a measure might be notated at the barline but sound at the end of the previous measure. Does its accidental affect the next measure? Does the next measure's key signature affect it? The dependency graph for accidentals is not a simple sequence. It's a directed acyclic graph with edges determined by temporal position, voice membership, staff assignment, and notational category. Correct handling requires topological sorting with domain-specific edge semantics. Most notation software handles this with heuristics: rules-of-thumb that cover common cases and fail on uncommon ones. The failures aren't bugs in the usual sense. They're consequences of architectural decisions made decades ago when the problem was less well understood. Ooloi handles this through what I call the 'timewalker' – a transducer that processes musical events in semantic order (not storage order) allowing full temporal context accumulation. The accidental algorithm becomes stateless over its input stream: given the same sequence of musical facts, it produces the same decisions. Deterministically. Every time. The complexity is still O(n · v) for n events and v voices, but the constant factor drops dramatically because traversal and processing are separated. The transducer handles navigation; the algorithm handles meaning. Concurrency Is Not Optional Here is where most notation software fails silently. A modern computer has multiple cores. Users expect responsiveness: type a note, see it appear immediately. But notation isn't like a word processor where inserting a character affects only nearby layout. Changing one note can alter accidentals throughout a measure, reflow an entire system, or invalidate cached engraving calculations across pages. The traditional approach is to serialise everything. One thread handles all mutations. The UI blocks or queues. This is why notation software freezes when you paste a large passage, or why collaborative editing remains essentially unsolved in this domain. The problem deepens when you consider what 'correct' means during an edit. If a user changes an F♮ to an F♯ in measure 4, the accidental state of measure 5 depends on this change. But if the rendering thread is currently drawing measure 5, what does it see? The old state? The new state? Some inconsistent mixture? Mutable state with locks doesn't solve this; it creates a new category of bugs. Deadlocks. Race conditions. Heisenbugs that appear only under specific timing. A very difficult work environment indeed, and one where diminishing returns becomes a real impeding factor. The functional approach is different. If musical data is immutable, the question dissolves. The rendering thread has a consistent snapshot – always. The editing thread produces a new snapshot. There's no moment of inconsistency because there's no mutation. This is what Software Transactional Memory provides in Clojure. Transactions compose. Reads are always consistent. Writes are atomic. The system guarantees that no thread ever sees a half-updated score. The performance implications are substantial. Ooloi's STM handles heavy contention without the coordination overhead that serialised systems pay constantly. The bottleneck shifts from synchronisation to actual work. Real-time collaboration becomes architecturally possible. Two users editing the same passage don't corrupt each other's state; they produce alternative futures that the system can merge or present as conflicts. This isn't a feature bolted on afterwards. It's a consequence of representing music correctly from the start. Engraving Is Constraint SatisfactionProgrammers sometimes assume that once you have the musical data, displaying it is straightforward. Calculate positions, draw glyphs, done. This underestimates the problem by roughly an order of magnitude. Consider accidental placement in a dense chord. Multiple accidentals must avoid colliding with each other, with noteheads, with stems, with ledger lines, and with accidentals in adjacent chords. The placement rules aren't arbitrary – they encode centuries of engraving practice optimised for readability. This is a constraint satisfaction problem. In the general case, it's NP-hard. Real systems use heuristics, but the heuristics are subtle. Gould's Behind Bars – the standard reference on notation practice – devotes many pages to accidental stacking alone. Now multiply by every other engraving decision: beam angles, stem lengths, tie curves, slur routing, dynamic placement, tuplet brackets, ottava lines, system breaks, page breaks. Each interacts with the others. Change a stem length and the beam angle needs adjustment; beam ends need to follow sit-straddle-hang rules and might both move. Change a beam angle and adjacent voices may collide. LilyPond's engraving quality comes precisely from treating this as a serious optimisation problem. Its page-breaking algorithm runs a variant of the Knuth-Plass line-breaking algorithm (developed for TeX). Its spacing uses spring-based constraint systems. The 'simple' rendering layer contains some of the most sophisticated code in the system. Interactive notation software faces an additional challenge: these calculations must be fast enough for real-time editing. You can't spend ten seconds optimising layout every time the user adds a note. The tension between quality and responsiveness has driven much of the architectural evolution in this field. Historical Conventions Are Not Bugs A programmer encountering music notation for the first time will find countless apparent inconsistencies. Surely these could be rationalised? No. Or rather: not without destroying information. A fermata on a note means 'sustain this note'. A fermata on a barline means 'pause here'. A fermata on a rest means something slightly different again. These distinctions emerged from practice. Composers use them. Performers understand them. A system that normalises them into a single 'fermata event' loses musical meaning. Grace notes in Bach are performed differently than grace notes in Chopin. The notation looks similar. The performance practice differs. A system that treats all grace notes identically will produce correct output that performers find misleading. Clefs change meaning by context. A C-clef on the third line is an alto clef; on the fourth line, a tenor clef. Same symbol, different staff position, different meaning. This isn't legacy cruft – orchestral scores routinely use multiple C-clef positions because different instruments read different clefs. Mensural notation (pre-1600) has entirely different rules. The same notehead shapes mean different things. Accidentals follow different conventions. Barlines serve different purposes. A truly comprehensive notation system must handle this or explicitly disclaim it. The temptation to clean this up, to impose regularity, to treat historical conventions as technical debt – this temptation must be resisted. These conventions encode musical knowledge. Destroying them in the name of consistency destroys the knowledge. The Performer Is the Point All of this serves a purpose that has nothing to do with computers. Notation exists to communicate with humans. Specifically: performers who bring interpretation, physical constraints, and centuries of inherited practice to their reading. This isn't sentimental. It has engineering consequences. A page turn in the wrong place forces a pianist to stop playing or memorise a passage. This is a functional failure, not an aesthetic preference. The page-breaking algorithm must allow this to be modelled. Awkward beam groupings obscure metric structure. The performer reads beats through beam patterns. Wrong beaming means wrong emphasis. The beaming algorithm must model this based on the currnent time signature. Unclear voice leading forces performers to decode the texture before they can interpret it. The system must make voices visually distinct in ways that match musical distinctness. Engraving choices that are mathematically equivalent may not be performatively equivalent. Two layouts might use the same space and avoid all collisions, but one reads naturally and the other requires active deciphering. The difference matters. This is why notation is not a solved problem. The specification is 'communicate musical meaning to human performers'. That's not a format spec. It's a design constraint that touches every decision in the system. Conclusion Music notation looks like a data format problem. It is actually a meaning representation problem with a millennium of accumulated context. The systems that fail – and most fail in some way – do so because they treat notation as something to be processed rather than something to be understood. They model the symbols without modelling what the symbols mean. The systems that succeed take the domain seriously. They represent time exactly. They handle multiple hierarchies without forcing false containment. They process context correctly even when context is non-local. They treat engraving as a real problem, not a rendering detail. They preserve historical conventions because those conventions carry meaning. This is why Ooloi's architecture looks the way it does. Rational arithmetic isn't ideology; it's the only way to represent tuplets exactly. Immutability isn't fashion; it's the only way to guarantee that accidental decisions are deterministic. Transducers aren't showing off; they're the cleanest and most efficient way to process a semantic stream with accumulated context. STM isn't over-engineering; it's the only way to achieve both correctness and responsiveness without sacrificing either. The domain dictates the tools. Not the reverse.
0 Comments
We have a new year, and it's time to make plans. Ooloi's architecture is now closed, and the work enters a new phase. What follows is a road map, not a schedule – no dates, no promises – but a logical progression. These stages are different in nature from what has gone before. Conceptually simpler than the deep semantic work (what the music is, how it's represented and manipulated) or the infrastructure (server/client transports and roundtrips, monitoring, certificates). Everything below is comprehensively architectured and implementation-ready. The thinking is done; what remains is execution. The PreparationsEvent-Driven Client Architecture: The nervous system that lets the interface respond without freezing. (ADR-0022, ADR-0031)
Windowing System: Windows, menus, palettes, dialogs, notifications, connection status. The application becomes something you can see and touch. (ADR-0005, ADR-0038) Multi-Mode Clients: Standalone operation, peer-to-peer connection, shared server connections. (ADR-0036) Skija / GPU Rendering: The drawing substrate. GPU-accelerated graphics, paintlist caching, lazy fetching. The machinery for putting marks on screen. (ADR-0005, ADR-0038) Hierarchical Rendering Pipeline: The transformation from musical structure to visual layout. (ADR-0028, ADR-0037) Plugin System: First-class access from any JVM language. (ADR-0003, ADR-0028) MusicXML: The first, limited version, implemented as a canonical plugin. Real scores entering the system. (ADR-0030) And then – and only then – will the first staff line appear. To implement what was described in the last blog post deterministically, ADR 0028, on the hierarchical rendering pipeline, and ADR 0037, about measure distribution, both have been updated. The rendering pipeline now has six stages rather than five. The additional stage enables the behaviour discussed in the previous post, but also allows slurs, lyrics, dynamics, and other elements that require vertical space to adapt dynamically without introducing layout jitter. What follows is a brief tour of the six pipeline stages. Each has a clearly defined role, a fixed direction of information flow, and no hidden feedback paths. Together, they transform musical structure into pages without iteration. If you are a musician, you should be able to follow how the music moves from essential meaning to visual expression. If you are a developer, the architectural choices should be readable between the lines. In both cases, the ADRs contain the full technical detail. Stage 1: AtomsEach measure is processed independently and turned into engraving atoms: indivisible musical units that already contain noteheads, augmentation dots, accidentals, stems, articulations: everything that belongs together. A note or a chord is an atom, including everything at a fixed distance from that note or chord. Once an atom is computed, its constituent parts never move relative to each other, though they can be adjusted manually or converted by the user to pure graphics for Illustrator-like manipulation. From these atoms, the system derives three facts about the measure stack: a minimum width, where atoms are simply lined up after one another (guaranteeing that nothing ever collides); an ideal width, based on proportional spacing from rhythmic density; and a gutter width – extra space required only if the measure begins a system. Courtesy accidentals caused by system breaks are accounted for here as space requirements, not as symbols. Ordinary courtesy accidentals are already part of the atom. Stage 2: Rhythmical AlignmentAll measures in a stack contribute a rhythmic raster: the sequence of atom positions implied by their rhythmic content at ideal spacing. Stage 2 takes these per-staff rasters and aligns them into a single, stack-wide rhythmic grid, so that 'the beat structure' is shared across staves rather than each staff carrying its own local spacing. Because the system has both the rasters and the atom extents from Stage 1, it can adjust the aligned positions so that collisions are avoided while still preserving the intended proportional structure. Stage 3: System FormationWith rhythmical alignment complete, the score collapses to a one-dimensional sequence of measure stacks, each described only by stack minimum width, stack ideal width, and possible stack system-start gutter. System breaks are chosen using dynamic programming, and a scale factor is computed for each system so that ideal proportions are preserved as well as possible within the available width. At the same time, system preambles (clefs, key signatures) are fully determined. Apart from these, the stage produces only break decisions and scale factors; no measure content is positioned yet. Stage 4: Atom PositioningThe abstract decisions from system formation are applied. Each atom is positioned horizontally by scaling its ideal spacing according to the scale factor of its system. Because atoms were defined with purely relative internal geometry, they can be moved without recomputation. This stage is a pure projection from musical proportions to absolute coordinates. Stage 5: SpannersAll connecting elements are resolved against the fixed atom positions: ties, slurs, hairpins, ottavas, lyric extenders. They adapt to the horizontal geometry they are given and never influence spacing upstream. System-start decorations live here as well: courtesy accidentals now appear in the gutter space that was explicitly reserved earlier. Actual system heights emerge at this stage as a consequence of what must be drawn – slurs, ties, lyrics, pedalling indications, hairpins – each claiming vertical space as required to avoid collisions. In this stage, braces, brackets, and instrument name indications are also computed for the left margin, as all information required to determine system grouping and extent is now available. Stage 6: Page FormationSystems are distributed across pages using dynamic programming, this time over the actual system heights produced by the previous stage. As in Stage 3, the distribution is optimal with respect to the available space. Once page breaks are chosen, layout is locked and no earlier stage is revisited. At this point, all rendering decisions have been made. The score can now be drawn directly from the cached paintlists.
|
AuthorPeter Bengtson – SearchArchives
January 2026
Categories
All
|
|
|
Ooloi is a modern, open-source desktop music notation software designed to produce professional-quality engraved scores, with responsive performance even for the largest, most complex scores. The core functionality includes inputting music notation, formatting scores and their parts, and printing them. Additional features can be added as plugins, allowing for a modular and customizable user experience.
Ooloi is currently under development. No release date has been announced.
|







RSS Feed