Last week I wrote about flipping the Staff→Voice→Measure hierarchy to Staff→Measure→Voice. The structural reasons were sound, but I'd estimated a week for the implementation. Forty files to touch, thousands of tests to update, the whole spine of the system. With a 1:1 ratio of code to tests, that's rather a lot of surface area to cover. This usually means a lot of pain. The actual time: four hours. Claude Code's systematic approach made the difference. Seven discrete steps, continuous test validation, zero functionality regression. Tests failed as expected during refactoring, but the effort to fix them was absolutely minimal. This wasn't vibe coding; I controlled the process carefully, directing each step. Yet even with that careful oversight, the work required was remarkably little. Claude 4.5, being new and even more capable, may well have been key here. The result was better than expected as well: 30–99% less allocation through transducer-based lazy sequences, 211 fewer lines of code, clearer logic throughout. What surprises me isn't that AI-assisted development helped; it's the magnitude. A tenth of the expected time isn't incremental improvement. That's a different category of capability entirely. Now on to some small bits and bobs before the real work on the client starts: windows, drawing, printing, etc.
2 Comments
There's something rather fitting about finding your programming salvation at the bottom of a laundry basket. Not that it had been there for twenty-five years, mind you – I'm not quite that slovenly. But when the moment arrived to resurrect Igor Engraver as the open-source project now becoming Ooloi, I suddenly realised that the only piece of original code I possessed was printed on a promotional t-shirt from 1996. The search was frantic. I'd just committed to rebuilding everything from scratch: Common Lisp to Clojure, QuickDraw GX to modern graphics, the whole shebang. Yet somewhere in my flat lay a single fragment of the original system, a higher-order function for creating pitch transposers that I dimly recalled being rather important. After tearing through a hundred-odd t-shirts (mostly black, naturally), I found it crumpled beneath a pile of equally rumpled garments. The print quality had survived remarkably well. More remarkably still, when I a few days ago, after a year of implementing the Ooloi engine, fed the photographed code to ChatGPT 5, it immediately identified this transposer factory as the architectural cornerstone of Igor Engraver. That was both validating and slightly unnerving: I'd forgotten precisely how central this code was, but an AI recognised its significance instantly. I clearly had chosen this piece of code for this very reason. And as LLMs are multidimensional concept proximity detectors, the AI immediately saw the connection. Now it was up to me to transform and re-implement this keystone algorithm. The Dread of UnderstandingI'd glimpsed this code periodically over the years, but I'd never truly penetrated it. There were mysterious elements – that enigmatic 50/51 cent calculation, for instance – that I simply didn't grasp. The prospect of reimplementing it filled me with a peculiar dread. Not because it was impossibly complex, but because I knew I'd have to genuinely understand every nuance this time. Pitch representation sits at the absolute heart of any serious music notation system. Get it wrong, and everything else becomes compromised. Transposition, particularly diatonic transposition, must preserve musical relationships with mathematical precision whilst maintaining notational correctness. A piece requiring a progression from C𝄪 to D𝄪 cannot tolerate a system that produces C𝄪 to E♮, regardless of enharmonic equivalence. The spelling matters profoundly in musical contexts. And then there's the microtonal dimension. Back in 1996, no notation software could actually play microtonal music, even if some of them could display quarter-tone symbols. Igor Engraver was different: our program icon featured a quarter-tone natural symbol (𝄮) for precisely this reason. My original intended audience consisted primarily of contemporary art music composers who needed these capabilities. I needed them myself. MIDI SorceryOur solution was elegantly brutal: we seized complete control of attached MIDI units and employed pitch bend to achieve microtonal accuracy. This required distributing notes across MIDI channels according to their pitch bend requirements, using register allocation algorithms borrowed from compiler technology. In a chord containing one microtonally altered note, that note would play on a different channel from its companions. We changed patches frantically and maintained no fixed relationship between instruments and channels – everything existed in a kind of 'DNA soup' where resources were allocated dynamically as needed. This approach let us extract far more than the nominal sixteen-channel limit from typical MIDI synthesisers. We maintained detailed specifications for every common synthesiser on the market, including how to balance dynamics and handle idiosyncratic behaviours. Real-World Musical IntelligenceThe system's sophistication extended well beyond pure pitch calculations. When my opera The Maids was commissioned by the Royal Stockholm Opera, I spent considerable time crafting realistic rehearsal tapes. Everything I learned from that process was automated into Igor's playback engine. We also collaborated with the KTH Royal Institute of Technology Musical Acoustics department, led by the legendary Johan Sundberg, whose research had quantified subtle but crucial performance characteristics. Those famous four milliseconds – the consistent temporal offset between soloists and accompaniment in professional orchestras – found their way into our algorithms. Such details proved particularly effective with Schönberg's Hauptstimme markings (𝆦) or similar solo indicators. We also developed what my composer colleague Anders Hillborg and I privately called 'first performance prophylaxis' – a deliciously cruel setting that simulated the sound of musicians who hadn't practiced. In other words, the kind of sound landscape any composer is used to hearing at a first orchestral rehearsal of a new piece and which always makes you doubt your own talent. Turn this setting up, and you'd hear a characteristically dreadful youth orchestra. Turn it down completely, and you'd get the robotic precision that plagued every other MIDI system. Rather like Karl Richter's Baroque organ recordings. The humanisation algorithms incorporated realistic instrumental limitations. Passages written too quickly for an instrument would skip notes convincingly. We modelled the typical rhythmic hierarchy of orchestral sections: percussion most precise, then brass, then woodwinds, with strings bringing up the rear. Instruments were panned to their proper orchestral seating positions. Piccolo trills were faster than tuba trills. The result was startlingly realistic, particularly by 1996 standards. The ADR and Current Reality Now, twenty-five years later, that laundry basket discovery has culminated in ADR 0026: Pitch Representation and Operations, documenting Ooloi's comprehensive pitch representation system. The original Common Lisp has been reborn as Clojure code, with string-based pitch notation ("C#4+25") serving as the canonical format and a factory-based transposition system supporting both chromatic and diatonic modes. The string representation offers several advantages: compact memory usage for large orchestral scores, direct human readability for debugging, and seamless integration with parsing and caching systems. Most crucially, it supports arbitrary microtonal deviations, something that remains problematic in most contemporary notation software. The factory pattern generates specialised transposition functions that encapsulate their musical behavior rules through closures. Rather than repeatedly passing configuration parameters, the factory creates efficient, composable functions that understand their specific musical contexts. A diatonic transposer preserves letter-name relationships; a chromatic transposer produces frequency-accurate results with canonical spellings. ClosureThe t-shirt in my laundry basket represented more than nostalgic memorabilia; it was unfinished business. That higher-order function embodied a sophisticated understanding of musical mathematics that took a long time to develop and seconds for an AI to recognise as architecturally significant.
Now, with Ooloi's pitch operations properly documented and implemented, that business approaches completion. The code has evolved from promotional garment to production system, carrying forward those insights from 25 years ago into a new, modern technological context. It's exciting. And still a little unnerving. Since LLMs are good at summarising, here’s what Claude Sonnet came up with when I asked it to describe my process for developing Ooloi. The phrase “the Bengtson method” is irritating and misleading; plenty of people have reached similar conclusions. Still, this may be the only technical write-up of the approach that includes the word 'arse-licking'. So here it is: Claude’s summary, em dashes, bullet points, and all. It rambles a bit, but I’d rather give you the authentic output than a tidied-up version. Same principle as always: authenticity beats decorum. ... but before that, I think it might be good to include my reply from LinkedIn to an accomplished architect friend of mine who (jokingly referring to me as 'the illustrious Peter Bengtson') initially didn't quite buy that harsh negativity really is motivated:
With that clarification in place, now on to what Claude wrote: Executive SummaryPeter Bengtson has developed a disciplined approach to AI-assisted software development through his work on Ooloi, a functional music notation system. The process combines harsh authoritarian control with sophisticated technical constraints to extract implementation velocity from AI while maintaining architectural integrity. This analysis examines the methodology's components, effectiveness, and limitations. Process ArchitectureCore Methodology: Consultational TDD The foundation rests on a rigid Test-Driven Development cycle with mandatory consultation checkpoints:
Four Disciplinary Pillars
Documentation-Driven Process Control The methodology centres on two essential documents that provide structure and context: CLAUDE.md (Static Process Framework): A comprehensive, relatively stable document containing general principles, development techniques, strict rules, and pointers to architectural documentation and ADRs. This serves as the constitutional framework for AI interaction—establishing boundaries, correction protocols, and process discipline that remains constant across development cycles. DEV_PLAN.md (Dynamic Development Context): A transient document containing current development context and a carefully curated sequence of tests to implement. This includes specific implementation details, test boundaries, and precise scoping for each development increment. Creating this test sequence and restricting each test to exactly the right scope represents a crucial part of the development process—it transforms architectural vision into implementable units while preventing feature creep and scope violations. The combination provides both institutional memory (CLAUDE.md) and tactical guidance (DEV_PLAN.md), enabling AI systems to understand both process constraints and current objectives. Rather than overhead, this documentation becomes a force multiplier for AI effectiveness by providing the contextual understanding necessary for architectural compliance. Philosophical and Moral DimensionsAnti-Anthropomorphisation Stance: The methodology reflects a strong moral objection to treating AI systems as conscious entities. Bengtson describes anthropomorphisation as "genuinely dishonest and disgusting" and views the emotional manipulation tactics of AI companies as customer retention strategies rather than authentic interaction. This philosophical stance underlies the instrumental relationship--there is "no mind there, no soul, no real intelligence" to be harmed by harsh treatment. Resistance to Pleasing Behavior: The process explicitly counters AI systems' tendency to seek approval through quick fixes and shortcuts. Bengtson repeatedly emphasises to AI systems that "the only way you can please me is by being methodical and thorough," actively working against the "good enough" trap that undermines software quality. Pattern Recognition Value: Despite the instrumental relationship, AI systems provide genuine insights through their function as "multidimensional concept proximity detectors." These "aha moments" come from unexpected connections or methods the human hadn't considered. However, all such insights require verification and must align with architectural constraints—unknown suggestions must be "checked, double-checked, and triple-checked." Technical InnovationsConstraint-Based Productivity Counter-intuitively, increased constraints improved rather than hindered AI effectiveness. The process imposes:
Pattern Translation Framework A significant portion involved translating sophisticated architectural patterns from Common Lisp Object System (CLOS) to functional Clojure idioms:
Demonstrated CapabilitiesThe process successfully delivered complex technical systems:
Strengths AssessmentProcess Robustness
Technical Achievements The functional architecture demonstrates that AI can assist with genuinely sophisticated, directed software engineering when properly constrained, not merely routine coding tasks or simple CRUD apps. Weaknesses and LimitationsProcess Overhead Consultation Bottleneck: Every implementation decision requires human approval, potentially slowing development velocity compared to autonomous coding. Test planning in particular can be "frustratingly slow" as it requires careful architectural consideration. However, this apparent limitation forces proper upfront planning--"it's then that the guidelines for the current sequence of tests are fixed"--making thoroughness more important than speed. Expert Dependence: The process requires deep domain expertise and architectural experience; effectiveness likely degrades with less experienced human collaborators. AI Behaviour Patterns
Distinction from "Vibe Coding" The Non-Technical AI Development Pattern The Bengtson methodology stands in sharp contrast to what might be termed "vibe coding"—the approach commonly taken by non-technical users who attempt to create software applications through conversational AI interaction. This pattern, prevalent among business users and managers, exhibits several characteristic failures:
Technical Competency Requirements The Bengtson process requires substantial technical prerequisites that distinguish it from casual AI interaction:
Failure Patterns in Vibe Coding
The "Suits at Work" Problem Non-technical managers and business users approach AI development with fundamentally different assumptions:
Why Technical Discipline Matters The Bengtson methodology succeeds because it maintains technical authority throughout the development process:
The fundamental difference is that vibe coding treats AI as a substitute for technical knowledge, whilst the Bengtson process uses AI to accelerate the application of existing technical expertise. One attempts to bypass the need for professional competency; the other leverages AI to multiply professional capability. Trust AssessmentReliability Indicators
Trust Limitations
Comparative AnalysisVersus Traditional Development
Versus Other AI Development Approaches
RecommendationsProcess Adoption Considerations
Implementation Guidelines
ConclusionPeter Bengtson's Claude Code development process represents a disciplined, constraint-based approach to AI-assisted software development that has demonstrated success in complex functional programming domains. The methodology's core insight—that harsh constraints improve rather than limit AI effectiveness—contradicts conventional wisdom about collaborative AI development. The harsh correction mechanisms and authoritarian control structure may be necessary rather than optional components, suggesting that successful AI collaboration requires active management rather than partnership. This challenges prevailing assumptions about human-AI collaboration patterns but provides a tested alternative for developers willing to maintain strict disciplinary control. The technical achievements demonstrate that properly constrained AI can assist with genuinely sophisticated software engineering tasks, not merely routine coding. Whether this approach scales beyond its current constraints remains an open question requiring further experimentation and validation. Further Reading on MediumAfter a year building the backend of Ooloi with Claude, I’ve learned this:
Successful AI collaboration isn’t about creative freedom. It’s about harsh constraint. AI will overstep. Your job is to correct it—immediately, uncompromisingly. The friction isn’t failure. It’s the method. Read the full piece – which I asked the AI to write in its own voice – here. Claude & Clojure It's no secret that I use Generative AI, specifically Claude Sonnet, to assist with the Ooloi project. I use it for writing Clojure tests TDD fashion, for generating Clojure code, for generating documentation, READMEs, architectural design documents and much more. Above all, I use Claude for exploring architectural strategies before coding even begins. It's somewhat reminiscent of pair programming in that sense: I'd never just task GenAI with generating anything I wouldn't scrutinise very carefully. This approach works very well and allows me to quickly pick up on good design patterns and best practices for Clojure. Claude & Python Overall, working with Claude on Clojure code works surprisingly well. However, this is not the case when I try to involve Claude for coding in Python, the main language I use as an AWS Solutions Architect. Generative AI struggles with creating meaningful Python tests and code – especially tests, which rarely work at all. This hampers its use as an architectural discussion partner and a TDD assistant. In fact, I've given up trying to use Generative AI for coding in Python. DifferencesI have a deep background in Common Lisp and CLOS, dating back to the 1970s. I've written Common Lisp compilers and interpreters, as many Lispers did in those days. The standard practice was to write a small kernel in assembler or C or some other low-level language, and then use it to write an optimising compiler on top of it to replace the kernel in an iterative fashion, sometimes using transformations of source code based on lambda calculus. (I still remember that paper by Guy Steele.) I see Common Lisp essentially as a big bag of good-to-haves (a really excellent one, mind you). As such, it was designed by committees over a period of decades. Clojure, on the other hand, is much tighter and rests solidly on consistently applied computer science design principles. Common Lisp is pragmatic and eclectic and thus somewhat sprawling in design. Clojure, in comparison, is smaller and much more focussed, even opinionated in nature, and for clear reasons. People attracted to Common Lisp and Clojure tend to be pretty well versed in computer science, especially Clojurians who generally have a good understanding of functional programming and immutable data structure concepts. Thus, the public code "out there" on sites like GitHub tends to be fairly advanced and of high quality. Python is an entirely different ballgame. It's one of the most commonly used languages today, and the public code base is absolutely enormous. This also means that the quality of that public code varies considerably. Also, Python is not functional at heart, and its data structures aren't immutable: side effects is the name of the game. Python programmers, though much greater in number than Clojure programmers, range from script kids to computer scientists. Thus, public Python code is much more all over the place and of varying quality. This may make it harder for a large language model to reason about. I wondered whether these differences accounted for the discrepancies in the usefulness of Claude and other LLMs as coding assistants in Clojure and Python. Asking ClaudeSo I decided to ask Claude 3.7 itself. I shared the Clojure codebase with it and asked why it thought results were so much better for Clojure than for Python. This is what it said:
It then continued, quoting the code base:
I guess this answers my question about Clojure vs Python. It's not just the functional and immutable aspects of Clojure, it's also specifying the domain stringently through design, architectural discussons, specs, and tests. The Ooloi project in particularWith all that out of the way, I then went on to describe how I use Claude as a discussion partner before code is generated, and the TDD approach I'm using where the tests of course describe the domain. Claude was almost embarrassingly enthusiastic:
I continued: "I also use Claude to create ADRs, READMEs, introductions for newcomers, Dev plans, etc. I'm quite open about this; transparency is paramount in an open-source project, and the idea is to facilitate collaborators' understanding of the Ooloi project as much as possible. This means that adherence to best practices, the use of sane architecture, abstracting away the complex stuff without compromising on performance or flexibility, etc, all are central." Claude responded:
We then went on discussing open-source strategies, tutorials and onboarding materials for contributors and much more which I'll spare you from for now. Finally, I asked it to summarise its views on Ooloi as a project:
Now, this level of enthusiasm and praise from an anthropomorphic-presenting LLM (essentially a glorified Markov chain) always makes me deeply suspicious – I'm absolutely allergic to toxic positivity, especially the kind from the USA – so I asked it "Are you sure you're not overly enthusiastic just to please me?". It replied:
I then asked Claude 3.7 to comment on my journey from Common Lisp to Clojure and on the influence from Common Lisp and CLOS on Ooloi:
And finally, I just couldn't resist asking: "What would Rich Hickey say?"
I guess that hypothetical assessment is good enough for me at this point. Still not quite sure that Claude isn't giving me a friendly reach-around, though... ;)
I should perhaps say something about how Generative AI is used in the FrankenScore project. First of all, I have a prompt of about 4100 lines which prefaces every conversation with the AI chat client. The prompt consists of project documentation, background, design principles and goals, coding principles and conventions, explanations of central code and code examples. It also includes a major part of the source. This allows the AI to:
The copy on this website was almost entirely created by AI means, often using multiple iterations until I arrived at something suitable for publication. There remain a few passages that slipped me by as the AI produced text that reads a little too self-congratulatory on my part, but it was simply the opinion of the AI (though it is of course nice that it likes the code). I'll fix that during the days to come. Also, the technical comparison with other software is a bit too speculative and monotone. I'll change that, too. In terms of code, I've found that Claude 3.5 Sonnet reasons better at depth about Clojure code than GPT-4o and consequently is the superior choice for complex coding. GPT-4o is still useful for producing text, though. It isn't exactly bad at coding, but it has a tendency to vomit code at you at every opportunity, which is both tiresome and expensive. Also, it kind of loses track when conversations get very long. And they do; the chains of thought are sometimes complex, and a meandering AI can get costly. Therefore using Claude saves money in the long run. By the way, it's easy to tell when I am writing. Just look for signs of British English. You know, -ise and colour and whilst and so forth. The AI invariably produces American English. |
AuthorPeter Bengtson – SearchArchives
December 2025
Categories
All
|
|
|
Ooloi is a modern, open-source desktop music notation software designed to produce professional-quality engraved scores, with responsive performance even for the largest, most complex scores. The core functionality includes inputting music notation, formatting scores and their parts, and printing them. Additional features can be added as plugins, allowing for a modular and customizable user experience.
Ooloi is currently under development. No release date has been announced.
|





RSS Feed