Elegant Engineering at Plum

Elegance, maintainability, and sound constitution are properties of robust software product. I know Haskell is a significant reason why I can say that our cloud services at Plum exhibit these properties; I also think another, harder to quantify set of contributing properties is of the implementors and their analytical density[1], cognitive flexibility, and cultivated intuition (concomitant with the mental paradigm shift a programmer experiences once their mental model of Haskell has reached escape velocity).

In this article I want to discuss the state of Plum’s software before my arrival, how we’re now using Haskell and why, what my experience so far has been with the technology, and why simple elegance is a stigmatized but causal factor in the production of sound software.

Arriving at Plum — a contractor — I was tasked with modernizing developer operations and server configuration automation. The timing was auspicious because my experience at Curb[2] provided me with a foundation in the technical challenges of internet-connected home devices and the Internet of Things market; me and the founders of Plum are also fraternal Techstars founders, we understand what each have been going through in building tech startups (hint: it’s incredibly stressful and comraderie is a force multiplier).

One salient and relevant adventure at Curb was the wholesale re-engineering of every component in the software product using Haskell for the big and critical components (the system holistically had no conceptual integrity). That experience at Curb proved crucial in the coming months when I accepted full-time employment at Plum to lead the cloud services and mobile app teams.

Legacy

The original cloud services were a prototype — much of the system design was preliminary, the mobile app’s cloud HTTP API was changing often and the early choices of technology focused on prototyping and fleshing out the ideas but weren’t scaling well and needed replacement before we went to full production. That’s often the point of no return for products; if it aint broke, even if limited, don’t rewrite Gen One because doing so can kill your business before it’s mature enough to absorb the cost of doing so.

The legacy stack was Python, CherryPy, and MongoDB. I am now of the firm belief that any dynamically typed language, save for Erlang[3], is inappropriate for programs or systems of significant size and mission criticality. I will dig into why later.

MongoDB needed to go. It served the intent to build a prototype but its mantra of being an indiscriminate bucket for bits is poisonous to software that needs a structured data model with non-trivial relationships (suitable for relational calculus). MongoDB foists the burden of gut-model sanity on to the programmer, that is dangerous because humans are messy.

Before you say it, JSON Schema isn’t used and I also think it’s not the correct solution. One strongly and statically typed reference implementation, say in Haskell, for generating clients with is a more dependable and safe strategy.

Rewrite! Rewrite!

The software required a significant refactoring because the default MongoDB Python library isn’t a true Object Relational Mapper, therefore no abstraction of the data model was present to insulate any of the inter-dependent microservices; that led to a pernicious ambiguity infecting nearly every unit of the cloud services code[4]. With no formal or explicit capture of effects and no abstraction of the data model, we possessed no conceptual integrity! This caused the prototype, as we relied on it more and more, to become unmaintainable, opaque (as in, behavior surprised everyone regularly), and devoid of rigorous edge-case understanding.

In a system composed of three, individually complex entities (cloud, mobile, and firmware) this is an untenable state to be in.

Part-way through refactoring the Python codebase it occurred to me that, like Curb, I could refine the concept into a well-crafted and production ready system by burning the world then rebuilding it with Haskell. That I may wipe out whole classes of bugs our beta users, the mobile team, and the firmware team were experiencing; guaranteeing that we could have an unfriable body of work, refactorable or extendable without fear of breaking something or introducing regressions. That I would need fewer engineers as we grow; bringing them up-to-speed is easier in Haskell because codified knowledge is in the types and if one is diligent to communicate thought with idiomatic Haskell it can be self-documenting.

I rebuilt the engine while driving and well enough that it will grow with the acceleration of the business. Technical credit instead of debt, not that we are debt-free but that we are well within our allowable band and any debt present in the cloud services code or design is miniscule enough for this stage of the product that we can just tag it and categorize it.

Circumscribing Effects

The rewrite went well and took a total of three weeks for me to replace five distinct programs that compose Plum’s cloud services, entirely in Haskell. The end result had dramatically fewer bugs, handled a swathe of edge-cases, and the migration was smooth.

The databases were in place and using an explicitly defined schema. The JSON schema for the AMQP interface to the devices and REST API for the mobile apps are stronger by virtue of Haskell’s algebraic data types and the brilliant Aeson library (you get schema validation for free via parsing into ADT’s with instances and the Maybe / Either types).

We model data for PostgreSQL in our programs with the persistent package so we can have a globally common library defining the models and types that all of the programs interacting with the DB depend upon (the API servers, the administrative web application, the AMQP router, etc…). Joel has produced a thin semantic query layer on-top of Esqueleto (a Haskell DSL for building type-safe SQL queries) for describing what we want from the DB, instead of writing and duplicating queries, further abstracting the interface to the data so the model is not only consistent but the methods for querying that model are easily understood and consistent across the different programs.

The explicit capture of effects is a crucial feature of Haskell that has made this real-world system robust, maintainable, well-understood, and safe. The closest software can get to being an engineering discipline!

Elegance and Analytical Density

Humans may be fallible but we are also appreciators of elegance. Haskell not only has a modern type system, the benefits of which I outlined above, but it also takes the concept of composition seriously. Some programming language cultures eschew elegance, conflating it with cleverness; industries squash elegance for being a “waste of capital”, sacrificing good potential at the altar of “just get it done”. To be fair, the technology for writing elegant software safely and quickly didn’t become viable until around 2005. Elegance is the chunking of clear thought, simple composition of small, tractable units of logic, and interfaces that communicate intent without ambiguity or unnecessary complexity.

Fault Tolerance

Taking humans out of the equation is so underappreciated and absolutely a form of fault-tolerance. Removing the burden of types, building and deploying programs, configuring and maintaining servers, or diffusing knowledge would make a software product and its organization much more tolerant of human mistakes!

In taking account of my human fallibility and making the machine responsible for my imperfections frees me the programmer, me the human, to be creative and solution oriented instead of burdened with cognitive work that can be automated with but a bit of knowledge and fastidious up-front application of effort.


Analytical Density
Not dense as in slow or dull, but dense as in compact and efficient, like the transistor density of modern CPUs
Leaving Curb
Curb, A Closed Chapter
Erlang and Dynamic Types
I dislike dynamically typed languages but in this one instance, Erlang’s designers deliberately chose this model not because “types are a pain in the ass” but because hot code-reloading in a virtual machine with distributed nodes running the same software but upgrading to the new code at unpredictable times presents significant difficulties in a statically typed program that is not valid unless its types are correct. Plus, Erlang does have type annotations and a useful static analyzer the Dialyzer.
Loosey Goosey
JSON is loosey goosey, so are dynamic types, and so are humans. Combining the fallible human being with tools that allow that human being to be as loosey goosey as they want is a recipe for the waste that is writing unit-tests to capture things a good type system can catch. I’m not shitting on tests, we write them in Haskell too, but we only write them for the things we cannot codify in the type system.