Architecting Functional Programs

Software architecture is a hazy term. Much like functional programming it can be a bit hard to define in exact terms. I like to think of Software Architecture as the techniques used to build large software systems, with many components and teams involved.

While ideas from the FP community are now becoming widespread in mainstream languages, there seems to be little cross pollination when it comes to Software Architecture. There's a lot that has happened in the past couple of decades that has proven to be valid when it comes to building large systems, yet these topics are seldom discussed in functional programming circles.

The main issue, I think, is that Software Architecture is very much an inexact science and therefore not particularly appealing to the FP crowd. The tools we work with optimize for software in the small but are not as effective in the large. When we can compile our program as a single unit, we can benefit greatly from static typing. In a large system, a single compilation unit may represent only a tiny portion of the whole, making compile time checks helpful but definitely not sufficient.

One area where this is apparent is the mantra of making "impossible states unrepresentable". That's a phrase that gets thrown around a lot and perfectly describes what a team can aspire to when employing functional programming to build software in the small. Static typing and fancy types can definitely help here, crafting the perfect model that can't be shaken by any invalid runtime value. In large systems, such a worthy goal becomes unattainable: the very definition of "impossible" has to be reconsidered when many ever evolving components are involved. Messages will be lost, or arrive in an unexpected format because of uncoordinated deployments, or with missing information. Assumptions will be broken. Yet the system must keep chugging along, that's what makes Software Architecture so hard yet so fascinating.

At some point, when a system grows and starts incorporating different tech stacks and programming languages, there must be a set of guidelines that teams follow to avoid chaos. Compilers are insufficient at this stage if not almost useless. Software in the large is not a technical problem anymore. It only works if the people and systems involved can manage to communicate effectively and collaborate with each other.

To that end, a set of techniques that has gained widespread adoption is Domain Driven Design. DDD puts a big emphasis on people and communication, considering them a key part of the software process. One core insight from this is that engineering teams should have a Shared Vocabulary with business, so that everyone can understand and reason about the system. The other simple but powerful idea is that of the Bounded Context: a defined boundary within which a particular model is applicable. Within the boundary, we ensure that the same terminology and concepts have a consistent meaning. Consider a "Customer" model which may be extremely rich in information and business logic in a service that serves a website, while it may just contain a couple of fields for a notification service that only sends out emails.

The most powerful idea behind DDD is that events should be front and foremost when designing a system. An event describes something that happened, with some data and the ID of the entity that was involved. The classic example is the OrderConfirmed event in an e-commerce platform. Instead of writing a row in the orders table, we just append an event to the event log so that anybody interested can consume this information.

This simple idea is referred to as Event Sourcing. The core tenets are that the event log is durable so events are never lost and it is append-only. This opens up many possibilities, such as disparate services being able to replay the log and extract data that wasn't previously relevant. Each service can choose the most appropriate representation for the read-model, which is the actual data layer used for querying. An event log allows for data to be stored in the most appropriate system, such as a database or a fast in-memory cache. Instead of doing destructive operations on a RDBMS for example, like adding a column to a table and painfully backfilling all the relevant data, with Event Sourcing it might just be easier and convenient to build a new version of the same table with the updated schema, replaying the log to populate it from scratch.

The traditional alternative to this approach is to just have services talk via RPC. While RPC may seem easier on the surface, it actually has a number of drawbacks. First, it's common to pick a schema language such as Protocol Buffer and derive the appropriate data types for the programming language in use. This sounds great, but in practice there's very little typing going on: the most recent version doesn't even allow for fields to be declared as required. This is because a message may pass through a number of services, not all of which may be able to decode every field in the request. A schema change becomes so cumbersome and error prone that it's just easier to say that all fields are optional. Compare this to events, which are strongly typed and their schema can be relied upon.

Another issue with large systems communicating over RPC is that components tend to be tightly coupled. In order to gather the data a service needs, it asks for it from multiple sources, each responsible for a piece. The big problem that often arises is that a service doesn't quite respond with the data as the other side expects: maybe the format is different or a key property is missing. This is a huge issue because now one team has to convince the other to implement another endpoint or somehow include the missing data even though it doesn't look particularly necessary from their perspective. Event sourcing solves this because each team can consume the full stream of events, plucking things out and aggregating data as they see fit.

There's a lot more to DDD and Event Sourcing but I strongly believe they're the best tool available to architect large systems. Many concepts in Event Sourcing in particular will resonate with FP practitioners, making these methodologies more approachable and worthwhile to learn about.


Back home