Understanding how monad works is a kind of mandatory rite of passage for any programmer aspiring to truly master the functional paradigm (of the advanced statically typed kind). Unfortunately it is an unforgiving trial: in spite of the large amount of tutorials out there the construction itself just seem to have some kind of irreducible complexity whose reason is mostly opaque to neophytes.

Interestingly, although I mainly studied math when I was an undergrad, I first came to learn monad through the usual programmer’s way - see some hand-wavy explanation of what it is/does, see introductory examples of monads, read the abstract interface, read (and try to understand) more examples, struggle, then “think” that I understood it, realize (a few months to a year later) that I haven’t - that I have missed some facet of it, rinse and repeat, despair at some point that I am probably never going to truly master it…

Until I study monad from the perspective of a pure mathematician with no thought given to applications. Then everything clicked. “Why don’t you tell me this earlier?”

So as a service mainly to myself, I’m writing this series of post retracing the story of monad (cleaned up of course). In clear violation of the warning of the infamous Monad Tutorial Fallacy. You have been warned.

In retrospect, the reasons that learning-through-example/pattern didn’t work for me is because:

  1. The pattern emerging from the examples is not the monad itself (at least, not its interface nor its implementation). It is sort of like the mismatch of thinking in the problem domain vs the solution domain.
  2. Unlike interface in traditional OOP languages, there are variants of the monad interface(s) with translation “formula” between them - a situation where Haskell typeclass shines but which add confusion to beginners. More importantly, unlike other context where things decompose obviously, the “moving pieces” of the monad interface seems to be tightly coupled to each other and look like the wrong decomposition, so that while it is possible to say understand any of its parts if I focus my mind on it for a while, it is hard to hold a mental conception/model of the whole thing in my head in one piece. Exactly the same situation where understanding a large, interlocking system is scary because you can’t really hold the full picture in your head and have to work on it piece-by-piece, always worrying about possible side effects.

Okay, let’s begin the story. (Prerequisite: Bachelor’s degree in Pure Math Knowledge of some category theory and basic abstract algebra)

(Warning to actual mathematician: skipping all proofs as this is a tech blog and not a math blog, and because you can find them elsewhere/do it yourself anyway. Or is it just a lame excuse on my part to not make this post ridiculously long?)

Before Monad, there’s the Adjoint Functor. But even that is a pretty mysterious abstraction as impenetrable as the monad. So let’s talk about concrete prototypical example, which in our story will be the free algebra.

Recall that given any unstructured set , the free algebra generated by /over , denoted , is the algebra formed by taking all formal expressions over with the obvious algebraic operations, and where elements are identified if and only if they can be proved to be identities under the axioms of the algebraic system we are using. (So is the same element as .) The intuition for this construction is that it is a tautological one: the formal expressions represent things that simply have to exists in any algebra containing (e.g. If are in , and we have some algebra containing , then necessarily is in by closure law), and then we impose the minimal constraint on the generated algebra by identifying only what have to be (those that can be proved to be equivalent).

Now the categorical way to think of free algebra is that they are a universal construction: There is an obvious embedding of into , and for any other algebra that contains (through the embedding ), we always have an unique homomorphism so that the diagram below commutes:

If you actually read category theory textbook you may find the above diagram doesn’t look quite right: the problem is that we have been a bit loose and didn’t clearly distinguish which object belongs in which category: we can’t really just say as is in the category of set, while is in the category of algebra. We can fix this (at the cost of making the diagram more complicated looking) by using the forgetful functor , which just stripe away all the algebraic operations, leaving only the set of elements. So the embedding become: as a map/morphism between sets. And voila - this is the natural transformation called the unit.

There is also a nifty idea that will become critical later on: adjoint functor is a theory of duality, and whenever there is such an adjunction the properties outlined above also work if we flip all the arrows (i.e. take the dual in category-speak). In particular there would naturally be a morphism for any algebra , called the counit. But what is this? is the free algebra whose underlying elements are , or just the elements of . So ’s elements are formal expressions, where all those variables take values in , where algebraic operations are well defined since is an algebra. E is an expression evaluator!

But there’s still one thing more (at least). The duality/symmetry of an adjunction become apparent once we express it in terms of a natural isomorphism between two hom-sets. (In fact this representation is the key ingredient in the proof of the duality mentioned above anyway) Packaged up in a nice formula, it reads: . What does it mean? Well, similar to how any linear transformation is uniquely determined by the values it takes on any basis of the source vector space, any homomorphism from the free algebra to any other algebra , is uniquely determined by the value it takes over a natural basis, which is just the underlying set/set of variables . Equivalently, any such map induces a full homomorphism (and not just a map of sets) over by the obvious algebra law (e.g. If and , then can be defined as ). The reason for the appearance of the forgetful functor is the same as in the discussion on universal property.

Now we mostly understand Adjoint Functors (excluding the more theory side of things, such as the Special and General Adjoint Functor Theorem for existence/construction of from ), but so what? Here comes the important part: we can interpret (typed) pure functional programming through category theory, and a suitably comprehension interpretive framework serves as the foundation for understanding monad, among other things.

In the simplest situation, a Category is a typed version of the composition higher order function. The objects in the Category are the type of the system, morphisms from to are the pure (computable) functions of type , and composition of arrows (in the category) correspond to function compositions (in our programming model). Even though it appears to be trivial and useless, this naive model can be extended and changed in various way, and you may even specify your own Category/Programming Model! (I promise to give examples of these in Part 3 or something of this series)

Adjoint Functors represents the duality between an unstructured, raw computational model/domain versus a structured, contextual computational model/domain. So we first design and specify a custom-made Category (where all its element - object, morphism and composition - need not be the standard/usual one) that capture/model the particular context we want, and Category remains the standard one in last paragraph. Then the functors and provides the natural translation between these two contrasting model of computation. In particular is an embedding functor: any computation can be trivially considered as computation with context by simply carrying out all computation without using or depending on the context at all. (In practice though describing is not trivial since category is itself not trivial) On the other hand, any computation with context is still a computation (Church-Turing thesis), so interpret it as plain computation by forgetting/stripping away the extra structure (but no information is lost - it is the marking of context, and not the content of the context, that is being stripped). In more silly terms, tells you that you can technically write assembly in any high level (Turing complete) language, while tells you that ultimately any such high level language will be compiled down into assembly anyway.