---
title: On the Design of a Parser
published: 2016-01-12
tags: Thermoprint
---

The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
-- more specifically the contents of the directory `bbcode` in the
[git repo](git://git.yggdrasil.li/thermoprint#rewrite).

In a manner consistent with designing software as
[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
our solution (as illustrated by the following mockup):

~~~ {.haskell}
-- | Our target structure -- a rose tree with an explicit terminal constructor
data DomTree = Element Text (Map Text Text) [DomTree]
             | Content Text
             deriving (Show, Eq)
			 
type Errors = ()

bbcode :: Text -> Either Errors DomTree
-- ^ Parse BBCode
~~~

Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types).

~~~ {.haskell}
data Token = BBOpen Text
	       | BBClose Text
		   | BBStr Text

tokenize :: Parser [Token]

type Error' = ()

runTokenizer :: Text -> Either Error' [Token]
~~~

We have now reduced the Problem to `[Token] -> DomTree`.
We quickly see that the structure of the problem is that of a
[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).

Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target
structure.