--- title: On the Design of a Parser published: 2016-01-12 tags: Thermoprint --- The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode) -- more specifically the contents of the directory `bbcode` in the [git repo](git://git.yggdrasil.li/thermoprint#rewrite). In a manner consistent with designing software as [compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of our solution (as illustrated by the following mockup): ~~~ {.haskell} -- | Our target structure -- a rose tree with an explicit terminal constructor data DomTree = Element Text (Map Text Text) [DomTree] | Content Text deriving (Show, Eq) type Errors = () bbcode :: Text -> Either Errors DomTree -- ^ Parse BBCode ~~~ Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types). ~~~ {.haskell} data Token = BBOpen Text | BBClose Text | BBStr Text tokenize :: Parser [Token] type Error' = () runTokenizer :: Text -> Either Error' [Token] ~~~ We have now reduced the Problem to `[Token] -> DomTree`. We quickly see that the structure of the problem is that of a [fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html). Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target structure.