summaryrefslogtreecommitdiff
path: root/provider/posts/thermoprint-4.md
blob: 6b2a022b06c6f56f15efeb876eeb31f46b8a3aa0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: On the Design of a Parser
published: 2016-01-12
tags: Thermoprint
---

The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
-- more specifically the contents of the directory `bbcode` in the
[git repo](git://git.yggdrasil.li/thermoprint#rewrite).

In a manner consistent with designing software as
[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
our solution (as illustrated by the following mockup):

~~~ {.haskell}
-- | Our target structure -- a rose tree with an explicit terminal constructor
data DomTree = Element Text (Map Text Text) [DomTree]
             | Content Text
             deriving (Show, Eq)
			 
type Errors = ()

bbcode :: Text -> Either Errors DomTree
-- ^ Parse BBCode
~~~

Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types).

~~~ {.haskell}
data Token = BBOpen Text
	       | BBClose Text
		   | BBStr Text

tokenize :: Parser [Token]

type Error' = ()

runTokenizer :: Text -> Either Error' [Token]
~~~

We have now reduced the Problem to `[Token] -> DomTree`.
We quickly see that the structure of the problem is that of a
[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).

Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target
structure.