blob: 6b2a022b06c6f56f15efeb876eeb31f46b8a3aa0 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
---
title: On the Design of a Parser
published: 2016-01-12
tags: Thermoprint
---
The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
-- more specifically the contents of the directory `bbcode` in the
[git repo](git://git.yggdrasil.li/thermoprint#rewrite).
In a manner consistent with designing software as
[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
our solution (as illustrated by the following mockup):
~~~ {.haskell}
-- | Our target structure -- a rose tree with an explicit terminal constructor
data DomTree = Element Text (Map Text Text) [DomTree]
| Content Text
deriving (Show, Eq)
type Errors = ()
bbcode :: Text -> Either Errors DomTree
-- ^ Parse BBCode
~~~
Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types).
~~~ {.haskell}
data Token = BBOpen Text
| BBClose Text
| BBStr Text
tokenize :: Parser [Token]
type Error' = ()
runTokenizer :: Text -> Either Error' [Token]
~~~
We have now reduced the Problem to `[Token] -> DomTree`.
We quickly see that the structure of the problem is that of a
[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).
Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target
structure.
|