summaryrefslogtreecommitdiff
path: root/provider/posts
diff options
context:
space:
mode:
Diffstat (limited to 'provider/posts')
-rw-r--r--provider/posts/thermoprint-4.md48
1 files changed, 48 insertions, 0 deletions
diff --git a/provider/posts/thermoprint-4.md b/provider/posts/thermoprint-4.md
new file mode 100644
index 0000000..6b2a022
--- /dev/null
+++ b/provider/posts/thermoprint-4.md
@@ -0,0 +1,48 @@
1---
2title: On the Design of a Parser
3published: 2016-01-12
4tags: Thermoprint
5---
6
7The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
8-- more specifically the contents of the directory `bbcode` in the
9[git repo](git://git.yggdrasil.li/thermoprint#rewrite).
10
11In a manner consistent with designing software as
12[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
13our solution (as illustrated by the following mockup):
14
15~~~ {.haskell}
16-- | Our target structure -- a rose tree with an explicit terminal constructor
17data DomTree = Element Text (Map Text Text) [DomTree]
18 | Content Text
19 deriving (Show, Eq)
20
21type Errors = ()
22
23bbcode :: Text -> Either Errors DomTree
24-- ^ Parse BBCode
25~~~
26
27Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
28the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
29combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types).
30
31~~~ {.haskell}
32data Token = BBOpen Text
33 | BBClose Text
34 | BBStr Text
35
36tokenize :: Parser [Token]
37
38type Error' = ()
39
40runTokenizer :: Text -> Either Error' [Token]
41~~~
42
43We have now reduced the Problem to `[Token] -> DomTree`.
44We quickly see that the structure of the problem is that of a
45[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).
46
47Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target
48structure.