summaryrefslogtreecommitdiff
path: root/provider
diff options
context:
space:
mode:
Diffstat (limited to 'provider')
-rw-r--r--provider/posts/thermoprint-4.md90
1 files changed, 79 insertions, 11 deletions
diff --git a/provider/posts/thermoprint-4.md b/provider/posts/thermoprint-4.md
index 6b2a022..2606c06 100644
--- a/provider/posts/thermoprint-4.md
+++ b/provider/posts/thermoprint-4.md
@@ -17,32 +17,100 @@ our solution (as illustrated by the following mockup):
17data DomTree = Element Text (Map Text Text) [DomTree] 17data DomTree = Element Text (Map Text Text) [DomTree]
18 | Content Text 18 | Content Text
19 deriving (Show, Eq) 19 deriving (Show, Eq)
20
21type Errors = ()
22 20
23bbcode :: Text -> Either Errors DomTree 21bbcode :: Text -> Maybe DomTree
24-- ^ Parse BBCode 22-- ^ Parse BBCode
25~~~ 23~~~
26 24
27Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using 25Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
28the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser 26the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
29combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types). 27combinators instead.
28
29We reproduce an incomplete version of the lexer below (it’s missing tag attributes and self-closing tags).
30
31We introduce `escapedText`, a helper function for extracting text until we reach one of a set of delimiting characters
32(exclusive).
33While doing this we also parse any delimiting character iff it's prefixed with an escape character (we use `\\`) -- the
34escape character itself needs only be escaped if encountered directly before one of the delimiting characters.
30 35
31~~~ {.haskell} 36~~~ {.haskell}
32data Token = BBOpen Text 37data Token = BBOpen Text -- ^ "[open]"
33 | BBClose Text 38 | BBClose Text -- ^ "[/close]"
34 | BBStr Text 39 | BBStr Text -- ^ "text"
35 40
36tokenize :: Parser [Token] 41token :: Parser [Token]
42token = BBClose <$ "[/" <*> escapedText' [']'] <* "]"
43 <|> BBOpen <$ "[" <*> escapedText' [']'] <* "]"
44 <|> BBStr <$> escapedText ['[']
45
46escapedText' :: [Char] -> Parser Text
47escapedText' = option "" . escapedText
37 48
38type Error' = () 49escapedText :: [Char] -> Parser Text
50escapedText [] = takeText -- No delimiting characters -- parse all remaining input
51escapedText cs = recurse $ choice [ takeWhile1 (not . special) -- a series of characters we don't treat as special
52 , escapeSeq -- an escaped delimiter
53 , escapeChar' -- the escape character
54 ]
55 where
56 escapeChar = '\\'
57 special = inClass $ escapeChar : cs
58 escapeChar' = string $ T.singleton escapeChar
59 escapeSeq = escapeChar' *> (T.singleton <$> satisfy special) -- escape character followed by a special character (which encludes the escape character)
60 recurse p = mappend <$> p <*> escapedText' cs -- parse a prefix and optionally append another chunk of escapedText
39 61
40runTokenizer :: Text -> Either Error' [Token] 62runTokenizer :: Text -> Maybe [Token]
63runTokenizer = either (const Nothing) Just . parseOnly (many token <* endOfInput)
41~~~ 64~~~
42 65
43We have now reduced the Problem to `[Token] -> DomTree`. 66We have now reduced the Problem to `[Token] -> DomTree`.
44We quickly see that the structure of the problem is that of a 67We quickly see that the structure of the problem is that of a
45[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html). 68[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).
46 69
47Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target 70Having realised this we require a function of type `Token -> DomTree -> DomTree` to recursively build up our target
48structure. 71structure.
72
73In general we’ll want to not only keep track of the `DomTree` during recursion but also maintain a reference to the
74position at which we’ll be inserting new tokens.
75This kind of problem is well understood and solved idiomatically by using a
76[zipper](https://en.wikipedia.org/wiki/Zipper_(data_structure))
77([a cursory introduction](http://learnyouahaskell.com/zippers)).
78
79Writing zippers tends to be tedious. We’ll therefore introduce an
80[additional intermediate structure](https://hackage.haskell.org/package/containers/docs/Data-Tree.html) for which an
81[implementation](https://hackage.haskell.org/package/rosezipper) is available readily.
82The morphism from this new structure (`Forest BBLabel`) to our `DomTree` will be almost trivial.
83
84~~~ {.haskell}
85import Data.Tree.Zipper (TreePos, Empty, Full)
86import qualified Data.Tree.Zipper as Z
87
88data BBLabel = BBTag Text
89 | BBPlain Text
90
91rose :: [BBToken] -> Maybe (Forest BBLabel)
92rose = Z.toForest <$> foldM (flip rose') (Z.fromForest [])
93
94rose' :: BBToken -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
95rose' (BBStr t) = return . Z.nextSpace . Z.insert (Node (BBPlain t) []) -- insert a node with no children and move one step to the right in the forest we’re currently viewing
96rose' (BBOpen t) = return . Z.children . Z.insert (Node (BBTag t) []) -- insert the node and move into position to insert it's first child
97rose' (BBClose t) = close t -- haskell complains if multiple equations for the same function have a differing number of arguments, therefore: 'close'
98 where
99 close :: Text -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
100 close tag pos = do
101 pos' <- Z.parent pos -- fail if we're trying to close a tag that does not have a parent (this indicates imbalanced tags)
102 let
103 pTag = (\(BBTag t) -> t) $ Z.label pos' -- yes, this will fail unceremoniously if the parent is not a tag, this poses no problem since we're constructing the structure ourselves. The proof that this failure mode does not occur is left as an exercise for the reader.
104 guard (pTag == tag) -- The structure shows that this mode of failure (opening tags content does not match the closing tags) is not logically required -- it only serves as a *notification* to the user
105 return $ Z.nextSpace pos' -- move one level up and to point at the next sibling of the parent
106~~~
107
108All that is left to do now is present our final morphism:
109
110~~~ {.haskell}
111dom :: Forest BBLabel -> [DomTree]
112dom = map dom'
113 where
114 dom' (Node (BBPlain t) []) = Content t
115 dom' (Node (BBTag t) ts = Element t $ map dom' ts
116~~~