1 files changed, 116 insertions, 0 deletions
diff --git a/provider/posts/thermoprint/4.md b/provider/posts/thermoprint/4.md
new file mode 100644
index 0000000..756c166
--- /dev/null
+++ b/provider/posts/thermoprint/4.md
@@ -0,0 +1,116 @@
+---
+title: On the Design of a Parser
+published: 2016-01-12
+tags: Thermoprint
+---
+The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
+-- more specifically the contents of the directory `bbcode` in the
+[git repo](https://git.yggdrasil.li/thermoprint/tree/bbcode?h=rewrite&id=dc99dae).
+In a manner consistent with designing software as
+[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
+our solution (as illustrated by the following mockup):
+~~~ {.haskell}
+-- | Our target structure -- a rose tree with an explicit terminal constructor
+data DomTree = Element Text (Map Text Text) [DomTree]
+             | Content Text
+             deriving (Show, Eq)
+bbcode :: Text -> Maybe DomTree
+-- ^ Parse BBCode
+~~~
+Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
+the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
+combinators instead.
+We reproduce an incomplete version of the lexer below (it’s missing tag attributes and self-closing tags).
+We introduce `escapedText`, a helper function for extracting text until we reach one of a set of delimiting characters
+(exclusive).
+While doing this we also parse any delimiting character iff it's prefixed with an escape character (we use `\`) -- the
+escape character itself needs only be escaped if encountered directly before one of the delimiting characters.
+~~~ {.haskell}
+data Token = BBOpen Text -- ^ "[open]"
+           | BBClose Text -- ^ "[/close]"
+           | BBStr Text -- ^ "text"
+token :: Parser [Token]
+token = BBClose <$ "[/" <*> escapedText' [']'] <* "]"
+        <|> BBOpen <$ "[" <*> escapedText' [']'] <* "]"
+        <|> BBStr <$> escapedText ['[']
+                
+escapedText' :: [Char] -> Parser Text
+escapedText' = option "" . escapedText
+escapedText :: [Char] -> Parser Text
+escapedText [] = takeText -- No delimiting characters -- parse all remaining input
+escapedText cs = recurse $ choice [ takeWhile1 (not . special) -- a series of characters we don't treat as special
+                                  , escapeSeq -- an escaped delimiter
+                                  , escapeChar' -- the escape character
+                                  ]
+  where
+    escapeChar = '\\'
+    special = inClass $ escapeChar : cs
+    escapeChar' = string $ T.singleton escapeChar
+    escapeSeq = escapeChar' *> (T.singleton <$> satisfy special) -- escape character followed by a special character (which encludes the escape character)
+    recurse p = mappend <$> p <*> escapedText' cs -- parse a prefix and optionally append another chunk of escapedText
+runTokenizer :: Text -> Maybe [Token]
+runTokenizer = either (const Nothing) Just . parseOnly (many token <* endOfInput)
+~~~
+We have now reduced the Problem to `[Token] -> DomTree`.
+We quickly see that the structure of the problem is that of a
+[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).
+Having realised this we require a function of type `Token -> DomTree -> DomTree` to recursively build up our target
+structure.
+In general we’ll want to not only keep track of the `DomTree` during recursion but also maintain a reference to the
+position at which we’ll be inserting new tokens.
+This kind of problem is well understood and solved idiomatically by using a
+[zipper](https://en.wikipedia.org/wiki/Zipper_(data_structure))
+([a cursory introduction](http://learnyouahaskell.com/zippers)).
+Writing zippers tends to be tedious. We’ll therefore introduce an
+[additional intermediate structure](https://hackage.haskell.org/package/containers/docs/Data-Tree.html) for which an
+[implementation](https://hackage.haskell.org/package/rosezipper) is available readily.
+The morphism from this new structure (`Forest BBLabel`) to our `DomTree` will be almost trivial.
+~~~ {.haskell}
+import Data.Tree.Zipper (TreePos, Empty, Full)
+import qualified Data.Tree.Zipper as Z
+data BBLabel = BBTag Text
+             | BBPlain Text
+rose :: [BBToken] -> Maybe (Forest BBLabel)
+rose = Z.toForest <$> foldM (flip rose') (Z.fromForest [])
+rose' :: BBToken -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
+rose' (BBStr t) = return . Z.nextSpace . Z.insert (Node (BBPlain t) []) -- insert a node with no children and move one step to the right in the forest we’re currently viewing
+rose' (BBOpen t) = return . Z.children . Z.insert (Node (BBTag t) []) -- insert the node and move into position to insert it's first child
+rose' (BBClose t) = close t -- haskell complains if multiple equations for the same function have a differing number of arguments, therefore: 'close'
+  where
+    close :: Text -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
+    close tag pos = do
+      pos' <- Z.parent pos -- fail if we're trying to close a tag that does not have a parent (this indicates imbalanced tags)
+      let
+        pTag = (\(BBTag t) -> t) $ Z.label pos' -- yes, this will fail unceremoniously if the parent is not a tag, this poses no problem since we're constructing the structure ourselves. The proof that this failure mode does not occur is left as an exercise for the reader.
+      guard (pTag == tag) -- The structure shows that this mode of failure (opening tags content does not match the closing tags) is not logically required -- it only serves as a *notification* to the user
+      return $ Z.nextSpace pos' -- move one level up and to point at the next sibling of the parent
+~~~
+All that is left to do now is present our final morphism:
+~~~ {.haskell}
+dom :: Forest BBLabel -> [DomTree]
+dom = map dom'
+  where
+    dom' (Node (BBPlain t) []) = Content t
+    dom' (Node (BBTag t) ts = Element t $ map dom' ts
+~~~

diff --git a/provider/posts/thermoprint/4.md b/provider/posts/thermoprint/4.md new file mode 100644 index 0000000..756c166 --- /dev/null +++ b/provider/posts/thermoprint/4.md
@@ -0,0 +1,116 @@
	1	---
	2	title: On the Design of a Parser
	3	published: 2016-01-12
	4	tags: Thermoprint
	5	---
	6
	7	The concrete application we’ll be walking through is a naive parser for [bbcode](https://en.wikipedia.org/wiki/BBCode)
	8	-- more specifically the contents of the directory `bbcode` in the
	9	[git repo](https://git.yggdrasil.li/thermoprint/tree/bbcode?h=rewrite&id=dc99dae).
	10
	11	In a manner consistent with designing software as
	12	[compositions of simple morphisms](https://en.wikipedia.org/wiki/Tacit_programming) we start by determining the type of
	13	our solution (as illustrated by the following mockup):
	14
	15	~~~ {.haskell}
	16	-- \| Our target structure -- a rose tree with an explicit terminal constructor
	17	data DomTree = Element Text (Map Text Text) [DomTree]
	18	\| Content Text
	19	deriving (Show, Eq)
	20
	21	bbcode :: Text -> Maybe DomTree
	22	-- ^ Parse BBCode
	23	~~~
	24
	25	Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using
	26	the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser
	27	combinators instead.
	28
	29	We reproduce an incomplete version of the lexer below (it’s missing tag attributes and self-closing tags).
	30
	31	We introduce `escapedText`, a helper function for extracting text until we reach one of a set of delimiting characters
	32	(exclusive).
	33	While doing this we also parse any delimiting character iff it's prefixed with an escape character (we use `\`) -- the
	34	escape character itself needs only be escaped if encountered directly before one of the delimiting characters.
	35
	36	~~~ {.haskell}
	37	data Token = BBOpen Text -- ^ "[open]"
	38	\| BBClose Text -- ^ "[/close]"
	39	\| BBStr Text -- ^ "text"
	40
	41	token :: Parser [Token]
	42	token = BBClose <$ "[/" <> escapedText' [']'] < "]"
	43	<\|> BBOpen <$ "[" <> escapedText' [']'] < "]"
	44	<\|> BBStr <$> escapedText ['[']
	45
	46	escapedText' :: [Char] -> Parser Text
	47	escapedText' = option "" . escapedText
	48
	49	escapedText :: [Char] -> Parser Text
	50	escapedText [] = takeText -- No delimiting characters -- parse all remaining input
	51	escapedText cs = recurse $ choice [ takeWhile1 (not . special) -- a series of characters we don't treat as special
	52	, escapeSeq -- an escaped delimiter
	53	, escapeChar' -- the escape character
	54	]
	55	where
	56	escapeChar = '\\'
	57	special = inClass $ escapeChar : cs
	58	escapeChar' = string $ T.singleton escapeChar
	59	escapeSeq = escapeChar' *> (T.singleton <$> satisfy special) -- escape character followed by a special character (which encludes the escape character)
	60	recurse p = mappend <$> p <*> escapedText' cs -- parse a prefix and optionally append another chunk of escapedText
	61
	62	runTokenizer :: Text -> Maybe [Token]
	63	runTokenizer = either (const Nothing) Just . parseOnly (many token <* endOfInput)
	64	~~~
	65
	66	We have now reduced the Problem to `[Token] -> DomTree`.
	67	We quickly see that the structure of the problem is that of a
	68	[fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html).
	69
	70	Having realised this we require a function of type `Token -> DomTree -> DomTree` to recursively build up our target
	71	structure.
	72
	73	In general we’ll want to not only keep track of the `DomTree` during recursion but also maintain a reference to the
	74	position at which we’ll be inserting new tokens.
	75	This kind of problem is well understood and solved idiomatically by using a
	76	[zipper](https://en.wikipedia.org/wiki/Zipper_(data_structure))
	77	([a cursory introduction](http://learnyouahaskell.com/zippers)).
	78
	79	Writing zippers tends to be tedious. We’ll therefore introduce an
	80	[additional intermediate structure](https://hackage.haskell.org/package/containers/docs/Data-Tree.html) for which an
	81	[implementation](https://hackage.haskell.org/package/rosezipper) is available readily.
	82	The morphism from this new structure (`Forest BBLabel`) to our `DomTree` will be almost trivial.
	83
	84	~~~ {.haskell}
	85	import Data.Tree.Zipper (TreePos, Empty, Full)
	86	import qualified Data.Tree.Zipper as Z
	87
	88	data BBLabel = BBTag Text
	89	\| BBPlain Text
	90
	91	rose :: [BBToken] -> Maybe (Forest BBLabel)
	92	rose = Z.toForest <$> foldM (flip rose') (Z.fromForest [])
	93
	94	rose' :: BBToken -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
	95	rose' (BBStr t) = return . Z.nextSpace . Z.insert (Node (BBPlain t) []) -- insert a node with no children and move one step to the right in the forest we’re currently viewing
	96	rose' (BBOpen t) = return . Z.children . Z.insert (Node (BBTag t) []) -- insert the node and move into position to insert it's first child
	97	rose' (BBClose t) = close t -- haskell complains if multiple equations for the same function have a differing number of arguments, therefore: 'close'
	98	where
	99	close :: Text -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel)
	100	close tag pos = do
	101	pos' <- Z.parent pos -- fail if we're trying to close a tag that does not have a parent (this indicates imbalanced tags)
	102	let
	103	pTag = (\(BBTag t) -> t) $ Z.label pos' -- yes, this will fail unceremoniously if the parent is not a tag, this poses no problem since we're constructing the structure ourselves. The proof that this failure mode does not occur is left as an exercise for the reader.
	104	guard (pTag == tag) -- The structure shows that this mode of failure (opening tags content does not match the closing tags) is not logically required -- it only serves as a notification to the user
	105	return $ Z.nextSpace pos' -- move one level up and to point at the next sibling of the parent
	106	~~~
	107
	108	All that is left to do now is present our final morphism:
	109
	110	~~~ {.haskell}
	111	dom :: Forest BBLabel -> [DomTree]
	112	dom = map dom'
	113	where
	114	dom' (Node (BBPlain t) []) = Content t
	115	dom' (Node (BBTag t) ts = Element t $ map dom' ts
	116	~~~