diff options
author | Gregor Kleen <gkleen@yggdrasil.li> | 2016-01-12 23:41:01 +0100 |
---|---|---|
committer | Gregor Kleen <gkleen@yggdrasil.li> | 2016-01-12 23:41:01 +0100 |
commit | 4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103 (patch) | |
tree | e01b06f22d9f98d419a4a755317df1fa19d62560 /provider | |
parent | d160ffd8f52eb5840bbd215ef30bf72da2dbdeea (diff) | |
download | dirty-haskell.org-4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103.tar dirty-haskell.org-4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103.tar.gz dirty-haskell.org-4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103.tar.bz2 dirty-haskell.org-4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103.tar.xz dirty-haskell.org-4c4f2dcb8ff5c2dfd7758f6dc4d2324f9b344103.zip |
thermoprint-4
Diffstat (limited to 'provider')
-rw-r--r-- | provider/posts/thermoprint-4.md | 90 |
1 files changed, 79 insertions, 11 deletions
diff --git a/provider/posts/thermoprint-4.md b/provider/posts/thermoprint-4.md index 6b2a022..2606c06 100644 --- a/provider/posts/thermoprint-4.md +++ b/provider/posts/thermoprint-4.md | |||
@@ -17,32 +17,100 @@ our solution (as illustrated by the following mockup): | |||
17 | data DomTree = Element Text (Map Text Text) [DomTree] | 17 | data DomTree = Element Text (Map Text Text) [DomTree] |
18 | | Content Text | 18 | | Content Text |
19 | deriving (Show, Eq) | 19 | deriving (Show, Eq) |
20 | |||
21 | type Errors = () | ||
22 | 20 | ||
23 | bbcode :: Text -> Either Errors DomTree | 21 | bbcode :: Text -> Maybe DomTree |
24 | -- ^ Parse BBCode | 22 | -- ^ Parse BBCode |
25 | ~~~ | 23 | ~~~ |
26 | 24 | ||
27 | Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using | 25 | Writing a parser capable of dealing with `Text` directly from scratch would be unnecessarily abstruse, we’ll be using |
28 | the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser | 26 | the [attoparsec](https://hackage.haskell.org/package/attoparsec/docs/Data-Attoparsec-Text.html) family of parser |
29 | combinators instead (the exact usage of attoparsec is out of scope -- we'll just mockup the types). | 27 | combinators instead. |
28 | |||
29 | We reproduce an incomplete version of the lexer below (it’s missing tag attributes and self-closing tags). | ||
30 | |||
31 | We introduce `escapedText`, a helper function for extracting text until we reach one of a set of delimiting characters | ||
32 | (exclusive). | ||
33 | While doing this we also parse any delimiting character iff it's prefixed with an escape character (we use `\\`) -- the | ||
34 | escape character itself needs only be escaped if encountered directly before one of the delimiting characters. | ||
30 | 35 | ||
31 | ~~~ {.haskell} | 36 | ~~~ {.haskell} |
32 | data Token = BBOpen Text | 37 | data Token = BBOpen Text -- ^ "[open]" |
33 | | BBClose Text | 38 | | BBClose Text -- ^ "[/close]" |
34 | | BBStr Text | 39 | | BBStr Text -- ^ "text" |
35 | 40 | ||
36 | tokenize :: Parser [Token] | 41 | token :: Parser [Token] |
42 | token = BBClose <$ "[/" <*> escapedText' [']'] <* "]" | ||
43 | <|> BBOpen <$ "[" <*> escapedText' [']'] <* "]" | ||
44 | <|> BBStr <$> escapedText ['['] | ||
45 | |||
46 | escapedText' :: [Char] -> Parser Text | ||
47 | escapedText' = option "" . escapedText | ||
37 | 48 | ||
38 | type Error' = () | 49 | escapedText :: [Char] -> Parser Text |
50 | escapedText [] = takeText -- No delimiting characters -- parse all remaining input | ||
51 | escapedText cs = recurse $ choice [ takeWhile1 (not . special) -- a series of characters we don't treat as special | ||
52 | , escapeSeq -- an escaped delimiter | ||
53 | , escapeChar' -- the escape character | ||
54 | ] | ||
55 | where | ||
56 | escapeChar = '\\' | ||
57 | special = inClass $ escapeChar : cs | ||
58 | escapeChar' = string $ T.singleton escapeChar | ||
59 | escapeSeq = escapeChar' *> (T.singleton <$> satisfy special) -- escape character followed by a special character (which encludes the escape character) | ||
60 | recurse p = mappend <$> p <*> escapedText' cs -- parse a prefix and optionally append another chunk of escapedText | ||
39 | 61 | ||
40 | runTokenizer :: Text -> Either Error' [Token] | 62 | runTokenizer :: Text -> Maybe [Token] |
63 | runTokenizer = either (const Nothing) Just . parseOnly (many token <* endOfInput) | ||
41 | ~~~ | 64 | ~~~ |
42 | 65 | ||
43 | We have now reduced the Problem to `[Token] -> DomTree`. | 66 | We have now reduced the Problem to `[Token] -> DomTree`. |
44 | We quickly see that the structure of the problem is that of a | 67 | We quickly see that the structure of the problem is that of a |
45 | [fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html). | 68 | [fold](https://hackage.haskell.org/package/base/docs/Data-Foldable.html). |
46 | 69 | ||
47 | Having realised this we now require an object of type `DomTree -> Token -> DomTree` to recursively build up our target | 70 | Having realised this we require a function of type `Token -> DomTree -> DomTree` to recursively build up our target |
48 | structure. | 71 | structure. |
72 | |||
73 | In general we’ll want to not only keep track of the `DomTree` during recursion but also maintain a reference to the | ||
74 | position at which we’ll be inserting new tokens. | ||
75 | This kind of problem is well understood and solved idiomatically by using a | ||
76 | [zipper](https://en.wikipedia.org/wiki/Zipper_(data_structure)) | ||
77 | ([a cursory introduction](http://learnyouahaskell.com/zippers)). | ||
78 | |||
79 | Writing zippers tends to be tedious. We’ll therefore introduce an | ||
80 | [additional intermediate structure](https://hackage.haskell.org/package/containers/docs/Data-Tree.html) for which an | ||
81 | [implementation](https://hackage.haskell.org/package/rosezipper) is available readily. | ||
82 | The morphism from this new structure (`Forest BBLabel`) to our `DomTree` will be almost trivial. | ||
83 | |||
84 | ~~~ {.haskell} | ||
85 | import Data.Tree.Zipper (TreePos, Empty, Full) | ||
86 | import qualified Data.Tree.Zipper as Z | ||
87 | |||
88 | data BBLabel = BBTag Text | ||
89 | | BBPlain Text | ||
90 | |||
91 | rose :: [BBToken] -> Maybe (Forest BBLabel) | ||
92 | rose = Z.toForest <$> foldM (flip rose') (Z.fromForest []) | ||
93 | |||
94 | rose' :: BBToken -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel) | ||
95 | rose' (BBStr t) = return . Z.nextSpace . Z.insert (Node (BBPlain t) []) -- insert a node with no children and move one step to the right in the forest we’re currently viewing | ||
96 | rose' (BBOpen t) = return . Z.children . Z.insert (Node (BBTag t) []) -- insert the node and move into position to insert it's first child | ||
97 | rose' (BBClose t) = close t -- haskell complains if multiple equations for the same function have a differing number of arguments, therefore: 'close' | ||
98 | where | ||
99 | close :: Text -> TreePos Empty BBLabel -> Maybe (TreePos Empty BBLabel) | ||
100 | close tag pos = do | ||
101 | pos' <- Z.parent pos -- fail if we're trying to close a tag that does not have a parent (this indicates imbalanced tags) | ||
102 | let | ||
103 | pTag = (\(BBTag t) -> t) $ Z.label pos' -- yes, this will fail unceremoniously if the parent is not a tag, this poses no problem since we're constructing the structure ourselves. The proof that this failure mode does not occur is left as an exercise for the reader. | ||
104 | guard (pTag == tag) -- The structure shows that this mode of failure (opening tags content does not match the closing tags) is not logically required -- it only serves as a *notification* to the user | ||
105 | return $ Z.nextSpace pos' -- move one level up and to point at the next sibling of the parent | ||
106 | ~~~ | ||
107 | |||
108 | All that is left to do now is present our final morphism: | ||
109 | |||
110 | ~~~ {.haskell} | ||
111 | dom :: Forest BBLabel -> [DomTree] | ||
112 | dom = map dom' | ||
113 | where | ||
114 | dom' (Node (BBPlain t) []) = Content t | ||
115 | dom' (Node (BBTag t) ts = Element t $ map dom' ts | ||
116 | ~~~ | ||