summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--provider/posts/beuteltier-1.lhs325
1 files changed, 325 insertions, 0 deletions
diff --git a/provider/posts/beuteltier-1.lhs b/provider/posts/beuteltier-1.lhs
new file mode 100644
index 0000000..c8f3e6c
--- /dev/null
+++ b/provider/posts/beuteltier-1.lhs
@@ -0,0 +1,325 @@
1---
2title: On the Design of Overly Complicated Feedreaders
3published: 2015-08-04
4---
5
6I like feedreaders.
7Thus, of course, I had to implement my own, because, as always, all existing software does
8not fullfill my exceedingly unrealistic expectations with respect to customizability and
9extendability.
10
11This post marks the start of a series describing and documenting the design of the current
12iteration of `Beuteltier` (a derivation of [Newsbeuter](https://newsbeuter.org) and the
13german `Beutel`, meaning bag, to mean in conjunction:
14[Marsupial](https://en.wikipedia.org/wiki/Marsupial)).
15
16It should be noted that the library described here is not finished or ready for use in any
17sense of the word (at the time of writing a "trivial" implementation of a `Beutel` shipped
18with the library supports only `run`, `search`, and `delete`). Searching a way to
19procrastinate implementing the more arduous `insert` (it requires nubbing—deduplication in
20the backstore) I decided to, instead, start this series of posts and put the thought that
21went into the library so far in a form that I can read again for later reference.
22
23We begin, as is to be expected for a haskell project, with type definitions and, thus,
24design philosophy.
25
26This post in particular reproduces the file `beuteltier/Beuteltier/Types.hs` from the
27[git repo](git://git.yggdrasil.li/beuteltier) with annotiations to provide some
28motivation.
29
30The `Beuteltier` library itself only provides primitives for (and a default implementation
31of) access to what we call a backstore. A backstore is, to us, an instance of the
32typeclass `Beutel` which contains the most primitive of primitives for storing, searching
33for and deleting representations of the objects we care about from the store.
34
35It is recommended that reader not try to follow the rest of this post linearly but start
36at the end with the definition of the `Beutel` class and work their way backwards.
37
38> {-# LANGUAGE FlexibleInstances, StandaloneDeriving, KindSignatures, MultiParamTypeClasses, TypeFamilies #-}
39>
40> module Beuteltier.Types
41> ( -- * Types
42> Object
43> , ObjectGen(..)
44> , SubObject(..)
45> , MetaData(..)
46> , Thunk(..)
47> , ThunkState(..)
48> , ThunkResult(..)
49> , Tag
50> , Flag(..)
51> , SubObjectName
52> , ThunkName
53> , SearchQuery
54> , Predicate
55> , Beutel(..)
56> ) where
57
58`Flag` ends up being a [sum type](https://en.wikipedia.org/wiki/Sum_type) holding values
59such as `Seen`, `Old`, or `Hidden`.
60We define it externally.
61
62> import Beuteltier.Types.Flags
63
64The `Identity` functor serves as basis for many a Monadtransformer-stack.
65
66> import Data.Functor.Identity
67> import Data.Functor.Classes ()
68
69Binary contents are encoded as `ByteStrings`
70
71> import qualified Data.ByteString.Lazy as Lazy (ByteString)
72> import qualified Data.ByteString.Lazy as LBS
73
74Unicode text as `Text`
75
76> import Data.Text (Text)
77
78Long unicode text as lazy `Text`
79
80> import qualified Data.Text.Lazy as Lazy (Text)
81> import qualified Data.Text.Lazy as LT
82>
83> import Data.Set (Set)
84>
85> import Data.Map (Map)
86>
87> import Data.Time (UTCTime)
88>
89> import Data.Function (on)
90> import Data.Ord (comparing)
91> import Control.Applicative
92
93`Data.Default` provides some convenience when constructing extensive record structures.
94
95> import Data.Default
96
97The `boolexpr` package provides us with a structure for representing boolean expressions
98supporting functor operations and evaluation.
99
100> import Data.BoolExpr
101
102Previous iterations of Beuteltier acted on Objects that were kept completely in RAM during
103all operations.
104This proved to be unsustainable, not only because nubbing (deduplication in the store of
105all objects) tended to exceed all RAM constraints (>4GiB for a few hundred objects), but
106also because cheaper operations on objects, like presentation to the user, got painfully
107slow once large `SubObject`s (like videos) were introduced into the store.
108
109The straight forward solution was to enrich the `Object` structure with provisions for
110explicit lazyness and partial construction.
111
112> -- | We deal in, at runtime, partially retrieved Objects
113> data ObjectGen (f :: * -> *) = ObjectGen
114> { _oMeta :: f MetaData
115> -- ^ An undetermined set of Metainformation
116> , _oContent :: f (Map SubObjectName (f SubObject))
117> -- ^ A list of undetermined length of undetermined
118> 'SubObject's with guaranteed unique 'SubObjectName's
119> , _oThunks :: f [f Thunk]
120> -- ^ A list of undetermined length of undetermined Thunks.
121> -- There is such a thing as thunk colissions (i.e.: two
122> -- thunks promise or even create 'SubObject's with the
123> -- same name).
124> -- Precedence in such a case is to be as suggested by
125> -- the list structure (later thunks override earlier ones).
126> }
127>
128> instance Monad f => Default (ObjectGen f) where
129> def = ObjectGen { _oContent = return def
130> , _oThunks = return def
131> , _oMeta = return def
132> }
133
134It is straight forward to collapse the more advanced representation of `Object`s back to
135the old behaviour by parametrising over the Identity functor, which is simply a newtype
136wrapper over the contained structure.
137
138> -- | An entirely retrieved Object
139> type Object = ObjectGen Identity
140>
141> -- -- | The default 'Object' is empty except for metadata
142> -- instance Default Object where
143> -- def = ObjectGen { _oContent = return def
144> -- , _oThunks = return def
145> -- , _oMeta = return def
146> -- }
147>
148> -- | Equality simply gets deferred to all subcomponents
149> deriving instance Eq Object
150>
151> -- | 'Object's compare as their 'MetaData'
152> instance Ord Object where
153> compare = comparing _oMeta
154
155We would like to associate some set of meta information with all objects.
156Therefore, we do.
157
158> -- | Metadata associated with an Object
159> data MetaData = MetaData
160> { _mRetrieved :: UTCTime -- ^ Time of creation
161> , _mTags :: Set Tag -- ^ Tags such as the name of the author,
162> -- the title of the work represented in
163> -- the 'Object', ….
164> -- We use something like @show . _mTags@
165> -- to identify an 'Object' to the user
166> , _mFlags :: Set Flag -- ^ Flags such as \"Read\" or \"Spam\"
167> } deriving (Show, Ord)
168> -- | Tags are unicode text
169> type Tag = Text
170>
171> -- | 'MetaData' equates as the contained tags
172> instance Eq MetaData where
173> (==) = (==) `on` _mTags
174>
175> -- | The default MetaData has no tags, no flags, and an undefined timestamp
176> instance Default MetaData where
177> def = MetaData { _mFlags = def
178> , _mTags = def
179> , _mRetrieved = undefined -- There really is no such thing as a default time
180> }
181
182Objects are no fun if they don´t contain anything of interest in the end.
183
184Below we see a remnant of an older model of associating names to `SubObject`s. We switched
185to using a `Map` for reasons of deduplication. Inserting into a `Map` carries some
186guarantees that keys end up being unique.
187
188Note below: creation of a `SubObject` is an update. It is thus expected, that `SubObject`s
189created at the same time as the `Object` they are associated to contain encode an update
190time that matches the `Object`s creation time.
191
192> -- | Contents of an object
193> data SubObject = SubObject
194> -- { _sId :: SubObjectName
195> -- ^ We associate a name to every chunk of content to determine
196> -- how to present an object to the user
197> { _sContent :: Lazy.ByteString
198> , _sUpdates :: [UTCTime]
199> -- ^ Times of witnessed updates to this 'SubObject'
200> } deriving (Show)
201>
202> -- | No content, no witnessed updates
203> instance Default SubObject where
204> def = SubObject { _sContent = def
205> , _sUpdates = def
206> }
207>
208> -- | Extensionality for 'SubObject's:
209> --
210> -- > (==) = (==) `on` _sContent
211> instance Eq SubObject where
212> (==) = (==) `on` _sContent
213
214The distinguishing feature of Beuteltier is it´s support for `Thunk`s. They are, as the
215name suggests, loosly based on the concept of lazy evaluation. They are, however, less
216transparent and thus more explicit than implementations as they are used in, for example
217haskell.
218
219As far as Beuteltier is concerned `Thunk`s are executables that are expected to produce
220files in the directory they are executed in in a pure manner. That is to say they do not
221access external resources, where possible. A `Thunk` that downloads a video from the
222internet will, of course, access the internet and can thus fail. We expect it, however, to
223not to try and access the users home directory to look for e.g. credentials for
224authentication it intends to use to its primary job.
225
226When a `Thunk`s executable gets executed the files it creates (excluding itself) get
227translated to `SubObject`s with the filenames (directories stripped of course) as their
228`SubObjectName`s and the file contents as their… well, their contents. It is understood,
229that not all possible `SubObjectName`s can be created thus (we restrict ourselves to valid
230filenames on whatever system we happen to be on). We do not consider this to be a great
231loss.
232
233The advanced equality checks mentioned below are, in fact, implemented and will be explained
234in more detail in a later post concerned with the file `beuteltier/Beuteltier/Types/Util.hs`.
235
236> -- | Thunks are at runtime not yet known parts of an object
237> data Thunk = Thunk
238> { _tId :: ThunkName -- ^ For debugging
239> , _tScript :: Lazy.ByteString
240> -- ^ A Thunk is, in the end, a shell script that is expected to generate
241> -- 'SubObject's
242> , _tPromises :: Maybe [SubObjectName]
243> -- ^ Maybe we already know what our script is going to generate?
244> -- This would enable us to do some more advanced equality checks under
245> -- the assumption that scripts are pure
246> , _tState :: ThunkState
247> }
248> deriving (Show)
249>
250> -- | Empty id, empty script, promises nothing, and with default state
251> instance Default Thunk where
252> def = Thunk { _tId = def
253> , _tScript = def
254> , _tPromises = def
255> , _tState = def
256> }
257>
258> -- | Equality on 'Thunk's ignores '_tState' and '_tId'
259> instance Eq Thunk where
260> a == b = and $ [ (==) `on` _tScript
261> , (==) `on` _tPromises
262> ] <*> pure a <*> pure b
263>
264> -- | The states in which a 'Thunk' can be encountered.
265> data ThunkState = NotExecuted
266> | Executed [SubObjectName] ThunkResult
267> deriving (Show)
268>
269> -- | Return the default 'ThunkResult' upon forcing
270> instance Default ThunkState where
271> def = NotExecuted
272>
273> -- | Thunks generate some data during execution
274> data ThunkResult = ThunkResult
275> { _rOutErr, _rOutStd :: Lazy.Text
276> , _rExit :: Integer -- ^ Numerical exit code (0 usually means success)
277> }
278> deriving (Show)
279>
280> -- | Empty output, and with undefined exit code (no execution took place and we can´t
281> -- encode this in a numerical exit code)
282> instance Default ThunkResult where
283> def = ThunkResult { _rOutErr = LT.empty, _rOutStd = LT.empty
284> , _rExit = undefined
285> }
286>
287> -- | We expect identifiers for 'SubObject's to be short, thus 'String'
288> type SubObjectName = String
289> -- | We expect identifiers for 'Thunk's to be short, thus 'String'
290> type ThunkName = String
291>
292> -- | @LBS.empty@
293> instance Default (Lazy.ByteString) where
294> def = LBS.empty
295
296What good is a library for managing a backstore if it does not support search operations?
297We consider the answer to be "very little" and, thus, support searches.
298
299> type SearchQuery f = BoolExpr (Predicate f)
300> -- data Predicate f = Prim (ObjectGen f -> f Bool)
301> -- | Meta (MetaData -> Bool)
302> type Predicate f = ObjectGen f -> f Bool
303
304The heart of the `Beuteltier` library is the typeclass reproduced below. We expect
305implementations of backstores to be `Monad`s so that we may be able to construct
306complicated actions that act on the backstore in question.
307Once we have constructed such an action using the three primitives `search`, `insert`, and
308`delete` we additionally require a way to execute that action from within the `IO`
309`Monad`.
310
311Additional primitives, such as those for "forcing" and resetting thunks, are provided in
312additional libraries and, thus, later posts.
313
314> -- | We have the user supply the functions we use to interact with whatever backstore
315> -- she uses
316> class Monad functor => Beutel (functor :: * -> *) where
317> data Config :: *
318> run :: Config -> functor a -> IO a
319> -- ^ Actually run whatever action we constructed against the backstore
320> search :: SearchQuery functor -> functor [ObjectGen functor]
321> -- ^ Perform a search
322> insert :: Object -> functor ()
323> -- ^ Insert an object
324> delete :: SearchQuery functor -> functor ()
325> -- ^ Delete the results of a search