summaryrefslogtreecommitdiff
path: root/provider/posts/beuteltier-1.lhs
blob: 9cc23648448d51d69b87a4327e5cf0a98bc48462 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
---
title: On the Design of Overly Complicated Feedreaders
published: 2015-08-04
tags: Beuteltier
---

I like feedreaders.
Thus, of course, I had to implement my own, because, as always, all existing software does
not fullfill my exceedingly unrealistic expectations with respect to customizability and
extendability.

This post marks the start of a series describing and documenting the design of the current
iteration of `Beuteltier` (a derivation of [Newsbeuter](https://newsbeuter.org) and the
german `Beutel`, meaning bag, to mean in conjunction:
[Marsupial](https://en.wikipedia.org/wiki/Marsupial)).

It should be noted that the library described here is not finished or ready for use in any
sense of the word (at the time of writing a "trivial" implementation of a `Beutel` shipped
with the library supports only `run`, `search`, and `delete`). Searching a way to
procrastinate implementing the more arduous `insert` (it requires nubbing—deduplication in
the backstore) I decided to, instead, start this series of posts and put the thought that
went into the library so far in a form that I can read again for later reference.

We begin, as is to be expected for a haskell project, with type definitions and, thus,
design philosophy.

This post in particular reproduces the file `beuteltier/Beuteltier/Types.hs` from the
[git repo](git://git.yggdrasil.li/beuteltier) with annotiations to provide some
motivation.

The `Beuteltier` library itself only provides primitives for (and a default implementation
of) access to what we call a backstore. A backstore is, to us, an instance of the
typeclass `Beutel` which contains the most primitive of primitives for storing, searching
for and deleting representations of the objects we care about from the store.

It is recommended that the reader not try to follow the rest of this post linearly but start
at the end with the definition of the `Beutel` class and work their way backwards.

> {-# LANGUAGE FlexibleInstances, StandaloneDeriving, KindSignatures, MultiParamTypeClasses, TypeFamilies #-}
> 
> module Beuteltier.Types
>        ( -- * Types
>          Object
>        , ObjectGen(..)
>        , SubObject(..)
>        , MetaData(..)
>        , Thunk(..)
>        , ThunkState(..)
>        , ThunkResult(..)
>        , Tag
>        , Flag(..)
>        , SubObjectName
>        , ThunkName
>        , SearchQuery
>        , Predicate
>        , Beutel(..)
>        ) where

`Flag` ends up being a [sum type](https://en.wikipedia.org/wiki/Sum_type) holding values
such as `Seen`, `Old`, or `Hidden`.
We define it externally.

> import Beuteltier.Types.Flags

The `Identity` functor serves as basis for many a Monadtransformer-stack.

> import Data.Functor.Identity
> import Data.Functor.Classes ()

Binary contents are encoded as `ByteStrings`

> import qualified Data.ByteString.Lazy as Lazy (ByteString)
> import qualified Data.ByteString.Lazy as LBS

Unicode text as `Text`

> import Data.Text (Text)

Long unicode text as lazy `Text`

> import qualified Data.Text.Lazy as Lazy (Text)
> import qualified Data.Text.Lazy as LT
> 
> import Data.Set (Set)
> 
> import Data.Map (Map)
> 
> import Data.Time (UTCTime)
> 
> import Data.Function (on)
> import Data.Ord (comparing)
> import Control.Applicative

`Data.Default` provides some convenience when constructing extensive record structures.

> import Data.Default

The `boolexpr` package provides us with a structure for representing boolean expressions
supporting functor operations and evaluation.

> import Data.BoolExpr

Previous iterations of Beuteltier acted on Objects that were kept completely in RAM during
all operations.
This proved to be unsustainable, not only because nubbing (deduplication in the store of
all objects) tended to exceed all RAM constraints (>4GiB for a few hundred objects), but
also because cheaper operations on objects, like presentation to the user, got painfully
slow once large `SubObject`s (like videos) were introduced into the store.

The straight forward solution was to enrich the `Object` structure with provisions for
explicit lazyness and partial construction.

> -- | We deal in, at runtime, partially retrieved Objects
> data ObjectGen (f :: * -> *) = ObjectGen
>                                { _oMeta :: f MetaData
>                                  -- ^ An undetermined set of Metainformation
>                                , _oContent :: f (Map SubObjectName (f SubObject))
>                                  -- ^ A list of undetermined length of undetermined
>                             'SubObject's with guaranteed unique 'SubObjectName's
>                                , _oThunks :: f [f Thunk]
>                                  -- ^ A list of undetermined length of undetermined Thunks.
>                                  --   There is such a thing as thunk colissions (i.e.: two
>                                  --   thunks promise or even create 'SubObject's with the
>                                  --   same name).
>                                  --   Precedence in such a case is to be as suggested by
>                                  --   the list structure (later thunks override earlier ones).
>                                }
> 
> instance Monad f => Default (ObjectGen f) where
>   def = ObjectGen { _oContent = return def
>                   , _oThunks = return def
>                   , _oMeta = return def
>                   }

It is straight forward to collapse the more advanced representation of `Object`s back to
the old behaviour by parametrising over the Identity functor, which is simply a newtype
wrapper over the contained structure.

> -- | An entirely retrieved Object
> type Object = ObjectGen Identity
> 
> -- -- | The default 'Object' is empty except for metadata
> -- instance Default Object where
> --   def = ObjectGen { _oContent = return def
> --                   , _oThunks = return def
> --                   , _oMeta = return def
> --                   }
> 
> -- | Equality simply gets deferred to all subcomponents
> deriving instance Eq Object
> 
> -- | 'Object's compare as their 'MetaData'
> instance Ord Object where
>   compare = comparing _oMeta

We would like to associate some set of meta information with all objects.
Therefore, we do.

> -- | Metadata associated with an Object
> data MetaData = MetaData
>                 { _mRetrieved :: UTCTime -- ^ Time of creation
>                 , _mTags :: Set Tag -- ^ Tags such as the name of the author,
>                                     --   the title of the work represented in
>                                     --   the 'Object', ….
>                                     --   We use something like @show . _mTags@
>                                     --   to identify an 'Object' to the user
>                 , _mFlags :: Set Flag -- ^ Flags such as \"Read\" or \"Spam\"
>                 } deriving (Show, Ord)
> -- | Tags are unicode text
> type Tag = Text
> 
> -- | 'MetaData' equates as the contained tags
> instance Eq MetaData where
>   (==) = (==) `on` _mTags
> 
> -- | The default MetaData has no tags, no flags, and an undefined timestamp
> instance Default MetaData where
>   def = MetaData { _mFlags = def
>                  , _mTags = def
>                  , _mRetrieved = undefined -- There really is no such thing as a default time
>                  }

Objects are no fun if they don´t contain anything of interest in the end.

Below we see a remnant of an older model of associating names to `SubObject`s. We switched
to using a `Map` for reasons of deduplication. Inserting into a `Map` carries some
guarantees that keys end up being unique.

Note below: creation of a `SubObject` is an update. It is thus expected, that `SubObject`s
created at the same time as the `Object` they are associated to encode an update
time that matches the `Object`s creation time.

> -- | Contents of an object
> data SubObject = SubObject
>                  -- { _sId :: SubObjectName
>                       -- ^ We associate a name to every chunk of content to determine
>                       --   how to present an object to the user
>                  { _sContent :: Lazy.ByteString
>                  , _sUpdates :: [UTCTime]
>                    -- ^ Times of witnessed updates to this 'SubObject'
>                  } deriving (Show)
> 
> -- | No content, no witnessed updates
> instance Default SubObject where
>   def = SubObject { _sContent = def
>                   , _sUpdates = def
>                   }
> 
> -- | Extensionality for 'SubObject's:
> --   
> --   > (==) = (==) `on` _sContent
> instance Eq SubObject where
>   (==) = (==) `on` _sContent

The distinguishing feature of Beuteltier is it´s support for `Thunk`s. They are, as the
name suggests, loosly based on the concept of lazy evaluation. They are, however, less
transparent and thus more explicit than implementations as they are used in, for example
haskell.

As far as Beuteltier is concerned `Thunk`s are executables that are expected to produce
files in the directory they are executed in in a pure manner. That is to say they do not
access external resources, where possible. A `Thunk` that downloads a video from the
internet will, of course, access the internet and can thus fail. We expect it, however, to
not to try and access the users home directory to look for e.g. credentials for
authentication it intends to use to its primary job.

When a `Thunk`s executable gets executed the files it creates (excluding itself) get
translated to `SubObject`s with the filenames (directories stripped of course) as their
`SubObjectName`s and the file contents as their… well, their contents. It is understood,
that not all possible `SubObjectName`s can be created thus (we restrict ourselves to valid
filenames on whatever system we happen to be on). We do not consider this to be a great
loss.

The advanced equality checks mentioned below are, in fact, implemented and will be explained
in more detail in a later post concerned with the file `beuteltier/Beuteltier/Types/Util.hs`.

> -- | Thunks are at runtime not yet known parts of an object
> data Thunk = Thunk
>              { _tId :: ThunkName -- ^ For debugging
>              , _tScript :: Lazy.ByteString
>                -- ^ A Thunk is, in the end, a shell script that is expected to generate
>                --  'SubObject's
>              , _tPromises :: Maybe [SubObjectName]
>                -- ^ Maybe we already know what our script is going to generate?
>                --   This would enable us to do some more advanced equality checks under
>                --   the assumption that scripts are pure
>              , _tState :: ThunkState
>              }
>            deriving (Show)
> 
> -- | Empty id, empty script, promises nothing, and with default state
> instance Default Thunk where
>   def = Thunk { _tId = def
>               , _tScript = def
>               , _tPromises = def
>               , _tState = def
>               }
> 
> -- | Equality on 'Thunk's ignores '_tState' and '_tId'
> instance Eq Thunk where
>   a == b = and $ [ (==) `on` _tScript
>                  , (==) `on` _tPromises
>                  ] <*> pure a <*> pure b
> 
> -- | The states in which a 'Thunk' can be encountered.
> data ThunkState = NotExecuted
>                 | Executed [SubObjectName] ThunkResult
>                 deriving (Show)
> 
> -- | Return the default 'ThunkResult' upon forcing
> instance Default ThunkState where
>   def = NotExecuted
> 
> -- | Thunks generate some data during execution
> data ThunkResult = ThunkResult
>                    { _rOutErr, _rOutStd :: Lazy.Text
>                    , _rExit :: Integer -- ^ Numerical exit code (0 usually means success)
>                    }
>                  deriving (Show)
> 
> -- | Empty output, and with undefined exit code (no execution took place and we can´t
> --   encode this in a numerical exit code)
> instance Default ThunkResult where
>   def = ThunkResult { _rOutErr = LT.empty, _rOutStd = LT.empty
>                     , _rExit = undefined
>                     }
> 
> -- | We expect identifiers for 'SubObject's to be short, thus 'String'
> type SubObjectName = String
> -- | We expect identifiers for 'Thunk's to be short, thus 'String'
> type ThunkName = String
> 
> -- | @LBS.empty@
> instance Default (Lazy.ByteString) where
>   def = LBS.empty

What good is a library for managing a backstore if it does not support search operations?
We consider the answer to be "very little" and, thus, support searches.

> type SearchQuery f = BoolExpr (Predicate f)
> -- data Predicate f = Prim (ObjectGen f -> f Bool)
> --                  | Meta (MetaData -> Bool)
> type Predicate f = ObjectGen f -> f Bool

The heart of the `Beuteltier` library is the typeclass reproduced below. We expect
implementations of backstores to be `Monad`s so that we may be able to construct
complicated actions that act on the backstore in question.
Once we have constructed such an action using the three primitives `search`, `insert`, and
`delete` we additionally require a way to execute that action from within the `IO`
`Monad`.

Additional primitives, such as those for "forcing" and resetting thunks, are provided in
additional libraries and, thus, later posts.

> -- | We have the user supply the functions we use to interact with whatever backstore
> --   she uses
> class Monad functor => Beutel (functor :: * -> *) where
>   data Config :: *
>   run :: Config -> functor a -> IO a
>   -- ^ Actually run whatever action we constructed against the backstore
>   search :: SearchQuery functor -> functor [ObjectGen functor]
>   -- ^ Perform a search
>   insert :: Object -> functor ()
>   -- ^ Insert an object
>   delete :: SearchQuery functor -> functor ()
>   -- ^ Delete the results of a search