From 46cf2aa1fb307b194660ba3b2bcd8fe400a49704 Mon Sep 17 00:00:00 2001
From: Gregor Kleen <gkleen@yggdrasil.li>
Date: Tue, 4 Aug 2015 15:35:21 +0200
Subject: beuteltier-1

---
 provider/posts/beuteltier-1.lhs | 325 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 325 insertions(+)
 create mode 100644 provider/posts/beuteltier-1.lhs

diff --git a/provider/posts/beuteltier-1.lhs b/provider/posts/beuteltier-1.lhs
new file mode 100644
index 0000000..c8f3e6c
--- /dev/null
+++ b/provider/posts/beuteltier-1.lhs
@@ -0,0 +1,325 @@
+---
+title: On the Design of Overly Complicated Feedreaders
+published: 2015-08-04
+---
+
+I like feedreaders.
+Thus, of course, I had to implement my own, because, as always, all existing software does
+not fullfill my exceedingly unrealistic expectations with respect to customizability and
+extendability.
+
+This post marks the start of a series describing and documenting the design of the current
+iteration of `Beuteltier` (a derivation of [Newsbeuter](https://newsbeuter.org) and the
+german `Beutel`, meaning bag, to mean in conjunction:
+[Marsupial](https://en.wikipedia.org/wiki/Marsupial)).
+
+It should be noted that the library described here is not finished or ready for use in any
+sense of the word (at the time of writing a "trivial" implementation of a `Beutel` shipped
+with the library supports only `run`, `search`, and `delete`). Searching a way to
+procrastinate implementing the more arduous `insert` (it requires nubbing—deduplication in
+the backstore) I decided to, instead, start this series of posts and put the thought that
+went into the library so far in a form that I can read again for later reference.
+
+We begin, as is to be expected for a haskell project, with type definitions and, thus,
+design philosophy.
+
+This post in particular reproduces the file `beuteltier/Beuteltier/Types.hs` from the
+[git repo](git://git.yggdrasil.li/beuteltier) with annotiations to provide some
+motivation.
+
+The `Beuteltier` library itself only provides primitives for (and a default implementation
+of) access to what we call a backstore. A backstore is, to us, an instance of the
+typeclass `Beutel` which contains the most primitive of primitives for storing, searching
+for and deleting representations of the objects we care about from the store.
+
+It is recommended that reader not try to follow the rest of this post linearly but start
+at the end with the definition of the `Beutel` class and work their way backwards.
+
+> {-# LANGUAGE FlexibleInstances, StandaloneDeriving, KindSignatures, MultiParamTypeClasses, TypeFamilies #-}
+> 
+> module Beuteltier.Types
+>        ( -- * Types
+>          Object
+>        , ObjectGen(..)
+>        , SubObject(..)
+>        , MetaData(..)
+>        , Thunk(..)
+>        , ThunkState(..)
+>        , ThunkResult(..)
+>        , Tag
+>        , Flag(..)
+>        , SubObjectName
+>        , ThunkName
+>        , SearchQuery
+>        , Predicate
+>        , Beutel(..)
+>        ) where
+
+`Flag` ends up being a [sum type](https://en.wikipedia.org/wiki/Sum_type) holding values
+such as `Seen`, `Old`, or `Hidden`.
+We define it externally.
+
+> import Beuteltier.Types.Flags
+
+The `Identity` functor serves as basis for many a Monadtransformer-stack.
+
+> import Data.Functor.Identity
+> import Data.Functor.Classes ()
+
+Binary contents are encoded as `ByteStrings`
+
+> import qualified Data.ByteString.Lazy as Lazy (ByteString)
+> import qualified Data.ByteString.Lazy as LBS
+
+Unicode text as `Text`
+
+> import Data.Text (Text)
+
+Long unicode text as lazy `Text`
+
+> import qualified Data.Text.Lazy as Lazy (Text)
+> import qualified Data.Text.Lazy as LT
+> 
+> import Data.Set (Set)
+> 
+> import Data.Map (Map)
+> 
+> import Data.Time (UTCTime)
+> 
+> import Data.Function (on)
+> import Data.Ord (comparing)
+> import Control.Applicative
+
+`Data.Default` provides some convenience when constructing extensive record structures.
+
+> import Data.Default
+
+The `boolexpr` package provides us with a structure for representing boolean expressions
+supporting functor operations and evaluation.
+
+> import Data.BoolExpr
+
+Previous iterations of Beuteltier acted on Objects that were kept completely in RAM during
+all operations.
+This proved to be unsustainable, not only because nubbing (deduplication in the store of
+all objects) tended to exceed all RAM constraints (>4GiB for a few hundred objects), but
+also because cheaper operations on objects, like presentation to the user, got painfully
+slow once large `SubObject`s (like videos) were introduced into the store.
+
+The straight forward solution was to enrich the `Object` structure with provisions for
+explicit lazyness and partial construction.
+
+> -- | We deal in, at runtime, partially retrieved Objects
+> data ObjectGen (f :: * -> *) = ObjectGen
+>                                { _oMeta :: f MetaData
+>                                  -- ^ An undetermined set of Metainformation
+>                                , _oContent :: f (Map SubObjectName (f SubObject))
+>                                  -- ^ A list of undetermined length of undetermined
+>                             'SubObject's with guaranteed unique 'SubObjectName's
+>                                , _oThunks :: f [f Thunk]
+>                                  -- ^ A list of undetermined length of undetermined Thunks.
+>                                  --   There is such a thing as thunk colissions (i.e.: two
+>                                  --   thunks promise or even create 'SubObject's with the
+>                                  --   same name).
+>                                  --   Precedence in such a case is to be as suggested by
+>                                  --   the list structure (later thunks override earlier ones).
+>                                }
+> 
+> instance Monad f => Default (ObjectGen f) where
+>   def = ObjectGen { _oContent = return def
+>                   , _oThunks = return def
+>                   , _oMeta = return def
+>                   }
+
+It is straight forward to collapse the more advanced representation of `Object`s back to
+the old behaviour by parametrising over the Identity functor, which is simply a newtype
+wrapper over the contained structure.
+
+> -- | An entirely retrieved Object
+> type Object = ObjectGen Identity
+> 
+> -- -- | The default 'Object' is empty except for metadata
+> -- instance Default Object where
+> --   def = ObjectGen { _oContent = return def
+> --                   , _oThunks = return def
+> --                   , _oMeta = return def
+> --                   }
+> 
+> -- | Equality simply gets deferred to all subcomponents
+> deriving instance Eq Object
+> 
+> -- | 'Object's compare as their 'MetaData'
+> instance Ord Object where
+>   compare = comparing _oMeta
+
+We would like to associate some set of meta information with all objects.
+Therefore, we do.
+
+> -- | Metadata associated with an Object
+> data MetaData = MetaData
+>                 { _mRetrieved :: UTCTime -- ^ Time of creation
+>                 , _mTags :: Set Tag -- ^ Tags such as the name of the author,
+>                                     --   the title of the work represented in
+>                                     --   the 'Object', ….
+>                                     --   We use something like @show . _mTags@
+>                                     --   to identify an 'Object' to the user
+>                 , _mFlags :: Set Flag -- ^ Flags such as \"Read\" or \"Spam\"
+>                 } deriving (Show, Ord)
+> -- | Tags are unicode text
+> type Tag = Text
+> 
+> -- | 'MetaData' equates as the contained tags
+> instance Eq MetaData where
+>   (==) = (==) `on` _mTags
+> 
+> -- | The default MetaData has no tags, no flags, and an undefined timestamp
+> instance Default MetaData where
+>   def = MetaData { _mFlags = def
+>                  , _mTags = def
+>                  , _mRetrieved = undefined -- There really is no such thing as a default time
+>                  }
+
+Objects are no fun if they don´t contain anything of interest in the end.
+
+Below we see a remnant of an older model of associating names to `SubObject`s. We switched
+to using a `Map` for reasons of deduplication. Inserting into a `Map` carries some
+guarantees that keys end up being unique.
+
+Note below: creation of a `SubObject` is an update. It is thus expected, that `SubObject`s
+created at the same time as the `Object` they are associated to contain encode an update
+time that matches the `Object`s creation time.
+
+> -- | Contents of an object
+> data SubObject = SubObject
+>                  -- { _sId :: SubObjectName
+>                       -- ^ We associate a name to every chunk of content to determine
+>                       --   how to present an object to the user
+>                  { _sContent :: Lazy.ByteString
+>                  , _sUpdates :: [UTCTime]
+>                    -- ^ Times of witnessed updates to this 'SubObject'
+>                  } deriving (Show)
+> 
+> -- | No content, no witnessed updates
+> instance Default SubObject where
+>   def = SubObject { _sContent = def
+>                   , _sUpdates = def
+>                   }
+> 
+> -- | Extensionality for 'SubObject's:
+> --   
+> --   > (==) = (==) `on` _sContent
+> instance Eq SubObject where
+>   (==) = (==) `on` _sContent
+
+The distinguishing feature of Beuteltier is it´s support for `Thunk`s. They are, as the
+name suggests, loosly based on the concept of lazy evaluation. They are, however, less
+transparent and thus more explicit than implementations as they are used in, for example
+haskell.
+
+As far as Beuteltier is concerned `Thunk`s are executables that are expected to produce
+files in the directory they are executed in in a pure manner. That is to say they do not
+access external resources, where possible. A `Thunk` that downloads a video from the
+internet will, of course, access the internet and can thus fail. We expect it, however, to
+not to try and access the users home directory to look for e.g. credentials for
+authentication it intends to use to its primary job.
+
+When a `Thunk`s executable gets executed the files it creates (excluding itself) get
+translated to `SubObject`s with the filenames (directories stripped of course) as their
+`SubObjectName`s and the file contents as their… well, their contents. It is understood,
+that not all possible `SubObjectName`s can be created thus (we restrict ourselves to valid
+filenames on whatever system we happen to be on). We do not consider this to be a great
+loss.
+
+The advanced equality checks mentioned below are, in fact, implemented and will be explained
+in more detail in a later post concerned with the file `beuteltier/Beuteltier/Types/Util.hs`.
+
+> -- | Thunks are at runtime not yet known parts of an object
+> data Thunk = Thunk
+>              { _tId :: ThunkName -- ^ For debugging
+>              , _tScript :: Lazy.ByteString
+>                -- ^ A Thunk is, in the end, a shell script that is expected to generate
+>                --  'SubObject's
+>              , _tPromises :: Maybe [SubObjectName]
+>                -- ^ Maybe we already know what our script is going to generate?
+>                --   This would enable us to do some more advanced equality checks under
+>                --   the assumption that scripts are pure
+>              , _tState :: ThunkState
+>              }
+>            deriving (Show)
+> 
+> -- | Empty id, empty script, promises nothing, and with default state
+> instance Default Thunk where
+>   def = Thunk { _tId = def
+>               , _tScript = def
+>               , _tPromises = def
+>               , _tState = def
+>               }
+> 
+> -- | Equality on 'Thunk's ignores '_tState' and '_tId'
+> instance Eq Thunk where
+>   a == b = and $ [ (==) `on` _tScript
+>                  , (==) `on` _tPromises
+>                  ] <*> pure a <*> pure b
+> 
+> -- | The states in which a 'Thunk' can be encountered.
+> data ThunkState = NotExecuted
+>                 | Executed [SubObjectName] ThunkResult
+>                 deriving (Show)
+> 
+> -- | Return the default 'ThunkResult' upon forcing
+> instance Default ThunkState where
+>   def = NotExecuted
+> 
+> -- | Thunks generate some data during execution
+> data ThunkResult = ThunkResult
+>                    { _rOutErr, _rOutStd :: Lazy.Text
+>                    , _rExit :: Integer -- ^ Numerical exit code (0 usually means success)
+>                    }
+>                  deriving (Show)
+> 
+> -- | Empty output, and with undefined exit code (no execution took place and we can´t
+> --   encode this in a numerical exit code)
+> instance Default ThunkResult where
+>   def = ThunkResult { _rOutErr = LT.empty, _rOutStd = LT.empty
+>                     , _rExit = undefined
+>                     }
+> 
+> -- | We expect identifiers for 'SubObject's to be short, thus 'String'
+> type SubObjectName = String
+> -- | We expect identifiers for 'Thunk's to be short, thus 'String'
+> type ThunkName = String
+> 
+> -- | @LBS.empty@
+> instance Default (Lazy.ByteString) where
+>   def = LBS.empty
+
+What good is a library for managing a backstore if it does not support search operations?
+We consider the answer to be "very little" and, thus, support searches.
+
+> type SearchQuery f = BoolExpr (Predicate f)
+> -- data Predicate f = Prim (ObjectGen f -> f Bool)
+> --                  | Meta (MetaData -> Bool)
+> type Predicate f = ObjectGen f -> f Bool
+
+The heart of the `Beuteltier` library is the typeclass reproduced below. We expect
+implementations of backstores to be `Monad`s so that we may be able to construct
+complicated actions that act on the backstore in question.
+Once we have constructed such an action using the three primitives `search`, `insert`, and
+`delete` we additionally require a way to execute that action from within the `IO`
+`Monad`.
+
+Additional primitives, such as those for "forcing" and resetting thunks, are provided in
+additional libraries and, thus, later posts.
+
+> -- | We have the user supply the functions we use to interact with whatever backstore
+> --   she uses
+> class Monad functor => Beutel (functor :: * -> *) where
+>   data Config :: *
+>   run :: Config -> functor a -> IO a
+>   -- ^ Actually run whatever action we constructed against the backstore
+>   search :: SearchQuery functor -> functor [ObjectGen functor]
+>   -- ^ Perform a search
+>   insert :: Object -> functor ()
+>   -- ^ Insert an object
+>   delete :: SearchQuery functor -> functor ()
+>   -- ^ Delete the results of a search
-- 
cgit v1.2.3