From 46cf2aa1fb307b194660ba3b2bcd8fe400a49704 Mon Sep 17 00:00:00 2001 From: Gregor Kleen Date: Tue, 4 Aug 2015 15:35:21 +0200 Subject: beuteltier-1 --- provider/posts/beuteltier-1.lhs | 325 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 provider/posts/beuteltier-1.lhs (limited to 'provider/posts') diff --git a/provider/posts/beuteltier-1.lhs b/provider/posts/beuteltier-1.lhs new file mode 100644 index 0000000..c8f3e6c --- /dev/null +++ b/provider/posts/beuteltier-1.lhs @@ -0,0 +1,325 @@ +--- +title: On the Design of Overly Complicated Feedreaders +published: 2015-08-04 +--- + +I like feedreaders. +Thus, of course, I had to implement my own, because, as always, all existing software does +not fullfill my exceedingly unrealistic expectations with respect to customizability and +extendability. + +This post marks the start of a series describing and documenting the design of the current +iteration of `Beuteltier` (a derivation of [Newsbeuter](https://newsbeuter.org) and the +german `Beutel`, meaning bag, to mean in conjunction: +[Marsupial](https://en.wikipedia.org/wiki/Marsupial)). + +It should be noted that the library described here is not finished or ready for use in any +sense of the word (at the time of writing a "trivial" implementation of a `Beutel` shipped +with the library supports only `run`, `search`, and `delete`). Searching a way to +procrastinate implementing the more arduous `insert` (it requires nubbing—deduplication in +the backstore) I decided to, instead, start this series of posts and put the thought that +went into the library so far in a form that I can read again for later reference. + +We begin, as is to be expected for a haskell project, with type definitions and, thus, +design philosophy. + +This post in particular reproduces the file `beuteltier/Beuteltier/Types.hs` from the +[git repo](git://git.yggdrasil.li/beuteltier) with annotiations to provide some +motivation. + +The `Beuteltier` library itself only provides primitives for (and a default implementation +of) access to what we call a backstore. A backstore is, to us, an instance of the +typeclass `Beutel` which contains the most primitive of primitives for storing, searching +for and deleting representations of the objects we care about from the store. + +It is recommended that reader not try to follow the rest of this post linearly but start +at the end with the definition of the `Beutel` class and work their way backwards. + +> {-# LANGUAGE FlexibleInstances, StandaloneDeriving, KindSignatures, MultiParamTypeClasses, TypeFamilies #-} +> +> module Beuteltier.Types +> ( -- * Types +> Object +> , ObjectGen(..) +> , SubObject(..) +> , MetaData(..) +> , Thunk(..) +> , ThunkState(..) +> , ThunkResult(..) +> , Tag +> , Flag(..) +> , SubObjectName +> , ThunkName +> , SearchQuery +> , Predicate +> , Beutel(..) +> ) where + +`Flag` ends up being a [sum type](https://en.wikipedia.org/wiki/Sum_type) holding values +such as `Seen`, `Old`, or `Hidden`. +We define it externally. + +> import Beuteltier.Types.Flags + +The `Identity` functor serves as basis for many a Monadtransformer-stack. + +> import Data.Functor.Identity +> import Data.Functor.Classes () + +Binary contents are encoded as `ByteStrings` + +> import qualified Data.ByteString.Lazy as Lazy (ByteString) +> import qualified Data.ByteString.Lazy as LBS + +Unicode text as `Text` + +> import Data.Text (Text) + +Long unicode text as lazy `Text` + +> import qualified Data.Text.Lazy as Lazy (Text) +> import qualified Data.Text.Lazy as LT +> +> import Data.Set (Set) +> +> import Data.Map (Map) +> +> import Data.Time (UTCTime) +> +> import Data.Function (on) +> import Data.Ord (comparing) +> import Control.Applicative + +`Data.Default` provides some convenience when constructing extensive record structures. + +> import Data.Default + +The `boolexpr` package provides us with a structure for representing boolean expressions +supporting functor operations and evaluation. + +> import Data.BoolExpr + +Previous iterations of Beuteltier acted on Objects that were kept completely in RAM during +all operations. +This proved to be unsustainable, not only because nubbing (deduplication in the store of +all objects) tended to exceed all RAM constraints (>4GiB for a few hundred objects), but +also because cheaper operations on objects, like presentation to the user, got painfully +slow once large `SubObject`s (like videos) were introduced into the store. + +The straight forward solution was to enrich the `Object` structure with provisions for +explicit lazyness and partial construction. + +> -- | We deal in, at runtime, partially retrieved Objects +> data ObjectGen (f :: * -> *) = ObjectGen +> { _oMeta :: f MetaData +> -- ^ An undetermined set of Metainformation +> , _oContent :: f (Map SubObjectName (f SubObject)) +> -- ^ A list of undetermined length of undetermined +> 'SubObject's with guaranteed unique 'SubObjectName's +> , _oThunks :: f [f Thunk] +> -- ^ A list of undetermined length of undetermined Thunks. +> -- There is such a thing as thunk colissions (i.e.: two +> -- thunks promise or even create 'SubObject's with the +> -- same name). +> -- Precedence in such a case is to be as suggested by +> -- the list structure (later thunks override earlier ones). +> } +> +> instance Monad f => Default (ObjectGen f) where +> def = ObjectGen { _oContent = return def +> , _oThunks = return def +> , _oMeta = return def +> } + +It is straight forward to collapse the more advanced representation of `Object`s back to +the old behaviour by parametrising over the Identity functor, which is simply a newtype +wrapper over the contained structure. + +> -- | An entirely retrieved Object +> type Object = ObjectGen Identity +> +> -- -- | The default 'Object' is empty except for metadata +> -- instance Default Object where +> -- def = ObjectGen { _oContent = return def +> -- , _oThunks = return def +> -- , _oMeta = return def +> -- } +> +> -- | Equality simply gets deferred to all subcomponents +> deriving instance Eq Object +> +> -- | 'Object's compare as their 'MetaData' +> instance Ord Object where +> compare = comparing _oMeta + +We would like to associate some set of meta information with all objects. +Therefore, we do. + +> -- | Metadata associated with an Object +> data MetaData = MetaData +> { _mRetrieved :: UTCTime -- ^ Time of creation +> , _mTags :: Set Tag -- ^ Tags such as the name of the author, +> -- the title of the work represented in +> -- the 'Object', …. +> -- We use something like @show . _mTags@ +> -- to identify an 'Object' to the user +> , _mFlags :: Set Flag -- ^ Flags such as \"Read\" or \"Spam\" +> } deriving (Show, Ord) +> -- | Tags are unicode text +> type Tag = Text +> +> -- | 'MetaData' equates as the contained tags +> instance Eq MetaData where +> (==) = (==) `on` _mTags +> +> -- | The default MetaData has no tags, no flags, and an undefined timestamp +> instance Default MetaData where +> def = MetaData { _mFlags = def +> , _mTags = def +> , _mRetrieved = undefined -- There really is no such thing as a default time +> } + +Objects are no fun if they don´t contain anything of interest in the end. + +Below we see a remnant of an older model of associating names to `SubObject`s. We switched +to using a `Map` for reasons of deduplication. Inserting into a `Map` carries some +guarantees that keys end up being unique. + +Note below: creation of a `SubObject` is an update. It is thus expected, that `SubObject`s +created at the same time as the `Object` they are associated to contain encode an update +time that matches the `Object`s creation time. + +> -- | Contents of an object +> data SubObject = SubObject +> -- { _sId :: SubObjectName +> -- ^ We associate a name to every chunk of content to determine +> -- how to present an object to the user +> { _sContent :: Lazy.ByteString +> , _sUpdates :: [UTCTime] +> -- ^ Times of witnessed updates to this 'SubObject' +> } deriving (Show) +> +> -- | No content, no witnessed updates +> instance Default SubObject where +> def = SubObject { _sContent = def +> , _sUpdates = def +> } +> +> -- | Extensionality for 'SubObject's: +> -- +> -- > (==) = (==) `on` _sContent +> instance Eq SubObject where +> (==) = (==) `on` _sContent + +The distinguishing feature of Beuteltier is it´s support for `Thunk`s. They are, as the +name suggests, loosly based on the concept of lazy evaluation. They are, however, less +transparent and thus more explicit than implementations as they are used in, for example +haskell. + +As far as Beuteltier is concerned `Thunk`s are executables that are expected to produce +files in the directory they are executed in in a pure manner. That is to say they do not +access external resources, where possible. A `Thunk` that downloads a video from the +internet will, of course, access the internet and can thus fail. We expect it, however, to +not to try and access the users home directory to look for e.g. credentials for +authentication it intends to use to its primary job. + +When a `Thunk`s executable gets executed the files it creates (excluding itself) get +translated to `SubObject`s with the filenames (directories stripped of course) as their +`SubObjectName`s and the file contents as their… well, their contents. It is understood, +that not all possible `SubObjectName`s can be created thus (we restrict ourselves to valid +filenames on whatever system we happen to be on). We do not consider this to be a great +loss. + +The advanced equality checks mentioned below are, in fact, implemented and will be explained +in more detail in a later post concerned with the file `beuteltier/Beuteltier/Types/Util.hs`. + +> -- | Thunks are at runtime not yet known parts of an object +> data Thunk = Thunk +> { _tId :: ThunkName -- ^ For debugging +> , _tScript :: Lazy.ByteString +> -- ^ A Thunk is, in the end, a shell script that is expected to generate +> -- 'SubObject's +> , _tPromises :: Maybe [SubObjectName] +> -- ^ Maybe we already know what our script is going to generate? +> -- This would enable us to do some more advanced equality checks under +> -- the assumption that scripts are pure +> , _tState :: ThunkState +> } +> deriving (Show) +> +> -- | Empty id, empty script, promises nothing, and with default state +> instance Default Thunk where +> def = Thunk { _tId = def +> , _tScript = def +> , _tPromises = def +> , _tState = def +> } +> +> -- | Equality on 'Thunk's ignores '_tState' and '_tId' +> instance Eq Thunk where +> a == b = and $ [ (==) `on` _tScript +> , (==) `on` _tPromises +> ] <*> pure a <*> pure b +> +> -- | The states in which a 'Thunk' can be encountered. +> data ThunkState = NotExecuted +> | Executed [SubObjectName] ThunkResult +> deriving (Show) +> +> -- | Return the default 'ThunkResult' upon forcing +> instance Default ThunkState where +> def = NotExecuted +> +> -- | Thunks generate some data during execution +> data ThunkResult = ThunkResult +> { _rOutErr, _rOutStd :: Lazy.Text +> , _rExit :: Integer -- ^ Numerical exit code (0 usually means success) +> } +> deriving (Show) +> +> -- | Empty output, and with undefined exit code (no execution took place and we can´t +> -- encode this in a numerical exit code) +> instance Default ThunkResult where +> def = ThunkResult { _rOutErr = LT.empty, _rOutStd = LT.empty +> , _rExit = undefined +> } +> +> -- | We expect identifiers for 'SubObject's to be short, thus 'String' +> type SubObjectName = String +> -- | We expect identifiers for 'Thunk's to be short, thus 'String' +> type ThunkName = String +> +> -- | @LBS.empty@ +> instance Default (Lazy.ByteString) where +> def = LBS.empty + +What good is a library for managing a backstore if it does not support search operations? +We consider the answer to be "very little" and, thus, support searches. + +> type SearchQuery f = BoolExpr (Predicate f) +> -- data Predicate f = Prim (ObjectGen f -> f Bool) +> -- | Meta (MetaData -> Bool) +> type Predicate f = ObjectGen f -> f Bool + +The heart of the `Beuteltier` library is the typeclass reproduced below. We expect +implementations of backstores to be `Monad`s so that we may be able to construct +complicated actions that act on the backstore in question. +Once we have constructed such an action using the three primitives `search`, `insert`, and +`delete` we additionally require a way to execute that action from within the `IO` +`Monad`. + +Additional primitives, such as those for "forcing" and resetting thunks, are provided in +additional libraries and, thus, later posts. + +> -- | We have the user supply the functions we use to interact with whatever backstore +> -- she uses +> class Monad functor => Beutel (functor :: * -> *) where +> data Config :: * +> run :: Config -> functor a -> IO a +> -- ^ Actually run whatever action we constructed against the backstore +> search :: SearchQuery functor -> functor [ObjectGen functor] +> -- ^ Perform a search +> insert :: Object -> functor () +> -- ^ Insert an object +> delete :: SearchQuery functor -> functor () +> -- ^ Delete the results of a search -- cgit v1.2.3