snoyman.com-content/posts/proposed-conduit-reskin.md
2016-09-23 12:02:36 +03:00

120 lines
4.4 KiB
Markdown

In a few different conversations I've had with people, the idea of
reskinning some of the surface syntax of the
[conduit library](https://github.com/snoyberg/conduit#readme) has come
up, and I wanted to share the idea here. I call this "reskinning"
since all of the core functionality of conduit would remain unchanged
in this proposal, we'd just be changing operators and functions a bit.
The idea here is: conduit borrowed the operator syntax of `$$`, `=$`
and `$=` from enumerator, and it made sense at the beginning of its
lifecycle. However, for quite a while now conduit has evolved to the
point of having a unified type for `Source`s, `Conduit`s, and `Sink`s,
and the disparity of operators adds more confusion than it may be
worth. So without further ado, let's compare a few examples of conduit
usage between the current skin:
```haskell
import Conduit
import qualified Data.Conduit.Binary as CB
main :: IO ()
main = do
-- copy files
runResourceT $ CB.sourceFile "source.txt" $$ sinkFile "dest.txt"
-- sum some numbers
print $ runIdentity $ enumFromToC 1 100 $$ sumC
-- print a bunch of numbers
enumFromToC 1 100 $$ mapC (* 2) =$ takeWhileC (< 100) =$ mapM_C print
```
With a proposed reskin:
```haskell
import Conduit2
import qualified Data.Conduit.Binary as CB
main :: IO ()
main = do
-- copy files
runConduitRes $ CB.sourceFile "source.txt" .| sinkFile "dest.txt"
-- sum some numbers
print $ runConduitPure $ enumFromToC 1 100 .| sumC
-- print a bunch of numbers
runConduit $ enumFromToC 1 100 .| mapC (* 2) .| takeWhileC (< 100) .| mapM_C print
```
This reskin is easily defined with this module:
```haskell
{-# LANGUAGE FlexibleContexts #-}
module Conduit2
( module Conduit
, module Conduit2
) where
import Conduit hiding (($$), (=$), ($=), (=$=))
import Data.Void (Void)
infixr 2 .|
(.|) :: Monad m
=> ConduitM a b m ()
-> ConduitM b c m r
-> ConduitM a c m r
(.|) = fuse
runConduitPure :: ConduitM () Void Identity r -> r
runConduitPure = runIdentity . runConduit
runConduitRes :: MonadBaseControl IO m
=> ConduitM () Void (ResourceT m) r
-> m r
runConduitRes = runResourceT . runConduit
```
To put this in words:
* Replace the `$=`, `=$`, and `=$=` operators - which are all synonyms
of each other - with the `.|` operator. This borrows intuition from
the Unix shell, where the pipe operator denotes piping data from one
process to another. The analogy holds really well for conduit, so
why not borrow it? (We call all of these operators "fusion.")
* Get rid of the `$$` operator - also known as the "connect" or
"fuse-and-run" operator - entirely. Instead of having this
two-in-one action, separate it into `.|` and `runConduit`. The
advantage is that no one needs to think about whether to use `.|` or
`$$`, as happens today. (Note that `runConduit` is available in the
conduit library today, it's just not very well promoted.)
* Now that `runConduit` is a first-class citizen, add in some helper
functions for two common use cases: running with `ResourceT` and
running a pure conduit.
The goals here are to improve consistency, readability, and intuition
about the library. Of course, there are some downsides:
* There's a slight performance advantage (not benchmarked recently
unfortunately) to `foo $$ bar` versus `runConduit $ foo =$= bar`,
since the former combines both sets of actions into one. We may be
able to gain some of this back with GHC rewrite rules, but my
experience with rewrite rules in conduit has been less than
reliable.
* Inertia: there's a lot of code and material out there using the
current set of operators. While we don't need to ever remove (or
even deprecate) the current operators, having two ways of writing
conduit code in the wild can be confusing.
* Conflicting operator: doing a
[quick Hoogle search](https://www.stackage.org/lts-7.0/hoogle?q=.%7C)
reveals that the parallel package already uses `.|`. We could choose
a different operator instead
([`|.`](https://www.stackage.org/lts-7.0/hoogle?q=%7C.) for instance
seems unclaimed), but generally I get nervous any time I'm defining
new operators.
* For simple cases like `source $$ sink`, code is now quite a few keystrokes
longer: `runConduit $ source .| sink`.
Code wise, this is a trivial change to implement. Updating docs to
follow this new convention wouldn't be too difficult either. The
question is: is this a good idea?