247 lines
7.4 KiB
Markdown
247 lines
7.4 KiB
Markdown
|
This blog post addresses a long-standing FIXME in the
|
|||
|
conduit-combinators documentation, as well as
|
|||
|
[a question on Twitter](https://twitter.com/bitemyapp/status/860919651957710849). This
|
|||
|
blog post will assume familiarity with the Conduit streaming data
|
|||
|
library; if you'd like to read up on it first, please
|
|||
|
[check out the tutorial](https://haskell-lang.org/library/conduit). The
|
|||
|
full executable snippet is at the end of this blog post, but we'll
|
|||
|
build up intermediate bits along the way. First, the
|
|||
|
[Stack](https://haskell-lang.org/get-started) script header, import
|
|||
|
statement, and some minor helper functions.
|
|||
|
|
|||
|
```haskell
|
|||
|
#!/usr/bin/env stack
|
|||
|
--stack --resolver lts-8.12 script
|
|||
|
import Conduit
|
|||
|
|
|||
|
src10 :: Monad m => ConduitM i Int m ()
|
|||
|
src10 = yieldMany [1..10]
|
|||
|
|
|||
|
remaining :: MonadIO m => ConduitM i o m ()
|
|||
|
remaining = lengthC >>= \x -> liftIO (putStrLn ("Remaining: " ++ show x))
|
|||
|
```
|
|||
|
|
|||
|
`src10` just provides the numbers 1 through 10 as a source, and
|
|||
|
`remaining` tells you how many values are remaining from
|
|||
|
upstream. Cool.
|
|||
|
|
|||
|
Now let's pretend that the Conduit libraries completely forgot to
|
|||
|
provide a `drop` function. That is, a function that will take an `Int`
|
|||
|
and discard that many values from the upstream. We could write one
|
|||
|
ourselves pretty easily:
|
|||
|
|
|||
|
```haskell
|
|||
|
dropSink :: Monad m => Int -> ConduitM i o m ()
|
|||
|
dropSink cnt
|
|||
|
| cnt <= 0 = return ()
|
|||
|
| otherwise = await >> dropSink (cnt - 1)
|
|||
|
```
|
|||
|
|
|||
|
(Bonus points to readers: this function is inefficient in the case
|
|||
|
that upstream has less than `cnt` values, optimize it.)
|
|||
|
|
|||
|
This function will drop a certain number of elements from upstream, so
|
|||
|
the next component we monadically bind with can pick it up. Let's see
|
|||
|
how that looks:
|
|||
|
|
|||
|
```haskell
|
|||
|
goodDropSink :: IO ()
|
|||
|
goodDropSink = runConduit
|
|||
|
$ src10
|
|||
|
.| (dropSink 5 >> remaining)
|
|||
|
```
|
|||
|
|
|||
|
All well and good. But notice two things:
|
|||
|
|
|||
|
* I called this `dropSink`. Why sink?
|
|||
|
* I stressed that we had to monadically bind. Why?
|
|||
|
|
|||
|
Well, there's another formulation of this drop function. Instead of
|
|||
|
letting the next monadically bound component pick up remaining values,
|
|||
|
we could _pass the remaining values downstream_. Fortunately it's
|
|||
|
really easy to implement this function in terms of `dropSink`:
|
|||
|
|
|||
|
```haskell
|
|||
|
dropTrans :: Monad m => Int -> ConduitM i i m ()
|
|||
|
dropTrans cnt = dropSink cnt >> mapC id
|
|||
|
```
|
|||
|
|
|||
|
(For more meaningless bonus points, feel free to implement this
|
|||
|
without `dropSink`, or for a greater challenge, implement `dropSink`
|
|||
|
in terms of `dropTrans`.) Anyway, this function can be used easily as:
|
|||
|
|
|||
|
```haskell
|
|||
|
goodDropTrans :: IO ()
|
|||
|
goodDropTrans = runConduit
|
|||
|
$ src10
|
|||
|
.| dropTrans 5
|
|||
|
.| remaining
|
|||
|
```
|
|||
|
|
|||
|
Many may argue that this is more natural. To some extent, it mirrors
|
|||
|
the behavior of `take` more closely, as `take` passes the initial
|
|||
|
values downstream. On the other hand, `dropTrans` cannot _guarantee_
|
|||
|
that the values will be removed from the stream; if instead of
|
|||
|
`dropTrans 5 .| remaining` I simply did `dropTrans 5 .| return ()`,
|
|||
|
then the `dropTrans` would never have a chance to fire, since
|
|||
|
execution is driven from downstream. Also, as demonstrated, it's
|
|||
|
really easy to capture this transformer behavior from the sink
|
|||
|
behavior; the other way is trickier.
|
|||
|
|
|||
|
My point here is that we have two legitimate definitions of a
|
|||
|
function. And from my experience, different people expect different
|
|||
|
behavior for the function. In fact, some people (myself included)
|
|||
|
intuitively expect different behavior _depending on the circumstance_!
|
|||
|
This is what earns `drop` the title of worst function in conduit.
|
|||
|
|
|||
|
To make it even more clear how bad this is, let's see how you can
|
|||
|
misuse these functions unintentionally.
|
|||
|
|
|||
|
```haskell
|
|||
|
badDropSink :: IO ()
|
|||
|
badDropSink = runConduit
|
|||
|
$ src10
|
|||
|
.| dropSink 5
|
|||
|
.| remaining
|
|||
|
```
|
|||
|
|
|||
|
This code looks perfectly reasonable, and if we just replaced
|
|||
|
`dropSink` with `dropTrans`, it would be correct. But instead of
|
|||
|
saying, as expected, that we have 5 values remaining, this will
|
|||
|
print 0. The reason: `src10` yields 10 values to
|
|||
|
`dropSink`. `dropSink` drops 5 of those and leaves the remaining 5
|
|||
|
untouched. But `dropSink` never itself yields a value downstream, so
|
|||
|
`remaining` receives nothing.
|
|||
|
|
|||
|
Because of the type system, it's slightly trickier to misuse
|
|||
|
`dropTrans`. Let's first do the naive thing of just assuming it's
|
|||
|
`dropSink`:
|
|||
|
|
|||
|
```haskell
|
|||
|
badDropTrans :: IO ()
|
|||
|
badDropTrans = runConduit
|
|||
|
$ src10
|
|||
|
.| (dropTrans 5 >> remaining)
|
|||
|
```
|
|||
|
|
|||
|
GHC does not like this one bit:
|
|||
|
|
|||
|
```
|
|||
|
error:
|
|||
|
• Couldn't match type ‘Int’ with ‘Data.Void.Void’
|
|||
|
Expected type: ConduitM () Data.Void.Void IO ()
|
|||
|
Actual type: ConduitM () Int IO ()
|
|||
|
```
|
|||
|
|
|||
|
The problem is that `runConduit` expects a pipeline where the final
|
|||
|
output value is `Void`. However, `dropTrans` has an output value of
|
|||
|
type `Int`. And if it's yielding `Int`s, so must `remaining`. This is
|
|||
|
definitely an argument in favor of `dropTrans` being the better
|
|||
|
function: the type system helps us a bit. (It's also an argument in
|
|||
|
favor of keeping
|
|||
|
[the type signature of `runConduit` as-is](http://www.snoyman.com/blog/2017/04/generalizing-type-signatures).)
|
|||
|
|
|||
|
However, it's still possible to accidentally screw things up in bigger
|
|||
|
pipelines, e.g.:
|
|||
|
|
|||
|
```haskell
|
|||
|
badDropTrans :: IO ()
|
|||
|
badDropTrans = runConduit
|
|||
|
$ src10
|
|||
|
.| (dropTrans 5 >> remaining)
|
|||
|
.| (sinkList >>= liftIO . print)
|
|||
|
```
|
|||
|
|
|||
|
This code may look a bit contrived, but in real-world Conduit code
|
|||
|
it's not at all uncommon to deeply nest these components in such a way
|
|||
|
that the error would not be present. You may be surprised to hear that
|
|||
|
the output of this program is:
|
|||
|
|
|||
|
```
|
|||
|
Remaining: 0
|
|||
|
[6,7,8,9,10]
|
|||
|
```
|
|||
|
|
|||
|
The reason is that the `sinkList` is downstream from `dropTrans`, and
|
|||
|
grabs all of its output. `dropTrans` itself will drain all output from
|
|||
|
`src10`, leaving nothing behind for `remaining` to grab.
|
|||
|
|
|||
|
The Conduit libraries use the `dropSink` variety of function. I wish
|
|||
|
there was a better approach here that felt more intuitive to
|
|||
|
everyone. The closest I can think of to that is deprecating `drop` and
|
|||
|
replacing it with more explicitly named `dropSink` and `dropTrans`,
|
|||
|
but I'm not sure how I feel about that (feedback welcome, and other
|
|||
|
ideas _certainly_ welcome).
|
|||
|
|
|||
|
* * *
|
|||
|
|
|||
|
Full code
|
|||
|
|
|||
|
```haskell
|
|||
|
#!/usr/bin/env stack
|
|||
|
--stack --resolver lts-8.12 script
|
|||
|
import Conduit
|
|||
|
|
|||
|
dropSink :: Monad m => Int -> ConduitM i o m ()
|
|||
|
dropSink cnt
|
|||
|
| cnt <= 0 = return ()
|
|||
|
| otherwise = await >> dropSink (cnt - 1)
|
|||
|
|
|||
|
dropTrans :: Monad m => Int -> ConduitM i i m ()
|
|||
|
dropTrans cnt = dropSink cnt >> mapC id
|
|||
|
|
|||
|
src10 :: Monad m => ConduitM i Int m ()
|
|||
|
src10 = yieldMany [1..10]
|
|||
|
|
|||
|
remaining :: MonadIO m => ConduitM i o m ()
|
|||
|
remaining = lengthC >>= \x -> liftIO (putStrLn ("Remaining: " ++ show x))
|
|||
|
|
|||
|
goodDropSink :: IO ()
|
|||
|
goodDropSink = runConduit
|
|||
|
$ src10
|
|||
|
.| (dropSink 5 >> remaining)
|
|||
|
|
|||
|
badDropSink :: IO ()
|
|||
|
badDropSink = runConduit
|
|||
|
$ src10
|
|||
|
.| dropSink 5
|
|||
|
.| remaining
|
|||
|
|
|||
|
goodDropTrans :: IO ()
|
|||
|
goodDropTrans = runConduit
|
|||
|
$ src10
|
|||
|
.| dropTrans 5
|
|||
|
.| remaining
|
|||
|
|
|||
|
badDropTrans :: IO ()
|
|||
|
badDropTrans = runConduit
|
|||
|
$ src10
|
|||
|
.| (dropTrans 5 >> remaining)
|
|||
|
.| (sinkList >>= liftIO . print)
|
|||
|
|
|||
|
main :: IO ()
|
|||
|
main = do
|
|||
|
putStrLn "Good drop sink"
|
|||
|
goodDropSink
|
|||
|
putStrLn "Bad drop sink"
|
|||
|
badDropSink
|
|||
|
putStrLn "Good drop trans"
|
|||
|
goodDropTrans
|
|||
|
putStrLn "Bad drop trans"
|
|||
|
badDropTrans
|
|||
|
```
|
|||
|
|
|||
|
Full output
|
|||
|
|
|||
|
```
|
|||
|
Good drop sink
|
|||
|
Remaining: 5
|
|||
|
Bad drop sink
|
|||
|
Remaining: 0
|
|||
|
Good drop trans
|
|||
|
Remaining: 5
|
|||
|
Bad drop trans
|
|||
|
Remaining: 0
|
|||
|
[6,7,8,9,10]
|
|||
|
```
|