From c0e47f3793a9e875072bbae018ccb15a78cb53bb Mon Sep 17 00:00:00 2001 From: Michael Snoyman Date: Mon, 2 Jan 2017 07:00:27 +0200 Subject: [PATCH] Import Foldable mapM_ content from SoH --- posts.yaml | 3 + ...able-mapm-maybe-and-recursive-functions.md | 209 ++++++++++++++++++ 2 files changed, 212 insertions(+) create mode 100644 posts/foldable-mapm-maybe-and-recursive-functions.md diff --git a/posts.yaml b/posts.yaml index 4be27c5..6de193f 100644 --- a/posts.yaml +++ b/posts.yaml @@ -1,3 +1,6 @@ +- file: posts/foldable-mapm-maybe-and-recursive-functions.md + title: Foldable.mapM_, Maybe, and recursive functions + day: 2017-01-10 - file: posts/functors-applicatives-and-monads.md title: Functors, Applicatives, and Monads day: 2017-01-03 diff --git a/posts/foldable-mapm-maybe-and-recursive-functions.md b/posts/foldable-mapm-maybe-and-recursive-functions.md new file mode 100644 index 0000000..1109a73 --- /dev/null +++ b/posts/foldable-mapm-maybe-and-recursive-functions.md @@ -0,0 +1,209 @@ +__NOTE__ This content originally appeared on [School of Haskell]( +https://www.schoolofhaskell.com/user/snoyberg/general-haskell/basics/foldable-mapm-maybe-and-recursive-functions). + +I've +[run into this issue myself](https://github.com/snoyberg/conduit/commit/11877684b3adb7ca422ae5000fab1ebeb3fbe142), +and seen others hit it too. Let's start off with some very simple +code: + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc runghc +sayHi :: Maybe String -> IO () +sayHi mname = + case mname of + Nothing -> return () + Just name -> putStrLn $ "Hello, " ++ name + +main :: IO () +main = sayHi $ Just "Alice" +``` + +There's nothing amazing about this code, it's pretty straight-forward +pattern matching Haskell. And at some point, many Haskellers end up +deciding that they don't like the explicit pattern matching, and +instead want to use a combinator. So the code above might get turned +into one of the following: + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc runghc +import Data.Foldable (forM_) + +hiHelper :: String -> IO () +hiHelper name = putStrLn $ "Hello, " ++ name + +sayHi1 :: Maybe String -> IO () +sayHi1 = maybe (return ()) hiHelper + +sayHi2 :: Maybe String -> IO () +sayHi2 = mapM_ hiHelper + +main :: IO () +main = do + sayHi1 $ Just "Alice" + sayHi2 $ Just "Bob" + -- or often times this: + forM_ (Just "Charlie") hiHelper +``` + +The theory is that all three approaches (`maybe`, `mapM_`, and +`forM_`) will end up being identical. We can fairly conclusively state +that `forM_` will be the exact same thing as `mapM_`, since +[it's just `mapM_` flipped](https://www.stackage.org/haddock/lts-7.14/base-4.9.0.0/src/Data-Foldable.html#forM_). So +the question is: will the `maybe` and `mapM_` approaches do the same +thing? In this case, the answer is yes, but let's spice it up a bit +more. First, the `maybe` version: + + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s +import Control.Monad (when) + +uncons :: [a] -> Maybe (a, [a]) +uncons [] = Nothing +uncons (x:xs) = Just (x, xs) + +printChars :: Int -> [Char] -> IO () +printChars idx str = maybe (return ()) (\(c, str') -> do + when (idx `mod` 100000 == 0) + $ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c + printChars (idx + 1) str') (uncons str) + +main :: IO () +main = printChars 1 $ replicate 5000000 'x' +``` + +You can compile and run this by saving to a `Main.hs` file and running +`stack Main.hs && ./Main`. On my system, it prints out the following +memory statistics, which from the maximum residency you can see runs +in constant space: + +``` + 2,200,270,200 bytes allocated in the heap + 788,296 bytes copied during GC + 44,384 bytes maximum residency (2 sample(s)) + 24,528 bytes maximum slop + 1 MB total memory in use (0 MB lost due to fragmentation) +``` + +While constant space is good, the usage of `maybe` makes this a bit +ugly. This is a common time to use `forM_` to syntactically clean +things up. So let's give that a shot: + + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s +import Control.Monad (when, forM_) + +uncons :: [a] -> Maybe (a, [a]) +uncons [] = Nothing +uncons (x:xs) = Just (x, xs) + +printChars :: Int -> [Char] -> IO () +printChars idx str = forM_ (uncons str) $ \(c, str') -> do + when (idx `mod` 100000 == 0) + $ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c + printChars (idx + 1) str' + +main :: IO () +main = printChars 1 $ replicate 5000000 'x' +``` + +The code is arguablycleaner and easier to follow. However, when I run +it, I get the following memory stats: + +``` + 3,443,468,248 bytes allocated in the heap + 632,375,152 bytes copied during GC + 132,575,648 bytes maximum residency (11 sample(s)) + 2,348,288 bytes maximum slop + 331 MB total memory in use (0 MB lost due to fragmentation) +``` + +Notice how max residency has balooned up from 42kb to 132mb! And if +you increase the size of the generated list, that number grows. In +other words: we have _linear_ memory usage instead of constant, +clearer something we want to avoid. + +The issue is that the implementation of `mapM_` in `Data.Foldable` is +not tail recursive, at least for the case of `Maybe`. As a result, +each recursive call ends up accumulating a bunch of "do nothing" +actions to perform after completing the recursive call, which all +remain resident in memory until the entire list is traversed. + +Fortunately, solving this issue is pretty easy: write a tail-recursive +version of `forM_` for `Maybe`: + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s +import Control.Monad (when) + +uncons :: [a] -> Maybe (a, [a]) +uncons [] = Nothing +uncons (x:xs) = Just (x, xs) + +forM_Maybe :: Monad m => Maybe a -> (a -> m ()) -> m () +forM_Maybe Nothing _ = return () +forM_Maybe (Just x) f = f x + +printChars :: Int -> [Char] -> IO () +printChars idx str = forM_Maybe (uncons str) $ \(c, str') -> do + when (idx `mod` 100000 == 0) + $ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c + printChars (idx + 1) str' + +main :: IO () +main = printChars 1 $ replicate 5000000 'x' +``` + +This implementation once again runs in constant memory. + +There's one slight difference in the type of `forM_Maybe` and `forM_` +specialized to `Maybe`. The former takes a second argument of type `a +-> m ()`, while the latter takes a second argument of type `a -> m +b`. This difference is unfortunately necessary; if we try to get back +the original type signature, we have to add an extra action to wipe +out the return value, which again reintroduces the memory leak: + +```haskell +forM_Maybe :: Monad m => Maybe a -> (a -> m b) -> m () +forM_Maybe Nothing _ = return () +forM_Maybe (Just x) f = f x >> return () +``` + +Try swapping in this implementation into the above program, and once +again you'll get your memory leak. + +## mono-traversable + +Back in 2014, I raised this same issue +[about the mono-traversable library](https://github.com/snoyberg/mono-traversable/issues/28), +and ultimately decided to change the type signature of the `omapM_` +function to the non-overflowing demonstrated above. You can see that +this in fact works: + +```haskell +#!/usr/bin/env stack +-- stack --resolver lts-7.14 --install-ghc exec --package mono-traversable -- ghc -with-rtsopts -s +import Control.Monad (when) +import Data.MonoTraversable (oforM_) + +uncons :: [a] -> Maybe (a, [a]) +uncons [] = Nothing +uncons (x:xs) = Just (x, xs) + +printChars :: Int -> [Char] -> IO () +printChars idx str = oforM_ (uncons str) $ \(c, str') -> do + when (idx `mod` 100000 == 0) + $ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c + printChars (idx + 1) str' + +main :: IO () +main = printChars 1 $ replicate 5000000 'x' +``` + +As we'd hope, this runs in constant memory.