--- title: What Makes Haskell Unique --- ### What Makes Haskell Unique * Michael Snoyman * VP of Engineering, FP Complete * F(by) 2017
--- ## Why uniqueness matters * Programmers have lots of options * Need to know what distinguishes programming languages * Need to understand what makes Haskell different from other languages ---- ## Is Haskell functional? * What even is a functional language? * Lax definition * First class functions * Higher order functions * Wait... is C functional? ---- __Haskell may be functional, but that doesn't make it unique__ ---- ## Let's Describe Haskell * Functional * Statically typed * Pure * Lazy * Strongly typed * Green threads * Native executables * Garbage collected * Immutability ---- __It's the combination of these features that makes Haskell unique__ * Example: purity + strong typing + functional style leads to: * Easy to write * Easy to read * Easy to modify * Efficient * We'll get to this later * Now: lots of examples! * Different here is usually better, but some downsides --- ## Async I/O and Concurrency What's wrong with this? ``` json1 := httpGet(url1) json2 := httpGet(url2) useJsonBodies(json1, json2) ``` * Hint: it's in the title of this slide * Ties up an entire system thread on blocking I/O calls * We want to be more efficient with resources, so... ---- ## Callbacks ``` httpGetA(url1, |json1| => httpGetA(url2, |json2| => useJsonBodies(json1, json2) ) ) ``` * Aka "callback hell" * Lots of techniques to work around it, e.g. promises/futures * "Oh, promises form a monad!" Not even going there today :) ---- ## Asynchronous Haskell version ```haskell json1 <- httpGet url1 json2 <- httpGet url2 useJsonBodies json1 json2 ``` * But that looks just like the blocking code! Exactly * Runtime converts to async system calls * Runtime schedules threads * Sleeps when waiting for data * Wake them up when data is available * Not only Haskell: Erlang and Go do this too * Therefore.... ---- ---- ## Concurrency * Why wait for `url1` before starting `url2`? * Need to fork threads, write to mutable variables, do some locking * Or be awesome
(json1, json2) <- concurrently
  (httpGet url1)
  (httpGet url2)
useJsonBodies json1 json2
---- ## Canceling * We only want one of the responses * Take whichever comes first ``` promise1 := httpGet(url1) promise2 := httpGet(url2) result := newMutex() promise1.andThen(|json1| => result.set(json1) promise2.cancel()) promise2.andThen(|json2| => result.set(json2) promise1.cancel()) useJsonBody(result.get()) ``` ---- ## Canceling in Haskell ```haskell eitherJson <- race (httpGet url1) (httpGet url2) case eitherJson of Left json1 -> useJsonBody1 json1 Right json2 -> useJsonBody2 json2 ``` * More than just a well designed API * Depends on asynchronous exceptions * Cancel any other running thread ---- ## Not just about I/O * Thread scheduling, sleeping, killing works for CPU bound tasks too! * Don't need to worry about a heavy computation starving other threads * No need to offload your heavy tasks to a different microservice, do it all in Haskell ```haskell let tenSeconds = 10 * 1000 * 1000 timeout tenSeconds expensiveComputation ``` ---- ## Summary: concurrency and async I/O

Advantages

Disadvantages

--- ## Immutability and purity * Most languages default to mutable values * Haskell differs in two ways: * Immutable by default, explicit kind of mutability * Mutating is an effect, tracked by the type system ---- ## Mutable Haskell Impossible ```haskell let mut total = 0 loop i = if i > 1000000 then total else total += i; loop (i + 1) in loop 1 ``` Real and tedious ```haskell total <- newIORef 0 let loop i = if i > 1000000 then readIORef total else do modifyIORef total (+ i) loop (i + 1) loop 1 ``` ---- ## Better Haskell Recursive and immutable, much better! ```haskell let loop i total = if i > 1000000 then total else loop (i + 1) (total + i) in loop 1 0 ``` But why does this matter? ---- ## Reasoning about code Guess the output ``` // scores.txt Alice,32 Bob,55 Charlie,22 func main() { results := readResultsFromFile("results.txt") printScoreRange(results) print("First result was by: " + results[0].name) } func printScoreRange(results: Vector) { ... } ``` ---- ## Expected output ``` Lowest: 22 Highest: 55 First result was by: Alice ``` ---- ## What's in printScoreRange? ``` func printScoreRange(results: Vector) { results.sortBy(|result| => result.score) print("Lowest: " + results[0].score) print("Highest: " + results[results.len() - 1].score) } ```
Actual output:
Lowest: 22
Highest: 55
First result was by: Charlie
Non-local changes broke our guessed result
---- ---- ## Do it in Haskell ```haskell main :: IO () main = do results <- readResultsFromFile "results.txt" printScoreRange results putStrLn $ "First result was by: " ++ name (head results) printScoreRange :: [TestResult] -> IO () printScoreRange results = do let results' = sortBy score results putStrLn $ "Lowest: " ++ show (score (head results')) putStrLn $ "Highest: " ++ show (score (last results')) ``` * Impossible for `printScoreRange` to modify results * `printScoreRange` sorts into a local copy * Have to think about less when writing `main` ---- ## Data races ```haskell main :: IO () main = do results <- readResultsFromFile "results.txt" concurrently_ printFirstResult printScoreRange printFirstResult results = putStrLn $ "First result was by: " ++ name (head results) printScoreRange results = do let results' = sortBy score results putStrLn $ "Lowest: " ++ show (score (head results')) putStrLn $ "Highest: " ++ show (score (last results')) ``` * Concurrent data accesses? No problem! * Concurrent data writes? Impossible! * We'll come back to mutable, multithreaded data ---- ## Mutability when needed * In place, mutable algorithms can be much faster * Example: sorting a vector with only pure transformations is __slow__ * Haskell's answers: 1. You can still have mutable data structures if you want them, but they're __explicit__ 2. Temporary mutable copy, then freeze it ---- ## Freeze! ```haskell sortMutable :: MutableVector a -> ST (MutableVector a) sortMutable = ... -- normal sorting algorithm sortImmutable :: Vector a -> Vector a sortImmutable orig = runST $ do mutable <- newMutableVector (length orig) copyValues orig mutable sortMutable mutable freeze mutable ``` * `ST` is for temporary, local mutability * Cannot be affected by the outside world, and cannot affect it * Keeps functional guarantee: same input ==> same output ---- ## Summary: immutability and purity
Advantages
  • Easier to reason about code
  • Avoid many cases of data races
  • Functions are more reliable, returning the same output for the same input
Disadvantages
  • Lots of ceremony if you actually want mutation
  • Some runtime performance hit for mutable algorithms
---- ## Concurrent Mutation What's wrong with this code? ``` runServer (|request| => { from := accounts.lookup(request.from) to := accounts.lookup(request.to) accounts.set(request.from, from - request.amt) accounts.set(request.to, to + request.amt) }) ``` Looks reasonable, but... ``` Thread 1: receive request: Alice gives $10 Thread 2: receive request: Alice receives $10 Thread 1: lookup that Alice has $50 Thread 2: lookup that Alice has $50 Thread 1: set Alice's account to $40 Thread 2: set Alice's account to $60 ``` NOTE: * What if you actually need to mutate values, and from multiple threads? * *Describe slide* * Alice ends up with either $40 or $60 instead of $50 ---- ## Locking ``` runServer (|request| => { accounts.lock(request.from) accounts.lock(request.to) // same code as before accounts.unlock(request.from) accounts.unlock(request.to) }) ``` ``` Thread 1: receive request: $50 from Alice to Bob Thread 2: receive request: $50 from Bob to Alice Thread 1: lock Alice Thread 2: lock Bob Thread 1: try to lock Bob, but can't, so wait Thread 2: try to lock Alice, but can't, so wait ``` __Deadlock!__ NOTE: Typical solution to this is to use locking, but it leads to other problems ---- ## Software Transactional Memory ```haskell runServer $ \request -> atomically $ do let fromVar = lookup (from request) accounts toVar = lookup (to request) accounts origFrom <- readTVar fromVar writeTVar fromVar (origFrom - amt request) origTo <- readTVar toVar writeTVar toVar (origTo + amt request) ``` * Looks like it has a race condition * But STM ensures transactions are atomic * No explicit locking required * `TVar` is an example of explicit mutation * Alternatives: `IORef`, `MVar` NOTE: * There are helper functions to make this shorter * Want to make a point with the longer code * STM will automatically retry when needed ---- ## The role of purity STM retries if a transaction isn't atomic. How many Bitcoins will I buy? ```haskell atomically $ do buyBitcoins 3 -- side effects on my bank account modifyTVar myBitcoinCount (+ 3) ``` * Trick question! Code doesn't compile * `atomically` only allows side effects on `TVar`s * Other side effects (like my bank account) are disallowed * Safe for runtime to retry thanks to purity NOTE: * `buyBitcoins` needs to go to an exchange and spend $100,000 * Due to retry, this code could spend $10m * This is where purity steps in ---- ## Summary of STM

Advantages

  • Makes concurrent data modification much easier
  • Bypass many race conditions and deadlocks

Disadvantages

  • Depends on purity to work at all
  • Not really a disadvantage, you're already stuck with purity in Haskell
  • Not really any other disadvantages, so just use it!
--- ## Laziness *A double edged sword* Let's revisit our previous summing example ```haskell let loop i total = if i > 1000000 then total else loop (i + 1) (total + i) in loop 1 0 ``` There are two problems with this code: 1. There's a major performance bug in it 2. It's much more cumbersome than it should be NOTE: Kind of cheeky to hold off on laziness this long ---- ## Space leaks Consider `let foo = 1 + 2` * `foo` isn't `3`, it's an instruction for how to create `3` * `foo` is a _thunk_ until it's evaluated * Storing thunks is more expensive than simple types like `Int`s * Which values are evaluated in our `loop`? ```haskell let loop i total = if i > 1000000 then total else loop (i + 1) (total + i) in loop 1 0 ``` NOTE: * The bane of laziness is space leaks, which you may have hard about. Need to understand how laziness is implemented. * Explain why `i` is forced and `total` is not * Builds a tree, lots of CPU and memory pressure ---- ## Explicit strictness Need to tell Haskell compiler to evaluate `total`. Bang! ```haskell let loop i !total = if i > 1000000 then total else loop (i + 1) (total + i) in loop 1 0 ``` * Needing to do this is a downside of Haskell's laziness * But do we get any benefits in return? NOTE: * Can be explicit about what needs to be evaluated * This is one approach, there are others * Just added a `!` ---- ## Looping (1) Let's write our `loop` in an imperative language: ``` total := 0 for(i := 1; i <= 1000000; i++) { total += i } ``` Or just the evens ``` total := 0 for(i := 1; i <= 1000000; i++) { if (isEven(i)) { total += i } } ``` ---- ## Looping (2) Now add up the values modulus 13 (for some weird reason) ``` total := 0 for(i := 1; i <= 1000000; i++) { if (isEven(i)) { total += i % 13 } } ``` * Each modification is fine * Getting harder to see the forest for the trees * If our logic was more complicated, code reuse would be an issue NOTE: Example of more complicated use case, writing a lookahead parser ---- ## Some better Haskell Our original recursive implementation sucked ```haskell let loop i !total = if i > 1000000 then total else loop (i + 1) (total + i) in loop 1 0 ```

But this is great

sum [1..1000000]
  • Doesn't it allocate 8mb of ints?
  • Nope, laziness!
  • Just a thunk telling us how to get the rest of the list
---- ## Composable Haskell Just the evens? ```haskell sum (filter even [1..1000000]) ``` Modulus 13? ```haskell sum (map (`mod` 13) (filter even [1..1000000])) ``` * Easy and natural to compose functions in a lazy context * Avoids doing unnecessary work or using too much memory NOTE: * Never using more than a few machine words * Other GHC optimizations avoid allocating any thunks * Not covering that today * Mixed bag, but functional+lazy=declarative, performant ---- ## Short circuiting for free In most languages, `&&` and `||` short circuit ``` foo() && bar() ``` * `bar` only called if `foo` returns `true` In Haskell: we get that for free from laziness: ```haskell False && _ = False True && x = x True || _ = True False || x = x ``` See also: `and`, `or`, `all`, `any` ---- ### Other downsides * Laziness means an exception can be hiding in any thunk * Aka partial functions: `head []` * Also, some inefficient functions still available, `foldl` vs `foldl'` NOTE: * Generally partial functions are frowned upon * But they're still in the language ---- ## Summary of laziness

Advantages

  • More composable code
  • Get efficient results with high level code
  • Short-circuiting no longer a special case

Disadvantages

  • Need to worry about space leaks
  • Exceptions can be hiding in many places
  • Bad functions still linger
---- ## Side note; other languages * Laziness is very similar to features in other languages * Python generators * Rust iterators * In Haskell, it's far more prevalent since it affects how all code works * However, you can get a lot of the benefits of Haskell with these techniques --- ## Other examples * Too much to talk about in 40 minutes! * Two other topics I wanted to touch on * Feel free to ask me about these at breaks ---- ## Parser (and other) DSLs * Operator overloading! * Abstractions like `Alternative` a natural fit * `parseXMLElement <|> parseXMLText`. * Able to reuse huge number of existing library functions, * `optional`, `many`, `foldMap` * General purpose `do`-notation is great ---- ## Parser example ```haskell data Time = Time Hour Minutes Seconds (Maybe AmPm) data AmPm = Am | Pm parseAmPm :: Parser Time parseAmPm = Time <$> decimal <*> (":" *> decimal) <*> (":" *> decimal) <*> optional (("AM" $> Am) <|> ("PM" $> Pm)) ``` c/o [@queertypes](https://twitter.com/queertypes/status/941064338848100352) ---- ## Advanced techniques * Free monads * Monad transformer stacks * Lens, conduit, pipes, ... * Lots of ways to do things in Haskell! * It's a plus and a minus * Recommendation: choose a useful subset of Haskell and its libraries, and define some best practices --- ## Conclusion * Haskell combines a lot of uncommon features * Very few of those features are unique * Combining those features allows you to write code very differently than in other languages * If you want readable, robust, easy to maintain code: I think it's a great choice * Be aware of the sharp edges: they do exist! ---- ## Questions? Thanks everyone!