snoyman.com-content/reveal/what-makes-haskell-unique.md at 773f64d624db16db2877b0c6b93bb3782c469846

Michael Snoyman 44180f46c5

2017-12-17 10:50:51 +03:00

18 KiB

Raw Blame History

title
What Makes Haskell Unique

What Makes Haskell Unique

Michael Snoyman
VP of Engineering, FP Complete
F(by) 2017

Why uniqueness matters

Programmers have lots of options
Need to know what distinguishes programming languages
Need to understand what makes Haskell different from other languages

Is Haskell functional?

What even is a functional language?
Lax definition
- First class functions
- Higher order functions
Wait... is C functional?

Haskell may be functional, but that doesn't make it unique

Let's Describe Haskell

Functional
Statically typed
Pure
Lazy
Strongly typed
Green threads
Native executables
Garbage collected
Immutability

It's the combination of these features that makes Haskell unique

Example: purity + strong typing + functional style leads to:
- Easy to write
- Easy to read
- Easy to modify
- Efficient
We'll get to this later
Now: lots of examples!
Different here is usually better, but some downsides

Async I/O and Concurrency

What's wrong with this?

json1 := httpGet(url1)
json2 := httpGet(url2)
useJsonBodies(json1, json2)

Hint: it's in the title of this slide
Ties up an entire system thread on blocking I/O calls
We want to be more efficient with resources, so...

Callbacks

httpGetA(url1, |json1| =>
  httpGetA(url2, |json2| =>
    useJsonBodies(json1, json2)
  )
)

Aka "callback hell"
Lots of techniques to work around it, e.g. promises/futures
"Oh, promises form a monad!" Not even going there today :)

Asynchronous Haskell version

json1 <- httpGet url1
json2 <- httpGet url2
useJsonBodies json1 json2

But that looks just like the blocking code! Exactly
Runtime converts to async system calls
Runtime schedules threads
- Sleeps when waiting for data
- Wake them up when data is available
Not only Haskell: Erlang and Go do this too
- Therefore....

Concurrency

Why wait for url1 before starting url2?
Need to fork threads, write to mutable variables, do some locking
Or be awesome

(json1, json2) <- concurrently
  (httpGet url1)
  (httpGet url2)
useJsonBodies json1 json2

Cheap green thread implementation
Wonderful `async` library
Builds on the async I/O system

Canceling

We only want one of the responses
Take whichever comes first

promise1 := httpGet(url1)
promise2 := httpGet(url2)
result := newMutex()
promise1.andThen(|json1| =>
  result.set(json1)
  promise2.cancel())
promise2.andThen(|json2| =>
  result.set(json2)
  promise1.cancel())
useJsonBody(result.get())

Canceling in Haskell

eitherJson <- race
  (httpGet url1)
  (httpGet url2)
case eitherJson of
  Left  json1 -> useJsonBody1 json1
  Right json2 -> useJsonBody2 json2

More than just a well designed API
Depends on asynchronous exceptions
Cancel any other running thread

Not just about I/O

Thread scheduling, sleeping, killing works for CPU bound tasks too!
Don't need to worry about a heavy computation starving other threads
No need to offload your heavy tasks to a different microservice, do it all in Haskell

let tenSeconds = 10 * 1000 * 1000
timeout tenSeconds expensiveComputation

Summary: concurrency and async I/O

Advantages

Cheap threads
Simple API
Highly responsive

Disadvantages

Complicated runtime system
Need to be aware of async exceptions when writing code

Immutability and purity

Most languages default to mutable values
Haskell differs in two ways:
- Immutable by default, explicit kind of mutability
- Mutating is an effect, tracked by the type system

Mutable Haskell

Impossible

let mut total = 0
    loop i =
      if i > 1000000
        then total
        else total += i; loop (i + 1)
 in loop 1

Real and tedious

total <- newIORef 0
let loop i =
      if i > 1000000
        then readIORef total
        else do
          modifyIORef total (+ i)
          loop (i + 1)
loop 1

Better Haskell

Recursive and immutable, much better!

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

But why does this matter?

Reasoning about code

Guess the output

// scores.txt
Alice,32
Bob,55
Charlie,22

func main() {
  results := readResultsFromFile("results.txt")
  printScoreRange(results)
  print("First result was by: " + results[0].name)
}

func printScoreRange(results: Vector<TestResult>) {
  ...
}

Expected output

Lowest: 22
Highest: 55
First result was by: Alice

But let's see the definition of printScoreRange

What's in printScoreRange?

func printScoreRange(results: Vector<TestResult>) {
  results.sortBy(|result| => result.score)
  print("Lowest: " + results[0].score)
  print("Highest: " + results[results.len() - 1].score)
}

Actual output:

Lowest: 22
Highest: 55
First result was by: Charlie

Non-local changes broke our guessed result

Do it in Haskell

main :: IO ()
main = do
  results <- readResultsFromFile "results.txt"
  printScoreRange results
  putStrLn $ "First result was by: " ++ name (head results)

printScoreRange :: [TestResult] -> IO ()
printScoreRange results = do
  let results' = sortBy score results
  putStrLn $ "Lowest: " ++ show (score (head results'))
  putStrLn $ "Highest: " ++ show (score (last results'))

Impossible for printScoreRange to modify results
printScoreRange sorts into a local copy
Have to think about less when writing main

Data races

main :: IO ()
main = do
  results <- readResultsFromFile "results.txt"
  concurrently_ printFirstResult printScoreRange

printFirstResult results =
  putStrLn $ "First result was by: " ++ name (head results)

printScoreRange results = do
  let results' = sortBy score results
  putStrLn $ "Lowest: " ++ show (score (head results'))
  putStrLn $ "Highest: " ++ show (score (last results'))

Concurrent data accesses? No problem!
Concurrent data writes? Impossible!
We'll come back to mutable, multithreaded data

Mutability when needed

In place, mutable algorithms can be much faster
Example: sorting a vector with only pure transformations is slow
Haskell's answers:
1. You can still have mutable data structures if you want them, but they're explicit
2. Temporary mutable copy, then freeze it

Freeze!

sortMutable :: MutableVector a -> ST (MutableVector a)
sortMutable = ... -- normal sorting algorithm

sortImmutable :: Vector a -> Vector a
sortImmutable orig = runST $ do
  mutable <- newMutableVector (length orig)
  copyValues orig mutable
  sortMutable mutable
  freeze mutable

ST is for temporary, local mutability
Cannot be affected by the outside world, and cannot affect it
Keeps functional guarantee: same input ==> same output

Summary: immutability and purity

Advantages

Easier to reason about code
Avoid many cases of data races
Functions are more reliable, returning the same output for the same input

Disadvantages

Lots of ceremony if you actually want mutation
Some runtime performance hit for mutable algorithms

Concurrent Mutation

What's wrong with this code?

runServer (|request| => {
  from := accounts.lookup(request.from)
  to := accounts.lookup(request.to)
  accounts.set(request.from, from - request.amt)
  accounts.set(request.to, to + request.amt)
})

Looks reasonable, but...

Thread 1: receive request: Alice gives $10
Thread 2: receive request: Alice receives $10
Thread 1: lookup that Alice has $50
Thread 2: lookup that Alice has $50
Thread 1: set Alice's account to $40
Thread 2: set Alice's account to $60

NOTE:

What if you actually need to mutate values, and from multiple threads?
Describe slide
Alice ends up with either $40 or $60 instead of $50

Locking

runServer (|request| => {
  accounts.lock(request.from)
  accounts.lock(request.to)
  // same code as before
  accounts.unlock(request.from)
  accounts.unlock(request.to)
})

Thread 1: receive request: $50 from Alice to Bob
Thread 2: receive request: $50 from Bob to Alice
Thread 1: lock Alice
Thread 2: lock Bob
Thread 1: try to lock Bob, but can't, so wait
Thread 2: try to lock Alice, but can't, so wait

Deadlock!

NOTE: Typical solution to this is to use locking, but it leads to other problems

Software Transactional Memory

runServer $ \request -> atomically $ do
  let fromVar = lookup (from request) accounts
      toVar = lookup (to request) accounts
  origFrom <- readTVar fromVar
  writeTVar fromVar (origFrom - amt request)
  origTo <- readTVar toVar
  writeTVar toVar (origTo + amt request)

Looks like it has a race condition
But STM ensures transactions are atomic
No explicit locking required
TVar is an example of explicit mutation
Alternatives: IORef, MVar

NOTE:

There are helper functions to make this shorter
Want to make a point with the longer code
STM will automatically retry when needed

The role of purity

STM retries if a transaction isn't atomic. How many Bitcoins will I buy?

atomically $ do
  buyBitcoins 3 -- side effects on my bank account

  modifyTVar myBitcoinCount (+ 3)

Trick question! Code doesn't compile
atomically only allows side effects on TVars
Other side effects (like my bank account) are disallowed
Safe for runtime to retry thanks to purity

NOTE:

buyBitcoins needs to go to an exchange and spend $100,000
Due to retry, this code could spend $10m
This is where purity steps in

Summary of STM

Advantages

Makes concurrent data modification much easier
Bypass many race conditions and deadlocks

Disadvantages

Depends on purity to work at all
Not really a disadvantage, you're already stuck with purity in Haskell
Not really any other disadvantages, so just use it!

Laziness

A double edged sword

Let's revisit our previous summing example

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

There are two problems with this code:

There's a major performance bug in it
It's much more cumbersome than it should be

NOTE: Kind of cheeky to hold off on laziness this long

Space leaks

Consider let foo = 1 + 2

foo isn't 3, it's an instruction for how to create 3
foo is a thunk until it's evaluated
Storing thunks is more expensive than simple types like Ints
Which values are evaluated in our loop?

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

NOTE:

The bane of laziness is space leaks, which you may have hard about. Need to understand how laziness is implemented.
Explain why i is forced and total is not
Builds a tree, lots of CPU and memory pressure

Explicit strictness

Need to tell Haskell compiler to evaluate total. Bang!

let loop i !total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

Needing to do this is a downside of Haskell's laziness
But do we get any benefits in return?

NOTE:

Can be explicit about what needs to be evaluated
This is one approach, there are others
Just added a !

Looping (1)

Let's write our loop in an imperative language:

total := 0
for(i := 1; i <= 1000000; i++) {
  total += i
}

Or just the evens

total := 0
for(i := 1; i <= 1000000; i++) {
  if (isEven(i)) {
    total += i
  }
}

Looping (2)

Now add up the values modulus 13 (for some weird reason)

total := 0
for(i := 1; i <= 1000000; i++) {
  if (isEven(i)) {
    total += i % 13
  }
}

Each modification is fine
Getting harder to see the forest for the trees
If our logic was more complicated, code reuse would be an issue

NOTE: Example of more complicated use case, writing a lookahead parser

Some better Haskell

Our original recursive implementation sucked

let loop i !total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

But this is great

sum [1..1000000]

Doesn't it allocate 8mb of ints?
Nope, laziness!
Just a thunk telling us how to get the rest of the list

Composable Haskell

Just the evens?

sum (filter even [1..1000000])

Modulus 13?

sum (map (`mod` 13) (filter even [1..1000000]))

Easy and natural to compose functions in a lazy context
Avoids doing unnecessary work or using too much memory

NOTE:

Never using more than a few machine words
Other GHC optimizations avoid allocating any thunks
Not covering that today
Mixed bag, but functional+lazy=declarative, performant

Short circuiting for free

In most languages, && and || short circuit

foo() && bar()

bar only called if foo returns true

In Haskell: we get that for free from laziness:

False && _ = False
True && x = x

True || _ = True
False || x = x

Other downsides

Laziness means an exception can be hiding in any thunk
Aka partial functions: head []
Also, some inefficient functions still available, foldl vs foldl'

NOTE:

Generally partial functions are frowned upon
But they're still in the language

Summary of laziness

Advantages

More composable code
Get efficient results with high level code
Short-circuiting no longer a special case

Disadvantages

Need to worry about space leaks
Exceptions can be hiding in many places
Bad functions still linger

Side note; other languages

Laziness is very similar to features in other languages
Python generators
Rust iterators
In Haskell, it's far more prevalent since it affects how all code works
However, you can get a lot of the benefits of Haskell with these techniques

Other examples

Too much to talk about in 40 minutes!
Two other topics I wanted to touch on
Feel free to ask me about these at breaks

Parser (and other) DSLs

Operator overloading!
Abstractions like Alternative a natural fit
- parseXMLElement <|> parseXMLText.
Able to reuse huge number of existing library functions,
- optional, many, foldMap
General purpose do-notation is great

Parser example

data Time = Time Hour Minutes Seconds (Maybe AmPm)
data AmPm = Am | Pm

parseAmPm :: Parser Time
parseAmPm = Time
  <$> decimal
  <*> (":" *> decimal)
  <*> (":" *> decimal)
  <*> optional (("AM" $> Am) <|> ("PM" $> Pm))

c/o @queertypes

Advanced techniques

Free monads
Monad transformer stacks
Lens, conduit, pipes, ...
Lots of ways to do things in Haskell!
It's a plus and a minus
Recommendation: choose a useful subset of Haskell and its libraries, and define some best practices

Conclusion

Haskell combines a lot of uncommon features
Very few of those features are unique
Combining those features allows you to write code very differently than in other languages
If you want readable, robust, easy to maintain code: I think it's a great choice
Be aware of the sharp edges: they do exist!

Questions?

Thanks everyone!

18 KiB Raw Blame History

What Makes Haskell Unique

Why uniqueness matters

Is Haskell functional?

Let's Describe Haskell

Async I/O and Concurrency

Callbacks

Asynchronous Haskell version

Concurrency

Canceling

Canceling in Haskell

Not just about I/O

Summary: concurrency and async I/O

Advantages

Disadvantages

Immutability and purity

Mutable Haskell

Better Haskell

Reasoning about code

Expected output

What's in printScoreRange?

Do it in Haskell

Data races

Mutability when needed

Freeze!

Summary: immutability and purity

Concurrent Mutation

Locking

Software Transactional Memory

The role of purity

Summary of STM

Advantages

Disadvantages

Laziness

Space leaks

Explicit strictness

Looping (1)

Looping (2)

Some better Haskell

Composable Haskell

Short circuiting for free

Other downsides

Summary of laziness

Advantages

Disadvantages

Side note; other languages

Other examples

Parser (and other) DSLs

Parser example

Advanced techniques

Conclusion

Questions?

18 KiB

Raw Blame History