18 KiB
title |
---|
What Makes Haskell Unique |
What Makes Haskell Unique
- Michael Snoyman
- VP of Engineering, FP Complete
- F(by) 2017
Why uniqueness matters
- Programmers have lots of options
- Need to know what distinguishes programming languages
- Need to understand what makes Haskell different from other languages
Is Haskell functional?
- What even is a functional language?
- Lax definition
- First class functions
- Higher order functions
- Wait... is C functional?
Haskell may be functional, but that doesn't make it unique
Let's Describe Haskell
- Functional
- Statically typed
- Pure
- Lazy
- Strongly typed
- Green threads
- Native executables
- Garbage collected
- Immutability
It's the combination of these features that makes Haskell unique
- Example: purity + strong typing + functional style leads to:
- Easy to write
- Easy to read
- Easy to modify
- Efficient
- We'll get to this later
- Now: lots of examples!
- Different here is usually better, but some downsides
Async I/O and Concurrency
What's wrong with this?
json1 := httpGet(url1)
json2 := httpGet(url2)
useJsonBodies(json1, json2)
- Hint: it's in the title of this slide
- Ties up an entire system thread on blocking I/O calls
- We want to be more efficient with resources, so...
Callbacks
httpGetA(url1, |json1| =>
httpGetA(url2, |json2| =>
useJsonBodies(json1, json2)
)
)
- Aka "callback hell"
- Lots of techniques to work around it, e.g. promises/futures
- "Oh, promises form a monad!" Not even going there today :)
Asynchronous Haskell version
json1 <- httpGet url1
json2 <- httpGet url2
useJsonBodies json1 json2
- But that looks just like the blocking code! Exactly
- Runtime converts to async system calls
- Runtime schedules threads
- Sleeps when waiting for data
- Wake them up when data is available
- Not only Haskell: Erlang and Go do this too
- Therefore....
Concurrency
- Why wait for
url1
before startingurl2
? - Need to fork threads, write to mutable variables, do some locking
- Or be awesome
(json1, json2) <- concurrently
(httpGet url1)
(httpGet url2)
useJsonBodies json1 json2
- Cheap green thread implementation
- Wonderful `async` library
- Builds on the async I/O system
Canceling
- We only want one of the responses
- Take whichever comes first
promise1 := httpGet(url1)
promise2 := httpGet(url2)
result := newMutex()
promise1.andThen(|json1| =>
result.set(json1)
promise2.cancel())
promise2.andThen(|json2| =>
result.set(json2)
promise1.cancel())
useJsonBody(result.get())
Canceling in Haskell
eitherJson <- race
(httpGet url1)
(httpGet url2)
case eitherJson of
Left json1 -> useJsonBody1 json1
Right json2 -> useJsonBody2 json2
- More than just a well designed API
- Depends on asynchronous exceptions
- Cancel any other running thread
Not just about I/O
- Thread scheduling, sleeping, killing works for CPU bound tasks too!
- Don't need to worry about a heavy computation starving other threads
- No need to offload your heavy tasks to a different microservice, do it all in Haskell
let tenSeconds = 10 * 1000 * 1000
timeout tenSeconds expensiveComputation
Summary: concurrency and async I/O
Advantages
- Cheap threads
- Simple API
- Highly responsive
Disadvantages
- Complicated runtime system
- Need to be aware of async exceptions when writing code
Immutability and purity
- Most languages default to mutable values
- Haskell differs in two ways:
- Immutable by default, explicit kind of mutability
- Mutating is an effect, tracked by the type system
Mutable Haskell
Impossible
let mut total = 0
loop i =
if i > 1000000
then total
else total += i; loop (i + 1)
in loop 1
Real and tedious
total <- newIORef 0
let loop i =
if i > 1000000
then readIORef total
else do
modifyIORef total (+ i)
loop (i + 1)
loop 1
Better Haskell
Recursive and immutable, much better!
let loop i total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0
But why does this matter?
Reasoning about code
Guess the output
// scores.txt
Alice,32
Bob,55
Charlie,22
func main() {
results := readResultsFromFile("results.txt")
printScoreRange(results)
print("First result was by: " + results[0].name)
}
func printScoreRange(results: Vector<TestResult>) {
...
}
Expected output
Lowest: 22
Highest: 55
First result was by: Alice
What's in printScoreRange?
func printScoreRange(results: Vector<TestResult>) {
results.sortBy(|result| => result.score)
print("Lowest: " + results[0].score)
print("Highest: " + results[results.len() - 1].score)
}
Actual output:
Lowest: 22 Highest: 55 First result was by: Charlie
Non-local changes broke our guessed result
Do it in Haskell
main :: IO ()
main = do
results <- readResultsFromFile "results.txt"
printScoreRange results
putStrLn $ "First result was by: " ++ name (head results)
printScoreRange :: [TestResult] -> IO ()
printScoreRange results = do
let results' = sortBy score results
putStrLn $ "Lowest: " ++ show (score (head results'))
putStrLn $ "Highest: " ++ show (score (last results'))
- Impossible for
printScoreRange
to modify results printScoreRange
sorts into a local copy- Have to think about less when writing
main
Data races
main :: IO ()
main = do
results <- readResultsFromFile "results.txt"
concurrently_ printFirstResult printScoreRange
printFirstResult results =
putStrLn $ "First result was by: " ++ name (head results)
printScoreRange results = do
let results' = sortBy score results
putStrLn $ "Lowest: " ++ show (score (head results'))
putStrLn $ "Highest: " ++ show (score (last results'))
- Concurrent data accesses? No problem!
- Concurrent data writes? Impossible!
- We'll come back to mutable, multithreaded data
Mutability when needed
- In place, mutable algorithms can be much faster
- Example: sorting a vector with only pure transformations is slow
- Haskell's answers:
- You can still have mutable data structures if you want them, but they're explicit
- Temporary mutable copy, then freeze it
Freeze!
sortMutable :: MutableVector a -> ST (MutableVector a)
sortMutable = ... -- normal sorting algorithm
sortImmutable :: Vector a -> Vector a
sortImmutable orig = runST $ do
mutable <- newMutableVector (length orig)
copyValues orig mutable
sortMutable mutable
freeze mutable
ST
is for temporary, local mutability- Cannot be affected by the outside world, and cannot affect it
- Keeps functional guarantee: same input ==> same output
Summary: immutability and purity
Advantages
- Easier to reason about code
- Avoid many cases of data races
- Functions are more reliable, returning the same output for the same input
Disadvantages
- Lots of ceremony if you actually want mutation
- Some runtime performance hit for mutable algorithms
Concurrent Mutation
What's wrong with this code?
runServer (|request| => {
from := accounts.lookup(request.from)
to := accounts.lookup(request.to)
accounts.set(request.from, from - request.amt)
accounts.set(request.to, to + request.amt)
})
Looks reasonable, but...
Thread 1: receive request: Alice gives $10
Thread 2: receive request: Alice receives $10
Thread 1: lookup that Alice has $50
Thread 2: lookup that Alice has $50
Thread 1: set Alice's account to $40
Thread 2: set Alice's account to $60
NOTE:
- What if you actually need to mutate values, and from multiple threads?
- Describe slide
- Alice ends up with either $40 or $60 instead of $50
Locking
runServer (|request| => {
accounts.lock(request.from)
accounts.lock(request.to)
// same code as before
accounts.unlock(request.from)
accounts.unlock(request.to)
})
Thread 1: receive request: $50 from Alice to Bob
Thread 2: receive request: $50 from Bob to Alice
Thread 1: lock Alice
Thread 2: lock Bob
Thread 1: try to lock Bob, but can't, so wait
Thread 2: try to lock Alice, but can't, so wait
Deadlock!
NOTE: Typical solution to this is to use locking, but it leads to other problems
Software Transactional Memory
runServer $ \request -> atomically $ do
let fromVar = lookup (from request) accounts
toVar = lookup (to request) accounts
origFrom <- readTVar fromVar
writeTVar fromVar (origFrom - amt request)
origTo <- readTVar toVar
writeTVar toVar (origTo + amt request)
- Looks like it has a race condition
- But STM ensures transactions are atomic
- No explicit locking required
TVar
is an example of explicit mutation- Alternatives:
IORef
,MVar
NOTE:
- There are helper functions to make this shorter
- Want to make a point with the longer code
- STM will automatically retry when needed
The role of purity
STM retries if a transaction isn't atomic. How many Bitcoins will I buy?
atomically $ do
buyBitcoins 3 -- side effects on my bank account
modifyTVar myBitcoinCount (+ 3)
- Trick question! Code doesn't compile
atomically
only allows side effects onTVar
s- Other side effects (like my bank account) are disallowed
- Safe for runtime to retry thanks to purity
NOTE:
buyBitcoins
needs to go to an exchange and spend $100,000- Due to retry, this code could spend $10m
- This is where purity steps in
Summary of STM
Advantages
- Makes concurrent data modification much easier
- Bypass many race conditions and deadlocks
Disadvantages
- Depends on purity to work at all
- Not really a disadvantage, you're already stuck with purity in Haskell
- Not really any other disadvantages, so just use it!
Laziness
A double edged sword
Let's revisit our previous summing example
let loop i total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0
There are two problems with this code:
- There's a major performance bug in it
- It's much more cumbersome than it should be
NOTE: Kind of cheeky to hold off on laziness this long
Space leaks
Consider let foo = 1 + 2
foo
isn't3
, it's an instruction for how to create3
foo
is a thunk until it's evaluated- Storing thunks is more expensive than simple types like
Int
s - Which values are evaluated in our
loop
?
let loop i total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0
NOTE:
- The bane of laziness is space leaks, which you may have hard about. Need to understand how laziness is implemented.
- Explain why
i
is forced andtotal
is not - Builds a tree, lots of CPU and memory pressure
Explicit strictness
Need to tell Haskell compiler to evaluate total
. Bang!
let loop i !total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0
- Needing to do this is a downside of Haskell's laziness
- But do we get any benefits in return?
NOTE:
- Can be explicit about what needs to be evaluated
- This is one approach, there are others
- Just added a
!
Looping (1)
Let's write our loop
in an imperative language:
total := 0
for(i := 1; i <= 1000000; i++) {
total += i
}
Or just the evens
total := 0
for(i := 1; i <= 1000000; i++) {
if (isEven(i)) {
total += i
}
}
Looping (2)
Now add up the values modulus 13 (for some weird reason)
total := 0
for(i := 1; i <= 1000000; i++) {
if (isEven(i)) {
total += i % 13
}
}
- Each modification is fine
- Getting harder to see the forest for the trees
- If our logic was more complicated, code reuse would be an issue
NOTE: Example of more complicated use case, writing a lookahead parser
Some better Haskell
Our original recursive implementation sucked
let loop i !total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0
But this is great
sum [1..1000000]
- Doesn't it allocate 8mb of ints?
- Nope, laziness!
- Just a thunk telling us how to get the rest of the list
Composable Haskell
Just the evens?
sum (filter even [1..1000000])
Modulus 13?
sum (map (`mod` 13) (filter even [1..1000000]))
- Easy and natural to compose functions in a lazy context
- Avoids doing unnecessary work or using too much memory
NOTE:
- Never using more than a few machine words
- Other GHC optimizations avoid allocating any thunks
- Not covering that today
- Mixed bag, but functional+lazy=declarative, performant
Short circuiting for free
In most languages, &&
and ||
short circuit
foo() && bar()
bar
only called iffoo
returnstrue
In Haskell: we get that for free from laziness:
False && _ = False
True && x = x
True || _ = True
False || x = x
See also: and
, or
, all
, any
Other downsides
- Laziness means an exception can be hiding in any thunk
- Aka partial functions:
head []
- Also, some inefficient functions still available,
foldl
vsfoldl'
NOTE:
- Generally partial functions are frowned upon
- But they're still in the language
Summary of laziness
Advantages
- More composable code
- Get efficient results with high level code
- Short-circuiting no longer a special case
Disadvantages
- Need to worry about space leaks
- Exceptions can be hiding in many places
- Bad functions still linger
Side note; other languages
- Laziness is very similar to features in other languages
- Python generators
- Rust iterators
- In Haskell, it's far more prevalent since it affects how all code works
- However, you can get a lot of the benefits of Haskell with these techniques
Other examples
- Too much to talk about in 40 minutes!
- Two other topics I wanted to touch on
- Feel free to ask me about these at breaks
Parser (and other) DSLs
- Operator overloading!
- Abstractions like
Alternative
a natural fitparseXMLElement <|> parseXMLText
.
- Able to reuse huge number of existing library functions,
optional
,many
,foldMap
- General purpose
do
-notation is great
Parser example
data Time = Time Hour Minutes Seconds (Maybe AmPm)
data AmPm = Am | Pm
parseAmPm :: Parser Time
parseAmPm = Time
<$> decimal
<*> (":" *> decimal)
<*> (":" *> decimal)
<*> optional (("AM" $> Am) <|> ("PM" $> Pm))
c/o @queertypes
Advanced techniques
- Free monads
- Monad transformer stacks
- Lens, conduit, pipes, ...
- Lots of ways to do things in Haskell!
- It's a plus and a minus
- Recommendation: choose a useful subset of Haskell and its libraries, and define some best practices
Conclusion
- Haskell combines a lot of uncommon features
- Very few of those features are unique
- Combining those features allows you to write code very differently than in other languages
- If you want readable, robust, easy to maintain code: I think it's a great choice
- Be aware of the sharp edges: they do exist!
Questions?
Thanks everyone!