WIP whirlwind tour
This commit is contained in:
parent
211fbe2333
commit
f20c93bdc6
1 changed files with 471 additions and 0 deletions
471
reveal/whirlwind-tour-core-haskell-libraries.md
Normal file
471
reveal/whirlwind-tour-core-haskell-libraries.md
Normal file
|
@ -0,0 +1,471 @@
|
|||
---
|
||||
title: Whirlwind Tour of Core Haskell Libraries
|
||||
---
|
||||
|
||||
# Whirlwind Tour of Core Haskell Libraries
|
||||
|
||||
* Michael Snoyman
|
||||
* LambdaConf Winter Retreat 2018
|
||||
|
||||
---
|
||||
|
||||
## Today's format
|
||||
|
||||
* Very informal
|
||||
* Interrupt!
|
||||
* Ask questions!
|
||||
* References to lots of external learning material
|
||||
* Can go into depth on any of that if desired
|
||||
* Happy to discuss the topics here at length
|
||||
* Mini hackathon as well? <https://github.com/snoyberg/codename-karka>
|
||||
|
||||
---
|
||||
|
||||
## Haskell's standard library
|
||||
|
||||
* Standard library is `base`
|
||||
* Includes standard prelude, `Prelude`
|
||||
* They both suck :(
|
||||
* Missing lots of functionality
|
||||
* Dangerous functions
|
||||
* Need to call out to other libraries for almost any program
|
||||
|
||||
----
|
||||
|
||||
## Downsides to weak base
|
||||
|
||||
* Which library to use for this functionality?
|
||||
* Dependency fear! I want my package to be lightweight
|
||||
* Mismatches in core types across ecosystem
|
||||
|
||||
----
|
||||
|
||||
## Bonus problem: patterns
|
||||
|
||||
* Other languages have "design patterns"
|
||||
* We don't need that in Haskell because types
|
||||
* Except: how do you handle effects like possible failure, or HTTP
|
||||
calls?
|
||||
* Throw it all in `IO`!
|
||||
* Concrete monad transformers
|
||||
* `mtl`-style typeclasses
|
||||
* Effect libraries...
|
||||
* Weak standard library => non-standard types => many different patterns
|
||||
|
||||
----
|
||||
|
||||
## End result
|
||||
|
||||
* Difficult for people new to the language to get started
|
||||
* Lack of standardization across team makes code bases difficult to
|
||||
maintain
|
||||
* Fear of dependencies ultimately leads to lots of reinvented
|
||||
functionality
|
||||
* Code bloat
|
||||
* More bugs
|
||||
|
||||
----
|
||||
|
||||
## Today's topic
|
||||
|
||||
* Cover a number of recommended libraries
|
||||
* Recommended by whom? Me :)
|
||||
* Discuss some best practices for putting projects together
|
||||
* Describe a new initiative to help bring this all together
|
||||
* Finally: how to help Haskell take over the world!
|
||||
|
||||
---
|
||||
|
||||
## Features to cover
|
||||
|
||||
* Data structures
|
||||
* I/O
|
||||
* Concurrency
|
||||
* Mutable data
|
||||
* Exception handling
|
||||
* External processes
|
||||
|
||||
Doesn't cover all needs, but most real programs will need almost all
|
||||
of these.
|
||||
|
||||
---
|
||||
|
||||
## Data structures
|
||||
|
||||
Three categories
|
||||
|
||||
* Sequential data
|
||||
* Map/Dictionary
|
||||
* Set
|
||||
|
||||
Sequential data the most complicated, let's knock out the other two
|
||||
|
||||
---
|
||||
|
||||
## Maps
|
||||
|
||||
* Three core datatypes
|
||||
* `data Map key value`
|
||||
* `data IntMap value`
|
||||
* `data HashMap key value`
|
||||
* `IntMap` is a specialized, optimized `Map Int`
|
||||
* `Map` is a binary tree, `HashMap` is (surprised) hash map
|
||||
* `Map` requires `Ord` on keys, `HashMap` requires `Hashable` and `Eq`
|
||||
* Generally: `HashMap` performs better
|
||||
|
||||
----
|
||||
|
||||
## Strict or lazy values
|
||||
|
||||
* Maps are always strict in their keys
|
||||
* Forcing a `Map` requires forcing all of its keys
|
||||
* You _can_ be lazy in the values if you want...
|
||||
* Usually: don't do that, use `Data.Map.Strict` et al
|
||||
|
||||
----
|
||||
|
||||
## Mutability
|
||||
|
||||
* Unlike other languages, `Map`s are immutable
|
||||
* Less used hashtables library provides in place mutation
|
||||
* Immutable is nice: don't worry about data races
|
||||
* Stick it inside a `TVar`, `IORef`, etc
|
||||
* Downside: performance is not as good
|
||||
|
||||
----
|
||||
|
||||
## Map API Overview
|
||||
|
||||
```haskell
|
||||
import qualified Data.Map.Strict as Map
|
||||
import qualified Data.IntMap.Strict as IntMap
|
||||
import qualified Data.HashMap.Strict as HashMap
|
||||
|
||||
singleton :: k -> v -> Map k v
|
||||
fromList :: [(k, v)] -> Map k v
|
||||
toList :: Map k v -> [(k, v)]
|
||||
lookup :: k -> Map k v -> Maybe v
|
||||
insert :: k -> v -> Map k v -> Map k v
|
||||
insertWith :: (v -> v -> v) -> k -> v -> Map k v -> Map k v
|
||||
union :: Map k v -> Map k v -> Map k v
|
||||
unionWith :: (v -> v -> v) -> Map k v -> Map k v -> Map k v
|
||||
```
|
||||
|
||||
What about duplicates?
|
||||
|
||||
----
|
||||
|
||||
## Sets
|
||||
|
||||
* Just like `Map`s, but no values (or `()` is the value...)
|
||||
* No strict vs lazy difference... no values!
|
||||
* No worry about duplicate keys... no values!
|
||||
|
||||
----
|
||||
|
||||
## Set API Overview
|
||||
|
||||
```haskell
|
||||
import qualified Data.Set as Set
|
||||
import qualified Data.IntSet as IntSet
|
||||
import qualified Data.HashSet as HashSet
|
||||
|
||||
singleton :: k -> Set k
|
||||
fromList :: [k] -> Set k
|
||||
toList :: Set k -> [k]
|
||||
member :: k -> Set k -> Bool
|
||||
insert :: k -> Set k -> Set k
|
||||
union :: Set k -> Set k -> Set k
|
||||
```
|
||||
|
||||
----
|
||||
|
||||
## Calculating frequency
|
||||
|
||||
```haskell
|
||||
import qualified Data.ByteString.Lazy as BL
|
||||
import qualified Data.Map.Strict as Map
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
lbs <- BL.getContents
|
||||
let add m w = Map.insertWith (+) w 1 m
|
||||
mapM_ print $ Map.toList $ BL.foldl' add Map.empty lbs
|
||||
```
|
||||
|
||||
----
|
||||
|
||||
## More information
|
||||
|
||||
* https://haskell-lang.org/library/containers
|
||||
|
||||
---
|
||||
|
||||
## Sequential data
|
||||
|
||||
Everyone knows lists, right?
|
||||
|
||||
```haskell
|
||||
(++) :: [a] -> [a] -> [a]
|
||||
concat :: [[a]] -> [a]
|
||||
map :: (a -> b) -> [a] -> [b]
|
||||
break :: (a -> Bool) -> [a] -> ([a], [a])
|
||||
splitAt :: Int -> [a] -> ([a], [a])
|
||||
null :: [a] -> Bool
|
||||
length :: [a] -> Int
|
||||
reverse :: [a] -> [a]
|
||||
intercalate :: [a] -> [[a]] -> [a]
|
||||
foldl' :: (b -> a -> b) -> b -> [a] -> b
|
||||
and :: [Bool] -> Bool
|
||||
sum :: Num a => [a] -> a
|
||||
replicate :: Int -> a -> [a]
|
||||
```
|
||||
|
||||
----
|
||||
|
||||
## Lists: the good
|
||||
|
||||
* Polymorphic on any contained value
|
||||
* Lazy/infinite
|
||||
* Cheap prepend (singly linked list)
|
||||
* Pure data structure
|
||||
* Built in syntactic sugar
|
||||
* Easy to pattern match
|
||||
|
||||
----
|
||||
|
||||
## Lists: the bad
|
||||
|
||||
* Lots of memory overhead
|
||||
* Data constructor per cons
|
||||
* Pointer to value (1 word)
|
||||
* Pointer to rest of list (1 word)
|
||||
* O(n) indexing
|
||||
* Can hide bottom values (`1:2:undefined`)
|
||||
* Consider overhead of `[Word8]` and `[Char]`
|
||||
|
||||
What do?
|
||||
|
||||
----
|
||||
|
||||
## Other languages
|
||||
|
||||
Most languages have a few sequential data types
|
||||
|
||||
* Linked list/doubly linked lists
|
||||
* Queue/double-ended queue
|
||||
* Array/vector
|
||||
|
||||
----
|
||||
|
||||
## Haskell's plethora
|
||||
|
||||
* Lists/difference lists
|
||||
* Seq
|
||||
* Arrays (don't bother, use vector)
|
||||
* Vector: boxed, storable, unboxed
|
||||
* ByteString: strict, lazy
|
||||
* Text: strict, lazy
|
||||
* ShortByteString (seriously?)
|
||||
|
||||
Why so many?
|
||||
|
||||
----
|
||||
|
||||
## Haskell's memory model
|
||||
|
||||
We have four ways of storing sequential data
|
||||
|
||||
* Entirely as normal heap objects (list, Seq, diff lists)
|
||||
* Primitive boxed arrays (boxed vector)
|
||||
* Unpinned memory (Text, ShortByteString, unboxed vector)
|
||||
* Pinned memory (ByteString, storable vector)
|
||||
|
||||
----
|
||||
|
||||
## Heap objects
|
||||
|
||||
* Pointers, pointers everywhere
|
||||
* Memory overhead for the allocations/GC
|
||||
* CPU overhead for following pointers
|
||||
|
||||
----
|
||||
|
||||
## Primitive boxed arrays
|
||||
|
||||
* Packed representation of pointers
|
||||
* Still follow pointers to the values
|
||||
* Pointers can point to thunks, which is why they're value lazy
|
||||
* Less pointer overhead, but still some
|
||||
* Allows _any heap object_ to be stored
|
||||
|
||||
----
|
||||
|
||||
## Unpinned memory
|
||||
|
||||
* Byte array managed by garbage collector
|
||||
* GC can move it around
|
||||
* Reducing fragmentation
|
||||
* Can't pass it over FFI
|
||||
* Values stored as bytes
|
||||
* Must be representable as bytes
|
||||
* Must represent in fixed size
|
||||
* Cannot be lazy
|
||||
|
||||
----
|
||||
|
||||
## Pinned memory
|
||||
|
||||
* Standard `malloc`-style buffers
|
||||
* GC __can't__ move it around
|
||||
* Can fragment memory (don't hold for too long)
|
||||
* Can pass it over FFI
|
||||
* Values stored as bytes, same as unpinned
|
||||
|
||||
----
|
||||
|
||||
## Haskell's laziness
|
||||
|
||||
Three levels of laziness in these data structures
|
||||
|
||||
* Fully lazy, both the values and the structure itself are lazy (e.g.,
|
||||
list `oo:bar:undefined`)
|
||||
* Spine strict: values can be lazy, but not the structure (e.g., boxed
|
||||
vector, `fromList [undefined]`)
|
||||
* Fully strict: nothing lazy (e.g., `ByteString`, this fails `fromList
|
||||
[undefined]`)
|
||||
* Semi-strict: lazy list of strict chunks (lazy `ByteString` and `Text`)
|
||||
|
||||
----
|
||||
|
||||
## Overlaps
|
||||
|
||||
That still leaves us with some overlaps
|
||||
|
||||
* List vs diff list vs `Seq`: different time complexity for some
|
||||
operations
|
||||
* `ShortByteString` vs unboxed `Vector Word8`: same thing
|
||||
* `ByteString` vs storable `Vector Word8`: also same thing
|
||||
* `Text` does not overlap: it's a `ShortByteString` containing UTF-16
|
||||
codepoints with a `Char`-based API
|
||||
|
||||
----
|
||||
|
||||
## What to use?
|
||||
|
||||
* `ShortByteString`: smaller and long lived
|
||||
* `ByteString` interacting with FFI (I/O)
|
||||
* `Text` for storing textual values
|
||||
* If it works and not FFI: unpinned vector
|
||||
* If it works _and_ you need FFI: storable vector
|
||||
* Need spine laziness (e.g., infinite), use lists
|
||||
* Unusual optimizations
|
||||
* Cheap append _and_ inspection: `Seq`
|
||||
* Cheap append: difference lists
|
||||
* Otherwise: boxed vector
|
||||
|
||||
----
|
||||
|
||||
## The string problem
|
||||
|
||||
* Lots of theoretical ways to represent string-like stuff
|
||||
* `[Char]`, strict/lazy `ByteString`/`Text`, `ShortByteString`,
|
||||
`Vector` of `Word8` or `Char`...
|
||||
* `Vector` of `Char` is always a bad idea: too much memory
|
||||
* Good that we have bytes vs text difference
|
||||
* Need to use `Text` instead of `String` everywhere... not there yet
|
||||
* Conclusion: use strict `ByteString` or `Text` almost everywhere,
|
||||
convert when necessary
|
||||
|
||||
----
|
||||
|
||||
## Why not use...
|
||||
|
||||
* Lazy `ByteString` or `Text`
|
||||
* Useful sometimes, like lazily generating large data
|
||||
* Mostly used for lazy I/O, which I advise against (use conduit
|
||||
instead)
|
||||
* Unboxed/storable vectors instead of `ByteString`/`ShortByteString`
|
||||
* Hysterical raisins, probably the right thing
|
||||
* Lots of Rust envy here :(
|
||||
* Lists everywhere
|
||||
* Performance
|
||||
|
||||
----
|
||||
|
||||
## The good news
|
||||
|
||||
* You know lists, right?
|
||||
* You basically know all of these data structures
|
||||
* Don't get overwhelmed with the choices, just follow the advice above
|
||||
|
||||
----
|
||||
|
||||
## Qualified imports (the bad news?)
|
||||
|
||||
* Since these all have similar APIs, names conflict
|
||||
* Use qualified imports
|
||||
* Recommended naming
|
||||
|
||||
```haskell
|
||||
import qualified Data.ByteString as B
|
||||
import qualified Data.Text as T
|
||||
import qualified Data.ByteString.Lazy as BL
|
||||
import qualified Data.Text.Lazy as TL
|
||||
import qualified Data.Vector as V -- boxed
|
||||
import qualified Data.Vector.Unboxed as VU
|
||||
import qualified Data.Vector.Storable as VS
|
||||
import Data.ByteString.Short
|
||||
(ShortByteString, toShort, fromShort)
|
||||
```
|
||||
|
||||
----
|
||||
|
||||
## Further reading
|
||||
|
||||
* https://haskell-lang.org/library/vector
|
||||
* https://haskell-lang.org/tutorial/string-types
|
||||
|
||||
----
|
||||
|
||||
## I/O
|
||||
|
||||
* Not the monad, actual input and output
|
||||
* Console
|
||||
* Files
|
||||
* Network
|
||||
* Streaming and in memory
|
||||
|
||||
----
|
||||
|
||||
## The bad: character I/O
|
||||
|
||||
* Implicit character decoding
|
||||
* Newline handling
|
||||
* Environment variables affect things
|
||||
* For console: probably right
|
||||
* File I/O: probably wrong
|
||||
* Network I/O: lol nope
|
||||
|
||||
https://www.snoyman.com/blog/2016/12/beware-of-readfile
|
||||
|
||||
----
|
||||
|
||||
## The bad: lazy I/O
|
||||
|
||||
* Lazy `readFile` (et al)
|
||||
* Hides exceptions till later
|
||||
* Keeps file descriptors open longer than expected
|
||||
* Answer: use strict I/O operations
|
||||
* Dealing with large data? Use conduit
|
||||
* Exception to the rule: lazy `writeFile` is fine (not actually lazy
|
||||
I/O)
|
||||
|
||||
----
|
||||
|
||||
## Reading a file
|
||||
|
||||
* Wants bytes? `Data.ByteString.readFile`
|
||||
* Want text? Choose an encoding!
|
||||
* `decodeUtf8With lenientDecode <$> Data.ByteString.readFile`
|
||||
*
|
Loading…
Reference in a new issue