Vector package
This commit is contained in:
parent
c0c50e5068
commit
50b69ec8e5
2 changed files with 616 additions and 1 deletions
615
content/vector.md
Normal file
615
content/vector.md
Normal file
|
@ -0,0 +1,615 @@
|
|||
---
|
||||
title: The vector package
|
||||
author: Michael Snoyman <michael@snoyman.com>
|
||||
description: Overview and typical usage of the vector package
|
||||
first-written: 2015-10-25
|
||||
last-updated: 2015-10-25
|
||||
last-reviewed: 2015-10-25
|
||||
---
|
||||
|
||||
The de facto standard package in the Haskell ecosystem for integer-indexed
|
||||
array data is the [vector package](http://www.stackage.org/package/vector).
|
||||
This corresponds at a high level to arrays in C, or the vector class in C++'s
|
||||
STL. However, the vector package offers quite a bit of functionality not
|
||||
familiar to those used to options in imperative and mutable languages.
|
||||
|
||||
While the interface for vector is relatively straightforward, the abundance of
|
||||
different modules can be daunting. This article will start off with an overview
|
||||
of terminology to guide you, and then step through a number of concrete
|
||||
examples of using the package.
|
||||
|
||||
## Example
|
||||
|
||||
Since we're about to jump into a few section of descriptive text, let's kick
|
||||
this off with a concrete example of whet your appetite. We're going to count
|
||||
the frequency of different bytes that appear on standard output, and then
|
||||
display this content.
|
||||
|
||||
Note that this example is purposely written in a very generic form. We'll build
|
||||
up to handling this form throughout this article.
|
||||
|
||||
```haskell
|
||||
{-# LANGUAGE FlexibleContexts #-}
|
||||
import Control.Monad.Primitive (PrimMonad, PrimState)
|
||||
import qualified Data.ByteString.Lazy as L
|
||||
import qualified Data.Vector.Generic.Mutable as M
|
||||
import qualified Data.Vector.Unboxed as U
|
||||
import Data.Word (Word8)
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
-- Get all of the contents from stdin
|
||||
lbs <- L.getContents
|
||||
|
||||
-- Create a new 256-size mutable vector
|
||||
-- Fill the vector with zeros
|
||||
mutable <- M.replicate 256 0
|
||||
|
||||
-- Add all of the bytes from stdin
|
||||
addBytes mutable lbs
|
||||
|
||||
-- Freeze to get an immutable version
|
||||
vector <- U.unsafeFreeze mutable
|
||||
|
||||
-- Print the frequency of each byte
|
||||
-- In newer vectors: we can use imapM_
|
||||
U.zipWithM_ printFreq (U.enumFromTo 0 255) vector
|
||||
|
||||
addBytes :: (PrimMonad m, M.MVector v Int)
|
||||
=> v (PrimState m) Int
|
||||
-> L.ByteString
|
||||
-> m ()
|
||||
addBytes v lbs = mapM_ (addByte v) (L.unpack lbs)
|
||||
|
||||
addByte :: (PrimMonad m, M.MVector v Int)
|
||||
=> v (PrimState m) Int
|
||||
-> Word8
|
||||
-> m ()
|
||||
addByte v w = do
|
||||
-- Read out the old count value
|
||||
oldCount <- M.read v index
|
||||
-- Write back the updated count value
|
||||
M.write v index (oldCount + 1)
|
||||
where
|
||||
-- Indices in vectors are always Ints. Our bytes come in as Word8, so we
|
||||
-- need to convert them.
|
||||
index :: Int
|
||||
index = fromIntegral w
|
||||
|
||||
printFreq :: Int -> Int -> IO ()
|
||||
printFreq index count = putStrLn $ concat
|
||||
[ "Frequency of byte "
|
||||
, show index
|
||||
, ": "
|
||||
, show count
|
||||
]
|
||||
```
|
||||
|
||||
## Terminology
|
||||
|
||||
There are two different varieties of vectors: immutable and mutable. Immutable
|
||||
vectors (such as provided by the `Data.Vector` module) are essentially
|
||||
swappable with normal lists in Haskell, though with drastically different
|
||||
performance characteristics (discussed below). The high-level API is similar to
|
||||
lists, it implements common typeclasses like `Functor` and `Foldable`, and
|
||||
plays quite nicely with parallel code.
|
||||
|
||||
By contrast, mutable vectors are much closer to C-style arrays. Operations
|
||||
working on these values must live in the `IO` or `ST` monads (see `PrimMonad`
|
||||
below for more details). Concurrent access from multiple threads has all of the
|
||||
normal concerns of shared mutable state. And perhaps most importantly for
|
||||
usage: mutable vectors can be *much* more efficient for certain use cases.
|
||||
|
||||
However, that's not the only dimension of choice you get in the vector package.
|
||||
vector itself defines three flavors: unboxed
|
||||
(`Data.Vector`/`Data.Vector.Mutable`), storable (`Data.Vector.Storable` and
|
||||
`Data.Vector.Storable.Mutable`), and unboxed (`Data.Vector.Unboxed` and
|
||||
`Data.Vector.Unboxed.Mutable`). (There's also technically primitive vectors,
|
||||
but in practice you should always prefer unboxed vectors; see the module
|
||||
documentation for more information on the distinction here.)
|
||||
|
||||
And our final point: in addition to having these three flavors, the vector
|
||||
package provides a typeclass-based interface which allows you to write code
|
||||
that works in any of these three (plus other vector types that may be defined
|
||||
in other packages, like
|
||||
[hybrid-vectors](http://www.stackage.org/package/hybrid-vectors)). These
|
||||
interfaces are in `Data.Vector.Generic` and `Data.Vector.Generic.Mutable`. When
|
||||
using these interfaces, you must still eventually choose a concrete
|
||||
representation, but your helper code can be agnostic to what it is.
|
||||
|
||||
What's nice is that - with small differences - all four mutable modules have
|
||||
the same interface, and all four immutable modules have the same interface.
|
||||
This means you can focus on learning one type of vector, and almost for free
|
||||
have that knowledge apply to other types as well. It then just becomes a
|
||||
question of choosing the representation that best fits your use case, which
|
||||
we'll get to shortly.
|
||||
|
||||
## Efficiency
|
||||
|
||||
Standard lists in Haskell are immutable, singly-linked lists. Every time you
|
||||
add another value to the front of the list, it has to allocate another heap
|
||||
object for that cell, create a pointer to the head of the original list, and
|
||||
create a pointer to the value in the current cell. This takes up a lot of
|
||||
memory for holding pointers, and makes it inefficient to index or traverse the
|
||||
list (indexing to position N requires N pointer dereferences).
|
||||
|
||||
By contract, vectors are stored in a packed format in memory, meaning indexing
|
||||
is an O(1) operation, and the memory overhead per additional item in the vector
|
||||
is much smaller (depending on the type of vector, which we'll cover in a
|
||||
moment). However, compared to lists, appending an item to a vector is
|
||||
relatively expensive: it requires creating a new buffer in memory, copying the
|
||||
old values, and then adding the new value.
|
||||
|
||||
There are other data structures that can be considered for list-like data, such
|
||||
as `Seq` from containers, or in some cases a `Set`, `IntMap`, or `Map`.
|
||||
Figuring out the best choice for each use case can only be reliably determined
|
||||
via profiling. But as a general rule: densely populated lists with integral
|
||||
access to the values will be best served by vector.
|
||||
|
||||
Now let's talk about some of the other things that make vector so efficient.
|
||||
|
||||
### Boxed, storable and unboxed
|
||||
|
||||
Boxed vectors hold normal Haskell values. These can be _any_ values at all, and
|
||||
are stored on the heap with pointers kept in the vector. The advantage is that
|
||||
this works for all datatypes, but the extra memory overhead for the pointers
|
||||
and the indirection of needing to dereference those pointers makes them
|
||||
(relative to the next two types) inefficient.
|
||||
|
||||
Storable and unboxed vectors both store their data in a byte array, avoiding
|
||||
pointer indirection. This is more memory efficient and allows better usage of
|
||||
caches. The distinction between storable and unboxed vectors is subtle:
|
||||
|
||||
* Storable vectors require data which is an instance of the [`Storable` type
|
||||
class](http://haddock.stackage.org/lts-3.11/base-4.8.1.0/Foreign-Storable.html#t:Storable).
|
||||
This data is stored in `malloc`ed memory, which is *pinned* (the garbage
|
||||
collector can't move it around). This can lead to memory fragmentation, but
|
||||
allows the data to be shared over the C FFI.
|
||||
* Unboxed vectors require data which is an instance of the [`Prim` type
|
||||
class](http://haddock.stackage.org/lts-3.11/primitive-0.6.1.0/Data-Primitive-Types.html).
|
||||
This data is stored in GC-managed *unpinned* memory, which helps avoid memory
|
||||
fragmentation. However, this data cannot be shared over the C FFI.
|
||||
|
||||
Both the `Storable` and `Prim` typeclasses provide a way to store a value as
|
||||
bytes, and to load bytes into a value. The distinction is what type of
|
||||
bytearray is used.
|
||||
|
||||
As usual, the only true measure of performance will be benchmarking. However,
|
||||
as a general guideline:
|
||||
|
||||
* If you don't need to pass values to a C FFI, and you have a `Prim` instance,
|
||||
use unboxed vectors.
|
||||
* If you have a `Storable` instance, use a storable vector.
|
||||
* Otherwise, use a boxed vector.
|
||||
|
||||
There are also other issues to consider, such as the fact that boxed vectors
|
||||
are instances of `Functor` while storable and unboxed vectors are not.
|
||||
|
||||
### Stream fusion
|
||||
|
||||
Take a guess how much memory the following program will take to run:
|
||||
|
||||
```haskell
|
||||
import qualified Data.Vector.Unboxed as V
|
||||
|
||||
main :: IO ()
|
||||
main = print $ V.sum $ V.enumFromTo 1 (10^9 :: Int)
|
||||
```
|
||||
|
||||
A valid guess may be `10^9 * sizeof int` bytes. However, when compiled with
|
||||
optimizations (`-O2`) on my system, it allocates a total of only 52kb! How it
|
||||
is possible to create a one billion integer array without using up 4-8GB of
|
||||
memory?
|
||||
|
||||
The vector package has a powerful technique: stream fusion. Using GHC rewrite
|
||||
rules, it's able to find many cases where creating a vector is unnecessary, and
|
||||
instead create a tight inner loop. In our case, GHC will end up generating code
|
||||
that can avoid touching system memory, and instead work on just the registers,
|
||||
yielding not only a tiny memory footprint, but performance close to a for-loop
|
||||
in C. This is one of the beauties of this library: you get to write high-level
|
||||
code, and optimizations can churn out something much more CPU-friendly.
|
||||
|
||||
### Slicing
|
||||
|
||||
Above we discussed the problem of appending values to the front of a vector.
|
||||
However, one place where vector shines is with *slicing*, or taking a subset of
|
||||
the vector. When dealing with immutable vectors, slicing is a safe operation,
|
||||
with slices being sharable with multiple threads. Slicing also works with
|
||||
mutable vectors, but as usual you need to be a bit more careful.
|
||||
|
||||
## Replacing lists
|
||||
|
||||
Enough talk! Let's start using vector. Assuming you're familiar with the list
|
||||
API, this should looke rather boring.
|
||||
|
||||
```haskell
|
||||
import qualified Data.Vector as V
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
let list = [1..10] :: [Int]
|
||||
vector = V.fromList list :: V.Vector Int
|
||||
vector2 = V.enumFromTo 1 10 :: V.Vector Int
|
||||
print $ vector == vector2 -- True
|
||||
print $ list == V.toList vector -- also True
|
||||
print $ V.filter odd vector -- 1,3,5,7,9
|
||||
print $ V.map (* 2) vector -- 2,4,6,...,20
|
||||
print $ V.zip vector vector -- (1,1),(2,2),...(10,10)
|
||||
print $ V.zipWith (*) vector vector -- (1,4,9,16,...,100)
|
||||
print $ V.reverse vector -- 10,9,...,1
|
||||
print $ V.takeWhile (< 6) vector -- 1,2,3,4,5
|
||||
print $ V.takeWhile odd vector -- 1
|
||||
print $ V.takeWhile even vector -- []
|
||||
print $ V.dropWhile (< 6) vector -- 6,7,8,9,10
|
||||
print $ V.head vector -- 1
|
||||
print $ V.tail vector -- 2,3,4,...,10
|
||||
print $ V.head $ V.takeWhile even vector -- exception!
|
||||
```
|
||||
|
||||
Hopefully there's nothing too surprising about this. Most `Prelude` functions
|
||||
that apply to lists have a corresponding vector function. If you know what a
|
||||
function does in `Prelude`, you probably know what it does in `Data.Vector`.
|
||||
This is the simplest usage of the vector package: import `Data.Vector`
|
||||
qualified, convert to/from lists with `V.fromList` and `V.toList`, and then
|
||||
prefix your function calls with `V.`.
|
||||
|
||||
* Exercise 1: Try out some other functions available in the [`Data.Vector`
|
||||
module](http://haddock.stackage.org/lts-3.11/vector-0.10.12.3/Data-Vector.html).
|
||||
In particular, try some of the fold functions, which we haven't covered here.
|
||||
|
||||
* Exercise 2: Try using the `Functor`, `Foldable`, and `Traversable` versions of
|
||||
functions with a vector
|
||||
|
||||
* Exercise 3: Use an unboxed (or storable) vector instead of the boxed vectors
|
||||
we were using above. What code did you have to change from the original
|
||||
example? Do your examples from exercise 2 all work still?
|
||||
|
||||
There are also a number of functions in the `Data.Vector` module with no
|
||||
corresponding function in `Prelude`. Many of these are related to mutable
|
||||
vectors (which we'll cover shortly). Others are present to provide more
|
||||
efficient means of manipulating a vector, based on their special in-memory
|
||||
representation.
|
||||
|
||||
## Mutable vectors
|
||||
|
||||
I want to test how fair the `System.Random` number generator is at generating
|
||||
numbers between 0 and 9, inclusive. I want to generate 1,000,000 random values,
|
||||
count the frequency of each result, and then print how often each value
|
||||
appeared. Let's first implement this using immutable vectors:
|
||||
|
||||
```haskell
|
||||
import Data.Vector.Unboxed ((!), (//))
|
||||
import qualified Data.Vector.Unboxed as V
|
||||
import System.Random (randomRIO)
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
let v0 = V.replicate 10 (0 :: Int)
|
||||
|
||||
loop v 0 = return v
|
||||
loop v rest = do
|
||||
i <- randomRIO (0, 9)
|
||||
let oldCount = v ! i
|
||||
v' = v // [(i, oldCount + 1)]
|
||||
loop v' (rest - 1)
|
||||
|
||||
vector <- loop v0 (10^6)
|
||||
print vector
|
||||
```
|
||||
|
||||
We've introduced the `!` operator for indexing, and the `//` operator for
|
||||
updating. Other than that, this is fairly straightforward code. When I ran this
|
||||
on my system, it had 48MB maximum memory residency, and took 1.968s to
|
||||
complete. Surely we can do better.
|
||||
|
||||
This problem is inherently better as a mutable state one: instead of generating
|
||||
a new immutable `Vector` for each random number generated, we'd like to simply
|
||||
increment a piece of memory. Let's rewrite this to use a mutable, unboxed
|
||||
vector:
|
||||
|
||||
```haskell
|
||||
import Control.Monad (replicateM_)
|
||||
import Data.Vector.Unboxed (freeze)
|
||||
import qualified Data.Vector.Unboxed.Mutable as V
|
||||
import System.Random (randomRIO)
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
vector <- V.replicate 10 (0 :: Int)
|
||||
|
||||
replicateM_ (10^6) $ do
|
||||
i <- randomRIO (0, 9)
|
||||
oldCount <- V.read vector i
|
||||
V.write vector i (oldCount + 1)
|
||||
|
||||
ivector <- freeze vector
|
||||
print ivector
|
||||
```
|
||||
|
||||
Once again, we use `replicate` to create a size-10 vector filled with 0. But
|
||||
now we've created a mutable vector (note the change in import). We then use
|
||||
`replicateM_` to perform the inner action 1,000,000 times, namely: generate a
|
||||
random index, read the old value at that index, increment it, and write it
|
||||
back.
|
||||
|
||||
After we're finished, we _freeze_ the vector (more on that in the next section)
|
||||
and print it. The results are the same (or close - we are dealing with random
|
||||
numbers here) to the previous immutable one. But instead of 48MB and 1.968s,
|
||||
this program has a maximum residency of 44KB and runs in 0.247s! That's a
|
||||
significant improvement!
|
||||
|
||||
If we feel like being even more adventurous, we can replace our `read` and
|
||||
`write` calls with `unsafeRead` and `unsafeWrite`. That will disable some
|
||||
bounds checks before reading and writing. This can be a nice performance boost
|
||||
in very tight loops, but has the potential to segfault your program, so caveat
|
||||
emptor! For example, try replacing `replicate 10` with `replicate 9`, change
|
||||
the `read` for an `unsafeRead`, and run your program. You'll see something
|
||||
like:
|
||||
|
||||
```
|
||||
internal error: evacuate: strange closure type -1944718914
|
||||
(GHC version 7.10.2 for x86_64_unknown_linux)
|
||||
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
|
||||
Aborted (core dumped)
|
||||
```
|
||||
|
||||
The same logic applies to the other `unsafe` functions in vector. The
|
||||
nomenclature means: `unsafe` may segfault your whole process, while
|
||||
not-marked-unsafe may just throw an impure exception (also not great, but
|
||||
certainly better than a segfault).
|
||||
|
||||
And if you were curious: on my system using `unsafeRead` and `unsafeWrite`
|
||||
speeds the program up marginally, from 0.247s to 0.233s. In our example, most
|
||||
of our time is spent on generating the random numbers, so taking off the safety
|
||||
checks does not have a significant impact.
|
||||
|
||||
## Freezing and thawing
|
||||
|
||||
We used the `freeze` function above. The behavior of this may not be
|
||||
immediately obvious. When you freeze a mutable vector, what happens is:
|
||||
|
||||
1. A new mutable vector of the same size is created
|
||||
2. Each value in the original mutable vector is copied to the new mutable vector
|
||||
3. A new immutable vector is created out of the memory space used by the new mutable vector
|
||||
|
||||
Why not just freeze it in place? Two reasons, actually:
|
||||
|
||||
1. It has the potential to break referential transparency. Consider this code:
|
||||
|
||||
```haskell
|
||||
import Data.Vector.Unboxed (freeze)
|
||||
import qualified Data.Vector.Unboxed.Mutable as V
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
vector <- V.replicate 1 (0 :: Int)
|
||||
V.write vector 0 1
|
||||
ivector <- freeze vector
|
||||
print ivector
|
||||
V.write vector 0 2
|
||||
print ivector
|
||||
```
|
||||
|
||||
If we froze the vector in-place in the call to `freeze`, then the second
|
||||
`write` call would modify our `ivector` value, meaning that the first and
|
||||
second call to `print ivector` would have different results!
|
||||
|
||||
2. When you freeze a mutable vector, its memory is marked different for
|
||||
garbage collection purposes. Later trying to write to that same memory can
|
||||
lead to a segfault.
|
||||
|
||||
However, if you really want to avoid that extra buffer copy, and are certain
|
||||
it's safe, you can use `unsafeFreeze`. And in fact, our random number example
|
||||
above is a case where `freeze` can be safely replaced by `unsafeFreeze`, since
|
||||
after the freeze, the original mutable vector is never used again.
|
||||
|
||||
* Exercise 1: Go ahead and make that swap and confirm that your program works
|
||||
as expected.
|
||||
* Exercise 2: In the program just above (with `V.replicate 1 (0 :: Int)`),
|
||||
replace `freeze` with `unsafeFreeze`. What result do you see?
|
||||
|
||||
The opposite of `freeze` is `thaw`. Similar to `freeze`, `thaw` will copy to a
|
||||
new mutable vector instead of exposing the current memory buffer. And also,
|
||||
like `freeze`, there's an `unsafeThaw` that turns off the safety measures. Like
|
||||
everything `unsafe`: caveat emptor!
|
||||
|
||||
(We'll cover some functions like `create` that provide safe wrappers around
|
||||
`unsafeFreeze` and `unsafeThaw` later.)
|
||||
|
||||
## PrimMonad
|
||||
|
||||
If you look at the mutation functions we used above like `read` and `write`,
|
||||
you can tell that they were looking in the `IO` monad. However, vector is more
|
||||
generic than that, and will allow your mutations to live in any *primitive
|
||||
monad*, meaning: `IO`, strict `ST s`, and transformers sitting on top of those
|
||||
two. The type class controlling this is `PrimMonad`.
|
||||
|
||||
You can get more information on `PrimMonad` in the [Primitive
|
||||
Haskell](primitive-haskell.md) article. Without diving into details: every
|
||||
primitive monad also has an associated primitive state token type, which is
|
||||
captured with `PrimState`. As a result, the type signatures for `read` and
|
||||
`write` (for boxed vectors) look like:
|
||||
|
||||
```haskell
|
||||
read :: PrimMonad m => MVector (PrimState m) a -> Int -> m a
|
||||
write :: PrimMonad m => MVector (PrimState m) a -> Int -> a -> m ()
|
||||
```
|
||||
|
||||
Every mutable vector takes two type parameters: the state token of the monad it
|
||||
lives in, and the type of value it holds. These gymnastics may seem overkill
|
||||
now, but are necessary for making mutable vectors both versatile in multiple
|
||||
monads, and type safe.
|
||||
|
||||
## modify and the ST monad
|
||||
|
||||
Let's check out a particularly complicated type signature (for unboxed vectors):
|
||||
|
||||
```haskell
|
||||
modify :: Unbox a => (forall s. MVector s a -> ST s ()) -> Vector a -> Vector a
|
||||
```
|
||||
|
||||
What this function does is:
|
||||
|
||||
1. Creates a new mutable buffer the same length as the original vector
|
||||
2. Copies the values from the original vector into the new mutable vector
|
||||
3. Runs the provided `ST` action on the provided mutable vector
|
||||
4. Unsafely freezes the mutable vector and returns it.
|
||||
|
||||
What's great about this function is that it does the minimal amount of buffer
|
||||
copying to be safe, and that it can be used from pure code (since all
|
||||
side-effects are captured inside the `ST` action you provide).
|
||||
|
||||
* Exercise 1: Steps 1 and 2 should look pretty similar to a function we
|
||||
discussed above. Can you figure out which one it is?
|
||||
* Exercise 2: Implement `modify` yourself using functions we've discussed and
|
||||
`runST` from `Control.Monad.ST`.
|
||||
|
||||
Let's use our new function to implement a Fisher-Yates shuffle. If we start
|
||||
with a vector of size 20, we'll generate a random number between 0 and 19. Then
|
||||
we'll swap position 19 with that generated random number. Then we'll loop, but
|
||||
this time with a random number between 0 and 18 and swapping with position 18.
|
||||
We continue until we get down to 0.
|
||||
|
||||
```haskell
|
||||
import Control.Monad.Primitive (PrimMonad, PrimState)
|
||||
import qualified Data.Vector.Unboxed as V
|
||||
import qualified Data.Vector.Unboxed.Mutable as M
|
||||
import System.Random (StdGen, getStdGen, randomR)
|
||||
|
||||
shuffleM :: (PrimMonad m, V.Unbox a)
|
||||
=> StdGen
|
||||
-> Int -- ^ count to shuffle
|
||||
-> M.MVector (PrimState m) a
|
||||
-> m ()
|
||||
shuffleM _ i _ | i <= 1 = return ()
|
||||
shuffleM gen i v = do
|
||||
M.swap v i' index
|
||||
shuffleM gen' i' v
|
||||
where
|
||||
(index, gen') = randomR (0, i') gen
|
||||
i' = i - 1
|
||||
|
||||
shuffle :: V.Unbox a
|
||||
=> StdGen
|
||||
-> V.Vector a
|
||||
-> V.Vector a
|
||||
shuffle gen vector = V.modify (shuffleM gen (V.length vector)) vector
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
gen <- getStdGen
|
||||
print $ shuffle gen $ V.enumFromTo 1 (20 :: Int)
|
||||
```
|
||||
|
||||
Notice how `shuffleM` is a mutable, side-effecting function. However, `shuffle`
|
||||
itself is pure.
|
||||
|
||||
## Generic
|
||||
|
||||
After everything else we've dealt with, `Generic` is a relatively easy
|
||||
addition. We introduce two new typeclasses:
|
||||
|
||||
```haskell
|
||||
class MVector v a
|
||||
class MVector (Mutable v) a => Vector v a
|
||||
```
|
||||
|
||||
Said in English: an instance `MVector v a` is a mutable vector of type `v` that
|
||||
can hold values of type `a`. The `Vector v a` is the immutable counterpart to
|
||||
some mutable vector. You can find the mutable version with `Mutable v`.
|
||||
|
||||
One important thing to keep in mind is *kinds*. The kind of the `v` is `MVector
|
||||
v a` is `* -> * -> *`, since it takes parameters for both the state token and
|
||||
the value it holds. With the immutable `Vector v a`, the `v` is of kind `* ->
|
||||
*`. Was that a little abstract? No problem, some type signatures should help:
|
||||
|
||||
```haskell
|
||||
length :: MVector v a => v s a -> Int
|
||||
length :: Vector v a => v a -> Int
|
||||
|
||||
read :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a
|
||||
```
|
||||
|
||||
It takes a bit of time to get used to these generic classes, but once you do
|
||||
it's fairly easy to use them. The best advice is to practice! And as such:
|
||||
|
||||
* Exercise: modify the `shuffle` program above to work on a generic vector
|
||||
instead of specifically on an unboxed vector.
|
||||
|
||||
The final trick when working with generic vectors is that, ultimately, you will
|
||||
need to provide a concrete type. If you forget to do so, you'll end up with
|
||||
error messages that look like the following:
|
||||
|
||||
```haskell
|
||||
stream.hs:28:13:
|
||||
No instance for (V.Vector v0 Int) arising from a use of ‘shuffle’
|
||||
In the expression: shuffle gen
|
||||
In the second argument of ‘($)’, namely
|
||||
‘shuffle gen $ V.enumFromTo 1 (20 :: Int)’
|
||||
In a stmt of a 'do' block:
|
||||
print $ shuffle gen $ V.enumFromTo 1 (20 :: Int)
|
||||
```
|
||||
|
||||
## vector-algorithms
|
||||
|
||||
A package of note is
|
||||
[vector-algorithms](http://www.stackage.org/package/vector-algorithms), which
|
||||
provides some algorithms (mostly sort) on mutable vectors. For example, let's
|
||||
generate 100 random numbers and then sort them.
|
||||
|
||||
```haskell
|
||||
import Data.Vector.Algorithms.Merge (sort)
|
||||
import qualified Data.Vector.Generic.Mutable as M
|
||||
import qualified Data.Vector.Unboxed as V
|
||||
import System.Random (randomRIO)
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
vector <- M.replicateM 100 $ randomRIO (0, 999 :: Int)
|
||||
sort vector
|
||||
V.unsafeFreeze vector >>= print
|
||||
```
|
||||
|
||||
* Exercise 1: write a helper function `sortImmutable` that uses `modify` and `sort` from vector-algorithms to sort an immutable vector safely
|
||||
* Exercise 2: rewrite the main function above to use `sortImmutable` and only the immutable vector API
|
||||
* Exercise 3: is your new version more efficient, less efficient, or the same? Explain.
|
||||
|
||||
## mwc-random
|
||||
|
||||
One final library to mention now is mwc-random, a random number generation
|
||||
library built on top of vector and primitive. Its API can be a bit daunting
|
||||
initially, but given your newfound understanding of the vector package, the API
|
||||
might make a lot more sense now. It provides a `Gen s` type, where `s` is some
|
||||
state token. You can then use `uniform` and `uniformR` to get random numbers
|
||||
out of that generator.
|
||||
|
||||
As a final example, here's how we can shuffle the numbers 1-20 using
|
||||
mwc-random.
|
||||
|
||||
```haskell
|
||||
import Control.Monad.ST (ST)
|
||||
import qualified Data.Vector.Unboxed as V
|
||||
import qualified Data.Vector.Unboxed.Mutable as M
|
||||
import System.Random.MWC (Gen, uniformR, withSystemRandom)
|
||||
|
||||
shuffleM :: V.Unbox a
|
||||
=> Gen s
|
||||
-> Int -- ^ count to shuffle
|
||||
-> M.MVector s a
|
||||
-> ST s ()
|
||||
shuffleM _ i _ | i <= 1 = return ()
|
||||
shuffleM gen i v = do
|
||||
index <- uniformR (0, i') gen
|
||||
M.swap v i' index
|
||||
shuffleM gen i' v
|
||||
where
|
||||
i' = i - 1
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
vector <- withSystemRandom $ \gen -> do
|
||||
vector <- V.unsafeThaw $ V.enumFromTo 1 (20 :: Int)
|
||||
shuffleM gen (M.length vector) vector
|
||||
V.unsafeFreeze vector
|
||||
print vector
|
||||
```
|
|
@ -37,7 +37,7 @@ follow the rest of this outline in particular.
|
|||
Covers some of the most commonly used data structures in Haskell, and the
|
||||
libraries providing them.
|
||||
|
||||
* vector (cover vector-algorithms)
|
||||
* [vector](../content/vector.md)
|
||||
* containers
|
||||
* unordered-containers
|
||||
* text (cover text-icu)
|
||||
|
|
Loading…
Reference in a new issue