615 lines
24 KiB
Markdown
615 lines
24 KiB
Markdown
---
|
||
title: The vector package
|
||
author: Michael Snoyman <michael@snoyman.com>
|
||
description: Overview and typical usage of the vector package
|
||
first-written: 2015-10-25
|
||
last-updated: 2015-10-25
|
||
last-reviewed: 2015-10-25
|
||
---
|
||
|
||
The de facto standard package in the Haskell ecosystem for integer-indexed
|
||
array data is the [vector package](http://www.stackage.org/package/vector).
|
||
This corresponds at a high level to arrays in C, or the vector class in C++'s
|
||
STL. However, the vector package offers quite a bit of functionality not
|
||
familiar to those used to options in imperative and mutable languages.
|
||
|
||
While the interface for vector is relatively straightforward, the abundance of
|
||
different modules can be daunting. This article will start off with an overview
|
||
of terminology to guide you, and then step through a number of concrete
|
||
examples of using the package.
|
||
|
||
## Example
|
||
|
||
Since we're about to jump into a few section of descriptive text, let's kick
|
||
this off with a concrete example to whet your appetite. We're going to count
|
||
the frequency of different bytes that appear on standard output, and then
|
||
display this content.
|
||
|
||
Note that this example is purposely written in a very generic form. We'll build
|
||
up to handling this form throughout this article.
|
||
|
||
```haskell
|
||
{-# LANGUAGE FlexibleContexts #-}
|
||
import Control.Monad.Primitive (PrimMonad, PrimState)
|
||
import qualified Data.ByteString.Lazy as L
|
||
import qualified Data.Vector.Generic.Mutable as M
|
||
import qualified Data.Vector.Unboxed as U
|
||
import Data.Word (Word8)
|
||
|
||
main :: IO ()
|
||
main = do
|
||
-- Get all of the contents from stdin
|
||
lbs <- L.getContents
|
||
|
||
-- Create a new 256-size mutable vector
|
||
-- Fill the vector with zeros
|
||
mutable <- M.replicate 256 0
|
||
|
||
-- Add all of the bytes from stdin
|
||
addBytes mutable lbs
|
||
|
||
-- Freeze to get an immutable version
|
||
vector <- U.unsafeFreeze mutable
|
||
|
||
-- Print the frequency of each byte
|
||
-- In newer vectors: we can use imapM_
|
||
U.zipWithM_ printFreq (U.enumFromTo 0 255) vector
|
||
|
||
addBytes :: (PrimMonad m, M.MVector v Int)
|
||
=> v (PrimState m) Int
|
||
-> L.ByteString
|
||
-> m ()
|
||
addBytes v lbs = mapM_ (addByte v) (L.unpack lbs)
|
||
|
||
addByte :: (PrimMonad m, M.MVector v Int)
|
||
=> v (PrimState m) Int
|
||
-> Word8
|
||
-> m ()
|
||
addByte v w = do
|
||
-- Read out the old count value
|
||
oldCount <- M.read v index
|
||
-- Write back the updated count value
|
||
M.write v index (oldCount + 1)
|
||
where
|
||
-- Indices in vectors are always Ints. Our bytes come in as Word8, so we
|
||
-- need to convert them.
|
||
index :: Int
|
||
index = fromIntegral w
|
||
|
||
printFreq :: Int -> Int -> IO ()
|
||
printFreq index count = putStrLn $ concat
|
||
[ "Frequency of byte "
|
||
, show index
|
||
, ": "
|
||
, show count
|
||
]
|
||
```
|
||
|
||
## Terminology
|
||
|
||
There are two different varieties of vectors: immutable and mutable. Immutable
|
||
vectors (such as provided by the `Data.Vector` module) are essentially
|
||
swappable with normal lists in Haskell, though with drastically different
|
||
performance characteristics (discussed below). The high-level API is similar to
|
||
lists, it implements common typeclasses like `Functor` and `Foldable`, and
|
||
plays quite nicely with parallel code.
|
||
|
||
By contrast, mutable vectors are much closer to C-style arrays. Operations
|
||
working on these values must live in the `IO` or `ST` monads (see `PrimMonad`
|
||
below for more details). Concurrent access from multiple threads has all of the
|
||
normal concerns of shared mutable state. And perhaps most importantly for
|
||
usage: mutable vectors can be *much* more efficient for certain use cases.
|
||
|
||
However, that's not the only dimension of choice you get in the vector package.
|
||
vector itself defines three flavors: boxed
|
||
(`Data.Vector`/`Data.Vector.Mutable`), storable (`Data.Vector.Storable` and
|
||
`Data.Vector.Storable.Mutable`), and unboxed (`Data.Vector.Unboxed` and
|
||
`Data.Vector.Unboxed.Mutable`). (There's also technically primitive vectors,
|
||
but in practice you should always prefer unboxed vectors; see the module
|
||
documentation for more information on the distinction here.)
|
||
|
||
And our final point: in addition to having these three flavors, the vector
|
||
package provides a typeclass-based interface which allows you to write code
|
||
that works in any of these three (plus other vector types that may be defined
|
||
in other packages, like
|
||
[hybrid-vectors](http://www.stackage.org/package/hybrid-vectors)). These
|
||
interfaces are in `Data.Vector.Generic` and `Data.Vector.Generic.Mutable`. When
|
||
using these interfaces, you must still eventually choose a concrete
|
||
representation, but your helper code can be agnostic to what it is.
|
||
|
||
What's nice is that - with small differences - all four mutable modules have
|
||
the same interface, and all four immutable modules have the same interface.
|
||
This means you can focus on learning one type of vector, and almost for free
|
||
have that knowledge apply to other types as well. It then just becomes a
|
||
question of choosing the representation that best fits your use case, which
|
||
we'll get to shortly.
|
||
|
||
## Efficiency
|
||
|
||
Standard lists in Haskell are immutable, singly-linked lists. Every time you
|
||
add another value to the front of the list, it has to allocate another heap
|
||
object for that cell, create a pointer to the head of the original list, and
|
||
create a pointer to the value in the current cell. This takes up a lot of
|
||
memory for holding pointers, and makes it inefficient to index or traverse the
|
||
list (indexing to position N requires N pointer dereferences).
|
||
|
||
By contract, vectors are stored in a packed format in memory, meaning indexing
|
||
is an O(1) operation, and the memory overhead per additional item in the vector
|
||
is much smaller (depending on the type of vector, which we'll cover in a
|
||
moment). However, compared to lists, appending an item to a vector is
|
||
relatively expensive: it requires creating a new buffer in memory, copying the
|
||
old values, and then adding the new value.
|
||
|
||
There are other data structures that can be considered for list-like data, such
|
||
as `Seq` from containers, or in some cases a `Set`, `IntMap`, or `Map`.
|
||
Figuring out the best choice for each use case can only be reliably determined
|
||
via profiling. But as a general rule: densely populated lists with integral
|
||
access to the values will be best served by vector.
|
||
|
||
Now let's talk about some of the other things that make vector so efficient.
|
||
|
||
### Boxed, storable and unboxed
|
||
|
||
Boxed vectors hold normal Haskell values. These can be _any_ values at all, and
|
||
are stored on the heap with pointers kept in the vector. The advantage is that
|
||
this works for all datatypes, but the extra memory overhead for the pointers
|
||
and the indirection of needing to dereference those pointers makes them
|
||
(relative to the next two types) inefficient.
|
||
|
||
Storable and unboxed vectors both store their data in a byte array, avoiding
|
||
pointer indirection. This is more memory efficient and allows better usage of
|
||
caches. The distinction between storable and unboxed vectors is subtle:
|
||
|
||
* Storable vectors require data which is an instance of the [`Storable` type
|
||
class](http://haddock.stackage.org/lts-3.11/base-4.8.1.0/Foreign-Storable.html#t:Storable).
|
||
This data is stored in `malloc`ed memory, which is *pinned* (the garbage
|
||
collector can't move it around). This can lead to memory fragmentation, but
|
||
allows the data to be shared over the C FFI.
|
||
* Unboxed vectors require data which is an instance of the [`Prim` type
|
||
class](http://haddock.stackage.org/lts-3.11/primitive-0.6.1.0/Data-Primitive-Types.html).
|
||
This data is stored in GC-managed *unpinned* memory, which helps avoid memory
|
||
fragmentation. However, this data cannot be shared over the C FFI.
|
||
|
||
Both the `Storable` and `Prim` typeclasses provide a way to store a value as
|
||
bytes, and to load bytes into a value. The distinction is what type of
|
||
bytearray is used.
|
||
|
||
As usual, the only true measure of performance will be benchmarking. However,
|
||
as a general guideline:
|
||
|
||
* If you don't need to pass values to a C FFI, and you have a `Prim` instance,
|
||
use unboxed vectors.
|
||
* If you have a `Storable` instance, use a storable vector.
|
||
* Otherwise, use a boxed vector.
|
||
|
||
There are also other issues to consider, such as the fact that boxed vectors
|
||
are instances of `Functor` while storable and unboxed vectors are not.
|
||
|
||
### Stream fusion
|
||
|
||
Take a guess how much memory the following program will take to run:
|
||
|
||
```haskell
|
||
import qualified Data.Vector.Unboxed as V
|
||
|
||
main :: IO ()
|
||
main = print $ V.sum $ V.enumFromTo 1 (10^9 :: Int)
|
||
```
|
||
|
||
A valid guess may be `10^9 * sizeof int` bytes. However, when compiled with
|
||
optimizations (`-O2`) on my system, it allocates a total of only 52kb! How it
|
||
is possible to create a one billion integer array without using up 4-8GB of
|
||
memory?
|
||
|
||
The vector package has a powerful technique: stream fusion. Using GHC rewrite
|
||
rules, it's able to find many cases where creating a vector is unnecessary, and
|
||
instead create a tight inner loop. In our case, GHC will end up generating code
|
||
that can avoid touching system memory, and instead work on just the registers,
|
||
yielding not only a tiny memory footprint, but performance close to a for-loop
|
||
in C. This is one of the beauties of this library: you get to write high-level
|
||
code, and optimizations can churn out something much more CPU-friendly.
|
||
|
||
### Slicing
|
||
|
||
Above we discussed the problem of appending values to the front of a vector.
|
||
However, one place where vector shines is with *slicing*, or taking a subset of
|
||
the vector. When dealing with immutable vectors, slicing is a safe operation,
|
||
with slices being sharable with multiple threads. Slicing also works with
|
||
mutable vectors, but as usual you need to be a bit more careful.
|
||
|
||
## Replacing lists
|
||
|
||
Enough talk! Let's start using vector. Assuming you're familiar with the list
|
||
API, this should looke rather boring.
|
||
|
||
```haskell
|
||
import qualified Data.Vector as V
|
||
|
||
main :: IO ()
|
||
main = do
|
||
let list = [1..10] :: [Int]
|
||
vector = V.fromList list :: V.Vector Int
|
||
vector2 = V.enumFromTo 1 10 :: V.Vector Int
|
||
print $ vector == vector2 -- True
|
||
print $ list == V.toList vector -- also True
|
||
print $ V.filter odd vector -- 1,3,5,7,9
|
||
print $ V.map (* 2) vector -- 2,4,6,...,20
|
||
print $ V.zip vector vector -- (1,1),(2,2),...(10,10)
|
||
print $ V.zipWith (*) vector vector -- (1,4,9,16,...,100)
|
||
print $ V.reverse vector -- 10,9,...,1
|
||
print $ V.takeWhile (< 6) vector -- 1,2,3,4,5
|
||
print $ V.takeWhile odd vector -- 1
|
||
print $ V.takeWhile even vector -- []
|
||
print $ V.dropWhile (< 6) vector -- 6,7,8,9,10
|
||
print $ V.head vector -- 1
|
||
print $ V.tail vector -- 2,3,4,...,10
|
||
print $ V.head $ V.takeWhile even vector -- exception!
|
||
```
|
||
|
||
Hopefully there's nothing too surprising about this. Most `Prelude` functions
|
||
that apply to lists have a corresponding vector function. If you know what a
|
||
function does in `Prelude`, you probably know what it does in `Data.Vector`.
|
||
This is the simplest usage of the vector package: import `Data.Vector`
|
||
qualified, convert to/from lists with `V.fromList` and `V.toList`, and then
|
||
prefix your function calls with `V.`.
|
||
|
||
* Exercise 1: Try out some other functions available in the [`Data.Vector`
|
||
module](http://haddock.stackage.org/lts-3.11/vector-0.10.12.3/Data-Vector.html).
|
||
In particular, try some of the fold functions, which we haven't covered here.
|
||
|
||
* Exercise 2: Try using the `Functor`, `Foldable`, and `Traversable` versions of
|
||
functions with a vector
|
||
|
||
* Exercise 3: Use an unboxed (or storable) vector instead of the boxed vectors
|
||
we were using above. What code did you have to change from the original
|
||
example? Do your examples from exercise 2 all work still?
|
||
|
||
There are also a number of functions in the `Data.Vector` module with no
|
||
corresponding function in `Prelude`. Many of these are related to mutable
|
||
vectors (which we'll cover shortly). Others are present to provide more
|
||
efficient means of manipulating a vector, based on their special in-memory
|
||
representation.
|
||
|
||
## Mutable vectors
|
||
|
||
I want to test how fair the `System.Random` number generator is at generating
|
||
numbers between 0 and 9, inclusive. I want to generate 1,000,000 random values,
|
||
count the frequency of each result, and then print how often each value
|
||
appeared. Let's first implement this using immutable vectors:
|
||
|
||
```haskell
|
||
import Data.Vector.Unboxed ((!), (//))
|
||
import qualified Data.Vector.Unboxed as V
|
||
import System.Random (randomRIO)
|
||
|
||
main :: IO ()
|
||
main = do
|
||
let v0 = V.replicate 10 (0 :: Int)
|
||
|
||
loop v 0 = return v
|
||
loop v rest = do
|
||
i <- randomRIO (0, 9)
|
||
let oldCount = v ! i
|
||
v' = v // [(i, oldCount + 1)]
|
||
loop v' (rest - 1)
|
||
|
||
vector <- loop v0 (10^6)
|
||
print vector
|
||
```
|
||
|
||
We've introduced the `!` operator for indexing, and the `//` operator for
|
||
updating. Other than that, this is fairly straightforward code. When I ran this
|
||
on my system, it had 48MB maximum memory residency, and took 1.968s to
|
||
complete. Surely we can do better.
|
||
|
||
This problem is inherently better as a mutable state one: instead of generating
|
||
a new immutable `Vector` for each random number generated, we'd like to simply
|
||
increment a piece of memory. Let's rewrite this to use a mutable, unboxed
|
||
vector:
|
||
|
||
```haskell
|
||
import Control.Monad (replicateM_)
|
||
import Data.Vector.Unboxed (freeze)
|
||
import qualified Data.Vector.Unboxed.Mutable as V
|
||
import System.Random (randomRIO)
|
||
|
||
main :: IO ()
|
||
main = do
|
||
vector <- V.replicate 10 (0 :: Int)
|
||
|
||
replicateM_ (10^6) $ do
|
||
i <- randomRIO (0, 9)
|
||
oldCount <- V.read vector i
|
||
V.write vector i (oldCount + 1)
|
||
|
||
ivector <- freeze vector
|
||
print ivector
|
||
```
|
||
|
||
Once again, we use `replicate` to create a size-10 vector filled with 0. But
|
||
now we've created a mutable vector (note the change in import). We then use
|
||
`replicateM_` to perform the inner action 1,000,000 times, namely: generate a
|
||
random index, read the old value at that index, increment it, and write it
|
||
back.
|
||
|
||
After we're finished, we _freeze_ the vector (more on that in the next section)
|
||
and print it. The results are the same (or close - we are dealing with random
|
||
numbers here) to the previous immutable one. But instead of 48MB and 1.968s,
|
||
this program has a maximum residency of 44KB and runs in 0.247s! That's a
|
||
significant improvement!
|
||
|
||
If we feel like being even more adventurous, we can replace our `read` and
|
||
`write` calls with `unsafeRead` and `unsafeWrite`. That will disable some
|
||
bounds checks before reading and writing. This can be a nice performance boost
|
||
in very tight loops, but has the potential to segfault your program, so caveat
|
||
emptor! For example, try replacing `replicate 10` with `replicate 9`, change
|
||
the `read` for an `unsafeRead`, and run your program. You'll see something
|
||
like:
|
||
|
||
```
|
||
internal error: evacuate: strange closure type -1944718914
|
||
(GHC version 7.10.2 for x86_64_unknown_linux)
|
||
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
|
||
Aborted (core dumped)
|
||
```
|
||
|
||
The same logic applies to the other `unsafe` functions in vector. The
|
||
nomenclature means: `unsafe` may segfault your whole process, while
|
||
not-marked-unsafe may just throw an impure exception (also not great, but
|
||
certainly better than a segfault).
|
||
|
||
And if you were curious: on my system using `unsafeRead` and `unsafeWrite`
|
||
speeds the program up marginally, from 0.247s to 0.233s. In our example, most
|
||
of our time is spent on generating the random numbers, so taking off the safety
|
||
checks does not have a significant impact.
|
||
|
||
## Freezing and thawing
|
||
|
||
We used the `freeze` function above. The behavior of this may not be
|
||
immediately obvious. When you freeze a mutable vector, what happens is:
|
||
|
||
1. A new mutable vector of the same size is created
|
||
2. Each value in the original mutable vector is copied to the new mutable vector
|
||
3. A new immutable vector is created out of the memory space used by the new mutable vector
|
||
|
||
Why not just freeze it in place? Two reasons, actually:
|
||
|
||
1. It has the potential to break referential transparency. Consider this code:
|
||
|
||
```haskell
|
||
import Data.Vector.Unboxed (freeze)
|
||
import qualified Data.Vector.Unboxed.Mutable as V
|
||
|
||
main :: IO ()
|
||
main = do
|
||
vector <- V.replicate 1 (0 :: Int)
|
||
V.write vector 0 1
|
||
ivector <- freeze vector
|
||
print ivector
|
||
V.write vector 0 2
|
||
print ivector
|
||
```
|
||
|
||
If we froze the vector in-place in the call to `freeze`, then the second
|
||
`write` call would modify our `ivector` value, meaning that the first and
|
||
second call to `print ivector` would have different results!
|
||
|
||
2. When you freeze a mutable vector, its memory is marked different for
|
||
garbage collection purposes. Later trying to write to that same memory can
|
||
lead to a segfault.
|
||
|
||
However, if you really want to avoid that extra buffer copy, and are certain
|
||
it's safe, you can use `unsafeFreeze`. And in fact, our random number example
|
||
above is a case where `freeze` can be safely replaced by `unsafeFreeze`, since
|
||
after the freeze, the original mutable vector is never used again.
|
||
|
||
* Exercise 1: Go ahead and make that swap and confirm that your program works
|
||
as expected.
|
||
* Exercise 2: In the program just above (with `V.replicate 1 (0 :: Int)`),
|
||
replace `freeze` with `unsafeFreeze`. What result do you see?
|
||
|
||
The opposite of `freeze` is `thaw`. Similar to `freeze`, `thaw` will copy to a
|
||
new mutable vector instead of exposing the current memory buffer. And also,
|
||
like `freeze`, there's an `unsafeThaw` that turns off the safety measures. Like
|
||
everything `unsafe`: caveat emptor!
|
||
|
||
(We'll cover some functions like `create` that provide safe wrappers around
|
||
`unsafeFreeze` and `unsafeThaw` later.)
|
||
|
||
## PrimMonad
|
||
|
||
If you look at the mutation functions we used above like `read` and `write`,
|
||
you can tell that they were looking in the `IO` monad. However, vector is more
|
||
generic than that, and will allow your mutations to live in any *primitive
|
||
monad*, meaning: `IO`, strict `ST s`, and transformers sitting on top of those
|
||
two. The type class controlling this is `PrimMonad`.
|
||
|
||
You can get more information on `PrimMonad` in the [Primitive
|
||
Haskell](primitive-haskell.md) article. Without diving into details: every
|
||
primitive monad also has an associated primitive state token type, which is
|
||
captured with `PrimState`. As a result, the type signatures for `read` and
|
||
`write` (for boxed vectors) look like:
|
||
|
||
```haskell
|
||
read :: PrimMonad m => MVector (PrimState m) a -> Int -> m a
|
||
write :: PrimMonad m => MVector (PrimState m) a -> Int -> a -> m ()
|
||
```
|
||
|
||
Every mutable vector takes two type parameters: the state token of the monad it
|
||
lives in, and the type of value it holds. These gymnastics may seem overkill
|
||
now, but are necessary for making mutable vectors both versatile in multiple
|
||
monads, and type safe.
|
||
|
||
## modify and the ST monad
|
||
|
||
Let's check out a particularly complicated type signature (for unboxed vectors):
|
||
|
||
```haskell
|
||
modify :: Unbox a => (forall s. MVector s a -> ST s ()) -> Vector a -> Vector a
|
||
```
|
||
|
||
What this function does is:
|
||
|
||
1. Creates a new mutable buffer the same length as the original vector
|
||
2. Copies the values from the original vector into the new mutable vector
|
||
3. Runs the provided `ST` action on the provided mutable vector
|
||
4. Unsafely freezes the mutable vector and returns it.
|
||
|
||
What's great about this function is that it does the minimal amount of buffer
|
||
copying to be safe, and that it can be used from pure code (since all
|
||
side-effects are captured inside the `ST` action you provide).
|
||
|
||
* Exercise 1: Steps 1 and 2 should look pretty similar to a function we
|
||
discussed above. Can you figure out which one it is?
|
||
* Exercise 2: Implement `modify` yourself using functions we've discussed and
|
||
`runST` from `Control.Monad.ST`.
|
||
|
||
Let's use our new function to implement a Fisher-Yates shuffle. If we start
|
||
with a vector of size 20, we'll generate a random number between 0 and 19. Then
|
||
we'll swap position 19 with that generated random number. Then we'll loop, but
|
||
this time with a random number between 0 and 18 and swapping with position 18.
|
||
We continue until we get down to 0.
|
||
|
||
```haskell
|
||
import Control.Monad.Primitive (PrimMonad, PrimState)
|
||
import qualified Data.Vector.Unboxed as V
|
||
import qualified Data.Vector.Unboxed.Mutable as M
|
||
import System.Random (StdGen, getStdGen, randomR)
|
||
|
||
shuffleM :: (PrimMonad m, V.Unbox a)
|
||
=> StdGen
|
||
-> Int -- ^ count to shuffle
|
||
-> M.MVector (PrimState m) a
|
||
-> m ()
|
||
shuffleM _ i _ | i <= 1 = return ()
|
||
shuffleM gen i v = do
|
||
M.swap v i' index
|
||
shuffleM gen' i' v
|
||
where
|
||
(index, gen') = randomR (0, i') gen
|
||
i' = i - 1
|
||
|
||
shuffle :: V.Unbox a
|
||
=> StdGen
|
||
-> V.Vector a
|
||
-> V.Vector a
|
||
shuffle gen vector = V.modify (shuffleM gen (V.length vector)) vector
|
||
|
||
main :: IO ()
|
||
main = do
|
||
gen <- getStdGen
|
||
print $ shuffle gen $ V.enumFromTo 1 (20 :: Int)
|
||
```
|
||
|
||
Notice how `shuffleM` is a mutable, side-effecting function. However, `shuffle`
|
||
itself is pure.
|
||
|
||
## Generic
|
||
|
||
After everything else we've dealt with, `Generic` is a relatively easy
|
||
addition. We introduce two new typeclasses:
|
||
|
||
```haskell
|
||
class MVector v a
|
||
class MVector (Mutable v) a => Vector v a
|
||
```
|
||
|
||
Said in English: an instance `MVector v a` is a mutable vector of type `v` that
|
||
can hold values of type `a`. The `Vector v a` is the immutable counterpart to
|
||
some mutable vector. You can find the mutable version with `Mutable v`.
|
||
|
||
One important thing to keep in mind is *kinds*. The kind of the `v` is `MVector
|
||
v a` is `* -> * -> *`, since it takes parameters for both the state token and
|
||
the value it holds. With the immutable `Vector v a`, the `v` is of kind `* ->
|
||
*`. Was that a little abstract? No problem, some type signatures should help:
|
||
|
||
```haskell
|
||
length :: MVector v a => v s a -> Int
|
||
length :: Vector v a => v a -> Int
|
||
|
||
read :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a
|
||
```
|
||
|
||
It takes a bit of time to get used to these generic classes, but once you do
|
||
it's fairly easy to use them. The best advice is to practice! And as such:
|
||
|
||
* Exercise: modify the `shuffle` program above to work on a generic vector
|
||
instead of specifically on an unboxed vector.
|
||
|
||
The final trick when working with generic vectors is that, ultimately, you will
|
||
need to provide a concrete type. If you forget to do so, you'll end up with
|
||
error messages that look like the following:
|
||
|
||
```haskell
|
||
stream.hs:28:13:
|
||
No instance for (V.Vector v0 Int) arising from a use of ‘shuffle’
|
||
In the expression: shuffle gen
|
||
In the second argument of ‘($)’, namely
|
||
‘shuffle gen $ V.enumFromTo 1 (20 :: Int)’
|
||
In a stmt of a 'do' block:
|
||
print $ shuffle gen $ V.enumFromTo 1 (20 :: Int)
|
||
```
|
||
|
||
## vector-algorithms
|
||
|
||
A package of note is
|
||
[vector-algorithms](http://www.stackage.org/package/vector-algorithms), which
|
||
provides some algorithms (mostly sort) on mutable vectors. For example, let's
|
||
generate 100 random numbers and then sort them.
|
||
|
||
```haskell
|
||
import Data.Vector.Algorithms.Merge (sort)
|
||
import qualified Data.Vector.Generic.Mutable as M
|
||
import qualified Data.Vector.Unboxed as V
|
||
import System.Random (randomRIO)
|
||
|
||
main :: IO ()
|
||
main = do
|
||
vector <- M.replicateM 100 $ randomRIO (0, 999 :: Int)
|
||
sort vector
|
||
V.unsafeFreeze vector >>= print
|
||
```
|
||
|
||
* Exercise 1: write a helper function `sortImmutable` that uses `modify` and `sort` from vector-algorithms to sort an immutable vector safely
|
||
* Exercise 2: rewrite the main function above to use `sortImmutable` and only the immutable vector API
|
||
* Exercise 3: is your new version more efficient, less efficient, or the same? Explain.
|
||
|
||
## mwc-random
|
||
|
||
One final library to mention now is mwc-random, a random number generation
|
||
library built on top of vector and primitive. Its API can be a bit daunting
|
||
initially, but given your newfound understanding of the vector package, the API
|
||
might make a lot more sense now. It provides a `Gen s` type, where `s` is some
|
||
state token. You can then use `uniform` and `uniformR` to get random numbers
|
||
out of that generator.
|
||
|
||
As a final example, here's how we can shuffle the numbers 1-20 using
|
||
mwc-random.
|
||
|
||
```haskell
|
||
import Control.Monad.ST (ST)
|
||
import qualified Data.Vector.Unboxed as V
|
||
import qualified Data.Vector.Unboxed.Mutable as M
|
||
import System.Random.MWC (Gen, uniformR, withSystemRandom)
|
||
|
||
shuffleM :: V.Unbox a
|
||
=> Gen s
|
||
-> Int -- ^ count to shuffle
|
||
-> M.MVector s a
|
||
-> ST s ()
|
||
shuffleM _ i _ | i <= 1 = return ()
|
||
shuffleM gen i v = do
|
||
index <- uniformR (0, i') gen
|
||
M.swap v i' index
|
||
shuffleM gen i' v
|
||
where
|
||
i' = i - 1
|
||
|
||
main :: IO ()
|
||
main = do
|
||
vector <- withSystemRandom $ \gen -> do
|
||
vector <- V.unsafeThaw $ V.enumFromTo 1 (20 :: Int)
|
||
shuffleM gen (M.length vector) vector
|
||
V.unsafeFreeze vector
|
||
print vector
|
||
```
|