No description
Find a file
Antonio Garrote f247274dec fix doc
2010-02-28 13:35:04 +01:00
src First commit 2010-02-28 13:14:17 +01:00
test/clj_ml First commit 2010-02-28 13:14:17 +01:00
project.clj First commit 2010-02-28 13:14:17 +01:00
README fix doc 2010-02-28 13:35:04 +01:00

# clj-ml

A machine learning library for Clojure built on top of Weka and friends

## Usage

* I/O of data

Loading data from a CSV file:

    REPL>(use 'clj-ml.io)

    REPL>; Loading data from an ARFF file, XRFF and CSV are also supported
    REPL>(def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff"))

    REPL>; Saving data in a different format
    REPL>(save-instances :csv ds)

* Working with datasets

    REPL>(use 'clj-ml.data)

    REPL>; Defining a dataset
    REPL>(def ds (make-dataset "name" [:length :width {:kind [:good :bad]}] [12 34 :good] [24 53 :bad] ]))
    REPL>ds

    #<ClojureInstances @relation name

    @attribute length numeric
    @attribute width numeric
    @attribute kind {good,bad}

    @data
    12,34,good
    24,53,bad>

    REPL>; Using datasets like sequences
    REPL>(dataset-seq ds)

    (#<Instance 12,34,good> #<Instance 24,53,bad>)

    REPL>; Transforming instances  into maps or vectors
    REPL>(instance-to-map (first (dataset-seq ds)))

    {:kind :good, :width 34.0, :length 12.0}

    REPL>(instance-to-vector (dataset-at ds 0))
    [12.0 34.0 :good]

* Filtering datasets

    REPL>(us 'clj-ml.filters)

    REPL>(def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff"))

    REPL>; Discretizing a numeric attribute using an unsupervised filter
    REPL>(def  discretize (make-filter :unsupervised-discretize {:dataset *ds* :attributes [0 2]}))

    REPL>(def filtered-ds (filter-process discretize ds))

* Using classifiers

    REPL>(use 'clj-ml.classifiers)

    REPL>; Building a classifier using a  C4.5 decission tree
    REPL>(def classifier (make-classifier :decission-tree :c45))

    REPL>; We set the class attribute for the loaded dataset
    REPL>(dataset-set-class ds 4)

    REPL>; Training the classifier
    REPL>(classifier-train classifier ds)

     #<J48 J48 pruned tree
     ------------------

     petalwidth <= 0.6: Iris-setosa (50.0)
     petalwidth > 0.6
     |	petalwidth <= 1.7
     |	|   petallength <= 4.9: Iris-versicolor (48.0/1.0)
     |	|   petallength > 4.9
     |	|   |	petalwidth <= 1.5: Iris-virginica (3.0)
     |	|   |	petalwidth > 1.5: Iris-versicolor (3.0/1.0)
     |	petalwidth > 1.7: Iris-virginica (46.0/1.0)

     Number of Leaves  :		5

     Size of the tree :	9


    REPL>; We evaluate the classifier using a test dataset
    REPL>; last parameter should be a different test dataset, here we are using the same
    REPL>(def evaluation   (classifier-evaluate classifier  :dataset ds ds))

     === Confusion Matrix ===

       a	 b  c	<-- classified as
      50	 0  0 |	 a = Iris-setosa
       0 49  1 |	 b = Iris-versicolor
       0	 2 48 |	 c = Iris-virginica

     === Summary ===

     Correctly Classified Instances	   147		     98	     %
     Incorrectly Classified Instances	     3		      2	     %
     Kappa statistic			     0.97
     Mean absolute error			     0.0233
     Root mean squared error		     0.108
     Relative absolute error		     5.2482 %
     Root relative squared error		    22.9089 %
     Total Number of Instances		   150

    REPL>(:kappa evaluation)

     0.97

    REPL>(:root-mean-squared-error e)

     0.10799370769526968

    REPL>(:precision e)

     {:Iris-setosa 1.0, :Iris-versicolor 0.9607843137254902, :Iris-virginica
      0.9795918367346939}

    REPL>; The classifier can also be evaluated using cross-validation
    REPL>(classifier-evaluate classifier :cross-validation ds 10)

     === Confusion Matrix ===

       a	 b  c	<-- classified as
      49	 1  0 |	 a = Iris-setosa
       0 47  3 |	 b = Iris-versicolor
       0	 4 46 |	 c = Iris-virginica

     === Summary ===

     Correctly Classified Instances	   142		     94.6667 %
     Incorrectly Classified Instances	     8		      5.3333 %
     Kappa statistic			     0.92
     Mean absolute error			     0.0452
     Root mean squared error		     0.1892
     Relative absolute error		    10.1707 %
     Root relative squared error		    40.1278 %
     Total Number of Instances		   150

    REPL>; A trained classifier can be used to classify new instances
    REPL>(def to-classify (make-instance ds
                                                      {:class :Iris-versicolor,
                                                      :petalwidth 0.2,
                                                      :petallength 1.4,
                                                      :sepalwidth 3.5,
                                                      :sepallength 5.1}))
    REPL>(classifier-classify classifier to-classify)

     0.0

    REPL>(classifier-label to-classify)

     #<Instance 5.1,3.5,1.4,0.2,Iris-setosa>


    REPL>; The classifiers can be saved and restored later
    REPL>(use 'clj-ml.utils)

    REPL>(serialize-to-file classifier
    REPL> "/Users/antonio.garrote/Desktop/classifier.bin")

## Installation

In order to install the library you must first install Leiningen.
You should also download the Weka 3.6.2 jar from the official weka homepage.
If maven complains about not finding weka, follow its instructions to install
the jar manually.

### To install from source

*  git clone the project
* $ lein deps
* $ lein compile
* $ lein compile-java
* $ lein uberjar

## License

MIT License