# clj-ml A machine learning library for Clojure built on top of Weka and friends ## Installation In order to install the library you must first install Leiningen. You should also download the Weka 3.6.2 jar from the official weka homepage. If maven complains about not finding weka, follow its instructions to install the jar manually. ### To install from source * git clone the project * $ lein deps * $ lein compile * $ lein compile-java * $ lein uberjar ### Installing from Clojars [clj-ml "0.0.3-SNAPSHOT"] ### Installing from Maven (add Clojars repository) clj-ml clj-ml 0.0.3-SNAPSHOT ## Supported algorithms * Filters - supervised discretize - unsupervised discretize - supervised nominal to binary - unsupervised nominal to binary * Classifiers - C4.5 (J4.8) - naive Bayes - multilayer perceptron * Clusterers - k-means ## Usage * I/O of data REPL>(use 'clj-ml.io) REPL>; Loading data from an ARFF file, XRFF and CSV are also supported REPL>(def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff")) REPL>; Saving data in a different format REPL>(save-instances :csv "file:///Users/antonio.garrote/Desktop/iris.csv" ds) * Working with datasets REPL>(use 'clj-ml.data) REPL>; Defining a dataset REPL>(def ds (make-dataset "name" [:length :width {:kind [:good :bad]}] [ [12 34 :good] [24 53 :bad] ])) REPL>ds # REPL>; Using datasets like sequences REPL>(dataset-seq ds) (# #) REPL>; Transforming instances into maps or vectors REPL>(instance-to-map (first (dataset-seq ds))) {:kind :good, :width 34.0, :length 12.0} REPL>(instance-to-vector (dataset-at ds 0)) [12.0 34.0 :good] * Filtering datasets REPL>(us 'clj-ml.filters) REPL>(def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff")) REPL>; Discretizing a numeric attribute using an unsupervised filter REPL>(def discretize (make-filter :unsupervised-discretize {:dataset *ds* :attributes [0 2]})) REPL>(def filtered-ds (filter-process discretize ds)) * Using classifiers REPL>(use 'clj-ml.classifiers) REPL>; Building a classifier using a C4.5 decission tree REPL>(def classifier (make-classifier :decission-tree :c45)) REPL>; We set the class attribute for the loaded dataset REPL>(dataset-set-class ds 4) REPL>; Training the classifier REPL>(classifier-train classifier ds) # 0.6 | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9 REPL>; We evaluate the classifier using a test dataset REPL>; last parameter should be a different test dataset, here we are using the same REPL>(def evaluation (classifier-evaluate classifier :dataset ds ds)) === Confusion Matrix === a b c <-- classified as 50 0 0 | a = Iris-setosa 0 49 1 | b = Iris-versicolor 0 2 48 | c = Iris-virginica === Summary === Correctly Classified Instances 147 98 % Incorrectly Classified Instances 3 2 % Kappa statistic 0.97 Mean absolute error 0.0233 Root mean squared error 0.108 Relative absolute error 5.2482 % Root relative squared error 22.9089 % Total Number of Instances 150 REPL>(:kappa evaluation) 0.97 REPL>(:root-mean-squared-error e) 0.10799370769526968 REPL>(:precision e) {:Iris-setosa 1.0, :Iris-versicolor 0.9607843137254902, :Iris-virginica 0.9795918367346939} REPL>; The classifier can also be evaluated using cross-validation REPL>(classifier-evaluate classifier :cross-validation ds 10) === Confusion Matrix === a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 4 46 | c = Iris-virginica === Summary === Correctly Classified Instances 142 94.6667 % Incorrectly Classified Instances 8 5.3333 % Kappa statistic 0.92 Mean absolute error 0.0452 Root mean squared error 0.1892 Relative absolute error 10.1707 % Root relative squared error 40.1278 % Total Number of Instances 150 REPL>; A trained classifier can be used to classify new instances REPL>(def to-classify (make-instance ds {:class :Iris-versicolor, :petalwidth 0.2, :petallength 1.4, :sepalwidth 3.5, :sepallength 5.1})) REPL>(classifier-classify classifier to-classify) 0.0 REPL>(classifier-label to-classify) # REPL>; The classifiers can be saved and restored later REPL>(use 'clj-ml.utils) REPL>(serialize-to-file classifier REPL> "/Users/antonio.garrote/Desktop/classifier.bin") * Using clusterers REPL>(use 'clj-ml.clusterers) REPL> ; we build a clusterer using k-means and three clusters REPL> (def kmeans (make-clusterer :k-means {:number-clusters 3})) REPL> ; we need to remove the class from the dataset to REPL> ; use this clustering algorithm REPL> (dataset-remove-class ds) REPL> ; we build the clusters REPL> (clusterer-build kmeans ds) REPL> kmeans #