Commit graph

262 commits

Author SHA1 Message Date
Joshua Eckroth
be08a41f33 Simplified docs-to-dataset function. 2013-12-24 08:07:40 -05:00
Joshua Eckroth
1e751dfae3 Keep original docs ordering in docs-to-dataset. 2013-10-12 21:39:27 -04:00
Joshua Eckroth
c220b141db Move to 0.5.0-SNAPSHOT 2013-10-12 21:30:57 -04:00
Joshua Eckroth
e11690dd93 Don't shuffle docs in doc-to-dataset. 2013-10-12 21:30:04 -04:00
Joshua Eckroth
c7c7cdd9f1 The option for the resample filter is :no-replacement not :replacement 2013-10-12 15:07:00 -04:00
Joshua Eckroth
e1295877e4 Typo. 2013-10-12 15:06:51 -04:00
Joshua Eckroth
9b56c2a53f Use clj-ml.artifice.cc for test arff/csv files 2013-09-21 20:01:31 -04:00
Joshua Eckroth
5626d45654 Fixed tests on Windows
Line-ending problem.
2013-09-21 19:53:54 -04:00
Joshua Eckroth
544701e462 Updated readme for v0.4.0. 2013-08-07 10:40:45 -04:00
Joshua Eckroth
a4223aab80 Updated history file for v0.4.0. 2013-08-07 10:39:49 -04:00
Joshua Eckroth
bc4294ae3f Bumped version to 0.4.0. 2013-08-07 10:34:57 -04:00
Joshua Eckroth
937b9bf87d Fixed some tests that broke when nominal attributes changed from string to keyword representations. 2013-08-07 10:32:50 -04:00
Joshua Eckroth
dcf6534ea4 Removed UI code (Weka can do that better) and some other unused or broken dependencies. 2013-08-07 10:32:24 -04:00
Joshua Eckroth
65a851341b Added regression example to readme and new function for regression, classifier-predict-numeric. 2013-08-07 10:24:45 -04:00
Joshua Eckroth
ddace20320 Added :replace-missing-values filter and updated readme. 2013-08-06 21:37:17 -04:00
Joshua Eckroth
a18cbbae19 Added example of using Titanic survival data from Kaggle
https://www.kaggle.com/c/titanic-gettingStarted
2013-08-06 19:44:23 -04:00
Joshua Eckroth
c650f86c3a Bugfix for (classifier-label) 2013-08-06 19:43:46 -04:00
Joshua Eckroth
73953ef2fb Removed useless files. 2013-08-06 12:19:34 -04:00
Joshua Eckroth
c44917b0fc Fixed bulleted lists in readme. 2013-08-06 12:18:17 -04:00
Joshua Eckroth
5d7faa2b22 Grammer in readme. 2013-08-06 12:16:03 -04:00
Joshua Eckroth
914b65a5dc Added clusterer-cluster examples to readme. 2013-08-06 12:15:58 -04:00
Joshua Eckroth
81cc54f8a3 Fixed author links in readme. 2013-08-06 12:15:38 -04:00
Joshua Eckroth
8e15fcc3ee Updated authors in readme. 2013-08-06 12:08:41 -04:00
Joshua Eckroth
10310d74e8 Simpler usage for docs-to-dataset. 2013-08-06 03:42:20 -04:00
Joshua Eckroth
59f4cf3697 Improved some dataset functions that operate on the class attribute. 2013-08-06 03:42:03 -04:00
Joshua Eckroth
3064722b14 Formatting. 2013-08-06 03:41:21 -04:00
Joshua Eckroth
8aeed64130 Changed classifier-classify to produce the class label; updated classifier-label to avoid using classifier-classify. 2013-08-06 03:41:12 -04:00
Joshua Eckroth
53d141019f Added codox metadata to project.clj 2013-08-06 03:40:20 -04:00
Joshua Eckroth
7a450f2e04 Updated tutorial in readme. 2013-08-06 03:40:05 -04:00
Joshua Eckroth
7a90091fba Removed some debugging statements. 2013-08-04 08:58:57 -04:00
Joshua Eckroth
19d3772bc0 Wrapped some noisy operations in (capture-out-err) which captures and discards stdout and stderr. 2013-08-04 08:57:49 -04:00
Joshua Eckroth
db70ee980f Fixed indentation. 2013-07-31 06:50:59 -04:00
Joshua Eckroth
2945f082bb Use default 0 for :normalize param (rather than false) 2013-07-17 00:43:05 -04:00
Joshua Eckroth
b5f92c5ced Support :normalize option in docs-to-dataset 2013-07-17 00:41:02 -04:00
Joshua Eckroth
8d7c41c25b Typo. 2013-07-16 23:59:10 -04:00
Joshua Eckroth
0e9b0bdb14 Add :counts option for docs-to-dataset 2013-07-16 23:58:14 -04:00
Joshua Eckroth
1105dac7b8 Don't limit the size of fulltext in docs-to-dataset 2013-07-16 23:51:12 -04:00
Joshua Eckroth
3ead98c527 Added k-nearest neighbor classifier (:lazy :ibk) 2013-07-16 23:29:45 -04:00
Joshua Eckroth
26a9d69c05 Fixed saving/loading csv instances. 2013-07-11 07:40:32 -04:00
Joshua Eckroth
0da42ca0ea Filter out junk from text fields. 2013-07-11 00:24:24 -04:00
Joshua Eckroth
8b53ee681c Shuffle docs (with/without term) when making a dataset. 2013-07-04 15:32:07 -04:00
Joshua Eckroth
b8cc877c05 Bugfix and added empty test case to force loading/compiling of clj-ml.public-datasets. 2013-07-04 09:42:03 -04:00
Joshua Eckroth
3dd4d872cd Switching to 0.4.0-SNAPSHOT to support rapid changes. 2013-07-04 09:41:36 -04:00
Joshua Eckroth
7b71a3d0f0 Bumped to version 0.3.13. 2013-07-04 09:38:52 -04:00
Joshua Eckroth
cce83d924d Bugfix. 2013-07-04 09:36:31 -04:00
Joshua Eckroth
623e9a1ef3 New format for terms on public datasets. Bumped to version 0.3.12. 2013-07-04 09:34:53 -04:00
Joshua Eckroth
4a4ef5ea03 Starting support for public datasets. Added reuters21578 dataset handling.
Bumped to version 0.3.11.
2013-07-04 09:00:48 -04:00
Joshua Eckroth
6f03716d0a Added function clj-ml.data/docs-to-dataset to support translating text documents (with title, fulltext, and terms) into wordvec datasets for binary classification.
Bumped to version 0.3.10.
2013-07-04 08:53:45 -04:00
Joshua Eckroth
123cd1713c Switched to a modern Snowball stemmer implementation. Bumped version to 0.3.9. 2013-07-01 14:40:05 -04:00
Joshua Eckroth
edceff891b Fixed bug reading instance attribute values.
Bumped to 0.3.8.
2013-07-01 14:08:01 -04:00