MySpider is an extension to Spider. The Spider is an advanced data mining Matlab toolbox released on GNU GPL v3 licence. It is object oriented set of classes which allows to build complex data analysis flow. It includes tools for data preprocessing, large set of classification algorithms, bat also algorithms for regression, data clustering, feature selection, model selection etc. Because Spider is object oriented building complex flow is very simple.
In MySpider some of the basic classes were modified or corrected, such that some of the bugs were fixed. It also modifies the algorithm configuration procedure, such that now it is a common standard of modification of any parameter.
MySpider can be obtained from Spider - this version also includes Spider Toolbox, or the newest version also with full Spider can be downloaded from SVN repository. However to get the SVN version it is required to get rights on SVN server so please contact us: SVN link: https://hpc.kzi.polsl.pl:8443/svn/myspider
Access to the repository is available through user: guest passwd: guest
To install MySpider simply download the zip file or check-out SVN version and put it to spider folder. Then it is required to update the matlab path. To do this you can run use_spider function available in spider\use_spider.m, this will create and automatically add the folder structure to the matlab path. Alternatively using function spider_subdirs you can obtain the full path structure and then do whatever you wont. The folder structure is returned as a cell array of string. myspider directory is a subdirectory of the spider folder.
MySpider implements following algorithms:
This is first simple example of Spider adn MySpider
Line 1 generates dataset, then in line 2 a chain is constructed, such that when processing data the CNN algorithm will be executed, and then the knn one. Line 3 starts execution, line 4 applies already trained model on test set.
This example is similar to the previous one, except that cross validation is used to estimate the accuracy of the system.
Line 2 extends the chain by adding also normalization and other algorithms ENN, CNN, ge_sel. In line 3 this chain is inserted into cross validation, and after training (line 4) the system calculates the mean and standard deviation
This example shows how to optimize model parameters. Here the prototype initialization for LVQ algorithm is optimized using a series of algorithms
In this example in line 4 a model generator is used. This model generator generates new models, here LVQ algorithm with different settings, and execute it. myparam has is construced as a name of the class which we wont to set, and a property of that class proto and after it is a list of possible values, here ENN,CNN,FCM algorithms
A simple process which compares classification error of two algorithms
In this process in line 2 data normalization is performed, then in line 3 a group of two algorithms is constructed, this group is plugged into crossvalidation test, this test is executed in line 5 and in line 6 we obtain final results