ISPR Information Selection

We develop an extension to RapidMiner which can be used for instance selection, but also for mining large datasets, to shrink them to be acceptable by common data mining algorithms.


Instance Selection and Prototype-Based Rule (ISPR) plugin can be installed directly from RapidMiner marketplace by selecting in menu Help→Updates and Extensions.

A working copy and the most recent version of the library can be obtained form the GitHub repository:, checkout and update commands has public access.

To install it manually simply copy the jar file into the RapidMinerX/lib/plugins folder. This will automatically force RapidMiner to load this extension.

Usage Examples

Materials with examples of how to use ISPR can be obtained from: This examples can be also directly imported into RapidMiner project using Community/MyExperiment RapidMiner extension.

A new book „USE CASES WITH RAPIDMINER” is approaching with a chapter describing the ISPR extension. Recently a draft is available here

List of implemented algorithms

This library includes a large number of instance selection algorithms:

  • Natively implemented Instance selection algorithms
    • CNN - Condensed Nearest Neighbor Rule
    • ENN - Edited Nearest Neighbor Rule
    • RENN - Repeated Edited Nearest Neighbor Rule
    • All k-NN
    • RNG - Relative Neighbor Graph
    • GE - Gabriel Editing
    • ELH - Encoding Length Heuristic
    • RMHC – Random Mutation Hill Climbing
    • IB2 - Instance Based Learning v2
    • IB3 - Instance Based Learning v2
    • Drop1 - Decremental Reduction Optimization Procedure v1
    • Drop2 - Decremental Reduction Optimization Procedure v1
    • Drop3 - Decremental Reduction Optimization Procedure v1
    • Drop4 - Decremental Reduction Optimization Procedure v1
    • Drop5 - Decremental Reduction Optimization Procedure v1
    • ICF - Iterative Case Filtering
    • MC - Monte Carlo
    • Random Selection Note: The most of these algorithms supports regression problems with additive as well as multiplicative noise
  • It also wraps other libraries, which were:
    • Developed for Weka:
      • Weka Drop1-5
      • Weka ICF
      • Weka BSE
      • Weka CNN
      • Weka ENN
      • Weka HMNE
      • Weka HMNEI
      • Weka MC
      • Weka MSS
      • Weka RNN
    • Developed for Keel project:
      • Keel CCIS
  • Ensembles of Instance Selection methods:
    • Ensemble Instance Selection by Bagging
    • Ensemble Instance Selection by Voting
    • Ensemble Instance Selection by Attribute Subsets
    • Ensemble Instance Selection by Noise
    • Ensemble Instance Selection by AdaBoost
  • Generalized Instance Selection (these methods allow any classifier to be used within instance selection)
    • Generalized ENN
    • Generalized CNN
  • Competitive based Neural Networks:
    • LVQ1
    • LVQ2
    • LVQ2.1
    • LVQ3
    • OLVQ
    • Weighted LVQ
    • SNG – Supervised Neural Gas
    • Winer Takes Most LVQ
    • Generalized LVQ
  • Clustering algorithms:
    • Fuzzy c-means
    • Vector Quantization
    • Conditional Fuzzy c-Means
  • Feature set reduction algorithms
    • MDS Multidimensional Scaling (use external library: Creative Commons License - which is not fully compatible with AGPL)
    • Feature Selection based on Infosel++ Package (requires external c++ library)
  • Performance metrics for:
    • Instance selection
    • Clustering
  • Preprocessing methods:
    • VDM based feature transformation from categorical into numerical features
  • Noise estimation methods:
    • Gamma-Test
    • Delta-Test
    • Local Gamma-Test
    • Local Delta-Test
    • ENN based instance weighting
  • Some other useful tools
download/ispr.txt · Last modified: 2019/09/19 13:23 (external edit)