# Information filtering and selection

Nowadays data mining techniques become more and more popular, however building data driven models we face several independent problems:

- model accuracy - the model should be as accurate as possible, preserving good generalization
- comprehensibility - the knowledge extracted from the model should be human friendly, such that we will be able to understand what the model has learned.
- big data - limitation related to the size of the dataset and the computational complexity - restriction in time and resource consumption

These goals are usually mutually exclusive, because building comprehensive model usually assume reduced accuracy, also mining big datasets assumes simplified model. But all these challenges have one thing in common - can be solved by appropriate information selection techniques. The information selection includes - instance and attributes filtering such that

- accuracy can be improved by rejecting outliers from the data
- accuracy can be improved also by feature selection techniques
- comprehensibility can be achieved by the so called prototype based rules
- the big data challenge can be also solved by preselecting information and reducing the size of the datasets to became fusible to be used for training a classical algorithm

# Why Prototype-Based Rules

**Prototype-based rules** or shortly **P-Rules** is a concept to represent knowledge as a set of rules, however instead of classical propositional logic we propose a logic which is based on reference examples also called prototypes.

In this approach single rule is defined as a reference point (prototype) and some distance measure with or without a threshold.

In general **P-Rules** can be divided into two separate concepts:

*Nearest neighbor rules*- this approach is based on a simple nearest neighbor algorithm*Prototype threshold rules*- where each rule is independent from the others, such that each rule is defined as a subspace surrounding prototype and limited by the threshold.