DecisionTrees

Decision trees - Arbori de decizie

Problema if-uri

alt text

Task: De implementat un algoritm cu if-uri conform schemei de mai sus

Ș### Problema Decision trees

(e imagine aici muahahahaha)

alt text

TODO:

png

Clasificarea setului de date despre ciuperci - practica

Acest set de date include descrierile eșantioanelor ipotetice corespunzătoare a 23 de specii de ciuperci. Fiecare specie este identificată ca fiind definitiv comestibilă, definitiv otrăvitoare sau de comestibilitate necunoscută și nu este recomandată. Această ultimă clasă a fost combinată cu cea otrăvitoare. Ghidul precizează clar că nu există o regulă simplă pentru a determina comestibilitatea unei ciuperci;

descrierea coloanelor

Vom prezice coloana "class", care poate avea 2 valori:

  • 'e' - edible (comestibil) sau

  • 'p' - 'poisonous' (otravitor)

Importam cateva librarii necesare

Încărcăm setul de date

class

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

0

p

x

s

n

t

p

f

c

n

k

e

e

s

s

w

w

p

w

o

p

k

s

u

1

e

x

s

y

t

a

f

c

b

k

e

c

s

s

w

w

p

w

o

p

n

n

g

2

e

b

s

w

t

l

f

c

b

n

e

c

s

s

w

w

p

w

o

p

n

n

m

3

p

x

y

w

t

p

f

c

n

n

e

e

s

s

w

w

p

w

o

p

k

s

u

4

e

x

s

g

f

n

f

w

b

k

t

e

s

s

w

w

p

w

o

e

n

a

g

class

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

count

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

8124

unique

2

6

4

10

2

9

2

2

2

12

2

5

4

4

9

9

1

4

3

5

9

6

7

top

e

x

y

n

f

n

f

c

b

b

t

b

s

s

w

w

p

w

o

p

w

v

d

freq

4208

3656

3244

2284

4748

3528

7914

6812

5612

1728

4608

3776

5176

4936

4464

4384

8124

7924

7488

3968

2388

4040

3148

  • Ce concluzii deducem?

  • Ce tipuri de date avem?

  • Avem date lipsa?

Vizualizam datele

png

Preprocesăm datele

Date categoriale - Encoding

mai intai vom face o copie a datelor

class

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

0

1

5

2

4

1

6

1

0

1

4

0

3

2

2

7

7

0

2

1

4

2

3

5

1

0

5

2

9

1

0

1

0

0

4

0

2

2

2

7

7

0

2

1

4

3

2

1

2

0

0

2

8

1

3

1

0

0

5

0

2

2

2

7

7

0

2

1

4

3

2

3

3

1

5

3

8

1

6

1

0

1

5

0

3

2

2

7

7

0

2

1

4

2

3

5

4

0

5

2

3

0

5

1

1

0

4

1

3

2

2

7

7

0

2

1

0

3

0

1

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

Separam datele de antrenare de clase

printam numele coloanelor mai intai

  • X-ul va contine features (caracteristici) - toate coloanele in afara de clase

  • Y-ul va contine doar denumirile claselor

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

0

5

2

4

1

6

1

0

1

4

0

3

2

2

7

7

0

2

1

4

2

3

5

1

5

2

9

1

0

1

0

0

4

0

2

2

2

7

7

0

2

1

4

3

2

1

2

0

2

8

1

3

1

0

0

5

0

2

2

2

7

7

0

2

1

4

3

2

3

3

5

3

8

1

6

1

0

1

5

0

3

2

2

7

7

0

2

1

4

2

3

5

4

5

2

3

0

5

1

1

0

4

1

3

2

2

7

7

0

2

1

0

3

0

1

class

0

1

1

0

2

0

3

1

4

0

Train test split

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Construim modelul

Antrenam modoelul

Plot decision tree

Evaluarea modelului

image.png
image.png

Cross-validation score

image.png

Tuning - ajustarea modelului

Parametri

Ce parametri avem pentru DecisionTreeClassifier?

  • criterion {“gini”, “entropy”}, default=”gini” The function to measure the quality of a split

  • splitter {“best”, “random”}, default=”best” The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depthint, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • random_stateint*, RandomState instance, default=None Controls the randomness of the estimator

  • max_leaf_nodesint, default=None Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • class_weightdict, list of dict or “balanced”, default=None Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Mai multi parametri aici: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

Train test validation split

Ajustam modelul cu diversi parametri

png
png
png

alegem cei mai buni parametri

Interpretarea rezultatelor

png

sa ne uitam la niste exemple

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

y_true

y_pred

1971

2

0

4

0

5

1

1

0

3

1

3

2

0

7

7

0

2

1

0

3

3

1

0

0

6654

2

2

2

0

8

1

0

1

0

1

0

2

2

6

6

0

2

1

0

7

4

2

1

1

5606

5

3

4

0

2

1

0

1

0

1

0

1

2

7

6

0

2

1

0

7

4

2

1

1

3332

2

3

3

1

5

1

0

0

5

1

1

2

2

3

6

0

2

1

4

3

5

0

0

0

6988

2

2

2

0

7

1

0

1

0

1

0

2

2

6

6

0

2

1

0

7

4

2

1

1

image.png

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

y_true

y_pred

TP

6654

2

2

2

0

8

1

0

1

0

1

0

2

2

6

6

0

2

1

0

7

4

2

1

1

True

5606

5

3

4

0

2

1

0

1

0

1

0

1

2

7

6

0

2

1

0

7

4

2

1

1

True

6988

2

2

2

0

7

1

0

1

0

1

0

2

2

6

6

0

2

1

0

7

4

2

1

1

True

5761

5

3

4

0

8

1

0

1

0

1

0

1

2

7

6

0

2

1

0

7

4

2

1

1

True

5798

5

2

3

1

2

1

0

0

3

1

1

2

0

7

7

0

2

1

4

1

3

5

1

1

True

cap-shape

cap-surface

cap-color

bruises

odor

gill-attachment

gill-spacing

gill-size

gill-color

stalk-shape

stalk-root

stalk-surface-above-ring

stalk-surface-below-ring

stalk-color-above-ring

stalk-color-below-ring

veil-type

veil-color

ring-number

ring-type

spore-print-color

population

habitat

y_true

y_pred

count

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.000000

1302.0

1302.0

1302.000000

1302.000000

1302.000000

1302.00000

1302.000000

1302.0

1302.0

mean

3.447773

2.058372

4.404762

0.168203

3.951613

0.996160

0.026882

0.565284

2.920891

0.505376

0.710445

1.360215

1.402458

5.505376

5.506912

0.0

2.0

1.011521

1.559140

3.969278

4.02381

1.905530

1.0

1.0

std

1.439518

1.102566

2.640617

0.374190

2.571043

0.061874

0.161800

0.495910

3.303837

0.500163

0.809170

0.561416

0.588849

2.196962

2.193807

0.0

0.0

0.163614

1.565615

2.839110

0.59479

1.805573

0.0

0.0

min

0.000000

0.000000

0.000000

0.000000

1.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.000000

0.0

2.0

0.000000

0.000000

1.000000

1.00000

0.000000

1.0

1.0

25%

2.000000

2.000000

2.000000

0.000000

2.000000

1.000000

0.000000

0.000000

0.000000

0.000000

0.000000

1.000000

1.000000

6.000000

6.000000

0.0

2.0

1.000000

0.000000

1.000000

4.00000

0.000000

1.0

1.0

50%

3.000000

2.000000

4.000000

0.000000

2.000000

1.000000

0.000000

1.000000

2.000000

1.000000

1.000000

1.000000

1.000000

6.000000

6.000000

0.0

2.0

1.000000

2.000000

3.000000

4.00000

1.000000

1.0

1.0

75%

5.000000

3.000000

5.000000

0.000000

7.000000

1.000000

0.000000

1.000000

7.000000

1.000000

1.000000

2.000000

2.000000

7.000000

7.000000

0.0

2.0

1.000000

2.000000

7.000000

4.00000

4.000000

1.0

1.0

max

5.000000

3.000000

9.000000

1.000000

8.000000

1.000000

1.000000

1.000000

11.000000

1.000000

3.000000

2.000000

3.000000

7.000000

8.000000

0.0

2.0

2.000000

4.000000

7.000000

5.00000

5.000000

1.0

1.0

Boundaries plot

png

Last updated