init
This commit is contained in:
commit
74650b6912
210
.instructions/INSTRUCTIONS.mdoc
Normal file
210
.instructions/INSTRUCTIONS.mdoc
Normal file
@ -0,0 +1,210 @@
|
|||||||
|
{% steps %}
|
||||||
|
{% step title="Introduction to ML Model Generalization" %}
|
||||||
|
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
|
||||||
|
|
||||||
|
Welcome to the "Introduction to ML Model Generalization" lab!
|
||||||
|
This lab is designed to give you a foundational understanding of generalization in machine learning models, and how to prevent
|
||||||
|
over or under fitting in models.
|
||||||
|
|
||||||
|
|
||||||
|
### Learning Objectives
|
||||||
|
|
||||||
|
|
||||||
|
- Review generalization and the importance of not over or under fitting models.
|
||||||
|
- Practice implementing early learning cutoff and learning rate decay
|
||||||
|
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
|
||||||
|
Familiarity with basic ML principals and key concepts around, learning rates, and model structure.
|
||||||
|
|
||||||
|
|
||||||
|
{% /step %}
|
||||||
|
|
||||||
|
|
||||||
|
{% step title="Synthetic Data Generation" %}
|
||||||
|
|
||||||
|
|
||||||
|
### Synthetic Data Generation
|
||||||
|
Provided below is a basic function to create some synthetic data for classification. This data will have 2000 samples, each with 20
|
||||||
|
features, where 5 features do not affect the outcome of the classification and 15 are directly correlated to the classification. The
|
||||||
|
options for correct classification will only be between two options. Most importantly, the random state is defined to allow repeatability
|
||||||
|
of the model generation.
|
||||||
|
```python
|
||||||
|
X, y = make_classification(n_samples=2000,
|
||||||
|
n_features=20,
|
||||||
|
n_classes=2,
|
||||||
|
n_informative=15,
|
||||||
|
n_redundant=5,
|
||||||
|
random_state=42)
|
||||||
|
scaler = StandardScaler()
|
||||||
|
X = scaler.fit_transform(X)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Train/Validation/Test Splits
|
||||||
|
For splitting the data you will first split into the test and training/validation sets. From there you will split out training/validation
|
||||||
|
into their separate sets of training and validation, resulting in a data distribution of train (64%), val (16%), test (20%).
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||||
|
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=42)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
In practice it is best to define your test dataset before model creation begins and to keep it out of production environments entirely
|
||||||
|
to ensure there is no data leakage or over fitting of a model.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
{% /step %}
|
||||||
|
|
||||||
|
|
||||||
|
{% step title="Model and Generalization Feature Setup" %}
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
For this lab you will use a basic feed forward neural network because neural networks allow you to implement additional features
|
||||||
|
such as learning rate schedulers, and early stopping, that more traditional models such as linear regression do not have.
|
||||||
|
|
||||||
|
### Model Setup
|
||||||
|
For this
|
||||||
|
model you will use two dense layers of Relu activation functions, allowing for
|
||||||
|
more complex patterns to be learned, and ending the model with a sigmoid.
|
||||||
|
When setting up the model you could also include additional generalization techniques such as drop out, which
|
||||||
|
selectively turns off a certain percentage of neurons to ensure no single neuron within the neural net
|
||||||
|
learns to perform a single aspect of prediction.
|
||||||
|
|
||||||
|
**Note:** RELU is used to introduce non linearity into a neural networks learning, and sigmoid is used as a classification function
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
model = tf.keras.Sequential([
|
||||||
|
tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
|
||||||
|
tf.keras.layers.Dense(32, activation='relu'),
|
||||||
|
tf.keras.layers.Dense(1, activation='sigmoid')
|
||||||
|
])
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Loss function and Model Initialization Parameters
|
||||||
|
For this lab you will be using the Adam optimizer as Adam is a good starting optimizer for most problems.
|
||||||
|
Adam takes one parameter which is the starting learning rate, Most
|
||||||
|
models begin with a learning rate under 0.3 and usually closer to 0.1 at most. Here you also define the loss function
|
||||||
|
as ```binary_crossentropy``` which is a simple loss function that just compares is the predicted values match
|
||||||
|
the actual values.
|
||||||
|
```python
|
||||||
|
model.compile(
|
||||||
|
optimizer=tf.keras.optimizers.Adam(learning_rate=0.05),
|
||||||
|
loss='binary_crossentropy',
|
||||||
|
metrics=['accuracy']
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### Learning Rate Scheduler
|
||||||
|
For your learning rate scheduler in this lab you will be using "ReduceLROnPlateau" from the keras library which sets the learning rate
|
||||||
|
decay to plateau at a certain amount, ensuring the model never slows to a halt during training. Below the function parameters are defined:
|
||||||
|
- ```val_loss``` is the loss applied on the validation training set
|
||||||
|
- ```factor``` is the factor as which the learning rate will be cut
|
||||||
|
- ```patience``` is the number of epochs between learning rate reductions
|
||||||
|
- ```min_lr``` defines the lowest possible value of learning rate
|
||||||
|
- ``` verbose``` has 3 possible values, 0 which returns nothing, 1 which returns a progress bar,
|
||||||
|
and 2 which displays the values for each epoch individually as its own line.
|
||||||
|
|
||||||
|
```python
|
||||||
|
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
|
||||||
|
monitor='val_loss', # metric to monitor
|
||||||
|
factor=0.5, # reduce by a factor
|
||||||
|
patience=2, # wait 2 epochs before reducing LR
|
||||||
|
min_lr=1e-5, # don't reduce below this
|
||||||
|
verbose=1
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Early Stopping
|
||||||
|
|
||||||
|
|
||||||
|
Next in the code you will implement the early stopping aspect of the model. This function will prevent the model from over fitting by
|
||||||
|
returning to a previous model's parameter in the training if the monitored value does not increase by a specific increment. In your case
|
||||||
|
The monitored value is the validation loss again. A patience of 3 is defined, state the model has 3 times of not incrementing by a
|
||||||
|
great enough change before the previous model that did, is selected as the final model. ```min_delta``` defines how much the value
|
||||||
|
needs to change to be considered a large enough value difference to keep the model. Finally the ```restore_best_weights``` set to true
|
||||||
|
allows the model to restore to the last model that performed the best, in cases where the models ```min_delta``` was not met. This
|
||||||
|
functionality is important to ensure the model does not overfit to the training data and keeps some aspect of generalization.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
early_stop = tf.keras.callbacks.EarlyStopping(
|
||||||
|
monitor='val_loss',
|
||||||
|
patience=3,
|
||||||
|
min_delta=0.01, # minimum change to be considered an improvement
|
||||||
|
restore_best_weights=True,
|
||||||
|
verbose=1
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
{% /step %}
|
||||||
|
|
||||||
|
|
||||||
|
{% step title="Model training" %}
|
||||||
|
|
||||||
|
|
||||||
|
### Model Training
|
||||||
|
|
||||||
|
|
||||||
|
Finally onto the model training, you will use the basic fit method and set the validation set in the hyper parameters, this lets the
|
||||||
|
```val_loss``` correctly be used by the learning rate schedule and the early stopping mechanism. For this case the epochs defaulted
|
||||||
|
to 100 and the verbose is set as 2, this will ensure you have plenty of epochs to end early and the line by line model training information
|
||||||
|
can help you better understand the values of ```val_loss``` and how they are changing per epoch. As you run the model pay close attention
|
||||||
|
to the change in ```val_loss``` and how it correlates to when the model initiates early stopping and rolling back to previous models.
|
||||||
|
```python
|
||||||
|
model.fit(
|
||||||
|
X_train, y_train,
|
||||||
|
validation_data=(X_val, y_val),
|
||||||
|
epochs=100,
|
||||||
|
callbacks=[early_stop, lr_scheduler], # your custom early stopping + LR scheduler
|
||||||
|
verbose=2
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
{% /step %}
|
||||||
|
|
||||||
|
|
||||||
|
{% step title="Evaluating Model Results" %}
|
||||||
|
|
||||||
|
|
||||||
|
### Evaluating Model Results
|
||||||
|
|
||||||
|
|
||||||
|
The following code provides a basic metric test of your neural network. Depending on the domain of the model different levels of accuracy
|
||||||
|
are acceptable. It's more important to see a considerable increase in accuracy in predictions compared to existing methods, than it is
|
||||||
|
to hit a particular threshold of accuracy. Accuracy above 99.5% for validation can be a bit concerning as it may be a sign of over fitting,
|
||||||
|
and an accuracy below previous methods may be a sign of under fitting.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
y_pred_probs = model.predict(X_test).flatten()
|
||||||
|
y_pred = (y_pred_probs >= 0.5).astype(int)
|
||||||
|
|
||||||
|
|
||||||
|
print("\n Test Set Evaluation:")
|
||||||
|
print(classification_report(y_test, y_pred))
|
||||||
|
print("Confusion Matrix:")
|
||||||
|
print(confusion_matrix(y_test, y_pred))
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
{% /step %}
|
||||||
|
{% /steps %}
|
||||||
|
|
||||||
22
.vscode/settings.json
vendored
Normal file
22
.vscode/settings.json
vendored
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
{
|
||||||
|
"terminal.integrated.fontSize": 15,
|
||||||
|
"editor.fontSize": 15,
|
||||||
|
"terminal.integrated.defaultProfile.linux": "bash",
|
||||||
|
"workbench.colorTheme": "Default Dark Modern",
|
||||||
|
"workbench.startupEditor": "none",
|
||||||
|
"files.associations": {
|
||||||
|
"*.md": "markdoc"
|
||||||
|
},
|
||||||
|
"workspace": {
|
||||||
|
"view": "readme",
|
||||||
|
"terminals": [
|
||||||
|
{
|
||||||
|
"name": "Terminal",
|
||||||
|
"active": false
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"files": [
|
||||||
|
"./lab.ipynb"
|
||||||
|
],
|
||||||
|
}
|
||||||
|
}
|
||||||
136
README.md
Normal file
136
README.md
Normal file
@ -0,0 +1,136 @@
|
|||||||
|
# ML Model Generalization Lab
|
||||||
|
|
||||||
|
## Objectif du Lab
|
||||||
|
|
||||||
|
Ce lab démontre les techniques essentielles pour améliorer la **généralisation** d'un modèle de Machine Learning et éviter l'**overfitting** (surapprentissage).
|
||||||
|
|
||||||
|
## Concepts Clés
|
||||||
|
|
||||||
|
### 1. Split des Données (Train/Validation/Test)
|
||||||
|
|
||||||
|
```
|
||||||
|
Total: 2000 échantillons
|
||||||
|
├── Train: 64% (1280 échantillons) - entraînement du modèle
|
||||||
|
├── Validation: 16% (320 échantillons) - ajustement des hyperparamètres
|
||||||
|
└── Test: 20% (400 échantillons) - évaluation finale
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pourquoi 3 splits ?**
|
||||||
|
- **Train** : apprend les patterns
|
||||||
|
- **Validation** : détecte l'overfitting pendant l'entraînement
|
||||||
|
- **Test** : mesure la performance réelle sur des données jamais vues
|
||||||
|
|
||||||
|
### 2. Early Stopping
|
||||||
|
|
||||||
|
```python
|
||||||
|
EarlyStopping(
|
||||||
|
monitor='val_loss',
|
||||||
|
patience=3,
|
||||||
|
min_delta=0.01,
|
||||||
|
restore_best_weights=True
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rôle** : Arrête l'entraînement quand la `val_loss` ne s'améliore plus
|
||||||
|
- Évite l'overfitting en stoppant avant que le modèle ne "mémorise" les données
|
||||||
|
- Restaure les meilleurs poids (epoch 7 dans notre cas)
|
||||||
|
|
||||||
|
### 3. Learning Rate Scheduler
|
||||||
|
|
||||||
|
```python
|
||||||
|
ReduceLROnPlateau(
|
||||||
|
monitor='val_loss',
|
||||||
|
factor=0.5,
|
||||||
|
patience=2,
|
||||||
|
min_lr=1e-5
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rôle** : Réduit le learning rate quand l'apprentissage stagne
|
||||||
|
- Learning rate initial : 0.05
|
||||||
|
- Réduction par 2 après 2 epochs sans amélioration
|
||||||
|
- Permet une convergence plus fine vers l'optimum
|
||||||
|
|
||||||
|
### 4. Architecture du Réseau
|
||||||
|
|
||||||
|
```
|
||||||
|
Input (20 features)
|
||||||
|
↓
|
||||||
|
Dense(64, relu)
|
||||||
|
↓
|
||||||
|
Dense(32, relu)
|
||||||
|
↓
|
||||||
|
Dense(1, sigmoid) → Probabilité binaire
|
||||||
|
```
|
||||||
|
|
||||||
|
Architecture simple mais efficace pour la classification binaire.
|
||||||
|
|
||||||
|
## Résultats Obtenus
|
||||||
|
|
||||||
|
### Métriques de Performance
|
||||||
|
|
||||||
|
| Métrique | Valeur |
|
||||||
|
|----------|--------|
|
||||||
|
| Accuracy | 97% |
|
||||||
|
| Precision (classe 0) | 95% |
|
||||||
|
| Precision (classe 1) | 99% |
|
||||||
|
| Recall (classe 0) | 99% |
|
||||||
|
| Recall (classe 1) | 95% |
|
||||||
|
|
||||||
|
### Matrice de Confusion
|
||||||
|
|
||||||
|
```
|
||||||
|
Prédictions
|
||||||
|
0 1
|
||||||
|
Réel 0 [205 2]
|
||||||
|
1 [ 10 183]
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Vrais positifs** : 183 + 205 = 388
|
||||||
|
- **Faux positifs** : 2 + 10 = 12
|
||||||
|
- **Taux d'erreur** : 3% seulement
|
||||||
|
|
||||||
|
## Ce qu'il faut Retenir
|
||||||
|
|
||||||
|
### ✅ Bonnes Pratiques Appliquées
|
||||||
|
|
||||||
|
1. **Toujours séparer les données** en 3 ensembles distincts
|
||||||
|
2. **Utiliser la validation** pour monitorer l'overfitting en temps réel
|
||||||
|
3. **Early stopping** est crucial pour éviter le surapprentissage
|
||||||
|
4. **Learning rate adaptatif** améliore la convergence
|
||||||
|
5. **Normalisation** des features avec StandardScaler pour stabiliser l'apprentissage
|
||||||
|
|
||||||
|
### 📊 Signes d'une Bonne Généralisation
|
||||||
|
|
||||||
|
- ✅ Performance similaire sur train et test
|
||||||
|
- ✅ Val_loss se stabilise sans diverger
|
||||||
|
- ✅ Le modèle s'arrête avant de surapprendre (epoch 10/100)
|
||||||
|
- ✅ Métriques équilibrées entre les classes
|
||||||
|
|
||||||
|
### ⚠️ Signes d'Overfitting (absents ici)
|
||||||
|
|
||||||
|
- ❌ Train accuracy >> Test accuracy
|
||||||
|
- ❌ Val_loss augmente alors que train_loss diminue
|
||||||
|
- ❌ Performance dégradée sur nouvelles données
|
||||||
|
|
||||||
|
## Exécution
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Activer l'environnement virtuel
|
||||||
|
source venv/bin/activate
|
||||||
|
|
||||||
|
# Lancer Jupyter
|
||||||
|
jupyter notebook lab.ipynb
|
||||||
|
```
|
||||||
|
|
||||||
|
## Technologies Utilisées
|
||||||
|
|
||||||
|
- **TensorFlow/Keras** : construction et entraînement du réseau de neurones
|
||||||
|
- **Scikit-learn** : génération de données, preprocessing, métriques
|
||||||
|
- **Python 3.12** : langage de programmation
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Ce lab illustre qu'un modèle bien régularisé avec early stopping et learning rate scheduling peut atteindre d'excellentes performances (97%) tout en généralisant correctement sur des données non vues.
|
||||||
|
|
||||||
|
**Principe fondamental** : Un bon modèle ne mémorise pas les données, il apprend les patterns généraux.
|
||||||
291
lab.ipynb
Normal file
291
lab.ipynb
Normal file
@ -0,0 +1,291 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"2025-11-12 17:18:24.255077: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.\n",
|
||||||
|
"2025-11-12 17:18:24.312342: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
|
||||||
|
"To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
|
||||||
|
"2025-11-12 17:18:25.689783: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import tensorflow as tf\n",
|
||||||
|
"from sklearn.datasets import make_classification\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"from sklearn.preprocessing import StandardScaler\n",
|
||||||
|
"from sklearn.metrics import classification_report, confusion_matrix"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Synthetic Data Generation"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# synthetic data generation of 2000 samples\n",
|
||||||
|
"X, y = make_classification(n_samples=2000,\n",
|
||||||
|
" n_features=20, \n",
|
||||||
|
" n_classes=2, \n",
|
||||||
|
" n_informative=15, \n",
|
||||||
|
" n_redundant=5, \n",
|
||||||
|
" random_state=42)\n",
|
||||||
|
"scaler = StandardScaler()\n",
|
||||||
|
"X = scaler.fit_transform(X)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Train/Validation/Test Splits"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Split into train (64%), val (16%), test (20%)\n",
|
||||||
|
"X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
|
||||||
|
"X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=42)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Model Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 11,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Feed Forward Neural Network Initalization\n",
|
||||||
|
"model = tf.keras.Sequential([\n",
|
||||||
|
" tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),\n",
|
||||||
|
" tf.keras.layers.Dense(32, activation='relu'),\n",
|
||||||
|
" tf.keras.layers.Dense(1, activation='sigmoid')\n",
|
||||||
|
"])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Training Hyperparameters Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#optimizer and loss setup\n",
|
||||||
|
"model.compile(\n",
|
||||||
|
" optimizer=tf.keras.optimizers.Adam(learning_rate=0.05),\n",
|
||||||
|
" loss='binary_crossentropy',\n",
|
||||||
|
" metrics=['accuracy']\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Learning Rate Scheduler"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 6,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Learning rate scheduler\n",
|
||||||
|
"lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(\n",
|
||||||
|
" monitor='val_loss', # metric to monitor\n",
|
||||||
|
" factor=0.5, # reduce by a factor\n",
|
||||||
|
" patience=2, # wait 2 epochs before reducing LR\n",
|
||||||
|
" min_lr=1e-5, # don't reduce below this\n",
|
||||||
|
" verbose=1\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Early Stopping Logic"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 7,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# 3. Early stopping callback with patience and loss threshold\n",
|
||||||
|
"early_stop = tf.keras.callbacks.EarlyStopping(\n",
|
||||||
|
" monitor='val_loss',\n",
|
||||||
|
" patience=3,\n",
|
||||||
|
" min_delta=0.01, # minimum change to be considered an improvement\n",
|
||||||
|
" restore_best_weights=True,\n",
|
||||||
|
" verbose=1\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Model Training"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 8,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Epoch 1/100\n",
|
||||||
|
"40/40 - 1s - 29ms/step - accuracy: 0.8359 - loss: 0.3691 - val_accuracy: 0.9187 - val_loss: 0.2269 - learning_rate: 0.0500\n",
|
||||||
|
"Epoch 2/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9102 - loss: 0.2240 - val_accuracy: 0.9438 - val_loss: 0.1643 - learning_rate: 0.0500\n",
|
||||||
|
"Epoch 3/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9477 - loss: 0.1400 - val_accuracy: 0.9531 - val_loss: 0.1484 - learning_rate: 0.0500\n",
|
||||||
|
"Epoch 4/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9547 - loss: 0.1338 - val_accuracy: 0.9344 - val_loss: 0.1857 - learning_rate: 0.0500\n",
|
||||||
|
"Epoch 5/100\n",
|
||||||
|
"\n",
|
||||||
|
"Epoch 5: ReduceLROnPlateau reducing learning rate to 0.02500000037252903.\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9555 - loss: 0.1402 - val_accuracy: 0.9219 - val_loss: 0.1695 - learning_rate: 0.0500\n",
|
||||||
|
"Epoch 6/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9688 - loss: 0.0904 - val_accuracy: 0.9656 - val_loss: 0.1186 - learning_rate: 0.0250\n",
|
||||||
|
"Epoch 7/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9812 - loss: 0.0491 - val_accuracy: 0.9688 - val_loss: 0.1048 - learning_rate: 0.0250\n",
|
||||||
|
"Epoch 8/100\n",
|
||||||
|
"40/40 - 0s - 4ms/step - accuracy: 0.9922 - loss: 0.0317 - val_accuracy: 0.9563 - val_loss: 0.1213 - learning_rate: 0.0250\n",
|
||||||
|
"Epoch 9/100\n",
|
||||||
|
"\n",
|
||||||
|
"Epoch 9: ReduceLROnPlateau reducing learning rate to 0.012500000186264515.\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9922 - loss: 0.0220 - val_accuracy: 0.9625 - val_loss: 0.1212 - learning_rate: 0.0250\n",
|
||||||
|
"Epoch 10/100\n",
|
||||||
|
"40/40 - 0s - 3ms/step - accuracy: 0.9953 - loss: 0.0177 - val_accuracy: 0.9563 - val_loss: 0.1283 - learning_rate: 0.0125\n",
|
||||||
|
"Epoch 10: early stopping\n",
|
||||||
|
"Restoring model weights from the end of the best epoch: 7.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"<keras.src.callbacks.history.History at 0x7f6c3ff320c0>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 8,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# 4. Train the model\n",
|
||||||
|
"model.fit(\n",
|
||||||
|
" X_train, y_train,\n",
|
||||||
|
" validation_data=(X_val, y_val),\n",
|
||||||
|
" epochs=100,\n",
|
||||||
|
" callbacks=[early_stop, lr_scheduler], # your custom early stopping + LR scheduler\n",
|
||||||
|
" verbose=2\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"evaluation metrics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 9,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"\u001b[1m13/13\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 4ms/step \n",
|
||||||
|
"\n",
|
||||||
|
" Test Set Evaluation:\n",
|
||||||
|
" precision recall f1-score support\n",
|
||||||
|
"\n",
|
||||||
|
" 0 0.95 0.99 0.97 207\n",
|
||||||
|
" 1 0.99 0.95 0.97 193\n",
|
||||||
|
"\n",
|
||||||
|
" accuracy 0.97 400\n",
|
||||||
|
" macro avg 0.97 0.97 0.97 400\n",
|
||||||
|
"weighted avg 0.97 0.97 0.97 400\n",
|
||||||
|
"\n",
|
||||||
|
"Confusion Matrix:\n",
|
||||||
|
"[[205 2]\n",
|
||||||
|
" [ 10 183]]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# 5. Evaluate on test set\n",
|
||||||
|
"y_pred_probs = model.predict(X_test).flatten()\n",
|
||||||
|
"y_pred = (y_pred_probs >= 0.5).astype(int)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"\\n Test Set Evaluation:\")\n",
|
||||||
|
"print(classification_report(y_test, y_pred))\n",
|
||||||
|
"print(\"Confusion Matrix:\")\n",
|
||||||
|
"print(confusion_matrix(y_test, y_pred))"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "venv",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.12.3"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
Loading…
x
Reference in New Issue
Block a user