Ajout du lab de workflows reproductibles avec documentation complète

- Implémentation du notebook lab.ipynb avec code complet pour créer des workflows d'IA reproductibles
- Ajout d'un README.md pédagogique de 600+ lignes en français
- Configuration des graines aléatoires pour la reproductibilité
- Implémentation de la génération, division, normalisation et sauvegarde des données
- Création et entraînement d'un réseau de neurones avec TensorFlow/Keras
- Démonstration du rechargement et de la vérification de la reproductibilité

🤖 Generated with [Claude Code](https://claude.com/claude-code)
This commit is contained in:
spham 2025-11-14 12:13:54 +01:00
commit 8dfd897cac
4 changed files with 1340 additions and 0 deletions

View File

@ -0,0 +1,102 @@
{% steps %}
{% step title="Introduction to Reproducible Workflows" %}
### Introduction
Welcome to the "Introduction to Reproducible Workflows" lab!
This lab is designed to give you a foundational understanding of creating reproducible workflows for training an AI model,
its importance, and examples of key parts in model training to define fixed seeds.
### Learning Objectives
- Display key areas within AI model workflows to define fixed seeds
- Review saving datasets after train/test splits
- Practice recovering models and training datasets to repeat results
### Prerequisites
- A high level understanding of AI neural networks
- Experience in training models.
{% /step %}
{% step title="Creating Reproducible Datasets" %}
### Creating Reproducible Datasets
For you to create a repeatable workflow you will need to start by defining the random seeds. When you define the random seed, it
will determine how data is split, generated, and how these train/test datasets are fed into the model. Your starting seed
helps ensure that all aspects of the workflow are repeatable. The provided code below sets the random seed for several
different libraries used in model development.
```python
random.seed(42)
np.random.seed(42)
tf.random.set_seed(42)
```
From here you can define the synthetic data generation similar to how it is defined in other labs.
```python
X, y = make_classification(
n_samples=1000,
n_features=20,
n_informative=15,
n_redundant=5,
n_classes=2,
random_state=42
)
```
{% /step %}
{% step title="Training Reproducible Model" %}
### Training Reproducible Models
For the next step of model creation you will need to split your data into training and test datasets. The code provided below
allows for the initial dataset to be split into training and test sets. The key parameter for you to focus on here is the
```random_state``` which defines the random seed for the split. When you define the random seed, you are
ensuring future instances of the models training will result in the same training dataset being used, and therefore the
same model being created.
```python
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
You can then also save the `train` and `test` data splits to allow for reproducibility in instances where the dataset you are
using may be a subset of a larger dataset.
```python
joblib.dump((X_train, y_train), 'train_data.pkl')
joblib.dump((X_test, y_test), 'test_data.pkl')
```
With the datasets saved and reproducible you can move forward with defining and training your model. You will define and train
your model in this lab so you can evaluate it and compare it to a model trained on the same training dataset to ensure it's
properly reproducible. Within the lab all code is provided for
training and evaluating the model.
{% /step %}
{% step title="Saving Model and Dataset" %}
### Saving Model and Dataset
Now that your model has been trained and evaluated, you can save it with the following code
```python
model.save('my_model.keras')
```
From there you can move onto reproducing models which is meant to mimic an entirely new environment of loading a model into.
You can load the previously saved model and datasets using the following code below, including required imports.
```python
from tensorflow.keras.models import load_model
import joblib
modelReloaded = load_model('my_model.keras')
X_train_reloaded, y_train_reloaded = joblib.load('train_data.pkl')
X_test_reloaded, y_test_reloaded = joblib.load('test_data.pkl')
```
As you revaluate the model on the exact same training and test sets using the previously defined evaluation code, you should
see results that are exactly the same as the previously evaluted model.
```python
loss, accuracy = modelReloaded.evaluate(X_test, y_test)
print(f"Test accuracy: {accuracy:.2f}")
```
{% /step %}
{% /steps %}

22
.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,22 @@
{
"terminal.integrated.fontSize": 15,
"editor.fontSize": 15,
"terminal.integrated.defaultProfile.linux": "bash",
"workbench.colorTheme": "Default Dark Modern",
"workbench.startupEditor": "none",
"files.associations": {
"*.md": "markdoc"
},
"workspace": {
"view": "readme",
"terminals": [
{
"name": "Terminal",
"active": false
}
],
"files": [
"./lab.ipynb"
],
}
}

1050
README.md Normal file

File diff suppressed because it is too large Load Diff

166
lab.ipynb Normal file
View File

@ -0,0 +1,166 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from sklearn.datasets import make_classification\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"import random\n",
"import joblib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating Reproducible Datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# 1. Set random seeds for reproducibility\n# Définir les graines aléatoires pour la reproductibilité\nrandom.seed(42) # Pour le module random\nnp.random.seed(42) # Pour NumPy\ntf.random.set_seed(42) # Pour TensorFlow\n\nprint(\"✓ Graines aléatoires définies avec succès !\")"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# 2. Generate synthetic data\n# Générer des données synthétiques pour la classification\nX, y = make_classification(\n n_samples=1000, # 1000 exemples\n n_features=20, # 20 caractéristiques\n n_informative=15, # 15 caractéristiques utiles\n n_redundant=5, # 5 caractéristiques redondantes\n n_classes=2, # 2 classes (classification binaire)\n random_state=42 # Graine aléatoire pour reproductibilité\n)\n\nprint(f\"✓ Données générées : {X.shape[0]} exemples, {X.shape[1]} caractéristiques\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training Reproducible Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# 3. Split and scale the data\n# Diviser les données en ensembles d'entraînement (80%) et de test (20%)\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, \n test_size=0.2, # 20% pour le test\n random_state=42 # Important pour la reproductibilité !\n)\n\n# Normaliser les données (StandardScaler centre les données autour de 0)\nscaler = StandardScaler()\nX_train = scaler.fit_transform(X_train) # Apprendre et transformer les données d'entraînement\nX_test = scaler.transform(X_test) # Transformer les données de test\n\nprint(f\"✓ Données divisées :\")\nprint(f\" - Entraînement : {X_train.shape[0]} exemples\")\nprint(f\" - Test : {X_test.shape[0]} exemples\")"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# Sauvegarder les données d'entraînement et de test pour la reproductibilité\njoblib.dump((X_train, y_train), 'train_data.pkl')\njoblib.dump((X_test, y_test), 'test_data.pkl')\n\nprint(\"✓ Données sauvegardées :\")\nprint(\" - train_data.pkl\")\nprint(\" - test_data.pkl\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Initalization and Training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 4. Build a neural network\n",
"model = keras.Sequential([\n",
" keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),\n",
" keras.layers.Dense(16, activation='relu'),\n",
" keras.layers.Dense(1, activation='sigmoid')\n",
"])\n",
"\n",
"model.compile(\n",
" optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy']\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 5. Train the model\n",
"model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 6. Evaluate the model\n",
"loss, accuracy = model.evaluate(X_test, y_test)\n",
"print(f\"Test accuracy: {accuracy:.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Saving Models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# 7. Save the model and scaler\n# Sauvegarder le modèle entraîné\nmodel.save('my_model.keras')\n\nprint(\"✓ Modèle sauvegardé : my_model.keras\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reproducing Models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "#8. Reloading model later\n# Recharger le modèle et les données sauvegardées\nfrom tensorflow.keras.models import load_model\n\nmodelReloaded = load_model('my_model.keras')\nX_train_reloaded, y_train_reloaded = joblib.load('train_data.pkl')\nX_test_reloaded, y_test_reloaded = joblib.load('test_data.pkl')\n\nprint(\"✓ Modèle et données rechargés avec succès !\")"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# Vérifier que le modèle rechargé donne les mêmes résultats\nloss_reloaded, accuracy_reloaded = modelReloaded.evaluate(X_test_reloaded, y_test_reloaded)\nprint(f\"\\n🎯 Précision du modèle rechargé : {accuracy_reloaded:.2f}\")\nprint(\"\\n💡 Si la précision est identique à celle obtenue plus haut,\")\nprint(\" votre workflow est reproductible ! ✓\")"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}