Ajout du lab de workflows reproductibles avec documentation complète

- Implémentation du notebook lab.ipynb avec code complet pour créer des workflows d'IA reproductibles - Ajout d'un README.md pédagogique de 600+ lignes en français - Configuration des graines aléatoires pour la reproductibilité - Implémentation de la génération, division, normalisation et sauvegarde des données - Création et entraînement d'un réseau de neurones avec TensorFlow/Keras - Démonstration du rechargement et de la vérification de la reproductibilité 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-11-14 12:13:54 +01:00 · 2025-11-14 12:13:54 +01:00 · 8dfd897cac
commit 8dfd897cac
4 changed files with 1340 additions and 0 deletions
--- a/.instructions/INSTRUCTIONS.mdoc
+++ b/.instructions/INSTRUCTIONS.mdoc
@ -0,0 +1,102 @@
+{% steps %}
+{% step title="Introduction to Reproducible Workflows" %}
+
+###  Introduction
+
+Welcome to the "Introduction to Reproducible Workflows" lab! 
+This lab is designed to give you a foundational understanding of creating reproducible workflows for training an AI model,
+ its importance, and examples of key parts in model training to define fixed seeds.
+
+###   Learning Objectives
+
+- Display key areas within AI model workflows to define fixed seeds
+- Review saving datasets after train/test splits
+- Practice recovering models and training datasets to repeat results
+
+###   Prerequisites
+
+- A high level understanding of AI neural networks
+- Experience in training models.
+
+{% /step %}
+
+{% step title="Creating Reproducible Datasets" %}
+
+###  Creating Reproducible Datasets
+For you to create a repeatable workflow you will need to start by defining the random seeds. When you define the random seed, it 
+will determine how data is split, generated, and how these train/test datasets are fed into the model. Your starting seed 
+helps ensure that all aspects of the workflow are repeatable. The provided code below sets the random seed for several 
+different libraries used in model development.
+```python
+random.seed(42)
+np.random.seed(42)
+tf.random.set_seed(42)
+```
+From here you can define the synthetic data generation similar to how it is defined in other labs.
+```python
+X, y = make_classification(
+    n_samples=1000, 
+    n_features=20, 
+    n_informative=15,
+    n_redundant=5, 
+    n_classes=2, 
+    random_state=42
+)
+```
+{% /step %}
+
+{% step title="Training Reproducible Model" %}
+
+###  Training Reproducible Models
+For the next step of model creation you will need to split your data into training and test datasets. The code provided below 
+allows for the initial dataset to be split into training and test sets. The key parameter for you to focus on here is the 
+```random_state``` which defines the random seed for the split. When you define the random seed, you are 
+ensuring future instances of the models training will result in the same training dataset being used, and therefore the 
+same model being created.
+```python
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.2, random_state=42
+)
+
+scaler = StandardScaler()
+X_train = scaler.fit_transform(X_train)
+X_test = scaler.transform(X_test)
+```
+You can then also save the `train` and `test` data splits to allow for reproducibility in instances where the dataset you are 
+using may be a subset of a larger dataset.
+```python
+joblib.dump((X_train, y_train), 'train_data.pkl')
+joblib.dump((X_test, y_test), 'test_data.pkl')
+```
+
+With the datasets saved and reproducible you can move forward with defining and training your model. You will define and train 
+your model in this lab so you can evaluate it and compare it to a model trained on the same training dataset to ensure it's 
+properly reproducible. Within the lab all code is provided for 
+training and evaluating the model.
+{% /step %}
+
+{% step title="Saving Model and Dataset" %}
+
+###  Saving Model and Dataset
+Now that your model has been trained and evaluated, you can save it with the following code 
+```python
+model.save('my_model.keras')
+```
+From there you can move onto reproducing models which is meant to mimic an entirely new environment of loading a model into.
+You can load the previously saved model and datasets using the following code below, including required imports.
+```python
+from tensorflow.keras.models import load_model
+import joblib
+modelReloaded = load_model('my_model.keras')
+X_train_reloaded, y_train_reloaded = joblib.load('train_data.pkl')
+X_test_reloaded, y_test_reloaded = joblib.load('test_data.pkl')
+```
+As you revaluate the model on the exact same training and test sets using the previously defined evaluation code, you should 
+see results that are exactly the same as the previously evaluted model. 
+```python
+loss, accuracy = modelReloaded.evaluate(X_test, y_test)
+print(f"Test accuracy: {accuracy:.2f}")
+```
+
+{% /step %}
+{% /steps %}
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@ -0,0 +1,22 @@
+{
+    "terminal.integrated.fontSize": 15,
+      "editor.fontSize": 15,
+      "terminal.integrated.defaultProfile.linux": "bash",
+      "workbench.colorTheme": "Default Dark Modern",
+      "workbench.startupEditor": "none",
+      "files.associations": {
+          "*.md": "markdoc"
+      },
+      "workspace": {
+          "view": "readme",
+          "terminals": [
+              {
+                  "name": "Terminal",
+                  "active": false
+              }
+          ],
+          "files": [
+              "./lab.ipynb"
+              ],
+      }
+  }
--- a/README.md
+++ b/README.md
--- a/lab.ipynb
+++ b/lab.ipynb
@ -0,0 +1,166 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from sklearn.datasets import make_classification\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "import random\n",
+    "import joblib"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Creating Reproducible Datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# 1. Set random seeds for reproducibility\n# Définir les graines aléatoires pour la reproductibilité\nrandom.seed(42)           # Pour le module random\nnp.random.seed(42)        # Pour NumPy\ntf.random.set_seed(42)    # Pour TensorFlow\n\nprint(\"✓ Graines aléatoires définies avec succès !\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# 2. Generate synthetic data\n# Générer des données synthétiques pour la classification\nX, y = make_classification(\n    n_samples=1000,        # 1000 exemples\n    n_features=20,         # 20 caractéristiques\n    n_informative=15,      # 15 caractéristiques utiles\n    n_redundant=5,         # 5 caractéristiques redondantes\n    n_classes=2,           # 2 classes (classification binaire)\n    random_state=42        # Graine aléatoire pour reproductibilité\n)\n\nprint(f\"✓ Données générées : {X.shape[0]} exemples, {X.shape[1]} caractéristiques\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Training Reproducible Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# 3. Split and scale the data\n# Diviser les données en ensembles d'entraînement (80%) et de test (20%)\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, \n    test_size=0.2,         # 20% pour le test\n    random_state=42        # Important pour la reproductibilité !\n)\n\n# Normaliser les données (StandardScaler centre les données autour de 0)\nscaler = StandardScaler()\nX_train = scaler.fit_transform(X_train)  # Apprendre et transformer les données d'entraînement\nX_test = scaler.transform(X_test)        # Transformer les données de test\n\nprint(f\"✓ Données divisées :\")\nprint(f\"  - Entraînement : {X_train.shape[0]} exemples\")\nprint(f\"  - Test : {X_test.shape[0]} exemples\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# Sauvegarder les données d'entraînement et de test pour la reproductibilité\njoblib.dump((X_train, y_train), 'train_data.pkl')\njoblib.dump((X_test, y_test), 'test_data.pkl')\n\nprint(\"✓ Données sauvegardées :\")\nprint(\"  - train_data.pkl\")\nprint(\"  - test_data.pkl\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Model Initalization and Training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 4. Build a neural network\n",
+    "model = keras.Sequential([\n",
+    "    keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),\n",
+    "    keras.layers.Dense(16, activation='relu'),\n",
+    "    keras.layers.Dense(1, activation='sigmoid')\n",
+    "])\n",
+    "\n",
+    "model.compile(\n",
+    "    optimizer='adam',\n",
+    "    loss='binary_crossentropy',\n",
+    "    metrics=['accuracy']\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 5. Train the model\n",
+    "model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 6. Evaluate the model\n",
+    "loss, accuracy = model.evaluate(X_test, y_test)\n",
+    "print(f\"Test accuracy: {accuracy:.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Saving Models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# 7. Save the model and scaler\n# Sauvegarder le modèle entraîné\nmodel.save('my_model.keras')\n\nprint(\"✓ Modèle sauvegardé : my_model.keras\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Reproducing Models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "#8. Reloading model later\n# Recharger le modèle et les données sauvegardées\nfrom tensorflow.keras.models import load_model\n\nmodelReloaded = load_model('my_model.keras')\nX_train_reloaded, y_train_reloaded = joblib.load('train_data.pkl')\nX_test_reloaded, y_test_reloaded = joblib.load('test_data.pkl')\n\nprint(\"✓ Modèle et données rechargés avec succès !\")"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# Vérifier que le modèle rechargé donne les mêmes résultats\nloss_reloaded, accuracy_reloaded = modelReloaded.evaluate(X_test_reloaded, y_test_reloaded)\nprint(f\"\\n🎯 Précision du modèle rechargé : {accuracy_reloaded:.2f}\")\nprint(\"\\n💡 Si la précision est identique à celle obtenue plus haut,\")\nprint(\"   votre workflow est reproductible ! ✓\")"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}