- Implémentation du notebook lab.ipynb avec code complet pour créer des workflows d'IA reproductibles - Ajout d'un README.md pédagogique de 600+ lignes en français - Configuration des graines aléatoires pour la reproductibilité - Implémentation de la génération, division, normalisation et sauvegarde des données - Création et entraînement d'un réseau de neurones avec TensorFlow/Keras - Démonstration du rechargement et de la vérification de la reproductibilité 🤖 Generated with [Claude Code](https://claude.com/claude-code)
103 lines
4.0 KiB
Plaintext
103 lines
4.0 KiB
Plaintext
{% steps %}
|
|
{% step title="Introduction to Reproducible Workflows" %}
|
|
|
|
### Introduction
|
|
|
|
Welcome to the "Introduction to Reproducible Workflows" lab!
|
|
This lab is designed to give you a foundational understanding of creating reproducible workflows for training an AI model,
|
|
its importance, and examples of key parts in model training to define fixed seeds.
|
|
|
|
### Learning Objectives
|
|
|
|
- Display key areas within AI model workflows to define fixed seeds
|
|
- Review saving datasets after train/test splits
|
|
- Practice recovering models and training datasets to repeat results
|
|
|
|
### Prerequisites
|
|
|
|
- A high level understanding of AI neural networks
|
|
- Experience in training models.
|
|
|
|
{% /step %}
|
|
|
|
{% step title="Creating Reproducible Datasets" %}
|
|
|
|
### Creating Reproducible Datasets
|
|
For you to create a repeatable workflow you will need to start by defining the random seeds. When you define the random seed, it
|
|
will determine how data is split, generated, and how these train/test datasets are fed into the model. Your starting seed
|
|
helps ensure that all aspects of the workflow are repeatable. The provided code below sets the random seed for several
|
|
different libraries used in model development.
|
|
```python
|
|
random.seed(42)
|
|
np.random.seed(42)
|
|
tf.random.set_seed(42)
|
|
```
|
|
From here you can define the synthetic data generation similar to how it is defined in other labs.
|
|
```python
|
|
X, y = make_classification(
|
|
n_samples=1000,
|
|
n_features=20,
|
|
n_informative=15,
|
|
n_redundant=5,
|
|
n_classes=2,
|
|
random_state=42
|
|
)
|
|
```
|
|
{% /step %}
|
|
|
|
{% step title="Training Reproducible Model" %}
|
|
|
|
### Training Reproducible Models
|
|
For the next step of model creation you will need to split your data into training and test datasets. The code provided below
|
|
allows for the initial dataset to be split into training and test sets. The key parameter for you to focus on here is the
|
|
```random_state``` which defines the random seed for the split. When you define the random seed, you are
|
|
ensuring future instances of the models training will result in the same training dataset being used, and therefore the
|
|
same model being created.
|
|
```python
|
|
X_train, X_test, y_train, y_test = train_test_split(
|
|
X, y, test_size=0.2, random_state=42
|
|
)
|
|
|
|
scaler = StandardScaler()
|
|
X_train = scaler.fit_transform(X_train)
|
|
X_test = scaler.transform(X_test)
|
|
```
|
|
You can then also save the `train` and `test` data splits to allow for reproducibility in instances where the dataset you are
|
|
using may be a subset of a larger dataset.
|
|
```python
|
|
joblib.dump((X_train, y_train), 'train_data.pkl')
|
|
joblib.dump((X_test, y_test), 'test_data.pkl')
|
|
```
|
|
|
|
With the datasets saved and reproducible you can move forward with defining and training your model. You will define and train
|
|
your model in this lab so you can evaluate it and compare it to a model trained on the same training dataset to ensure it's
|
|
properly reproducible. Within the lab all code is provided for
|
|
training and evaluating the model.
|
|
{% /step %}
|
|
|
|
{% step title="Saving Model and Dataset" %}
|
|
|
|
### Saving Model and Dataset
|
|
Now that your model has been trained and evaluated, you can save it with the following code
|
|
```python
|
|
model.save('my_model.keras')
|
|
```
|
|
From there you can move onto reproducing models which is meant to mimic an entirely new environment of loading a model into.
|
|
You can load the previously saved model and datasets using the following code below, including required imports.
|
|
```python
|
|
from tensorflow.keras.models import load_model
|
|
import joblib
|
|
modelReloaded = load_model('my_model.keras')
|
|
X_train_reloaded, y_train_reloaded = joblib.load('train_data.pkl')
|
|
X_test_reloaded, y_test_reloaded = joblib.load('test_data.pkl')
|
|
```
|
|
As you revaluate the model on the exact same training and test sets using the previously defined evaluation code, you should
|
|
see results that are exactly the same as the previously evaluted model.
|
|
```python
|
|
loss, accuracy = modelReloaded.evaluate(X_test, y_test)
|
|
print(f"Test accuracy: {accuracy:.2f}")
|
|
```
|
|
|
|
{% /step %}
|
|
{% /steps %}
|