Logging Experiments with MLflow
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849import logging import mlflow import mlflow.sklearn from sklearn.datasets import load_diabetes from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") # Load sample dataset X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Set experiment name (creates if doesn't exist) experiment_name = "DiabetesRegression" mlflow.set_experiment(experiment_name) logging.info(f"Using experiment: {experiment_name}") # Start an MLflow run with mlflow.start_run() as run: # Define and train model alpha = 0.5 logging.info(f"Training Ridge(alpha={alpha})") model = Ridge(alpha=alpha) model.fit(X_train, y_train) logging.info("Training complete") # Predict and calculate metric predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) logging.info(f"Test MSE: {mse:.6f}") # Log parameter, metric, and model mlflow.log_param("alpha", alpha) mlflow.log_metric("mse", mse) mlflow.sklearn.log_model(model, "model") logging.info("Logged params, metrics, and model to MLflow") # Optionally, log data version or hash for reproducibility mlflow.log_param("data_version", "sklearn_diabetes_v1") # Print run information to stdout run_id = run.info.run_id experiment_id = run.info.experiment_id artifact_uri = mlflow.get_artifact_uri() logging.info(f"Run ID: {run_id}") logging.info(f"Experiment ID: {experiment_id}") logging.info(f"Artifact URI: {artifact_uri}")
To understand how experiment logging works in practice, you can follow this step-by-step breakdown of the provided code. First, the code loads a sample dataset using load_diabetes from scikit-learn, then splits it into training and test sets. The experiment is named using mlflow.set_experiment, which either selects an existing experiment or creates a new one if needed.
The main part of the workflow begins with mlflow.start_run(), which initializes a new run and ensures all subsequent logs are grouped together. Inside this run, a Ridge regression model is defined with a specific alpha parameter and trained on the training data. After training, predictions are made on the test set, and the mean squared error (MSE) is calculated as a performance metric.
MLflow's logging functions are then used to capture key aspects of the experiment. The alpha parameter is logged with mlflow.log_param, and the computed mse is logged as a metric using mlflow.log_metric. The trained model itself is saved as an artifact with mlflow.sklearn.log_model, making it easy to retrieve or deploy later. For reproducibility, the code also logs a data_version parameter, which records the origin or version of the dataset used for training.
Warning: always log relevant metadata such as data version, random seed, and environment information. Without these, reproducing results or debugging issues becomes much harder.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Awesome!
Completion rate improved to 6.25
Logging Experiments with MLflow
Deslize para mostrar o menu
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849import logging import mlflow import mlflow.sklearn from sklearn.datasets import load_diabetes from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") # Load sample dataset X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Set experiment name (creates if doesn't exist) experiment_name = "DiabetesRegression" mlflow.set_experiment(experiment_name) logging.info(f"Using experiment: {experiment_name}") # Start an MLflow run with mlflow.start_run() as run: # Define and train model alpha = 0.5 logging.info(f"Training Ridge(alpha={alpha})") model = Ridge(alpha=alpha) model.fit(X_train, y_train) logging.info("Training complete") # Predict and calculate metric predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) logging.info(f"Test MSE: {mse:.6f}") # Log parameter, metric, and model mlflow.log_param("alpha", alpha) mlflow.log_metric("mse", mse) mlflow.sklearn.log_model(model, "model") logging.info("Logged params, metrics, and model to MLflow") # Optionally, log data version or hash for reproducibility mlflow.log_param("data_version", "sklearn_diabetes_v1") # Print run information to stdout run_id = run.info.run_id experiment_id = run.info.experiment_id artifact_uri = mlflow.get_artifact_uri() logging.info(f"Run ID: {run_id}") logging.info(f"Experiment ID: {experiment_id}") logging.info(f"Artifact URI: {artifact_uri}")
To understand how experiment logging works in practice, you can follow this step-by-step breakdown of the provided code. First, the code loads a sample dataset using load_diabetes from scikit-learn, then splits it into training and test sets. The experiment is named using mlflow.set_experiment, which either selects an existing experiment or creates a new one if needed.
The main part of the workflow begins with mlflow.start_run(), which initializes a new run and ensures all subsequent logs are grouped together. Inside this run, a Ridge regression model is defined with a specific alpha parameter and trained on the training data. After training, predictions are made on the test set, and the mean squared error (MSE) is calculated as a performance metric.
MLflow's logging functions are then used to capture key aspects of the experiment. The alpha parameter is logged with mlflow.log_param, and the computed mse is logged as a metric using mlflow.log_metric. The trained model itself is saved as an artifact with mlflow.sklearn.log_model, making it easy to retrieve or deploy later. For reproducibility, the code also logs a data_version parameter, which records the origin or version of the dataset used for training.
Warning: always log relevant metadata such as data version, random seed, and environment information. Without these, reproducing results or debugging issues becomes much harder.
Obrigado pelo seu feedback!