Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Machine Learning Workflow | Machine Learning Concepts
ML Introduction with scikit-learn
course content

Course Content

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

book
Machine Learning Workflow

Let's look at the workflow you would go through to build a successful machine learning project.

Step 1. Get the data

For this step, you need to define the problem and what data is required. Then, choose a metric and define what result would be satisfactory.

Next, you need to gather this data together, usually from several sources (databases) in a format suitable for further processing in Python.

Sometimes the data is already in a .csv format and ready to be preprocessed, and this step can be skipped.

Example

A hospital provides you with historical patient records from their database and additional demographic information from a national health database, all compiled into a CSV file. The task is to predict patient readmissions, using accuracy (the percentage of total predictions that are correct) over 80% as the metric for satisfactory results.

Step 2. Preprocess the data

This step consists of:

  • Data cleaning: dealing with missing values, non-numerical data, etc;
  • Exploratory data analysis (EDA): analyzing and visualizing the dataset to find patterns and relationships between features and, in general, to get insights on how the training set can be improved;
  • Feature Engineering: selecting, transforming, or creating new features based on EDA insights to improve the model's performance.

Example

For the hospital data, you might fill in missing values for essential metrics like blood pressure and convert categorical variables like race into numerical codes for analysis.

Step 3. Modeling

This step involves:

  • Choosing the model: at this stage, you choose a model or few that perform best on your problem. It combines the algorithm's understanding and experiments with models to find the ones suitable for your problem;
  • Hyperparameter tuning: a process of finding the hyperparameters that result in the best performance;
  • Evaluating the model - measuring the model's performance on the unseen data.

Example

You select a specific classification model to predict patient readmissions, which is ideal for binary outcomes (readmitted or not). You then tune its hyperparameters to optimize the model’s configuration. Finally, the model's performance is evaluated using a separate validation/test set to ensure it generalizes effectively beyond the training data.

Step 4. Deployment

Once you have a fine-tuned model that shows good performance, you can deploy it. But that's not where your job ends. Most of the time, you also want to monitor the deployed model's performance, find ways to improve it, and feed new data as it is collected. It sends you back to step 1.

Example

Once the model predicts readmissions accurately, it's integrated into the hospital's database system to alert staff about high-risk patients upon admission, enhancing patient care.

Data preprocessing and modeling steps can be completed using the scikit-learn (imported as sklearn) library. That is what the rest of the course is about.

We will learn some basic preprocessing steps and learn how to build pipelines. After that, we will discuss the modeling stage using the k-nearest neighbors algorithm (implemented as the KNearestClassifier in sklearn) as an example of the model. This includes building a model, tuning hyperparameters, and evaluating the model.

1. What is the primary purpose of the "Get the data" step in a machine learning project?
2. Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?
What is the primary purpose of the "Get the data" step in a machine learning project?

What is the primary purpose of the "Get the data" step in a machine learning project?

Select the correct answer

Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 5
We're sorry to hear that something went wrong. What happened?
some-alt