Summary  
This chapter explains how scikit-learn estimators implement the .fit, .predict, and .score interface for training models, generating predictions, and evaluating their performance.

General domain of usage  
Predictive modeling (e.g., classification)

The fundamentals of data preprocessing and pipeline construction are now covered. The next step is **modeling**.


A **model** in Scikit-learn is an **estimator** that provides `.predict()` and `.score()` methods, along with `.fit()` inherited from all estimators.


## .fit() 

Once the data is preprocessed and ready to go to the model, the first step of building a model is **training a model**. This is done using the `.fit(X, y)`.

For **supervised learning** (regression, classification), `.fit()` requires both `X` and `y`.
For **unsupervised learning** (e.g., clustering), you call `.fit(X)` only. Passing `y` does not cause an error — it is simply ignored.

Note

During training, the model **learns** patterns needed for prediction. What it learns and how long training takes depend on the algorithm. Training is often the **slowest part** of ML, especially with large datasets.

## .predict()

After training, use `.predict()` to generate predictions:

```python
model.fit(X, y)
y_pred = model.predict(X_new)
```

## .score()

`.score()` evaluates a trained model, typically on a **test set**:

```python
model.fit(X, y)
model.score(X_test, y_test)
```

It compares predictions with true targets. By default, the metric is **accuracy** for classification.

`X_test` refers to the subset of the dataset, known as the **test set**, used to evaluate a model's performance after training. It contains the **features** (input data). `y_test` is the corresponding subset of **true labels** for `X_test`. Together, they assess how well the model predicts new, unseen data.

Machine learning is now used everywhere. Want to learn it yourself? This course is an introduction to the world of Machine learning for you to learn basic concepts, work with Scikit-learn – the most popular library for ML and build your first Machine Learning project.
This course is intended for students with a basic knowledge of Python, Pandas, and Numpy.

Learn the Machine Learning concepts and the ML project workflow.

Preprocessing is probably the most important stage of an ML project. This chapter covers the preprocessing steps needed for almost any dataset.

A pipeline is a neat way to combine all the preprocessing steps as well as a model. Pipelines make it much easier to train and use a model.

Modeling is the most fun stage of an ML project. Let's learn to build, fine-tune and evaluate the model!

Models

.fit()

.predict()

.score()

Models

.fit()

.predict()

.score()