Contenido del Curso
ML Introduction with scikit-learn
ML Introduction with scikit-learn
Training Set
If we talk about Supervised or Unsupervised Learning, the training set will usually be in a table form.
Consider the Diabetes Dataset, which has the task of predicting whether a person has diabetes.
It holds information about 768 females with parameters like age, body mass index, blood pressure, etc. These parameters are called features.
The dataset also contains information on whether the person has diabetes in an 'Outcome'
column, which is what we want to predict. It is called target.
Each row in a table is called instance(or data point or sample). In this case, it is information about one female.
The table(training set) has a target column in it, which means it is labeled.
The task is to train the ML model on this training set, and once it is trained, it can predict for other people(new instances) whether they have diabetes based on features only.
Note
The training set should be as relevant to new instances as possible. For example, this diabetes dataset contains information about females at least 21 years old, so the model can make worse predictions on new instances that are male compared to females.
While coding, feature columns are usually assigned to X
and target columns assigned as y
.
And features of new instances are assigned as X_new
.
In the next chapter, we will discuss the types of data a training set can contain and what problems we can face with our data.
¡Gracias por tus comentarios!