Conteúdo do Curso
ML Introduction with scikit-learn
ML Introduction with scikit-learn
Training Set
If we talk about supervised or unsupervised learning, the training set will usually be in a table form.
Consider the diabetes dataset, which has the task of predicting whether a person has diabetes. It holds information about 768 females with parameters like age, body mass index, blood pressure, etc. These parameters are called features.
The dataset also contains information on whether the person has diabetes in an 'Outcome'
column, which is what we want to predict. It is called target.
Each row in a table is called instance(or data point or sample). In this case, it is information about one female.
The table (training set) has a target column in it, which means it is labeled.
The task is to train the ML model on this training set, and once it is trained, it can predict for other people (new instances) whether they have diabetes based on features only.
While coding, feature columns are usually assigned to X
and target columns assigned as y
.
And features of new instances are assigned as X_new
.
Obrigado pelo seu feedback!