Course Content
Explore the Linear Regression Using Python
Explore the Linear Regression Using Python
Building the Regression Model Based on Several Variables
Often we will need to make predictions based not on one but on several characteristics at once. For example, predict the length of a cat's tail knowing its weight and height. In such a case our equation looks like this:
tail_length = <em><strong class="go98639658">a</strong></em> + <em><strong class="go98639658">b</strong></em> * cat_weight + <em><strong class="go98639658">c</strong></em> * cat_height
Here we need to find we have to find already three unknown variables (the intercept a, coefficients b and c). The more features we have from which we make our forecast, the more unknown variables. In this case, we will fit a plane instead of a line.
So multivariant regression is the way to predict a value based on two or more variables. Let’s indicate the number of flavanoids based on the number of total phenols and nonflavanoid phenols. Dealing with these two characteristics we will use methods from the previous section to find the best model and get missing parameters:
# Preparing data X2 = data[['total_phenols','nonflavanoid_phenols']] Y = data['flavanoids'] # Train and test model X2_train, X2_test, Y_train, Y_test = train_test_split(X2, Y, test_size = 0.3, random_state = 1) model2 = LinearRegression() model2.fit(X2_train, Y_train) print(model2.intercept_) print(model2.coef_)
We use in code the sane random_state parameter to get the same split.
The method of obtaining forecasts is identical to what we considered in the previous chapter for one characteristic:
y_test_predicted2 = model2.predict(X2_test)
This code will return predicted values using the trained model:
flavanoids = <em><strong class="go98639658">2.1</strong></em> - <em><strong class="go98639658">0.74</strong></em> * total_phenols + <em><strong class="go98639658">1.46</strong></em> * nonflavanoid_phenols
Task
Let’s indicate the number of total phenols based on the number of flavanoids and nonflavanoid phenols.
- [Line #17] Set
total_phenols
as our target. - [Line #24] Split the data 70-30 (70% of the data is for training and 30% is for testing) and insert 1 as a random parameter.
- [Line #25] Initialize linear regression model.
- [Line #26] Fit the model using your tain data.
- [Lines #29-30] Print model’s parameters (
model2.intercept_
andmodel2.coef_
) using the functionprint()
twice.
Thanks for your feedback!
Building the Regression Model Based on Several Variables
Often we will need to make predictions based not on one but on several characteristics at once. For example, predict the length of a cat's tail knowing its weight and height. In such a case our equation looks like this:
tail_length = <em><strong class="go98639658">a</strong></em> + <em><strong class="go98639658">b</strong></em> * cat_weight + <em><strong class="go98639658">c</strong></em> * cat_height
Here we need to find we have to find already three unknown variables (the intercept a, coefficients b and c). The more features we have from which we make our forecast, the more unknown variables. In this case, we will fit a plane instead of a line.
So multivariant regression is the way to predict a value based on two or more variables. Let’s indicate the number of flavanoids based on the number of total phenols and nonflavanoid phenols. Dealing with these two characteristics we will use methods from the previous section to find the best model and get missing parameters:
# Preparing data X2 = data[['total_phenols','nonflavanoid_phenols']] Y = data['flavanoids'] # Train and test model X2_train, X2_test, Y_train, Y_test = train_test_split(X2, Y, test_size = 0.3, random_state = 1) model2 = LinearRegression() model2.fit(X2_train, Y_train) print(model2.intercept_) print(model2.coef_)
We use in code the sane random_state parameter to get the same split.
The method of obtaining forecasts is identical to what we considered in the previous chapter for one characteristic:
y_test_predicted2 = model2.predict(X2_test)
This code will return predicted values using the trained model:
flavanoids = <em><strong class="go98639658">2.1</strong></em> - <em><strong class="go98639658">0.74</strong></em> * total_phenols + <em><strong class="go98639658">1.46</strong></em> * nonflavanoid_phenols
Task
Let’s indicate the number of total phenols based on the number of flavanoids and nonflavanoid phenols.
- [Line #17] Set
total_phenols
as our target. - [Line #24] Split the data 70-30 (70% of the data is for training and 30% is for testing) and insert 1 as a random parameter.
- [Line #25] Initialize linear regression model.
- [Line #26] Fit the model using your tain data.
- [Lines #29-30] Print model’s parameters (
model2.intercept_
andmodel2.coef_
) using the functionprint()
twice.
Thanks for your feedback!
Building the Regression Model Based on Several Variables
Often we will need to make predictions based not on one but on several characteristics at once. For example, predict the length of a cat's tail knowing its weight and height. In such a case our equation looks like this:
tail_length = <em><strong class="go98639658">a</strong></em> + <em><strong class="go98639658">b</strong></em> * cat_weight + <em><strong class="go98639658">c</strong></em> * cat_height
Here we need to find we have to find already three unknown variables (the intercept a, coefficients b and c). The more features we have from which we make our forecast, the more unknown variables. In this case, we will fit a plane instead of a line.
So multivariant regression is the way to predict a value based on two or more variables. Let’s indicate the number of flavanoids based on the number of total phenols and nonflavanoid phenols. Dealing with these two characteristics we will use methods from the previous section to find the best model and get missing parameters:
# Preparing data X2 = data[['total_phenols','nonflavanoid_phenols']] Y = data['flavanoids'] # Train and test model X2_train, X2_test, Y_train, Y_test = train_test_split(X2, Y, test_size = 0.3, random_state = 1) model2 = LinearRegression() model2.fit(X2_train, Y_train) print(model2.intercept_) print(model2.coef_)
We use in code the sane random_state parameter to get the same split.
The method of obtaining forecasts is identical to what we considered in the previous chapter for one characteristic:
y_test_predicted2 = model2.predict(X2_test)
This code will return predicted values using the trained model:
flavanoids = <em><strong class="go98639658">2.1</strong></em> - <em><strong class="go98639658">0.74</strong></em> * total_phenols + <em><strong class="go98639658">1.46</strong></em> * nonflavanoid_phenols
Task
Let’s indicate the number of total phenols based on the number of flavanoids and nonflavanoid phenols.
- [Line #17] Set
total_phenols
as our target. - [Line #24] Split the data 70-30 (70% of the data is for training and 30% is for testing) and insert 1 as a random parameter.
- [Line #25] Initialize linear regression model.
- [Line #26] Fit the model using your tain data.
- [Lines #29-30] Print model’s parameters (
model2.intercept_
andmodel2.coef_
) using the functionprint()
twice.
Thanks for your feedback!
Often we will need to make predictions based not on one but on several characteristics at once. For example, predict the length of a cat's tail knowing its weight and height. In such a case our equation looks like this:
tail_length = <em><strong class="go98639658">a</strong></em> + <em><strong class="go98639658">b</strong></em> * cat_weight + <em><strong class="go98639658">c</strong></em> * cat_height
Here we need to find we have to find already three unknown variables (the intercept a, coefficients b and c). The more features we have from which we make our forecast, the more unknown variables. In this case, we will fit a plane instead of a line.
So multivariant regression is the way to predict a value based on two or more variables. Let’s indicate the number of flavanoids based on the number of total phenols and nonflavanoid phenols. Dealing with these two characteristics we will use methods from the previous section to find the best model and get missing parameters:
# Preparing data X2 = data[['total_phenols','nonflavanoid_phenols']] Y = data['flavanoids'] # Train and test model X2_train, X2_test, Y_train, Y_test = train_test_split(X2, Y, test_size = 0.3, random_state = 1) model2 = LinearRegression() model2.fit(X2_train, Y_train) print(model2.intercept_) print(model2.coef_)
We use in code the sane random_state parameter to get the same split.
The method of obtaining forecasts is identical to what we considered in the previous chapter for one characteristic:
y_test_predicted2 = model2.predict(X2_test)
This code will return predicted values using the trained model:
flavanoids = <em><strong class="go98639658">2.1</strong></em> - <em><strong class="go98639658">0.74</strong></em> * total_phenols + <em><strong class="go98639658">1.46</strong></em> * nonflavanoid_phenols
Task
Let’s indicate the number of total phenols based on the number of flavanoids and nonflavanoid phenols.
- [Line #17] Set
total_phenols
as our target. - [Line #24] Split the data 70-30 (70% of the data is for training and 30% is for testing) and insert 1 as a random parameter.
- [Line #25] Initialize linear regression model.
- [Line #26] Fit the model using your tain data.
- [Lines #29-30] Print model’s parameters (
model2.intercept_
andmodel2.coef_
) using the functionprint()
twice.