In this lab you will work with cerealsTrainSetNov23 and
cerealsTestSetNov23 Excel files (sent to you by e-mail)
You will work with QlikView ONLY and document results in this Word
File: cerealResult.docx
Open QlikView
Load cerealsTrainSetNov23 Excel file. Save this QlikView file
as cerealTrain_YourInitials
There are 9 Independent Variables, the requirement will be to examine 3:
Calories
Sugars
Fiber
Build Scatter Chart for each variable vs Rating. Use Name as
Dimension, X - one of the independent variables, Y - Rating
Use Advanced Mode for Expression and add Trendline Average and
Linear
with Show Equation and R^2 for Rating
Document R^2 and Linear Regression Formula for each Model in the Word file cerealResult.docx
Remember, the general Linear Regression Fomula is PredictedRating = A*IndependentVariable+B
Rate the models based on R^2
Add Statistics Box for Rating with the following information: Average, Min and Max.
Document these values in the Word file. Pay attention, Average Rating is your baseline model. You would need this
value in Testing file.
Save QlikView file and close it.
Open NEW QlikView file and load
cerealsTestSetNov23 Excel file. This data we DIDN'T use when we built the models. Now you will TEST your models on this
test dataset.
Save this QlikView file
as cerealTest_YourInitials
In this file you would need to use Linear Regressin Coefficients you documented earlier and you would need to
use the Average Rating value you found earlier.
Remember, the general Linear regression fomula is PredictedRating = A*IndependentVariable+B
Build Straight Table Chart with Name as Dimension
Expressions:
only(Rating), Label RatingActual
A_Sugars*only(Sugars)+B_Sugars (you have to write the actual numbers you found earlier instead of A_Sugars
and
B_Sugars). Label: PredictedBySugars
(RatingActual-PredictedBySugars)*(RatingActual-PredictedBySugars), Label SSESugars, Select Sum of Rows in Total
Mode
(RatingActual-Average)*(RatingActual-Average) (YOU MUST SUBSTITUTE THE ACTUAL VALUE INSTEAD OF
AVERAGE), Label SST, Select Sum of Rows in Total Mode
A_Calories*only(Calories)+B_Calories (you have to write the actual numbers you found earlier instead of
A_Calories and
B_Calories). Label: PredictedByCalories
(PredictedByCalories-RatingActual)*(PredictedByCalories-RatingActual), Label SSECalories, Select Sum of Rows in
Total Mode
A_Fiber*only(Fiber)+B_Fiber(you have to write the actual numbers you found earlier instead of
A_Fiber and
B_Fiber). Label: PredictedByFiber
(PredictedByFiber-RatingActual)*(PredictedByFiber-RatingActual), Label SSEFiber, Select Sum of Rows in Total
Mode
IMPORTANT!!!For ALL expressions on Number Tab select Fixed to 2 Decimals
Find Test_R^2 = 1-SSE/SST for each model(Sugars, Calories, Fiber) on the Test Data
Document Test_R^2 for each model in the Word File
Find the model which performs the best on the test dataset
Explain.
Submit THREE FILES, 2 QlikView files and Word Summary
File
If time permits
Examine additional variables to find
significant ones:
Protein
Fat
Sodium
Carbo
Sugars
Potass
Vitamins
Open training file, cerealTrain_YourInitials
Add new sheet and build Combo Chart with Name as Dimension and the following expressions:
Only(Rating), label it ActualRating (make it Bar Chart)
Formula for Sugars: $(=LINEST_M(Rating, Sugars))*Only(Sugars)+$(=LINEST_B(Rating,Sugars)), label
it PredictedbySugars. (Making it Line Chart)
Formula for additional 2 most significant variables (replace Sugars with variable name in the
formula above)(Make these Line Charts)
Compare Actual and Predicted Rating values for each Model. Analyze the results. Is it possible
to identfy the best model by looking on the chart.
Find Test_R^2 for additional models. Compare the performance.
Create Combo Chart to compare Actual Rating with Predicted Rating for several models. Analyze the chart