Your submitted document should include the following items. Points will be deducted if the following are not included:
Type your Name, STAT 350 and Data Analysis Assignment #3 centered on the top of page 1 of your document.
Number your pages across your entire solutions document.
Your document should include the ANSWERS ONLY to the following FOUR questions with each answer labeled by its corresponding number and subpart. Keep the questions in order. Do NOT include the questions in your submitted document.
Generate all requested graphs and tables using JMP.
Upload your document onto Blackboard as a Word or pdf document using the link provided by your instructor in Blackboard.
Elements of good technical writing:
Use complete and coherent sentences to answer the questions.
Graphs must be appropriately titled and should refer to the context of the question.
Graphical displays must include labels with units if appropriate for each axis.
Units should always be included when referring to numerical values.
When making a comparison you must use comparative language, such as “greater than”, “less than”, or “about the same as.”
Ensure that all graphs and tables appear on one page and are not split across two pages.
Show all mathematical calculations when directed to compute an answer ‘by-hand.’
When writing mathematical expressions into your document you may use either an equation editor or common shortcuts such as: can be written as sqrt(x), can be written as p-hat, can be written as x-bar.
All questions will require you to load data sets posted on the Blackboard course DA#3 site.
Load the RailsTrails dataset located in the Blackboard site. This dataset contains data from a sample of 104 homes that were sold in 2007 in the city of Northampton, Massachusetts. The goal was to see which variables might relate to selling price. One variable under consideration was the size of the house as it is commonly the case that larger homes sell for more money.
1a) Select Graph → Graph Builder on JMP to construct a scatter plot to determine if size of house (using the variable ‘squarefeet’ which measures the interior size of the home in 000’s sq.ft.) is related to price (using ‘price2007’ which is the 2007 selling price in $000). Highlight and drag price2007 to the Y-axis, and squarefeet to the X-axis. Use the icons at the top to remove any curve overlays and leave you with a basic scatter plot. Copy and paste this graph into your document, suitably titled.
Copy and paste your completely labeled and titled scatterplot into your document.
1b) Use your scatterplot from 1a to describe the association (if any) between price and size of a home.
1c) Use JMP and Analyze → Fit Y by X and Fit Line to find the simple linear regression model for this relationship. Copy and paste the output for the “Summary of Fit” and “Parameter Estimates” tables into your document.
1d) State the linear regression equation and interpret in context the meaning of the slope value.
1e) State and explain the meaning of the R2 value in the context of the question.
1f) Obtain a 95% confidence interval for the slope of the regression line by following these steps.
Right click anywhere in the Parameter Estimate portion of the output
Click on Columns and then ask for both “Upper 95%” and “Lower 95%”
Copy and paste only the revised Parameter Estimate table into your document.
1g). Interpret the confidence interval that you obtained in part 1f in the context of the question.
1h) Now use JMP to construct residuals plots by clicking on the red triangle by the Linear Fit under the scatterplot and selecting Plot Residuals. Copy and paste only the “Residual by Predicted Plot” and “Residual Normal Quantile Plot” into your document.
1i) Use these residual plots to determine if this linear model is appropriate
1j) Report and interpret the regression standard error, s.
Continue using the RailsTrails data set. In this exercise use ‘price2007’ as the response variable and ‘distance’ (distance in miles from a bike trail) and ‘squarefeet’ (size of home) as two explanatory variables.
2a) Use JMP and Analyze → Fit Model to fit a regression model with both explanatory variables, as follows:
Analyze → Fit Model
Enter price2007 as the Y variable
Add square feet and distance in the “Construct Model Effects” box
Make sure Minimal Report appears in the Emphasis box.
Copy and paste the summary of fit and parameter estimates into your document.
2b) Has the addition of the extra variable ‘distance’ changed our value of R2adjusted from its value in Question 1? What does this imply?
2c) Use the two-predictor model to calculate ‘by-hand’ a prediction for the 2007 price of a particular home that went on the market and that had 1500 square feet of space and is 0.5 miles from a bike trail.
James Bond Films
Ian Fleming’s 007 novels achieved moderate success but what catapulted James Bond into the pantheon of cultural icons was the introduction of the character portrayed by Sean Connery in 1962’s “Dr. No”. There have now been 24 official James Bond films and six actors have donned the tuxedo as 007. Note: the 25th film has just been released.
The JMP data file Bond Films contains the name of each of the 24 films, the year the film was released, which actor portrayed James Bond, and the following continuous variables:
Worldwide Gross Earnings (adjusted for inflation) in $1,000’s
The film’s Budget (adjusted for inflation) in $1,000’s
Average Online Ratings (on a 1-10 scale)
Number of Kills by Bond (Bond Kills)
Number of Bond’s Romantic Liaisons
Number of Gadgets Used.
We will explore this data using graphics, one-way analysis of variance, correlation, and multiple regression.
3a) Select Graph → Graph Builder. Highlight and drag Adj WW Gross to the Y-axis, Release Year to the X-axis, and Actor to the Color box on the right. Use the icons at the top to remove any curve overlays and leave you with a color-coded scatter plot. Copy and paste this graph into your document, suitably titled.
3b) What features do you see in this plot?
3c) We will now use one-way ANOVA to test whether actors significantly affect the mean rating scores of their Bond films. Select Analyze → Fit Y by X, place Average Rating in the Y box, the categorical Actor variable in the X Factor box and hit OK. Copy and paste the resulting graphical display into your document.
3d) State the null and alternative hypotheses for this research in the context of the question.
Note: due to the small sample sizes of films by actor we will not conduct assumption checks at this point.
3e) Using the red triangle, select Means/Anova to conduct an analysis of variance on the data. Copy and paste the Analysis of Variance table into your document.
3f) Use your output from part 3e to make a decision concerning your hypotheses from part 3d. Justify your decision at the α = 0.05 level of significance.
3g) Now, using the red triangle, select Compare Means → Each Pair, Student’s t to determine which actors, if any, differ regarding mean rating scores of their films. Include the relevant output in your document.
James Bond Films
4a) We will now conduct a multiple regression analysis using Adj WW Gross as our dependent (response) variable, Y, and the following 6 variables as our explanatory variables:
Adjusted Budget, Average Rating, Bond Kills, Romantic Liaisons, Gadgets Used and Release Year. Since the release year is strongly associated with the actor portraying Bond, this continuous variable will act as a proxy for the actor.
Initially we will investigate correlations between the six explanatory variables by following the steps below:
Analyze → Multivariate Methods → Multivariate
Enter all six variables into the Y box
Copy and paste the Correlations table into your document.
Which two explanatory variables have the highest correlation, and which two have the lowest correlation? Give plausible explanations for these particular correlations.
4b) Now fit a multiple regression model using all six explanatory variables by following the steps below:
Analyze → Fit Model
Enter Adj WW Gross as the Y variable
Add Adjusted Budget, Average Rating, Bond Kills, Romantic Liaisons, Gadgets Used and Release Year in the “Construct Model Effects” box
Make sure Minimal Report appears in the Emphasis box.
Copy and paste the following three tables into you document: “Summary of Fit”, “Analysis of Variance” and “Parameter Estimates” and use these tables to answer the following questions.
4c) State the value of the F-statistic and its p-value from the “Analysis of Variance” table and use these to comment on the adequacy of the overall model.
4d) Use the “Parameter Estimates” table to determine which of the explanatory variables are significant in the model. Use α = 0.05. Explain by referring to the p-values for the individual variables.
4e) Evaluate, numerically, how well the model fits the data using the “Summary of Fit” table.