Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/analysis/correlation_heatmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ Displays correlation matrix heatmap.
## Example1 - The Boston Housing Dataset
[Multicollinearity]('https://en.wikipedia.org/wiki/Multicollinearity') may exist when two or more of the predictors in a regression model are highly correlated. When it exists, the coefficient estimates of the multiple regression may return erroneous values. In this example, we create a correlation matrix heatmap to check the correlations between predictors in a linear model of the Boston housing dataset.

1. Follow the instruction of example 1 explained on [Myltiple regression analysis](./regression_analysis.md). Select[Multiple linear regression analysis] > [Correlation matrix heatmap] for [Analysis Type].
1. Follow the instruction of example 1 explained on [Multiple regression analysis](./regression_analysis.md). Select[Multiple linear regression analysis] > [Correlation matrix heatmap] for [Analysis Type].
2. From the correlation matrix heatmap, you are able to find out that the correlation between rad and tax is high.
![correlation heatmap example1](./images/correlation_heatmap_example1.png)
42 changes: 21 additions & 21 deletions docs/analysis/regression_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,35 @@ Performs multiple regression analysis and fits linear models.
## Usage
1. Place [Advanced Analytics Toolbox] extension on a sheet and select [Multiple linear regression Analysis] > [Multiple regression analysis] for [Analysis Type]
2. Select dimensions and measures
* Dimension: A field uniquely identifies each record (ex: ID, Code)
* Dimension: A field that uniquely identifies each record (examples: `ID`, `Code`)
* Measure 1: Response variable
* Measure 2-: Predictor variables
* Remaining measures: Predictor variables

## Options
* Confidence level - Tolerance/confidence level.

## Example1 - The Boston Housing Dataset
The Boston housing dataset contains medv (median house value) for 506 neighborhoods around Boston. In this example, we seek to predict medev using predictors such as rm (average number of rooms per house), age (average age of house) and crim (per capita crime rate by town).
## Example - The Boston Housing Dataset
The Boston housing dataset contains `medv` (median house value) for 506 neighborhoods around Boston. In this example, we seek to predict `medv` using predictors such as `rm` (average number of rooms per house), `age` (average age of house) and `crim` (per capita crime rate by town).

1. Download the following sample file.
* Boston ( [Download file](./data/Boston.xlsx) | [Description on the dataset](http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) )
2. Load the downloaded file into a new Qlik Sense app.
3. Place [Advanced Analytics Toolbox] extension on a sheet and select [Multiple linear regression Analysis] > [Multiple regression analysis] for [Analysis Type].
4. Select [id] for a dimension.
5. Select Sum([mdev]) for the first measure as a response(dependent) variable. This is the values we seek to predict.
6. We are adding 13 predictor(independent) variables. Press [+] button to add measure button, and select the following fields for these measures:
* crim - per capita crime rate by town
* zn - proportion of residential land zoned for lots over 25,000 sq.ft.
* indus - proportion of non-retail business acres per town
* chas - Charles River dummy variable (= 1 if tract bounds)
* nox - nitric oxides concentration (parts per 10 million)
* rm - average number of rooms per dwelling
* age - proportion of owner-occupied units built prior to 1940
* dis - weighted distances to five Boston employment centres
* rad - index of accessibility to radial highways
* tax - full-value property-tax rate per $10,000
* ptratio - pupil-teacher ratio by town
* black - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* lstat ^ percentage of lower status of the population
6. The following chart is displayed. Two or three stars on the Coefficients table represent small p-values, which indicates that there is a relationship between an predictor variable and mdev as the responsible variable. The R-squared provides a measurement on how well the model is fitting the actual data. In this example, the R-squared we get is 0.7406, which indicates that 74% of the response variable can be explained by the predictor variables.
4. Select `[id]` for a dimension.
5. Select `Sum([medv])` for the first measure as a response (dependent) variable. These are the values we seek to predict.
6. We are adding 13 predictor (independent) variables. Press [+] button to add measure button, and select the following fields for these measures:
* `crim` - per capita crime rate by town
* `zn` - proportion of residential land zoned for lots over 25,000 sq.ft.
* `indus` - proportion of non-retail business acres per town
* `chas` - Charles River dummy variable (= 1 if tract bounds)
* `nox` - nitric oxides concentration (parts per 10 million)
* `rm` - average number of rooms per dwelling
* `age` - proportion of owner-occupied units built prior to 1940
* `dis` - weighted distances to five Boston employment centres
* `rad` - index of accessibility to radial highways
* `tax` - full-value property-tax rate per $10,000
* `ptratio` - pupil-teacher ratio by town
* `black` - _1000(Bk - 0.63)^2_ where _Bk_ is the proportion of blacks by town
* `lstat` - percentage of lower status of the population
7. The following chart is displayed. Two or three stars on the Coefficients table represent small p-values, which indicates that there is a relationship between a predictor variable and `medv` as the response variable. The R-squared provides a measurement on how well the model is fitting the actual data. In this example, the R-squared we get is 0.7406, which indicates that 74% of the response variable can be explained by the predictor variables.
![regression analysis example1](./images/regression_analysis_example1.png)
6 changes: 3 additions & 3 deletions src/lib/js/util/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ define([
*/
displayReturnedDatasetToConsole(debugMode, dataset) {
if (debugMode) {
console.log('** Recieved data from engine:')
console.log('** Received data from engine:')
console.log(dataset);
}
},
Expand Down Expand Up @@ -214,7 +214,7 @@ define([
},

/**
* validateDimension - Recieve dimension object and return field value
* validateDimension - Receive dimension object and return field value
*
* @param {Object} dimension Dimension data (layout.props.dimensions[i])
*
Expand All @@ -232,7 +232,7 @@ define([
return result;
},
/**
* validateMeasure - Recieve measure object and return measure expression value
* validateMeasure - Receive measure object and return measure expression value
*
* @param {Object} measure Measure data (layout.props.measures[i])
*
Expand Down