-
Notifications
You must be signed in to change notification settings - Fork 2
1. Analysis review
What we did.
- Focused on the the attitude variables.
- Calculated a distance matrix using general dissimilarity coefficient of Gower (1971)
- We then used Multidimensional scaling (MDS) an Paritioning around mediods (PAM) to find groups which were similar.
- Then we looked at financial and demographic variables of the groups to find differences and similarities
- We then created an app to visualise everything allowing users to explore the data further.
- This could be used by the BOE to explore different groups of people over time.
- They would need to standardised some of the questions.
- give a list in the handout and justify why we chose these
- We kept the financial and demographic variables separate to prevent finding artificial differences
- Example to explain this point, if we group people into above and below 40 years of age then we find one of the groups are older, this is artificial because we have made the groups differ by age
I think the easiest way of explaining this is to use cities as an example. Think of the globe with all the cities on it. We can calculate a distance matrix representing how far each city is from one another MDS takes the 3d globe and portrays it in 2d just like a map. PAM then looks to split the cities into groups based on how close they are. Maybe a bit like countries.
-
This deals with categorical variables by normalising them within the boundary of [0, 1]
-
A distance matrix is like a spreadsheet with cites in the top row and the first column then all the cells represent how many miles it takes to travel between each city. With a distance of zero meaning that it is the same city.
-
This could be a slide showing distance from London to Paris etc so it's easy for them to understand
-
MDS is a method for reducing the number of dimension of your data.
- Example it is similar to the way we produce a 2d map of a 3d world. But in our situation we reduce from many dimensions to 2 dimensions so we can visualise the distances.
-
PAM is a technique to get clusters (is groups an easier word to use?)
-
it is more robust than k-means because it minimizes a sum of dissimilarities instead of a sum of squared euclidean distances.
- Example We want to put 3 parks in a town, and we want as many people as possible to be close to a park
- PAM would look for points in the town that minimises the distance of everyone to at least one park.
We then explored the data to look for patterns in the other type of variables and founds 4 subgroups
This is our visualisation
Here is how it could be taken forward.