diff --git a/posts/DACSS753_Final_KMuhammad.qmd b/posts/DACSS753_Final_KMuhammad.qmd new file mode 100644 index 0000000..dfaedd7 --- /dev/null +++ b/posts/DACSS753_Final_KMuhammad.qmd @@ -0,0 +1,499 @@ +--- +title: "DACSS753: Political and Social Networks Final" +author: "Kalimah Muhammad" +description: "Evaluating the Exchange of Advice in Two Organizations" +date: "05/20/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - final +--- + +```{r, warning=FALSE, include=FALSE} +library(igraph) +library(statnet) +library(sna) +library(readxl) +library(network) +library(ggplot2) +library(tidyverse) +library(DT) +``` + +```{r, load data sets} +#read in data sets +#load consulting firm's advice edgelist +cn_advice<- read_xlsx("_data/Consulting_Advice_Network.xlsx") + +#load research and development team's advice edgelist +rd_advice <- read_xlsx("_data/R&D_Advice.xlsx") +``` + + +## 1. Introduction: A Social Network Analysis of Advice Requests + +In the book, The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations, Rob Cross and Andrew Parker conduct social network analyses of 60 organizations around the world. Cross and Parker suggests managers do not understand how their employees get work done and reveal there are hidden social networks at play impacting an organization's performance.[1] + +This project will focus on two companies, a consulting firm and a research and development team, and the frequency of advice exchanged within each network. The data was compiled from survey questions before and then compiled into two edge lists. You can find the source data and further details in the Resources section.[2] + +For the consulting firm, participants were asked, "Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months." Options= 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often. + +For the research and development team, participants were asked, "Please indicate the extent to which the people listed below provide you with information you use to accomplish your work." Options= 0: I Do Not Know This Person/I Have Never Met this Person; 1: Very Infrequently; 2: Infrequently; 3: Somewhat Infrequently; 4: Somewhat Frequently; 5: Frequently; and 6: Very Frequently. + +This project analyzes the two networks to investigate trends in the frequency and concentration of advice exchanged. + +```{r, warning=FALSE} +#create igraph object for consulting firm +cn_advice.ig <- graph_from_data_frame(cn_advice) + +#create statnet object for consulting firm +cn_advice.stat <- as.network(cn_advice, loops = TRUE, multiple = TRUE) + +#create igraph object for research and development team +rd_advice.ig<- graph_from_data_frame(rd_advice) +#create statnet object for research and development team +rd_advice.stat <- as.network(rd_advice, loops = TRUE, multiple = TRUE) + +``` + +## 2. Data + +This project includes two edge lists with variables for the source listed as "From", the target node as "To", and an ordinal variable for frequency of advice as "Value." + +### Network Properties + +**Consulting Firm** + +```{r, consulting network properties} +#size of the data set +dim(cn_advice) + +#summarize consulting network attributes +print(cn_advice.stat) + +#check if network is weighted +print(str_c("Is the consulting network data set weighted? ", is_weighted(cn_advice.ig)," ")) + +#check if network is multiplex +print(str_c("Is the consulting network data set multiplex? ",is.multiplex(cn_advice.stat), " ")) +``` + +The consulting firm includes 879 edges/ties representing a connection between nodes and 46 nodes/vertices representing individual employees. The ties are directed based on who received advice from who. The network is neither bipartite nor weighted and it is unclear from the source if this data is a sample or universal to the company. + +**Research and Development Team** + +```{r, research network properties} +#summarize consulting network attributes +print(rd_advice.stat) + +#check if network is weighted +print(str_c("Is the research and development network data set weighted? ", is_weighted(rd_advice.ig)," ")) + +#check if network is multiplex +print(str_c("Is the research and development network data set multiplex? ",is.multiplex(rd_advice.stat), " ")) +``` + +The research and development team includes 2228 edges/ties and 77 nodes/ vertices. This network is also directed but not weighted or bipartite. It is unclear from the source if this data set is a sample or universal to this company. + +## 3. Network-Level Statistics + +This section reviews descriptive statistics for each social network. + +### Degree and Degree Distribution + +**Consulting Firm** + +```{r} +#create dataframe of node level stats +cn_advice.nodes<-data.frame(totdegree=sna::degree(cn_advice.stat,gmode="digraph")) + +hist(cn_advice.nodes$totdegree, main = paste("Histogram of Consulting Firm's Total Degree Distribution"), xlab = "Count of Total Degrees", ylab = "Frequency") +summary(cn_advice.nodes$totdegree) +``` + +The plot above show a mildly flat distribution except for nodes between 40-50 total degrees. This sharp increase may suggest certain roles or positions have increased connections such as middle managers or subject matter experts playing an important part in the exchange of information in the firm. + +**Research and Development Team** + +```{r, rd histogram} +#create dataframe of node level stats +rd_advice.nodes<-data.frame(totdegree=sna::degree(rd_advice.stat,gmode="digraph")) + +#plot histogram of degree distribution +hist(rd_advice.nodes$totdegree, main = paste("Histogram of R&D Team's Total Degree Distribution"), xlab = "Count of Total Degrees", ylab = "Frequency") + +#list summary of total degree distribution +summary(rd_advice.nodes$totdegree) +``` + +The plot above show a slightly skewed distribution with the highest concentration of total degrees around 40. The exceptions are nodes with 90+ total degrees.This long tail on the right side suggests status and hierarchy may play a role in advice given/received. + +### Centralization + +```{r, centralization} +#get network centralization score for consulting firm +centralization(cn_advice.stat, degree, cmode="indegree") +centralization(cn_advice.stat, degree, cmode="outdegree") + +#get network centralization score for R&D team +centralization(rd_advice.stat, degree, cmode="indegree") +centralization(rd_advice.stat, degree, cmode="outdegree") +``` + +The centralization for requests received, represented by the in-degree, is fairly decentralized for both the consulting firm (0.29) and the R&D team (0.33). There is, however, more centralization for requests sent, represented by out-degree, for the consulting firm (0.59) and the R&D team (0.63). This suggest that at the network-level, both groups have distributed advice received, and a concentration of nodes sending advice requests. + +### Components and Density + +**Consulting Firm** + +```{r, components cn} +#get number of components +igraph::components(cn_advice.ig)$no + +#get size of each component +igraph::components(cn_advice.ig)$csize + +#network diameter +diameter(cn_advice.ig) + +#get network density with loops: igraph +graph.density(cn_advice.ig, loops=TRUE) +``` + +There is one giant component containing all 46 nodes. The diameter or length of the longest geodesic is 3. The network density is also moderately low (0.42). This usually means there is less possibility of information to spread and is likely a contributor to the skewed degree distribution and influence in the network. + +**Research and Development Team** + +```{r, components rd} +#get number of components +igraph::components(rd_advice.ig)$no + +#get size of each component +igraph::components(rd_advice.ig)$csize + +#network diameter +diameter(rd_advice.ig, directed = TRUE) + +#get network density with loops: igraph +graph.density(rd_advice.ig, loops=TRUE) +``` + +There is one giant component in the research and development team's network including all 77 nodes. The diameter, the length of the longest geodesic is the same as the smaller consulting network suggesting the path connections is still short in a larger social network. The network density for the consulting firm has a moderately low (0.38) meaning many nodes have little social capital but a small number have much. + +### Dyads and Triads + +**Consulting Firm** + +```{r, network structure cn} +#Dyad census, triad census +#Classify all dyads in the network: statnet +sna::dyad.census(cn_advice.stat) + +#Classify all triads in the network: statnet +sna::triad.census(cn_advice.stat) +``` + +In the consulting firm's network, 55% of the total 880 observations are null/ absent (485). This is also reflected in the high unconnected triples in the triad census. Among mutual or reciprocal dyads, there are 327 ties followed closely by the asymmetrical dyads at 223. These findings may point towards a concentration of information with a subset group that is shared in a hierarchical manner. + +**Research and Development Team** + +```{r, network structure rd} +#Dyad census and triad census +#Classify all dyads in the network: statnet +sna::dyad.census(rd_advice.stat) + +#Classify all triads in the network: statnet +sna::triad.census(rd_advice.stat) +``` + +In the research and development network, most of the 2228 ties were absent or null (1585). This trend is also reflected in the high number of unconnected triples in the triad census. + +There are 887 mutual or reciprocal relationships suggesting a collaborative culture or work environment among those connected. This finding could also open up opportunity for more brokerage and collaboration among absent ties. Finally, there are 454 asymmetrical dyads, likely due to hierarchy in the organization, representing the remaining ties. + +### Transitivity + +**Consulting Firm** + +```{r, transitivity cn} +#transitivity +#get global clustering coefficient: igraph +transitivity(cn_advice.ig, type="global") + +#get average local clustering coefficient: igraph +transitivity(cn_advice.ig, type="average") +``` + +The average local transitivity (0.80) is higher than the overall network transitivity (0.72) suggesting subgroups are more connected to each other than the group is to the whole organization. + +**Research and Development Team** + +```{r, transitivity rd} +#transitivity +#get global clustering coefficient: igraph +transitivity(rd_advice.ig, type="global") + +#get average local clustering coefficient: igraph +transitivity(rd_advice.ig, type="average") +``` + +Similar to the consulting firm, the research and development team's average local transitivity (0.78) is higher than the overall network transitivity (0.67) suggesting once again that subgroups are more connected to each other than the group is to the whole organization. + +## 4. Plot Networks + +**Consulting Firm** + +```{r, plot consulting network} + +#calculate degree for each node +cn.deg <- degree(cn_advice.stat, gmode="digraph") + +#plot the consulting network based on degree size +plot(cn_advice.ig, vertex.size=cn.deg*.3, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Network of a Consulting Firm's Internal Advice Requests", sub="Node Size Indicates Degree of Total Requests for the Employee") + +``` + +On first glance, there are two/three relationships far from the other nodes but connected (15, 24 and 30). There is also a cluster of larger nodes towards the center with increasingly smaller nodes going outwards. This suggests a central position for nodes 2, 16, 20, 22, and 45. Finally, a cluster of nodes appears emerging on the bottom, perhaps from nodes 2, 20, 45's influence. + +**Research and Development Team** + +```{r, plot research network} + +#calculate degree for each node +rd.deg <- degree(rd_advice.stat,gmode="digraph") + +#plot the consulting network based on degree size +plot(rd_advice.ig, vertex.size=rd.deg*.3, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Network of a R&D Team's Internal Advice Requests", sub="Node Size Indicates Degree of Total Requests for the Employee", rescale=TRUE) + +``` + +The research and development team has at least five clusters of relationships. The central nodes including 15, 28, 68, and 74 have the most ties and and node sizes decrease on the perimeter of the network. Like the consulting firm, there is also another prominent cluster emerging with more ties than the other clusters. + +## 5. Prominent Roles: Node-Level Statistics + +**Consulting Firm** + +```{r, cn node level stats} +#create dataframe of node level stats +cn_advice.nodes<-data.frame(totdegree=sna::degree(cn_advice.stat,gmode="digraph"), + indegree=sna::degree(cn_advice.stat, cmode="indegree"), + outdegree=sna::degree(cn_advice.stat, cmode="outdegree"), + betweenness=sna::betweenness(cn_advice.stat, gmode="digraph"), + close=sna::closeness(cn_advice.stat, cmode="suminvdir"), + constraint=constraint(cn_advice.ig), + eigen=sna::evcent(cn_advice.stat, gmode="digraph", diag=TRUE) + ) +``` + +Below is a summary of the average node degrees and degree distribution. + +```{r, cn summarize node statistics} +#get summary statistics for node attributes +summary(cn_advice.nodes) +``` + +Most of the above metrics have a fairly normal distribution. In-/Out-degrees have similar averages but the maximum out-degree (45) appears much higher than the maximum in-degree (32). The betweenness scores appear skewed to a few high scorers and not reflective of the majority. Constraint is generally low among the nodes. Reflected centrality is low and derived centrality is high suggesting the nodes are primarily pure bridges. + +The next section includes details on each node. + +```{r, echo=TRUE} +cn.mat<-as.matrix(as_adjacency_matrix(cn_advice.ig)) + +#create dataframe of node level stats +cn.nodes<-data.frame(cn.mat) + +#square the adjacency matrix +cn.matsq<-t(cn.mat) %*% cn.mat + +#Calculate the proportion of reflected centrality. +cn_advice.nodes$rc<-diag(cn.matsq)/rowSums(cn.matsq) + +#Calculate the proportion of derived centrality. +cn_advice.nodes$dc<-1-diag(cn.matsq)/rowSums(cn.matsq) +#replace missing values with 0 +cn_advice.nodes$dc<-ifelse(is.nan(cn_advice.nodes$dc),1,cn_advice.nodes$dc) + +#view node details as a data table +datatable(cn_advice.nodes) +``` + +Node #20 stands out with the highest total degree (65), in-degree (32), out-degree (32), degree weight (65), betweenness (97.92), closeness (0.87), and constraint (0.10). This would suggest that Node #20 has the highest efficiency or popularity based on the total degree and that this node's connections are fairly mutual with equal in-/out-degrees. The high between score show this node may be a gatekeeper or broker and the low constraint score show minimal redundant information between ties.Based on this information, it's expected that Node #20 would be a central player within this network. + +Node #12 has the highest betweenness score (138.02) and one of the highest closeness metrics (1.0). This person may be a key gatekeeper within the network. + +Node #30 had the lowest metrics with total degree (4), in-degree (3), out-degree (1), degree weight (4), betweenness (0), closeness (0.02), and constraint (0.12). This may be due to the node being fairly isolated in the role. + +Note, nodes #12 and #16 had a connection with each node in the network. + +**Research and Development Team** + +```{r, rd node level stats} +#create dataframe of node level stats +rd_advice.nodes<-data.frame(totdegree=sna::degree(rd_advice.stat,gmode="digraph"), + indegree=sna::degree(rd_advice.stat, cmode="indegree"), + outdegree=sna::degree(rd_advice.stat, cmode="outdegree"), + betweenness=sna::betweenness(rd_advice.stat, gmode="digraph"), + close=sna::closeness(rd_advice.stat, cmode="suminvdir"), + constraint=constraint(rd_advice.ig), + eigen=sna::evcent(rd_advice.stat, gmode="digraph", diag=TRUE) + ) +``` + + +Below is a summary of the node-level degrees and statistics for the research and development team. + +```{r, rd summarize node statistics} +#get summary statistics for node attributes +summary(rd_advice.nodes) +``` + +Once again, most metrics have a fairly normal distribution. The average in-/out-degree is similar but the maximum out-degree (76) far exceeds the maximum in-degree (54). The constraint in the network is low. Reflected centrality is low with derived centrality high suggesting all pure bridges in this network. +There is one extreme outlier (520) for the betweenness score. + +The next section includes details on each node. + +```{r, echo=TRUE} +rd.mat<-as.matrix(as_adjacency_matrix(rd_advice.ig)) + +#create dataframe of node level stats +rd.nodes<-data.frame(rd.mat) + +#square the adjacency matrix +rd.matsq<-t(rd.mat) %*% rd.mat + +#Calculate the proportion of reflected centrality. +rd_advice.nodes$rc<-diag(rd.matsq)/rowSums(rd.matsq) + +#Calculate the proportion of derived centrality. +rd_advice.nodes$dc<-1-diag(rd.matsq)/rowSums(rd.matsq) +#replace missing values with 0 +rd_advice.nodes$dc<-ifelse(is.nan(rd_advice.nodes$dc),1,rd_advice.nodes$dc) + +#view node details as a data table +datatable(rd_advice.nodes) + +``` + +Node #68 stands out with the highest total degree (130), in-degree (54), out-degree (76), degree weight (130), betweenness (520.39), closeness (1.0), and low constraint (0.05). This would suggest that Node #68 may be manager or other trusted colleague that has many connections within the organization and is often utilized for advice. Based on this information, it's expected that Node #68 is a broker, gatekeeper, or other highly skilled and respected employee that is well connected with the rest of the network giving more advice than he/she receives. + +Node #73 had the lowest metrics with total degree (18), in-degree (18), out-degree (0), degree weight (18), betweenness (0), closeness (0), and constraint (0.14). This shows an uncommon dynamic in this network, all the advice is coming to the node but none are directed outward as shown in the in-/out-degree. This node also has minimal connections between other nodes. Based on this, we may expect this node to be an independent contributing subject matter expert within the network. + +Nodes 15, 28, 49, 68, and 74 have connections out to each other node in the network (76) making their positions more prominent and likely indicating they are at the top of their respective hierarchy. Advice received for this group is unremarkable except for #68 with the highest in-degree (54) in the network. This makes node 68 the most centralized, prominent node in the network. + +## 6. Community Detection + +To identify communities within each network, I used the conceptual approach of recognizing more connections within a subgraph than outside of it to determine the communities. The Walktrap algorithm was well suited for the directed, multiplex data. + +### Walktrap Community Detection + +**Consulting Firm** + +```{r, walktrap consulting} +#Run clustering algorithm: walktrap +cn.wt<-walktrap.community(cn_advice.ig) +#Inspect community membership +igraph::groups(cn.wt) + +#Run & inspect clustering algorithm: 20 steps +igraph::groups(walktrap.community(cn_advice.ig ,steps=20)) + +``` + +In both cases of using defaulted steps and predetermined 20 steps, two distinct and equal sized communities (23 nodes) emerge. A plot of the subgraph communities is below. + +```{r, CN Walkstrap} +#calculate degree for each node +cn.deg <- degree(cn_advice.stat,gmode="digraph") + +#plot the consulting network based on degree size +cn.communityplot <- plot(cn.wt,cn_advice.ig, vertex.size=cn.deg*.3, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Community Detection of a Consulting Firm's Advice Requests", sub="Node Size Indicates Degree of Total Requests | Color Indicates Subgroup") + +cn.communityplot +``` + +Here we find two distinct communities detected highlighted and circled in orange and blue. Many ties appear within the same community color, however, there are other ties between the two communities as indicated by the red edges. This distinction may signal a hierarchy of communication in two ways. First, ties with a community may be between similar positioned employees as well as between management and employees. Second, ties between communities could indicate key collaborators between the two groups. + +**Research and Development Team** + +```{r} +#Run clustering algorithm: walktrap +rd.wt<-walktrap.community(rd_advice.ig) +#Inspect community membership +igraph::groups(rd.wt) + +#Run & inspect clustering algorithm: 20 steps +igraph::groups(walktrap.community(rd_advice.ig, steps=20)) +``` + +The first Walktrap algorithm without a defined step shows six community clusters with varying numbers of node, while the second algorithm of 20 steps identifies only 3 subgraph groups. + +A plot of the Research and Development communities is below. + +```{r, RD Walktrap} +#calculate degree for each node +rd.deg <- degree(rd_advice.stat,gmode="digraph") + +#plot the consulting network based on degree size +rd.communityplot <- plot(rd.wt,rd_advice.ig, vertex.size=rd.deg*.2, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Community Detection of a R&D Team's Advice Requests", sub="Node Size Indicates Degree of Total Requests | Color Indicates Subgroup") + +rd.communityplot +``` +There are six distinct communities in the research and development network highlighted in green, yellow, orange, red, dark blue, and light blue. The dark blue and red communities clustered at the center displaying minimal ties within each group but frequent connections with other communities. This could signal a hierarchy of upper management and employees. Communities in green, yellow, and blue have far more connections within their community than outside perhaps signaling another commonality in node attributes. + +### Comparing Community Modularity from the Walktrap Algorithm + +```{r} +#compare community partition modularity scores +modularity(cn.wt) +modularity(rd.wt) +``` + +Comparing the modularity scores of both networks, there is higher modularity between the research and development team (0.28). This suggests there are fewer edges in the community than the expected edges compared to the consulting firm whose score is 0.24. + +Overall, community detection made it more likely to find communities within the larger network. This proves the authors point that internal social networks is a likely contributor to performance. Details on the nature of these communities is not evident however without information on node attributes. + +## 7. Network Inference and Hypotheses + +Potential social network hypotheses from this project could compare network structures using a CUG-test. + +**Consulting Firm** + +```{r, inferential statistics} +#create matrix from edgelist +cn_advice.mat <- as.matrix(cn_advice) +rd_advice.mat <- as.matrix(rd_advice) + +#compare network transitivity to null conditional on size +trans.cug<-cug.test(cn_advice.stat,FUN=gden,mode="digraph",cmode="size", reps = 500) +trans.cug + +#view plot of cug.test results +plot(trans.cug) +``` + +First, I compared the network density of the consulting firm to a random network of the same network size. Here we see the observed value of network density (0.43) is lower than the simulated network (0.50) meaning the consulting firm network is less dense than we'd find in a random network. + +**Research and Development Team** + +```{r} + +#compare network transitivity to null conditional on size +trans.cug2<-cug.test(rd_advice.stat,FUN=gden,mode="digraph",cmode="size", reps = 500) +trans.cug2 + +#view plot of cug.test results +plot(trans.cug2) + +``` + +Once again, comparing the network density of the research and development team to a random network of the same network size, we see the observed value of network density (0.38) is lower than the simulated network (0.50) meaning the research and development team's network is less dense than we'd find in a random network. + +Further research for this data set could add network attributes such as job position, geography, or gender for further analysis. + +#### References +[1]Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA. +[2]Data Set: diff --git a/posts/Week1_Challenge_kmuhammad.qmd b/posts/Week1_Challenge_kmuhammad.qmd new file mode 100644 index 0000000..0f934b2 --- /dev/null +++ b/posts/Week1_Challenge_kmuhammad.qmd @@ -0,0 +1,131 @@ +--- +title: "Week 1 Challenge" +author: "Kalimah Muhammad" +description: "Loading Data and Creating a Network" +date: "03/13/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - challenge_1 + # - railroads + # - faostat + # - wildbirds +--- + +```{r} +#| label: setup +#| include: false +``` + +```{r, include=FALSE} +library(statnet) +library(network) +library(igraph) +library(readr) +``` + +## Challenge Overview + +Today's challenge is to + +1) read in a data set, and + +2) create a network object + +## Load the Data + +Read in one (or more) of the following data sets, using the correct R package and command. + +- got_marriages.csv +- fish_encounters data set (available in the `tidyr` package) +- got_like_dislike.csv + +```{r, results='hide'} +#read in data set +got_marriages <- read_csv('_data/got/got_marriages.csv') +``` + +Show top 10 observations from the marriage data set + +```{r} +#show top results +head(got_marriages, 10) +``` + +## Create a Network + +**Instructions**: Load the package `igraph` and create an `igraph` object (i.e. a graph or network) in the form of an edge list. The command may vary whether the data is given as a list of connections or an adjacency matrix. Is the network directed or undirected; weighted or unweighted; unimodal or bipartite? Can you plot it? + +### Marriage network using `igraph` + +#### Preview of `igraph` object + +```{r} +#create igraph object +marriages.ig <- graph_from_data_frame(got_marriages) +#print igraph +print(marriages.ig) +``` + +#### Is the network directed? + +```{r} +is_directed(marriages.ig) +``` + +#### Is the network weighted? + +```{r} +is_weighted(marriages.ig) +``` + +#### Is the network bipartite? + +```{r} +is_bipartite(marriages.ig) +``` + +#### Plot of marriage network in `igraph` + +```{r} +plot(marriages.ig, edge.arrow.size = 0) +``` + +### Marriage network using `statnet` + +```{r} +marriages.stat <- as.network(got_marriages, loops = TRUE, multiple = TRUE) +``` + +#### Summary of network attributes + +```{r} +print(marriages.stat) +``` + +The marriage network is directed and not bipartite. There are, however, loops and multiple edges between actors. This finding could suggest a few options: + +a) a re-occurrence of a relationship between actors over time, such as within or between a current or past `Generation`, or + +b) a change in the relationship `Type` such as engaged, married, or an affair between actors. + +Both suggestions are plausible and evidenced in the data frame sampled below, where we see reoccurring observations within the same `Generation` between *Martell* and *Essos*a and a change in their relationship `Type`. + +```{r} +head(got_marriages, 10) +``` + +These changes do not appear as the only triggers of an event/ occurrence; thus, including an interval of time may help interpret these events. + +#### Plot of marriage network in `statnet` + +```{r} +plot(marriages.stat) +``` + +This plot shows the direction of the relationship, indicated by the arrow lines, and the number of observations of the relationship, indicated in the varying weight of each edge. However, the actors' names, a critical piece of information, appear missing from the plot. diff --git a/posts/Week2_Challenge_kmuhammad.qmd b/posts/Week2_Challenge_kmuhammad.qmd new file mode 100644 index 0000000..f28311b --- /dev/null +++ b/posts/Week2_Challenge_kmuhammad.qmd @@ -0,0 +1,177 @@ +--- +title: "Week 2 Challenge" +author: "Kalimah Muhammad" +description: "Describing the Basic Structure of a Like/Dislike Network" +date: "03/13/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - challenge_2 +--- + +```{r} +#| label: setup +#| include: false +``` + +```{r, include= FALSE} +library(igraph) +library(statnet) +library(readr) +``` + +## Challenge Overview + +Describe the basic structure of a network following the steps in tutorial of week 2, this time using a data set of your choice: for instance, you could use Marriages in Game of Thrones or Like/Dislike from week 1. + +## Describe the Network Data + +```{r, results='hide'} +#load Game of Thrones like and dislike data +like_dislike<- read_csv('_data/got/got_like_dislike.csv') +``` + +```{r, warning=FALSE} +#create IGRAPH object +like_dislike.ig <- graph_from_data_frame(like_dislike) +``` + +1. *List and inspect* List the objects to make sure the data files are working properly. + +```{r} +#list and inspect igraph object +ls() +``` + +Below is a plot of the like/dislike network. + +```{r} +plot(like_dislike.ig) +``` + +2. *Network Size* What is the size of the network? + +```{r} +#count nodes +vcount(like_dislike.ig) + +#count edges +ecount(like_dislike.ig) +``` + +There are 11 edges and 46 nodes. + +3. *Network features* Are these networks weighted, directed, and bipartite? + +```{r} +is_weighted(like_dislike.ig) +is_directed(like_dislike.ig) +is_bipartite(like_dislike.ig) +``` + +The network is directed but neither weighted nor bipartite. + +4. *Network Attributes* Listing the vertex and edge attributes. + +```{r} +#display vertex attributes for igraph object +vertex_attr_names(like_dislike.ig) +``` +`Name` is the only vertex attribute. + +```{r} +#display edge attributes for igraph object +edge_attr_names(like_dislike.ig) +``` + +There are approximately 47 edge attributes. + +## Dyad and Triad Census + +5. Conduct a *dyad census* to determine the number of dyads where the relationship is: + +- Reciprocal (mutual), or `mut` +- Asymmetric (non-mutual), or `asym`, and +- Absent, or `null` + +```{r} +igraph::dyad.census(like_dislike.ig) +``` + +There is one mutual/ reciprocal dyad, 12 asymmetric dyads, and 42 absent relationships. + +6. Now, I'll find the *triad census*. + +```{r} +igraph::triad_census(like_dislike.ig) +``` +Here we see most triads are null as shown in the first classification. This result is consistent with the high absent relationships in the dyad census. The second classification with a result of 32 points toward a single directed edge, similar to the asymmetrical dyad relationships seen earlier. Finally in the fifth classification with a result of 22, there is a suggestion of an inward star as the third most frequent triad type. + + +## Global and Local Transitivity or Clustering + +Compute the global transitivity, local transitivity of specific nodes of your choice, and the average clustering coefficient. What is the distribution of node degree and how does it compare with the distribution of local transitivity? + +```{r, global transitivity} +#calculate the global transitivity +transitivity(like_dislike.ig, type="global") +``` + +The global transitivity is 0.237 suggesting a lower proportion of connected triads within the overall network. + +Next, I calculate the local transitivity. + +```{r, local transitivity} +#calculate the local transitivity using specific nodes +transitivity(like_dislike.ig, type ="local") + +``` +Then, I calculated the average local clustering coefficient. + +```{r} +##get average local clustering coefficient: igraph +transitivity(like_dislike.ig, type="average") +``` + +The average clustering coefficient is 0.59 suggesting higher connected triads among neighboring actors than among the entire network as seen in the global transitivity score 0.23. Here the emphasis on low degree nodes suggest more ties between fewer actors. + + +## Path Length and Component Structure + +Compute the average path length and the _diameter_ of the network: + +```{r, shortest path} +#find average shortest path for network +average.path.length(like_dislike.ig,directed=T) + +#find the network diameter +diameter(like_dislike.ig) +``` + +The shortest path for the network is 1.3 and the diameter is 2. + +Find the component structure of the network and identify the cluster membership of each node: + +```{r, network components} +#get number of components +igraph::components(like_dislike.ig)$no + +#get size of each component +igraph::components(like_dislike.ig)$csize +``` + +There is one component in the network and the size of the component is 11. + + + + + + + + + diff --git a/posts/Week3_Challenge_kmuhammad.qmd b/posts/Week3_Challenge_kmuhammad.qmd new file mode 100644 index 0000000..ffd1113 --- /dev/null +++ b/posts/Week3_Challenge_kmuhammad.qmd @@ -0,0 +1,188 @@ +--- +title: "Week 3 Challenge" +author: "Kalimah Muhammad" +description: "Degree and Density of a Network " +date: "05/1/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - challenge_3 +--- + +```{r} +#| label: setup +#| include: false +``` + +```{r, include= FALSE} +library(igraph) +library(statnet) +library(readr) +library(network) +``` + +## Challenge Overview + +Describe the many measures of degree, as well as density, of a network and compare. + +## Describe the Network Data + +```{r, results='hide'} +#load Game of Thrones like and dislike data +like_dislike<- read_csv('_data/got/got_like_dislike.csv') +``` + +```{r, warning=FALSE} +#create iGraph object +like_dislike.ig <- graph_from_data_frame(like_dislike) +``` + +Below is a plot of the like/dislike network. + +```{r} +plot(like_dislike.ig) +``` + +On initial review, we see the majority of contacts are directed towards NA. However, Arryn and Tyrell's most direct relationship is with the contacts Tully and Baratheon respectively. Stark appears as one of the few egos with contacts directed out. Tully appears as a potential mediator between Arryn, Stark, and NA. + +*Network Size* + +```{r} +#count nodes +vcount(like_dislike.ig) + +#count edges +ecount(like_dislike.ig) +``` + +There are 11 vertices and 46 nodes. + +*Network features* + +```{r} +is_weighted(like_dislike.ig) +is_directed(like_dislike.ig) +is_bipartite(like_dislike.ig) +``` + +The network is directed but neither weighted nor bipartite. + +*Network Attributes* + +```{r} +#display vertex attributes for igraph object +vertex_attr_names(like_dislike.ig) +``` +`Name` is the only vertex attribute. + +```{r} +#display edge attributes for igraph object +edge_attr_names(like_dislike.ig) +``` + +There are approximately 47 edge attributes. + +## Dyad and Triad Census + +*Dyad Census* + +```{r} +igraph::dyad.census(like_dislike.ig) +``` + +There is one mutual/ reciprocal dyad, 12 asymmetric dyads, and 42 absent relationships. + +*Triad Census* + +```{r, triad census} +igraph::triad_census(like_dislike.ig) +``` +Here we see most triads are null as shown in the first classification. This result is consistent with the high absent relationships in the dyad census. The second classification with a result of 32 points toward a single directed edge, similar to the asymmetrical dyad relationships seen earlier. Finally in the fifth classification with a result of 22, there is a suggestion of an inward star as the third most frequent triad type. + + +## Degree + +Total Degrees + +```{r, total degrees} +#Calculate average network degree: igraph +igraph::degree(like_dislike.ig) +``` + +Calculate In-degree + +```{r, in degree} +#calculate in-degree: statnet +igraph::degree(like_dislike.ig, mode="in") +``` + +Calculate Out-degree + +```{r, out degree} +#calculate out-degree: statnet +igraph::degree(like_dislike.ig, mode="out") +``` + +```{r} +#create a data frame with the degree values +like_dislike.nodes<-data.frame( + totdegree=igraph::degree(like_dislike.ig, loops=TRUE), + indegree=igraph::degree(like_dislike.ig, mode="in", loops=TRUE), + outdegree=igraph::degree(like_dislike.ig, mode="out", loops=TRUE)) + +like_dislike.nodes +``` + +Here we see a significant number of relationships between NA and NA. This is likely due to the networks tracing relationships between current and former houses in which 40 of the 46 observations do not include a former house and 8 observations do not include a current house. There is a high prevalence of in-degrees for NA houses compared to out-degrees, 39 to 8 respectively. For out-degrees, both Lannister followed by Stark have the most relationships directed out with 11 and 10 out-degrees. Baratheon has the closest in- to out-degree relationships, 4 and 5, suggesting more mutual/ reciprocal contacts. + +## Density + +Compute the density of the network. Is this a global or local measure? Does it have a relationship with average degree? + +```{r, graph degree} +#get network density: igraph with loops +graph.density(like_dislike.ig, loops=TRUE) + +``` + +The density of the network is 0.38 which is relatively low. This suggests there is less spread of contacts among the group. This seconds our findings in the degree statistics where overall contacts had little contact between each other outside of a few popular egos, Lannister, Stark, Baratheon, and NA houses. + +## Radomn Network + +```{r, random network} +#create vertices and edges variables based on like_dislike network +vertices<-11 +edges <- 46 + +#create a random network with the same number of nodes and edges +random_network<-sample_gnm(n=vertices,m=edges, directed = TRUE, loops = TRUE) + +``` + +Does the comparison tell us something about the network of your choice? + +First, let's plot the random graph. + +```{r, plot random network} +plot(random_network) +``` + +Upon initial review, the random network with the same number of edges (46) and vertices (11) appear more distributed than the like_dislike network which displayed a high concentration toward a central node. We also see a higher prevalence of in-degree relationships than out-degrees suggesting a hierarchy in the relationships. + +```{r, degree statistics in random network} +#create a data frame with the degree values +random_network.nodes<-data.frame( + totdegree=igraph::degree(random_network, loops=TRUE), + indegree=igraph::degree(random_network, mode="in", loops=TRUE), + outdegree=igraph::degree(random_network, mode="out", loops=TRUE)) + +random_network.nodes +``` + +Once again, we find more distributed and mutual relationships in the random network than in the Game of Thrones like_dislike network. It is likely that we'd find more brokerage relationships in the random network as well. + diff --git a/posts/Week4_Challenge_kmuhammad.qmd b/posts/Week4_Challenge_kmuhammad.qmd new file mode 100644 index 0000000..090bb20 --- /dev/null +++ b/posts/Week4_Challenge_kmuhammad.qmd @@ -0,0 +1,303 @@ +--- +title: "Week 4 Challenge" +author: "Kalimah Muhammad" +description: "Degree and Density of a Network " +date: "05/1/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - challenge_4 +--- + +```{r} +#| label: setup +#| include: false +``` + +```{r, warning= FALSE, include=FALSE} +library(igraph) +library(statnet) +library(readr) +library(network) +library(ggplot2) +library(tidyverse) +``` + +## Challenge Overview + +Describe the many measures of centrality of at least one network of your choice. + +## Describe the Network Data + +```{r, results='hide'} +#load Game of Thrones like and dislike data +like_dislike<- read_csv('_data/got/got_like_dislike.csv') +``` + +```{r, warning=FALSE} +#create iGraph object +like_dislike.ig <- graph_from_data_frame(like_dislike) +``` + +Below is a plot of the like/dislike network. + +```{r} +plot(like_dislike.ig) +``` + +On initial review, we see the majority of contacts are directed towards NA. However, Arryn and Tyrell's most direct relationship is with the contacts Tully and Baratheon respectively. Stark appears as one of the few egos with contacts directed out. Tully appears as a potential mediator between Arryn, Stark, and NA. + +*Network Size* + +```{r} +#count nodes +vcount(like_dislike.ig) + +#count edges +ecount(like_dislike.ig) +``` + +There are 11 vertices and 46 nodes. + +*Network features* + +```{r} +is_weighted(like_dislike.ig) +is_directed(like_dislike.ig) +is_bipartite(like_dislike.ig) +``` + +The network is directed but neither weighted nor bipartite. + +*Network Attributes* + +```{r} +#display vertex attributes for igraph object +vertex_attr_names(like_dislike.ig) +``` +`Name` is the only vertex attribute. + +```{r} +#display edge attributes for igraph object +edge_attr_names(like_dislike.ig) +``` + +There are approximately 47 edge attributes. + +## Dyad and Triad Census + +*Dyad Census* + +```{r} +igraph::dyad.census(like_dislike.ig) +``` + +There is one mutual/ reciprocal dyad, 12 asymmetric dyads, and 42 absent relationships. + +*Triad Census* + +```{r, triad census} +igraph::triad_census(like_dislike.ig) +``` +Here we see most triads are null as shown in the first classification. This result is consistent with the high absent relationships in the dyad census. The second classification with a result of 32 points toward a single directed edge, similar to the asymmetrical dyad relationships seen earlier. Finally in the fifth classification with a result of 22, there is a suggestion of an inward star as the third most frequent triad type. + + +## Degree + +```{r} +#create a data frame with the degree values +like_dislike.nodes<-data.frame( + totdegree=igraph::degree(like_dislike.ig, loops=TRUE), + indegree=igraph::degree(like_dislike.ig, mode="in", loops=TRUE), + outdegree=igraph::degree(like_dislike.ig, mode="out", loops=TRUE)) + +like_dislike.nodes +``` + +Here we see a significant number of relationships between NA, the most popular node in the network. This is likely due to the networks tracing relationships between current and former houses in which 40 of the 46 observations do not include a former house and 8 observations do not include a current house. There is a high prevalence of in-degrees for NA houses compared to out-degrees, 39 to 8 respectively. For out-degrees, both Lannister and then Stark have the most relationships directed out with 11 and 10 out-degrees. Baratheon has the closest in- to out-degree relationships, 4 and 5, suggesting more mutual/ reciprocal contacts. + +## Density + +Compute the density of the network. + +```{r, graph degree} +#get network density: igraph with loops +graph.density(like_dislike.ig, loops=TRUE) + +``` + +The density of the network is 0.38 which is relatively low. This suggests there is less spread of contacts among the group. This seconds our findings in the degree statistics where overall contacts had little contact between each other outside of a few popular egos, Lannister, Stark, Baratheon, and NA houses. + +## Centrality + +```{r, network centraility} +#identify network centrality using iGraph + +#calculate centralization score of in-degrees +centr_degree(like_dislike.ig, loops=TRUE, mode="in")$centralization + +#calculate centralization score of out-degrees +centr_degree(like_dislike.ig, loops=TRUE, mode="out")$centralization + +``` + +The in-degree centralization score is 3.48 and the out-degree centralization score is 0.68. As suspected, the prevalence of relationship directed to a central node is much higher than those directed outward or among the other nodes. + + +### **Calculate Closeness Centrality** + +```{r, node closeness centrality} +#calculate to ten closeness centrality: igraph +igraph::closeness(like_dislike.ig) + +#add closeness centrality to node measures +like_dislike.nodes$closeness<-igraph::closeness(like_dislike.ig) +``` + +In the results above, there is a high closeness score for NA, Arryn, and Baratheon. +Interestingly, Tully does not have any out-degrees, only in, and has NaN value for closeness. The other actors have varying levels of closeness likely depending on if the actor has a connection outside of NA or multiple loops between NA. + + +### Betweenness Centrality + +```{r, betweenness} + +#calculate network betweenness centralization +centr_betw(like_dislike.ig,directed=T)$centralization + +#calculate node-level betweenness centralization +igraph::betweenness(like_dislike.ig, directed=TRUE) + +#add betweenness centrality to node measures +like_dislike.nodes$between<-igraph::betweenness(like_dislike.ig, directed=TRUE) +``` + +The network-level betweenness score is 0.054. At the node-level, the betweenness score is similarly low among most of the nodes with the exception of NA (5) and Stark (1). Both Tully and Arryn are singularly connected to Stark to get to the remaining network while NA is the most centralized node in the group. + +### Bonacich Power Centrality & Centralization + +```{r} +#calculate bon. power centrality for nodes: igraph +power_centrality(like_dislike.ig) + +#add bonachic power centrality to node measures +like_dislike.nodes$bonpow<-power_centrality(like_dislike.ig) + +``` + + +### Eigenvector Centrality & Centralization + +```{r} +##calculate eigenvector centrality scores: igraph +temp<-centr_eigen(like_dislike.ig,directed=T) + +#identify names +names(temp) +#length +length(temp$vector) +#first 6 eigenvector scores +head(temp$vector) +#graph level centralization score +temp$centralization + +#add eigenvector centrality to node measures +like_dislike.nodes$eigen<-centr_eigen(like_dislike.ig)$centralization + +``` + +#### Derived and Reflected Centrality + +```{r} +#create adjacency matrix +ld.matrix<-as.matrix(as_adjacency_matrix(like_dislike.ig)) + +#square the adjacency matrix +ld.matrixsq<-t(ld.matrix) %*% ld.matrix + +#Calculate the proportion of reflected centrality. +like_dislike.nodes$rc<-diag(ld.matrixsq)/rowSums(ld.matrixsq) +#replace missing values with 0 +like_dislike.nodes$rc<-ifelse(is.nan(like_dislike.nodes$rc),0,like_dislike.nodes$rc) + +#Calculate received eigenvalue centrality +like_dislike.nodes$eigen.rc<-like_dislike.nodes$eigen*like_dislike.nodes$rc + +#Calculate the proportion of derived centrality. +like_dislike.nodes$dc<-1-diag(ld.matrixsq)/rowSums(ld.matrixsq) +#replace missing values with 0 +like_dislike.nodes$dc<-ifelse(is.nan(like_dislike.nodes$dc),1,like_dislike.nodes$dc) +#Calculate derived eigenvalue centrality +like_dislike.nodes$eigen.dc<-like_dislike.nodes$eigen*like_dislike.nodes$dc +``` + + +### Dataframe of Centralization Scores + +```{r} +like_dislike.nodes +``` + +When investigating the reflected and derived centrality scores, we notice most actors have the same score for each measure with the exception of Baratheon, Stark, and NA. Baratheon has low reflected centrality (0.19) and high derived centrality (0.81) suggesting it's likely a peripheral among actors. Stark has relatively low reflected centrality (0.33) and moderate derived centrality (0.67) denoting a pure bridge between nodes. NA has high reflected centrality (0.87) and low derived centrality (0.13) meaning it's likely a pure hub between actors. All other nodes score 0 for reflected centrality and 1 for derived centrality suggesting the remaining nodes are pure bridges. + +### Network Constraint + +```{r, network constraint} +constraint(like_dislike.ig) +``` + +With most nodes, there is significant redundancy. This is likely due to most nodes directly connected towards NA. Tully is unique in that this node has in-degrees with Stark and Arryn but not directly to NA. Both Tyrell and Night's Watch have a constraint score above 1 signifying over redundant contacts. This may be due to direct connections with NA and one other node who also have direct connections with NA. + +```{r, warning=FALSE} +#plot distribution of centralization scores +like_dislike.nodes%>% + gather() %>% + ggplot(aes(value)) + + geom_histogram() + + facet_wrap(~key, scales = "free") + + ggtitle("Game of Thrones Book Like/Dislikes") +``` + +## Radomn Network Comparison + +```{r, random network} +#create vertices and edges variables based on like_dislike network +vertices<-11 +edges <- 46 + +#create a random network with the same number of nodes and edges +random_network<-sample_gnm(n=vertices,m=edges, directed = TRUE, loops = TRUE) + +``` + +First, let's plot the random graph. + +```{r, plot random network} +plot(random_network) +``` + +Upon initial review, the random network with the same number of edges (46) and vertices (11) appear more distributed than the like_dislike network which displayed a high concentration toward a central node. We also see a higher prevalence of in-degree relationships than out-degrees suggesting a hierarchy in the relationships. + +### Comparing Centralization Scores in a Random Network + +```{r, centralization statistics in random network} +#create a data frame with the degree values +random_network.nodes<-data.frame( + totdegree=igraph::degree(random_network, loops=TRUE), + indegree=igraph::degree(random_network, mode="in", loops=TRUE), + outdegree=igraph::degree(random_network, mode="out", loops=TRUE), + closeness=igraph::closeness(random_network), + betweenness=igraph::betweenness(random_network, directed=TRUE), + eigen= igraph::centr_eigen(random_network)$centralization, + bonpow= igraph::power_centrality(random_network), + constraint=igraph::constraint(random_network)) + +random_network.nodes +``` + +Overall in the random network, there is a more even distribution of values in the closeness and constraint scores. The closeness scores are still low under 0.1 suggesting actors are not centralized at all. Variability within the betweenness scores point towards multiple centralized hubs rather than only 2 mentioned in the Game of Thrones like_dislike network. The network Eigenvector score of 0.45 denotes that nodes are moderately connected to other central nodes. This is a stark difference than the Game of Throne network where the Eigenvector score was 0.94 highlighting the highly centralized nature of that network. \ No newline at end of file diff --git a/posts/Week7_Interpretative_Assignment_kmuhammad.qmd b/posts/Week7_Interpretative_Assignment_kmuhammad.qmd new file mode 100644 index 0000000..6f6ad71 --- /dev/null +++ b/posts/Week7_Interpretative_Assignment_kmuhammad.qmd @@ -0,0 +1,306 @@ +--- +title: "Week 7 Interpretative Assingment" +author: "Kalimah Muhammad" +description: "Interpretative Assignment: Community Detection" +date: "05/13/2023" +format: + html: + toc: true + code-fold: true + code-copy: true + code-tools: true +# editor: visual +categories: + - Week7_assignment +--- + +```{r} +#| label: setup +#| include: false +``` + +```{r, warning=FALSE, include=FALSE} +library(igraph) +library(statnet) +library(readxl) +library(network) +library(ggplot2) +library(tidyverse) +``` + +## Assignment Overview + +Briefly describe the dataset you are using: identify initial network format, describe and identify the nodes (including how many nodes are in the dataset), what constitutes a tie or edge (including how many ties, whether ties are directed/undirected and weighted/binary, and how to interpret the value of the tie if any), whether or not there are edge attributes that might be used to subset data or stack multiple networks (e.g., tie type, year, etc). The goal should be interpretation of the data, not simply reporting results. + +Calculate community clusters using various algorithms in the attached syntax.Which communities make sense, and why? Do some algorithms assign nodes to the “wrong’ communities. How does changing the number of expected clusters affect community membership? Any other comments or observations? Do we observe the type of behavior we would expect, given community assignment? + +## Describe the Network Data + +```{r, results='hide'} +#read in data sets +#load consulting firm advice edgelist +cn_advice<- read_xlsx("_data/Consulting_Advice_Network.xlsx") + +#load research and development firm advice edgelist +rd_advice <- read_xlsx("_data/R&D_Advice.xlsx") +``` + +In the book, The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations, Rob Cross and Andrew Parker conduct social network analyses of 60 organizations around the world. Cross and Parker suggests managers do not understand how their employees get work done and reveal there are hidden social networks at play impacting an organization's performance.[1] + +For this assignment I will focus on one of the two data sets, the consulting firm. The data was compiled from a survey question and then compiled into edge lists. You can find the source data and further details in the References section.[2] + +For the consulting firm, participants were asked, "Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months." Options= 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often. + +This project analyzes the network to investigate trends in the frequency and concentration of advice exchanged. In the edge list, the variables for the source is listed as "From", the target node as "To", and an ordinal variable for frequency of advice as "Value." + +```{r, wraning=FALSE, results='hide'} +#create igraph object for consulting firm +cn_advice.ig <- graph_from_data_frame(cn_advice) + +#create statnet object for consulting firm +cn_advice.stat <- as.network(cn_advice, loops = TRUE, multiple = TRUE) + +``` + +### Network Properties + +```{r, consulting network properties} +#summarize consulting network attributes +print(cn_advice.stat) + +#check if network is weighted +is_weighted(cn_advice.ig) +``` + +The consulting firm includes 879 edges/ties representing a connection between nodes and 46 nodes/vertices representing individual employees. The ties are directed based on who received advice from who. The network is neither bipartite nor weighted. + +### Plot Networks + +**Consulting Firm** + +```{r, plot consulting network} +plot(cn_advice.ig, edge.arrow.size = 1) +``` + +On first glance, there are three nodes far from the other nodes but connected (15, 24 and 30). These nodes appear to have ties directed towards them but little to no ties directed outward. This would suggest these nodes receive advice requests but not solicit advice from others. Overall, the network appears fairly dense and connected initially. + +### Network Structure + +```{r} + +#Dyad census, triad census +#Classify all dyads in the network: statnet +sna::dyad.census(cn_advice.stat) + +#Classify all triads in the network: statnet +sna::triad.census(cn_advice.stat) +``` + +In this network, 55% of the total 880 observations are null/ absent (485). This is also reflected in the high unconnected triples in the triad census. Among mutual or reciprocal dyads, there are 327 ties followed closely by the asymmetrical dyads at 223. These findings may point towards a concentration of information with a subset group that is shared in a hierarchical manner. + +```{r} +#get number of components +igraph::components(cn_advice.ig)$no + +#get size of each component +igraph::components(cn_advice.ig)$csize + +#get network density with loops: igraph +graph.density(cn_advice.ig, loops=TRUE) +``` + +There is one giant component containing all 46 nodes. The network density is also moderately low (0.42). This usually means there is less possibility of information to spread and is likely a contributor to the skewed degree distribution and influence in the network. + +```{r} +#get global clustering coefficient: igraph +transitivity(cn_advice.ig, type="global") + +#get average local clustering coefficient: igraph +transitivity(cn_advice.ig, type="average") + +summary(E(cn_advice.ig)$value) +``` + +The average local transitivity (0.80) is higher than the overall network transitivity (0.72) suggesting subgroups are more connected to each other than the group is to the whole organization. + +### Network Degree and Centrality + +```{r} +cn_advice.nodes<-data.frame(name=cn_advice.stat%v%"vertex.names", + degree=sna::degree(cn_advice.stat,gmode="digraph"), + degree.wt=strength(cn_advice.ig), + betweenness=sna::betweenness(cn_advice.stat, gmode="digraph"), + close=sna::closeness(cn_advice.stat, cmode="suminvdir"), + constraint=constraint(cn_advice.ig) + ) + +datatable(cn_advice.nodes) +``` + +Node #20 stands out with the highest total degree (65), in-degree (32), out-degree (32), degree weight (65), betweenness (97.92), closeness (0.87), and constraint (0.10). This would suggest that Node #20 has the highest efficiency or popularity based on the total degree and that this node's connections are fairly mutual with equal in-/out-degrees. The high between score show this node may be a gatekeeper or broker and the low constraint score show minimal redundant information between ties.Based on this information, it's expected that Node #20 would be a central player within this network. + +Node #12 has the highest betweenness score (138.02) and one of the highest closeness metrics (1.0). This person may be a key gatekeeper within the network. + +Node #30 had the lowest metrics with total degree (4), in-degree (3), out-degree (1), degree weight (4), betweenness (0), closeness (0.02), and constraint (0.12). This may be due to the node being fairly isolated in the role. + +Note, nodes #12 and #16 had a connection with each node in the network. + +## Community Clustering + +Calculate community clusters using various algorithms. + +### Fast and Greedy Algorithm +```{r, CN fast and greedy} +#Run clustering algorithm: fast_greedy +cn.fg<-cluster_fast_greedy(cn_advice.ig) + +#Inspect clustering object +names(cn.fg) +cn.fg + +#retrieve list of nodes in communities +igraph::groups(cn.fg) +``` + +Fast and Greedy algorithms only works on undirected graphs and thus are not applicable for this project. + +### Walktrap Community Detection + +```{r} +#Run clustering algorithm: walktrap +cn.wt<-walktrap.community(cn_advice.ig) +#Inspect community membership +igraph::groups(cn.wt) + +#Run & inspect clustering algorithm: 10 steps +igraph::groups(walktrap.community(cn_advice.ig, steps=10)) +#Run & inspect clustering algorithm: 20 steps +igraph::groups(walktrap.community(cn_advice.ig ,steps=20)) +#Run & inspect clustering algorithm +igraph::groups(walktrap.community(cn_advice.ig, steps=100)) +``` + +For the defaulted Walktrap algorithm and steps 10 and 20, two communities were detected with the same distinction of nodes. When distinguishing 100 steps, five communities emerged. + +```{r, CN Walkstrap} +#plot network with community coloring +plot(cn.wt,cn_advice.ig, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Walktrap Community Detection") +``` +Here we find two distinct communities detected in the circled in red and blue. Many ties appear within the same community color, however, there are also numerous ties between the two communities. This distinct may signal a hierarchy of communication within the department between similar positioned employees as well as between management and non-management. + +**Modularity Score Walktrap Algorithm** + +```{r} +#collect modularity scores to compare +mods<-c(walktrap=modularity(cn.wt)) +mods + +``` + + +### Leading Label Propagation Community Detection + +**Consulting Firm** + +```{r} +#Leading label propagation +cn.lab<-label.propagation.community(cn_advice.ig) + +igraph::groups(cn.lab) + +cn_advice.nodes$comm.lab<-cn.lab$membership + +plot(cn.lab,cn_advice.ig, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Leading Label Propagation Community Detection") + +mods<-c(mods, label=modularity(cn.lab)) + +mods +``` + +Here we see one community detected. This algorithm is best suited for weighted edges and may not be appropriate for this data set. + +### Edge Betweenness Community Detection +```{r} +#edge betweenness community detection +cn.edge<-edge.betweenness.community(cn_advice.ig) +igraph::groups(cn.edge) + +cn_advice.nodes$cn.edge<-cn.edge$membership + +plot(cn.edge,cn_advice.ig, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Edge Betweenness Community Detection") + +mods<-c(mods, edge=modularity(cn.edge)) + +mods +``` + +Seventeen communities were detected using the edge betweenness algorithm. Here communities 3-17 only include one node. + +### Eigenvector Community Detection +```{r} +#consulting firm eigen community detection +cn.eigen<-leading.eigenvector.community(cn_advice.ig) + +igraph::groups(cn.eigen) + +cn_advice.nodes$cn.eigen<-cn.eigen$membership + +plot(cn.eigen,cn_advice.ig, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Eigenvector Community Detection") + +mods<-c(mods, eigen=modularity(cn.eigen)) + +mods + +``` +The Eigenvector community detection found two communities similar to the Walktrap clustering. + +### Spinglass Community Detection + +```{r} +giant.component <- function(graph) { + cl <- clusters(graph) + induced.subgraph(graph, which(cl$membership == which.max(cl$csize))) +} +``` + + +```{r} +#extract giant component +cn.giant<-giant.component(cn_advice.ig) +``` + + +```{r} +#consulting firm spinglass community detection +#extract giant component +cn.giant<-giant.component(cn_advice.ig) + +cn.spin<-spinglass.community(cn.giant) + +igraph::groups(cn.spin) + +cn_advice.nodes$cn.spin[which(cn_advice.nodes$name%in%V(cn.giant)$name)]<-cn.spin$membership + +plot(cn.spin,cn_advice.ig, edge.arrow.size = 0.5, edge.arrow.width = 0.5, arrow.mode=3, main="Spinglass Community Detection") + +mods<-c(mods, spin=modularity(cn.spin)) + +mods +``` + +The Spinglass algorithm detected two communities similar to the Walktrap and Eigenvector algorithms but the modularity score is slightly higher 0.24. + +## Conclusion + +**Which communities make sense, and why?** +The Walktrap, Eigenvector, and Spinglass algorithms make the most sense. They divide the network into two equal and distinct nodes. + +**Do some algorithms assign nodes to the "wrong" communities?** +The Edge Betweenness algorithm appeared to incorrectly detect single nodes as communities, and the Leading Label Propagation only identified one community. + +**Do we observe the type of behavior we would expect, given community assignment? ** +Yes. It makes sense that two communities would have the most ties among themselves and that there would be a subgroup that communicates with the other community. + +## References +[1]Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA. +[2]Data Set: diff --git a/posts/_data/Consulting_Advice_Network.xlsx b/posts/_data/Consulting_Advice_Network.xlsx new file mode 100644 index 0000000..8bb43b1 Binary files /dev/null and b/posts/_data/Consulting_Advice_Network.xlsx differ diff --git a/posts/_data/R&D_Advice.xlsx b/posts/_data/R&D_Advice.xlsx new file mode 100644 index 0000000..4f53586 Binary files /dev/null and b/posts/_data/R&D_Advice.xlsx differ