diff --git a/DESCRIPTION b/DESCRIPTION index 914302b..531c7fe 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,31 +1,23 @@ Package: networkflow Title: Functions For A Workflow To Manipulate Networks -Version: 0.1.0 +Version: 1.0.0 Date: 2022-11-25 Authors@R: c( person("Aurélien", "Goutsmedt", , "agoutsmedt@gmail.com", role = c("cre", "aut"), comment = c(ORCID = "0000-0002-3788-7237")), person("Alexandre", "Truc", , "alexandre.truc77@gmail.com", role = c("aut"), - comment = c(ORCID = "0000-0002-1328-7819")) + comment = c(ORCID = "0000-0002-1328-7819")), + person("Thomas", "Delcey", role = c("aut"), comment = c(ORCID = "0000-0003-0546-1474")) ) -Author: Aurélien Goutsmedt and Alexandre Truc. +Author: Aurelien Goutsmedt, Alexandre Truc and Thomas Delcey. Maintainer: Aurélien Goutsmedt -Description: This package proposes a series of function to make it easier - and quicker to work on networks. It mainly targets working on - bibliometric networks (see the - [biblionetwork](https://github.com/agoutsmedt/biblionetwork) package - for creating such networks). This package heavily relies on - [igraph](https://igraph.org/r/) and - [tidygraph](https://tidygraph.data-imaginist.com/index.html), and aims - at producing ready-made networks for projecting them using - [ggraph](https://ggraph.data-imaginist.com/). This package does not - invent nothing new, properly speaking, but it allows the users to - follow more quickly and easily the main steps of network manipulation, - from creating the graph to projecting it. It is inspired by what could - be done with [GEPHI](https://gephi.org/): the package allows the use - of the Leiden community detection algorithm, as well as of the Force - Atlas 2 layout, both being unavailable in igraph (and so in - tidygraph). +Description: Provides a workflow to build, analyze, and visualize projected + networks from tabular data. The package supports dynamic analysis across + time windows, including cluster detection and cross-period cluster matching. + It also covers network construction, interpretation, static plotting, and + interactive exploration through a 'shiny' app. Although designed for + projected networks (e.g., article -> reference), it can be used more + generally with 'tbl_graph' objects. License: MIT + file LICENSE URL: https://github.com/agoutsmedt/networkflow, https://agoutsmedt.github.io/networkflow/ @@ -71,6 +63,4 @@ Remotes: Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.2 - - +RoxygenNote: 7.3.3 diff --git a/NAMESPACE b/NAMESPACE index 42329d3..f90e0ac 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -4,7 +4,6 @@ export("%>%") export(add_clusters) export(add_node_roles) export(build_dynamic_networks) -export(build_dynamic_networks2) export(build_network) export(color_alluvial) export(color_networks) diff --git a/NEWS.md b/NEWS.md index a15a654..a93f07f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,6 @@ -# networkflow 0.1.0 (Development) +# networkflow 1.0.0 + +First stable release. Deprecated and new functions: diff --git a/R/Authors_stagflation.R b/R/Authors_stagflation.R index d4f76e8..ac0f93a 100644 --- a/R/Authors_stagflation.R +++ b/R/Authors_stagflation.R @@ -4,11 +4,11 @@ #' the US stagflation and their authors (`Nodes_stagflation` just takes the first author; #' here is the complete list of authors per document). #' -#' @format A data frame with 558 rows and 7 variables: +#' @format A data frame with 231 rows and 3 variables: #' \describe{ -#' \item{ItemID_Ref}{Identifier of the document published by the author} -#' \item{Author}{Author of the document} -#' \item{Order}{Use this as a label for nodes} +#' \item{source_id}{Identifier of the document published by the author} +#' \item{author_name}{Author of the document} +#' \item{author_order}{Author order in the document author list} #' } #' @source Goutsmedt A. (2020) “From Stagflation to the Great Inflation: Explaining the 1970s US Economic #' Situation”. Revue d’Economie Politique, Forthcoming 2021. diff --git a/R/Edges_coupling.R b/R/Edges_coupling.R index fbeab5c..48ec4ae 100644 --- a/R/Edges_coupling.R +++ b/R/Edges_coupling.R @@ -1,10 +1,10 @@ #' Edges For Bibliographic Coupling Network Of Articles and Books Explaining the 1970s US Stagflation. #' #' A dataset containing the edges of the bibliographic coupling network of articles and books on stagflation. -#' Built by using [Ref_stagflation]: `biblionetwork::biblio_coupling(Ref_stagflation,"Citing_ItemID_Ref","ItemID_Ref")`. +#' Built by using [Ref_stagflation]: `biblionetwork::biblio_coupling(Ref_stagflation,"source_id","target_id")`. #' Could be used with [Nodes_coupling] to create a network with tidygraph. #' -#' @format A data frame with 154 rows and 6 variables: +#' @format A data frame with 2593 rows and 5 variables: #' \describe{ #' \item{from}{Identifier of the Source document on stagflation, in character format} #' \item{to}{Identifier of the Target document on stagflation, in character format} diff --git a/R/Nodes_coupling.R b/R/Nodes_coupling.R index ae4588e..16b3c6e 100644 --- a/R/Nodes_coupling.R +++ b/R/Nodes_coupling.R @@ -7,12 +7,12 @@ #' #' @format A data frame with 154 rows and 6 variables: #' \describe{ -#' \item{ItemID_Ref}{Identifier of the document on stagflation, in character format} -#' \item{Author}{Author of the document on stagflation} -#' \item{Author_date}{Use this as a label for nodes} -#' \item{Year}{Year of publication of the document} -#' \item{Title}{Title of the document} -#' \item{Journal}{Journal of publication of the document (if an article)} +#' \item{source_id}{Identifier of the document on stagflation, in character format} +#' \item{source_author}{Author of the document on stagflation} +#' \item{source_label}{Use this as a label for nodes} +#' \item{source_year}{Year of publication of the document} +#' \item{source_title}{Title of the document} +#' \item{source_journal}{Journal of publication of the document (if an article)} #' } #' @source Created from `Nodes_stagflation.rda` diff --git a/R/Nodes_stagflation.R b/R/Nodes_stagflation.R index 7e34f65..da86937 100644 --- a/R/Nodes_stagflation.R +++ b/R/Nodes_stagflation.R @@ -4,15 +4,15 @@ #' what happened in the US economy in the 1970s, as well as all the articles and books #' cited at least twice by the first set of articles and books (on the stagflation). #' -#' @format A data frame with 558 rows and 7 variables: +#' @format A data frame with 654 rows and 7 variables: #' \describe{ -#' \item{ItemID_Ref}{Identifier of the document} -#' \item{Author}{Author of the document} -#' \item{Author_date}{Use this as a label for nodes} -#' \item{Year}{Year of publication of the document} -#' \item{Title}{Title of the document} -#' \item{Journal}{Journal of publication of the document (if an article)} -#' \item{Type}{If "Stagflation", the document is listed as an explanation of the US stagflation. +#' \item{source_id}{Identifier of the document} +#' \item{source_author}{Author of the document} +#' \item{source_label}{Use this as a label for nodes} +#' \item{source_year}{Year of publication of the document} +#' \item{source_title}{Title of the document} +#' \item{source_journal}{Journal of publication of the document (if an article)} +#' \item{source_type}{If "Stagflation", the document is listed as an explanation of the US stagflation. #' If "Non-Stagflation", the document is cited by a document explaining the stagflation} #' } #' @source Goutsmedt A. (2020) “From Stagflation to the Great Inflation: Explaining the 1970s US Economic diff --git a/R/Ref_stagflation.R b/R/Ref_stagflation.R index 555a272..82e342d 100644 --- a/R/Ref_stagflation.R +++ b/R/Ref_stagflation.R @@ -6,12 +6,12 @@ #' #' @format A data frame with 4416 rows and 6 variables: #' \describe{ -#' \item{Citing_ItemID_Ref}{Identifier of the citing document} -#' \item{ItemID_Ref}{Identifier of the cited document} -#' \item{Author}{Author of the cited document} -#' \item{Year}{Year of publication of the cited document} -#' \item{Title}{Title of the cited document} -#' \item{Journal}{Journal of publication of the cited document (if an article)} +#' \item{source_id}{Identifier of the citing document} +#' \item{target_id}{Identifier of the cited document} +#' \item{target_author}{Author of the cited document} +#' \item{target_year}{Year of publication of the cited document} +#' \item{target_title}{Title of the cited document} +#' \item{target_journal}{Journal of publication of the cited document (if an article)} #' } #' @source Goutsmedt A. (2020) “From Stagflation to the Great Inflation: Explaining the 1970s US Economic #' Situation”. Revue d’Economie Politique, Forthcoming 2021. diff --git a/R/add_clusters.R b/R/add_clusters.R index 0d473cb..8334f2d 100644 --- a/R/add_clusters.R +++ b/R/add_clusters.R @@ -1,16 +1,23 @@ -add_clusters <- function(graphs, - weights = NULL, - clustering_method = c("leiden", "louvain", "fast_greedy", "infomap", "walktrap"), - objective_function = c("modularity", "CPM"), #leiden - resolution = 1, #leiden - n_iterations = 1000, #leiden - n_groups = NULL, #fast_greedy & walktrap - node_weights = NULL, #infomap & Leiden - trials = 10, #infomap - steps = 4, #walktrap - verbose = TRUE, - seed = NA - ){ +add_clusters <- function( + graphs, + weights = NULL, + clustering_method = c( + "leiden", + "louvain", + "fast_greedy", + "infomap", + "walktrap" + ), + objective_function = c("modularity", "CPM"), #leiden + resolution = 1, #leiden + n_iterations = 1000, #leiden + n_groups = NULL, #fast_greedy & walktrap + node_weights = NULL, #infomap & Leiden + trials = 10, #infomap + steps = 4, #walktrap + verbose = TRUE, + seed = NA +) { #' Detect and Add Clusters to Graphs #' #' @description @@ -93,7 +100,7 @@ add_clusters <- function(graphs, #' for the edges, called `cluster_leiden_from`, `cluster_leiden_to` and `cluster_leiden`. #' @details The function also #' automatically calculates the percentage of total nodes that are gathered in each - #' cluster, in the column `size_com`. + #' cluster, in the column `size_cluster_{clustering_method}`. #' @details To make plotting easier later, a zero is put before one-digit cluster identifier #' (cluster 5 becomes "05"; cluster 10 becomes "10"). Attributing a cluster identifier to edges #' allow for giving edges the same color of the nodes they are connecting together if the two nodes have the same color, @@ -106,18 +113,16 @@ add_clusters <- function(graphs, #' @examples #' library(networkflow) #' - #' nodes <- Nodes_stagflation |> - #' dplyr::rename(ID_Art = ItemID_Ref) |> - #' dplyr::filter(Type == "Stagflation") + #' nodes <- networkflow::Nodes_stagflation |> + #' dplyr::filter(source_type == "Stagflation") #' - #' references <- Ref_stagflation |> - #' dplyr::rename(ID_Art = Citing_ItemID_Ref) + #' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, - #' source_id = "ID_Art", - #' target_id = "ItemID_Ref", - #' time_variable = "Year", + #' source_id = "source_id", + #' target_id = "target_id", + #' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -137,92 +142,116 @@ add_clusters <- function(graphs, #' @export #' - if(length(clustering_method) > 1){ - cli::cli_abort(c("You did not choose any clustering method! You have the choice between: ", - "*" = "\"leiden\";", - "*" = "\"louvain\";", - "*" = "\"fast_greedy\";", - "*" = "\"infomap\";", - "*" = "\"walktrap\".")) + if (length(clustering_method) > 1) { + cli::cli_abort(c( + "You did not choose any clustering method! You have the choice between: ", + "*" = "\"leiden\";", + "*" = "\"louvain\";", + "*" = "\"fast_greedy\";", + "*" = "\"infomap\";", + "*" = "\"walktrap\"." + )) } - if(! clustering_method %in% c("leiden", "louvain", "fast_greedy", "infomap", "walktrap")){ - cli::cli_abort("The method you have chosen is not implemented within the function.") + if ( + !clustering_method %in% + c("leiden", "louvain", "fast_greedy", "infomap", "walktrap") + ) { + cli::cli_abort( + "The method you have chosen is not implemented within the function." + ) } - if(length(objective_function) > 1 & clustering_method == "leiden"){ - cli::cli_abort(c("You did not choose any objective function for the Leiden algorithm. You have the choice between: ", - "*" = "\"CPM\";", - "*" = "\"modularity\".")) - if(clustering_method %in% c("leiden", "louvain", "fast_greedy", "infomap", "walktrap")){ - cli::cli_alert_info("You are using the {.emph {.strong {clustering_method}}} clustering method.") + if (length(objective_function) > 1 & clustering_method == "leiden") { + cli::cli_abort(c( + "You did not choose any objective function for the Leiden algorithm. You have the choice between: ", + "*" = "\"CPM\";", + "*" = "\"modularity\"." + )) + if ( + clustering_method %in% + c("leiden", "louvain", "fast_greedy", "infomap", "walktrap") + ) { + cli::cli_alert_info( + "You are using the {.emph {.strong {clustering_method}}} clustering method." + ) + } } - } - - if(!is.na(seed)){ + if (!is.na(seed)) { set.seed(seed) } - if(inherits(graphs, "list")){ + if (inherits(graphs, "list")) { list <- TRUE - cluster_list_graph <- lapply(graphs, function(graph) detect_cluster(graph, - weights = weights, - clustering_method = clustering_method, - objective_function = objective_function, - resolution = resolution, - n_iterations = n_iterations, - n_groups = n_groups, - node_weights = node_weights, - trials = trials, - steps = steps, - list = list, - verbose = verbose)) + cluster_list_graph <- lapply(graphs, function(graph) { + detect_cluster( + graph, + weights = weights, + clustering_method = clustering_method, + objective_function = objective_function, + resolution = resolution, + n_iterations = n_iterations, + n_groups = n_groups, + node_weights = node_weights, + trials = trials, + steps = steps, + list = list, + verbose = verbose + ) + }) return(cluster_list_graph) } - if(inherits(graphs, "tbl_graph")){ + if (inherits(graphs, "tbl_graph")) { list <- FALSE - cluster_graph <- detect_cluster(graphs, - weights = weights, - clustering_method = clustering_method, - objective_function = objective_function, - resolution = resolution, - n_iterations = n_iterations, - n_groups = n_groups, - node_weights = node_weights, - trials = trials, - steps = steps, - list = list, - verbose = verbose) + cluster_graph <- detect_cluster( + graphs, + weights = weights, + clustering_method = clustering_method, + objective_function = objective_function, + resolution = resolution, + n_iterations = n_iterations, + n_groups = n_groups, + node_weights = node_weights, + trials = trials, + steps = steps, + list = list, + verbose = verbose + ) return(cluster_graph) } } # function in the tidygraph style to import Leiden community detection -group_leiden <- function(graph = graph, - objective_function = objective_function, - weights = weights, - resolution = resolution, - n_iterations = n_iterations, - node_weights = node_weights){ - igraph::cluster_leiden(graph, - resolution_parameter = resolution, - objective_function = objective_function, - weights = weights, - n_iterations = n_iterations, - vertex_weights = node_weights) %>% +group_leiden <- function( + graph = graph, + objective_function = objective_function, + weights = weights, + resolution = resolution, + n_iterations = n_iterations, + node_weights = node_weights +) { + igraph::cluster_leiden( + graph, + resolution_parameter = resolution, + objective_function = objective_function, + weights = weights, + n_iterations = n_iterations, + vertex_weights = node_weights + ) %>% igraph::membership() } # extracting the appropriate clustering function depending on the method chosen -extract_clustering_method <- function(clustering_method = clustering_method){ - . <- objective_function <- functions <- n_groups <- weights <- resolution <- n_iterations <- node_weights <- trials <- steps <- method <- graph <- NULL +extract_clustering_method <- function(clustering_method = clustering_method) { + . <- objective_function <- functions <- n_groups <- weights <- resolution <- n_iterations <- node_weights <- trials <- steps <- method <- graph <- NULL function_table <- dplyr::tribble( - ~ method, ~functions, - "leiden", rlang::expr(group_leiden(graph, objective_function = objective_function, weights = weights, resolution = resolution, n_iterations = n_iterations, node_weights = node_weights)), - "louvain", rlang::expr(tidygraph::group_louvain(weights = weights)), - "fast_greedy", rlang::expr(tidygraph::group_fast_greedy(weights = weights, n_groups = n_groups)), - "infomap", rlang::expr(tidygraph::group_infomap(weights = weights, node_weights = node_weights, trials = trials)), - "walktrap", rlang::expr(tidygraph::group_walktrap(weights = weights, steps = steps, n_groups = n_groups))) + ~method , ~functions , + "leiden" , rlang::expr(group_leiden(graph, objective_function = objective_function, weights = weights, resolution = resolution, n_iterations = n_iterations, node_weights = node_weights)) , + "louvain" , rlang::expr(tidygraph::group_louvain(weights = weights)) , + "fast_greedy" , rlang::expr(tidygraph::group_fast_greedy(weights = weights, n_groups = n_groups)) , + "infomap" , rlang::expr(tidygraph::group_infomap(weights = weights, node_weights = node_weights, trials = trials)) , + "walktrap" , rlang::expr(tidygraph::group_walktrap(weights = weights, steps = steps, n_groups = n_groups)) + ) fun <- function_table %>% dplyr::filter(method == clustering_method) %>% dplyr::pull(functions) %>% @@ -232,21 +261,23 @@ extract_clustering_method <- function(clustering_method = clustering_method){ } # function to detect the clusters on one graph -detect_cluster <- function(graph, - weights = weights, - clustering_method = clustering_method, - objective_function = objective_function, - resolution = resolution, - n_iterations = n_iterations, - n_groups = n_groups, - node_weights = node_weights, - trials = trials, - steps = steps, - list = list, - verbose = verbose){ +detect_cluster <- function( + graph, + weights = weights, + clustering_method = clustering_method, + objective_function = objective_function, + resolution = resolution, + n_iterations = n_iterations, + n_groups = n_groups, + node_weights = node_weights, + trials = trials, + steps = steps, + list = list, + verbose = verbose +) { . <- from <- to <- NULL - if(clustering_method %in% c("infomap", "leiden") & !is.null(node_weights)){ + if (clustering_method %in% c("infomap", "leiden") & !is.null(node_weights)) { node_weights <- graph %N>% dplyr::pull(node_weights) } @@ -258,18 +289,27 @@ detect_cluster <- function(graph, size_col <- paste0("size_cluster_", clustering_method) graph <- graph %N>% - dplyr::mutate({{ cluster_col }} := eval(fun), - {{ cluster_col }} := sprintf("%02d", eval(cluster_col)), - {{ size_col }} := dplyr::n()) %>% + dplyr::mutate( + {{ cluster_col }} := eval(fun), + {{ cluster_col }} := sprintf("%02d", eval(cluster_col)), + {{ size_col }} := dplyr::n() + ) %>% dplyr::group_by(dplyr::across({{ cluster_col }})) %>% - dplyr::mutate({{ size_col }} := dplyr::n()/eval(rlang::ensym(size_col))) %>% + dplyr::mutate( + {{ size_col }} := dplyr::n() / eval(rlang::ensym(size_col)) + ) %>% dplyr::ungroup() %E>% - dplyr::mutate("{ cluster_col }_from" := .N()[[cluster_col]][from], - "{ cluster_col }_to" := .N()[[cluster_col]][to], - {{ cluster_col }} := ifelse(eval(rlang::ensym(cluster_col_from)) == eval(rlang::ensym(cluster_col_to)), - eval(rlang::ensym(cluster_col_from)), - "00")) - if(verbose == TRUE){ + dplyr::mutate( + "{ cluster_col }_from" := .N()[[cluster_col]][from], + "{ cluster_col }_to" := .N()[[cluster_col]][to], + {{ cluster_col }} := ifelse( + eval(rlang::ensym(cluster_col_from)) == + eval(rlang::ensym(cluster_col_to)), + eval(rlang::ensym(cluster_col_from)), + "00" + ) + ) + if (verbose == TRUE) { nb_clusters <- graph %N>% dplyr::pull(cluster_col) %>% unique %>% @@ -278,10 +318,18 @@ detect_cluster <- function(graph, max_size <- graph %N>% dplyr::pull(size_col) %>% max() %>% - round(3) * 100 + round(3) * + 100 - if(list == TRUE) cli::cli_h1("Cluster detection for the {.val {graph %N>% as.data.frame() %>% dplyr::pull(time_window) %>% unique()}} period") - cli::cli_alert_info("The {.emph {clustering_method}} method detected {.val {nb_clusters}} clusters. The biggest cluster represents {.val {max_size}%} of the network.") + if (list == TRUE) { + cli::cli_h1( + "Cluster detection for the {.val {graph %N>% as.data.frame() %>% dplyr::pull(time_window) %>% unique()}} period" + ) + } + cli::cli_alert_info( + "The {.emph {clustering_method}} method detected {.val {nb_clusters}} clusters. The biggest cluster represents {.val {max_size}%} of the network." + ) } return(graph) } + diff --git a/R/add_node_roles.R b/R/add_node_roles.R index 93d64e4..72a7c66 100644 --- a/R/add_node_roles.R +++ b/R/add_node_roles.R @@ -36,18 +36,16 @@ #' @examples #' library(networkflow) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -274,3 +272,4 @@ add_node_roles_one <- function( dplyr::left_join(roles_tbl, by = ".node_id") %>% dplyr::select(-.node_id) } + diff --git a/R/build_dynamic_networks.R b/R/build_dynamic_networks.R index c7056d5..627ede4 100644 --- a/R/build_dynamic_networks.R +++ b/R/build_dynamic_networks.R @@ -1,346 +1,151 @@ -#' Creating One or Multiple Networks from a List of Nodes and Directed Edges +#' Build One or Multiple Networks from Bipartite Links #' #' @description #' `r lifecycle::badge("experimental")` #' -#' `build_network()` creates a network from a table of nodes and its -#' directed edges. That is a special case of the more general `build_dynamic_networks()`. -#' This function creates one or several tibble graphs (built with -#' [tidygraph](https://tidygraph.data-imaginist.com/)) from a table of nodes and its -#' directed edges. For instance, for bibliometric networks, you can give a list of -#' articles and the list of the references these articles cite. You can use it to -#' build a single network or multiple networks over different time windows. -#' -#' @param nodes -#' The table with all the nodes and their metadata. For instance, if your nodes are -#' articles, this table is likely to contain the year of publication, the name of the authors, -#' the title of the article, etc... The table must have one row per node. -#' -#' @param directed_edges -#' The table with of all the elements to which your nodes are connected. If your nodes are -#' articles, the `directed_edges` table can contain the list of the references cited -#' by these articles, the authors that have written these articles, or the affiliations -#' of the authors of these articles. -#' -#' @param source_id -#' The quoted name of the column with the unique identifier of each node. For instance, -#' for a bibliographic coupling network, the id of your citing documents. It corresponds -#' to the `source` argument of [biblionetwork](https://agoutsmedt.github.io/biblionetwork/) -#' functions. -#' -#' @param target_id -#' The quoted name of the column with the unique identifier of each element connected to the node (for -#' instance, the identifier of the reference cited by your node if the node is an article). -#' It corresponds to the `ref` argument of -#' [biblionetwork](https://agoutsmedt.github.io/biblionetwork/) functions. -#' -#' @param time_variable -#' The column with the temporal variable you want to use to build your windows for the -#' succession of networks. By default, `time_variable` is `NULL` and the function -#' will only build one network without taking into account any temporal variable. -#' -#' @param time_window -#' The length of your network relatively of the unity of the `time_variable` column. If you -#' use a variable in years as `time_variable` and you set `time_window` at 5, the function -#' will build network on five year windows. By default, `time_window` is `NULL` and the -#' function will only build one network. -#' -#' @param overlapping_window -#' Set to `FALSE` by default. If set to `TRUE`, and if `time_variable` and `time_window` not -#' `NULL`, the function will create a succession of networks for moving time windows. The windows are -#' moving one unit per one unit of the `time_variable`. For instance, for years, if `time_window` -#' set to 5, it creates networks for successive time windows like 1970-1974, 1971-1975, 1972-1976, etc. -#' -#' @param cooccurrence_method -#' Choose a cooccurrence method to build your indirect edges table. The function propose -#' three methods that depends on the [biblionetwork package](https://agoutsmedt.github.io/biblionetwork/) -#' and three methods that are implemented in it: -#' -#' - the coupling angle measure (see [biblionetwork::biblio_coupling()] for documentation); -#' - the coupling strength measure ([biblionetwork::coupling_strength()]); -#' - the coupling similarity measure ([biblionetwork:: coupling_similarity()]). -#' -#' @param backbone_method Method used to extract the network backbone. Choose between: +#' `build_dynamic_networks()` builds one or several `tbl_graph` networks from a +#' node table (`source_id`) and a bipartite link table (`source_id` -> `target_id`). `build_network()` is a wrapper for a single network. +#' +#' It supports two backbone extraction methods: +#' - structured filtering using coupling/cooccurrence measures from +#' [biblionetwork](https://agoutsmedt.github.io/biblionetwork/); +#' - statistical filtering using null models from +#' [backbone](https://github.com/zpneal/backbone) \insertCite{neal2022}{networkflow}. +#' +#' The function can build a single network or multiple networks across time windows. +#' +#' @param nodes Table of nodes and their metadata. One row per node. For example, a table +#' of articles with identifiers, authors, publication year, etc. +#' +#' @param directed_edges Table of bipartite links between `source_id` nodes and +#' `target_id` entities (e.g., article -> reference, author -> paper). +#' +#' @param source_id Quoted name of the source-side node identifier. +#' +#' @param target_id Quoted name of the target-side identifier linked to each source node. +#' +#' @param time_variable Optional name of the column with a temporal variable (e.g., publication year). +#' +#' @param time_window Optional size of the time window (in units of `time_variable`) to construct temporal networks. +#' +#' @param projection_method method used to extract the network backbone. Choose between: #' - `"structured"`: uses cooccurrence measures from the [biblionetwork](https://agoutsmedt.github.io/biblionetwork/) package; -#' - `"statistical"`: uses statistical models from the [backbone](https://github.com/djmurphy533/backbone) package. +#' - `"statistical"`: uses statistical models from the [backbone](https://github.com/zpneal/backbone) package. +#' Defaults to `"structured"`. The `"statistical"` method can be computationally slow on large networks. + #' -#' @param statistical_method For `backbone_method = "statistical"`, select the null model: one of -#' `"sdsm"`, `"fdsm"`, `"fixedfill"`, `"fixedfrow"`, `"fixedcol"`. +#' @param model Statistical null model from [backbone](https://github.com/zpneal/backbone): +#' one of `"sdsm"`, `"fdsm"`, `"fixedfill"`, `"fixedrow"`, `"fixedcol"`. +#' Required if `projection_method = "statistical"`. #' -#' @param alpha Significance threshold for statistical backbone extraction. Required if -#' `backbone_method = "statistical"`. #' -#' @param edges_threshold -#' Threshold value for building your edges. With a higher threshold, only the stronger links -#' will be kept. See the [biblionetwork package](https://agoutsmedt.github.io/biblionetwork/) -#' documentation and the `cooccurrence_method` parameter. +#' @param alpha Significance threshold for statistical backbone filtering. Required if +#' `projection_method = "statistical"`. Lower values keep fewer edges. #' -#' @param compute_size -#' Set to `FALSE` by default. If `TRUE`, the function uses the `directed_edges` data -#' to calculate how many directed edges a node receives (as a target). If `directed_edges` -#' is a table of direct citations, the functions calculates the number of time a node -#' is cited by the other nodes. You need to have the `target_id` in the `nodes` table -#' to make the link with the targetted nodes in the `directed_edges` table. +#' @param cooccurrence_method For `projection_method = "structured"`, choose the coupling method: +#' - `"coupling_angle"` +#' - `"coupling_strength"`; +#' - `"coupling_similarity"`. #' -#' @param keep_singleton -#' Set to `FALSE` by default. If `TRUE`, the function removes the nodes that have no -#' undirected edges, i.e. no cooccurrence with any other nodes. In graphical terms, -#' these nodes are alone in the network, with no link with other nodes. +#' @param edges_threshold Threshold used to filter weak edges in structured mode. #' -#' @param filter_components -#' Set to `TRUE` if you want to run `networkflow::filter_components()` -#' to filter the components of the network(s) and keep only the biggest component(s). If -#' you don't change the defaults parameters of `networkflow::filter_components()`, -#' it will keep only the main component. +#' @param overlapping_window Logical. If `TRUE`, builds networks using rolling time windows. #' -#' @param ... -#' Additional arguments from `networkflow::filter_components()`. +#' @param compute_size Logical. If `TRUE`, computes the number of incoming edges per node (e.g., citation count). #' -#' @param verbose -#' Set to `FALSE` if you don't want the function to display different sort of information. +#' @param keep_singleton Logical. If `FALSE`, removes nodes with no edges in the final network. #' -#' @details `build_network()` has been added for convenience but it is just -#' a special case of the more general `build_dynamic_networks()`, with +#' @param filter_components Logical. If `TRUE`, keeps only the main component(s) using `networkflow::filter_components()`. #' +#' @param ... Additional arguments passed to `filter_components()`. #' +#' @param backbone_args Optional list of additional arguments passed to the +#' backbone extraction call. If `backbone_args` includes `alpha` or `model`, +#' those values override function arguments. #' -#' @return If `time_window` is `NULL`, the function computes only -#' one network and return a tidygraph object built with [tbl_graph()][tidygraph::tbl_graph()]. -#' If `time_variable` and `time_window` are not `NULL`, the function returns a list -#' of tidygraph networks, for each time window. +#' @param verbose Logical. If `TRUE`, displays progress messages. +#' +#' @details +#' The function uses bipartite links (`source_id` -> `target_id`) to produce +#' source-side networks. +#' +#' If `time_variable` and `time_window` are provided, it builds one network per +#' time window (rolling or non-overlapping). Otherwise it builds a single network. +#' +#' `projection_method = "structured"` applies coupling/cooccurrence filtering. +#' `projection_method = "statistical"` applies a statistical backbone model. #' #' @examples #' library(networkflow) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' -#' temporal_networks <- build_dynamic_networks(nodes = nodes, +#' # Structured backbone (cooccurrence) +#' net_structured <- build_dynamic_networks( +#' nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", +#' time_window = 20, +#' projection_method = "structured", #' cooccurrence_method = "coupling_similarity", +#' edges_threshold = 1 +#' ) +#' +#' # Statistical backbone (backbone package) +#' net_statistical <- build_dynamic_networks( +#' nodes = nodes, +#' directed_edges = references, +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' time_window = 20, -#' edges_threshold = 1, -#' overlapping_window = TRUE) +#' projection_method = "statistical", +#' model = "sdsm", +#' alpha = 0.05, +#' backbone_args = list(mtc = "holm") +#' ) #' -#' temporal_networks[[1]] +#' @return +#' - A single tidygraph object if `time_window` is `NULL`; +#' - A list of tidygraph objects (one per time window) otherwise. +#' +#' @seealso [biblionetwork::biblio_coupling()], [backbone::backbone()] +#' +#' @references +#' \insertAllCited{} #' #' @export -build_dynamic_networks <- function(nodes, - directed_edges, - source_id, - target_id, - time_variable = NULL, - time_window = NULL, - cooccurrence_method = c("coupling_angle","coupling_strength","coupling_similarity"), - overlapping_window = FALSE, - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ..., - verbose = TRUE) -{ - size <- node_size <- N <- method <- NULL - - - # Making sure the table is a datatable - nodes <- data.table::data.table(nodes) - directed_edges <- data.table::data.table(directed_edges) - cooccurrence_methods <- c("coupling_angle","coupling_strength","coupling_similarity") - - # Checking various problems: lacking method, - if(length(cooccurrence_method) > 1){ - cli::cli_abort(c( - "You did not choose any method for cooccurrence computation. You have to choose between: ", - "*" = "\"coupling_angle\";", - "*" = "\"coupling_strength\";", - "*" = "\"coupling_similarity\".")) - } - if(!cooccurrence_method %in% cooccurrence_methods){ - cli::cli_abort(c( - "You did not choose an existing method for cooccurrence computation. You have to choose between: ", - "*" = "\"coupling_angle\";", - "*" = "\"coupling_strength\";", - "*" = "\"coupling_similarity\".")) - } - if(nodes[, .N, source_id, env = list(source_id=source_id)][N > 1, .N] > 0){ - cli::cli_alert_warning("Some identifiers in your column {.field {source_id}} in your nodes table are not unique. You need only one row per node.") - } - - if(! is.null(time_window) & is.null(time_variable)){ - cli::cli_abort("You cannot have a {.emph time_window} if you don't give any column with a temporal variable. - Put a column in {.emph time_variable} or remove the {.emph time_window}.") - } - - # giving information on the method - - if(verbose == TRUE){ - cli::cli_alert_info("The method use for co-occurence is the {.emph {cooccurrence_method}} method.") - cli::cli_alert_info("The edge threshold is: {.val {edges_threshold}}.") - if(keep_singleton == FALSE) cli::cli_alert_info("We remove the nodes that are alone with no edge. \n\n") - } - - # let's extract the information we need - Nodes_coupling <- data.table::copy(nodes) - Nodes_coupling[, source_id := as.character(source_id), - env = list(source_id = source_id)] - - if(is.null(time_variable)){ - time_variable <- "fake_column" - Nodes_coupling[, time_variable := 1, - env = list(time_variable = time_variable)] - } - - if(! target_id %in% colnames(Nodes_coupling) & compute_size == TRUE) - { - cli::cli_abort("You don't have the column {.field {target_id}} in your nodes table. Set {.emph compute_size} to {.val FALSE}.") - } - - if(compute_size == TRUE){ - Nodes_coupling[, target_id := as.character(target_id), - env = list(target_id = target_id)] - } - - Edges <- data.table::copy(directed_edges) - Edges <- Edges[, .SD, .SDcols = c(source_id, target_id)] - Edges[, c(source_id, target_id) := lapply(.SD, as.character), .SDcols = c(source_id, target_id)] - - ######################### Dynamics networks ********************* - - # Find the time_window - Nodes_coupling <- Nodes_coupling[order(time_variable), env = list(time_variable = time_variable)] - Nodes_coupling[, time_variable := as.integer(time_variable), - env = list(time_variable = time_variable)] - - first_year <- Nodes_coupling[, min(as.integer(time_variable)), - env = list(time_variable = time_variable)] - last_year <- Nodes_coupling[, max(as.integer(time_variable)), - env = list(time_variable = time_variable)] - - if(!is.null(time_window)){ - if(last_year - first_year + 1 < time_window){ - cli::cli_alert_warning("Your time window is larger than the number of distinct values of {.field {time_variable}}") - } - } - - if(is.null(time_window)){ - all_years <- first_year - time_window <- last_year - first_year + 1 - } else { - if(overlapping_window == TRUE){ - last_year <- last_year - time_window + 1 - all_years <- first_year:last_year - } else { - all_years <- seq(first_year, last_year, by = time_window) - if(all_years[length(all_years)] + (time_window - 1) > last_year){ - cli::cli_warn("Your last network is shorter than the other(s) because the cutting by time window does not give a round count. - The last time unity in your data is {.val {last_year}}, but the upper limit of your last time window is - {.val {all_years[length(all_years)] + (time_window - 1)}}.") - } - } - } - - # Prepare our list - tbl_coup_list <- list() - - for (Year in all_years) { - nodes_of_the_year <- Nodes_coupling[time_variable >= Year & time_variable < (Year + time_window), - env = list(time_variable = time_variable, Year = Year)] - - if(time_variable != "fake_column"){ - nodes_of_the_year[, time_window := paste0(Year, "-", Year + time_window - 1), - env = list(Year = Year)] - if(verbose == TRUE) cli::cli_h1("Creation of the network for the {.val {Year}}-{.val {Year + time_window - 1}} window.") - } else { - nodes_of_the_year <- nodes_of_the_year[, -c("fake_column")] - } - - edges_of_the_year <- Edges[source_id %in% nodes_of_the_year[, source_id], - env = list(source_id = source_id)] - - # size of nodes - if(compute_size == TRUE){ - nb_cit <- edges_of_the_year[source_id %in% nodes_of_the_year[, source_id], .N, target_id, - env = list(source_id = source_id, target_id = target_id)] - colnames(nb_cit)[colnames(nb_cit) == "N"] <- "node_size" - - if("node_size" %in% colnames(Nodes_coupling) == TRUE) - { - cli::cli_warn("You already have a column name {.field node_size}. The content of the column will be replaced.") - } - nodes_of_the_year <- data.table::merge.data.table(nodes_of_the_year, - nb_cit, - by = target_id, - all.x = TRUE) - nodes_of_the_year[is.na(node_size), node_size := 0] - } - - # coupling - biblio_functions <- data.table::data.table(biblio_function = c(rlang::expr(biblionetwork::biblio_coupling), - rlang::expr(biblionetwork::coupling_strength), - rlang::expr(biblionetwork::coupling_similarity)), - method = c("coupling_angle", - "coupling_strength", - "coupling_similarity")) - biblio_function <- biblio_functions[method == cooccurrence_method][["biblio_function"]][[1]] - edges_of_the_year <- rlang::expr((!!biblio_function)(dt = edges_of_the_year, - source = rlang::inject(source_id), - ref = rlang::inject(target_id), - weight_threshold = rlang::inject(edges_threshold))) %>% - eval() - - # remove nodes with no edges - if(keep_singleton==FALSE){ - nodes_of_the_year <- nodes_of_the_year[source_id %in% edges_of_the_year$from | source_id %in% edges_of_the_year$to, env=list(source_id=source_id)] - } - - # make tbl - if(length(all_years) == 1){ - tbl_coup_list <- tidygraph::tbl_graph(nodes = nodes_of_the_year, - edges = edges_of_the_year, - directed = FALSE, - node_key = source_id) - } else { - tbl_coup_list[[paste0(Year, "-", Year + time_window - 1)]] <- tidygraph::tbl_graph(nodes = nodes_of_the_year, - edges = edges_of_the_year, - directed = FALSE, - node_key = source_id) - } - } - if(filter_components == TRUE){ - tbl_coup_list <- filter_components(tbl_coup_list, ...) - } - return (tbl_coup_list) -} - -#' @rdname build_dynamic_networks -#' @export - -build_dynamic_networks2 <- function(nodes, - directed_edges, - source_id, - target_id, - time_variable = NULL, - time_window = NULL, - backbone_method = c("statistical", "structured"), - statistical_method = c("sdsm", "fdsm", "fixedfill", "fixedfrow", "fixedcol"), - alpha = NULL, - coupling_measure = c("coupling_angle", "coupling_strength", "coupling_similarity"), - edges_threshold = 1, - overlapping_window = FALSE, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ..., - verbose = TRUE) { +#' +build_dynamic_networks <- function( + nodes, + directed_edges, + source_id, + target_id, + time_variable = NULL, + time_window = NULL, + projection_method = c("structured", "statistical"), + model = c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol"), + alpha = NULL, + cooccurrence_method = c( + "coupling_angle", + "coupling_strength", + "coupling_similarity" + ), + edges_threshold = 1, + overlapping_window = FALSE, + compute_size = FALSE, + keep_singleton = FALSE, + filter_components = FALSE, + ..., + backbone_args = list(), + verbose = TRUE +) { size <- node_size <- N <- method <- NULL # Making sure the table is a datatable @@ -348,41 +153,31 @@ build_dynamic_networks2 <- function(nodes, directed_edges <- data.table::data.table(directed_edges) # Checking the methods - backbone_methods = c("statistical", "structured") - - coupling_measures <- c("coupling_angle", - "coupling_strength", - "coupling_similarity") + projection_methods <- c("structured", "statistical") - statistical_methods <- c("sdsm", "fdsm", "fixedfill", "fixedfrow", "fixedcol") + cooccurrence_methods <- c( + "coupling_angle", + "coupling_strength", + "coupling_similarity" + ) + statistical_methods <- c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol") - if (length(backbone_method) > 1) { - cli::cli_abort( - c( - "You did not choose any method for extracting the backbone. You have to choose between: ", - "*" = "\"statistical\";", - "*" = "\"structured\"." - ) - ) - } - - if (!backbone_method %in% backbone_methods) { - cli::cli_abort( - c( - "You did not choose any method for extracting the backbone. You have to choose between: ", - "*" = "\"statistical\";", - "*" = "\"structured\";" + if (length(projection_method) > 1) { + projection_method <- match.arg(projection_method, projection_methods) + if (verbose == TRUE && missing(projection_method)) { + cli::cli_alert_info( + "No projection_method provided. Defaulting to {.val {projection_method}}." ) - ) + } + } else { + projection_method <- match.arg(projection_method, projection_methods) } - # check various setting for the structured methods - - if (backbone_method == "structured") { - + # check various setting for the structured/statistical methods + if (projection_method == "structured") { # Checking various problems: lacking method, - if (length(coupling_measure) > 1) { + if (length(cooccurrence_method) > 1) { cli::cli_abort( c( "For structured backbone extraction, you have to choose a coupling measure among: ", @@ -393,7 +188,7 @@ build_dynamic_networks2 <- function(nodes, ) } - if (!coupling_measure %in% coupling_measures) { + if (!cooccurrence_method %in% cooccurrence_methods) { cli::cli_abort( c( "For structured backbone extraction, you have to choose a coupling measure among: ", @@ -403,49 +198,34 @@ build_dynamic_networks2 <- function(nodes, ) ) } - - } - - # check various setting for the statistical methods - if (backbone_method == "statistical") { - # check if a model is given - if (length(statistical_method) > 1) { + } else if (projection_method == "statistical") { + if (is.null(model) || length(model) > 1) { cli::cli_abort( c( "For statistical backbone extraction, you have to choose a model: ", "*" = "\"sdsm\";", "*" = "\"fdsm\";", "*" = "\"fixedfill\".", - "*" = "\"fixedfrow\".", + "*" = "\"fixedrow\".", "*" = "\"fixedcol\"." ) ) } - if (!statistical_method %in% statistical_methods) { - cli::cli_abort( - c( - "For statistical backbone extraction, you have to choose a model: ", - "*" = "\"sdsm\";", - "*" = "\"fdsm\";", - "*" = "\"fixedfill\".", - "*" = "\"fixedfrow\".", - "*" = "\"fixedcol\"." - ) - ) - } + model <- match.arg(model, statistical_methods) # check if alpha is given - if (is.null(alpha)) { + if (is.null(alpha) && is.null(backbone_args$alpha)) { cli::cli_abort( "For statistical backbone extraction, you have to choose a significance level alpha." ) } - } # warning if the source_id is not unique - if (nodes[, .N, source_id, env = list(source_id = source_id)][N > 1, .N] > 0) { + if ( + nodes[, .N, source_id, env = list(source_id = source_id)][N > 1, .N] > 0 + ) { cli::cli_alert_warning( "Some identifiers in your column {.field {source_id}} in your nodes table are not unique. You need only one row per node." ) @@ -461,56 +241,82 @@ build_dynamic_networks2 <- function(nodes, # VERBOSE if (verbose == TRUE) { - if (length(statistical_method > 0)) + if (!missing(projection_method)) { cli::cli_alert_info(paste( - "We extract the network backbone using the", - backbone_method, - "method." + "Backbone method selected:", + projection_method )) + } - if (keep_singleton == FALSE) - cli::cli_alert_info("Keep_singleton == FALSE: removing the nodes that are alone with no edge. \n\n") + if (keep_singleton == FALSE) { + cli::cli_alert_info( + "Keep_singleton == FALSE: removing the nodes that are alone with no edge. \n\n" + ) + } } - # CHECKING THE DATA # NODES nodes_coupling <- data.table::copy(nodes) - nodes_coupling[, source_id := as.character(source_id), env = list(source_id = source_id)] + nodes_coupling[, + source_id := as.character(source_id), + env = list(source_id = source_id) + ] if (is.null(time_variable)) { time_variable <- "fake_column" - nodes_coupling[, time_variable := 1, env = list(time_variable = time_variable)] + nodes_coupling[, + time_variable := 1, + env = list(time_variable = time_variable) + ] } - - if (!target_id %in% colnames(nodes_coupling) & - compute_size == TRUE) { + if ( + !target_id %in% colnames(nodes_coupling) & + compute_size == TRUE + ) { cli::cli_abort( "You don't have the column {.field {target_id}} in your nodes table. Set {.emph compute_size} to {.val FALSE}." ) } if (compute_size == TRUE) { - nodes_coupling[, target_id := as.character(target_id), env = list(target_id = target_id)] + nodes_coupling[, + target_id := as.character(target_id), + env = list(target_id = target_id) + ] } # EDGES edges <- data.table::copy(directed_edges) - edges <- edges[, .SD, .SDcols = c(source_id, target_id)] # we keep only the columns we need - edges <- unique(edges) # in case there are some duplicates - edges[, c(source_id, target_id) := lapply(.SD, as.character), .SDcols = c(source_id, target_id)] # we need to have character columns + edges <- data.table::data.table( + from = as.character(edges[[source_id]]), + to = as.character(edges[[target_id]]) + ) # canonical edge columns + edges <- unique(edges) ######################### Dynamics networks ********************* # define the time window - nodes_coupling <- nodes_coupling[order(time_variable), env = list(time_variable = time_variable)] - nodes_coupling[, time_variable := as.integer(time_variable), env = list(time_variable = time_variable)] - - first_year <- nodes_coupling[, min(as.integer(time_variable)), env = list(time_variable = time_variable)] - last_year <- nodes_coupling[, max(as.integer(time_variable)), env = list(time_variable = time_variable)] + nodes_coupling <- nodes_coupling[ + order(time_variable), + env = list(time_variable = time_variable) + ] + nodes_coupling[, + time_variable := as.integer(time_variable), + env = list(time_variable = time_variable) + ] + + first_year <- nodes_coupling[, + min(as.integer(time_variable)), + env = list(time_variable = time_variable) + ] + last_year <- nodes_coupling[, + max(as.integer(time_variable)), + env = list(time_variable = time_variable) + ] if (!is.null(time_window)) { if (last_year - first_year + 1 < time_window) { @@ -543,86 +349,105 @@ build_dynamic_networks2 <- function(nodes, tbl_coup_list <- list() for (year in all_years) { - nodes_of_the_year <- nodes_coupling[time_variable >= year & - time_variable < (year + time_window), env = list(time_variable = time_variable, year = year)] + nodes_of_the_year <- nodes_coupling[ + time_variable >= year & + time_variable < (year + time_window), + env = list(time_variable = time_variable, year = year) + ] if (time_variable != "fake_column") { - nodes_of_the_year[, time_window := paste0(year, "-", year + time_window - 1), env = list(year = year)] + nodes_of_the_year[, + time_window := paste0(year, "-", year + time_window - 1), + env = list(year = year) + ] - if (verbose == TRUE) + if (verbose == TRUE) { cli::cli_h1( "Generation of the network for the {.val {year}}-{.val {year + time_window - 1}} time window." ) + } } else { nodes_of_the_year <- nodes_of_the_year[, -c("fake_column")] } - edges_of_the_year <- edges[source_id %in% nodes_of_the_year[, source_id], env = list(source_id = source_id)] + node_ids <- nodes_of_the_year[[source_id]] + edges_of_the_year <- edges[from %in% node_ids] # size of nodes if (compute_size == TRUE) { - nb_cit <- edges_of_the_year[source_id %in% nodes_of_the_year[, source_id], .N, target_id, env = list(source_id = source_id, target_id = target_id)] + nb_cit <- edges_of_the_year[from %in% node_ids, .N, by = to] + data.table::setnames(nb_cit, "to", target_id) colnames(nb_cit)[colnames(nb_cit) == "N"] <- "node_size" - if ("node_size" %in% colnames(nodes_coupling) == TRUE) - { + if ("node_size" %in% colnames(nodes_coupling) == TRUE) { cli::cli_warn( "You already have a column name {.field node_size}. The content of the column will be replaced." ) } - nodes_of_the_year <- data.table::merge.data.table(nodes_of_the_year, - nb_cit, - by = target_id, - all.x = TRUE) + nodes_of_the_year <- data.table::merge.data.table( + nodes_of_the_year, + nb_cit, + by = target_id, + all.x = TRUE + ) nodes_of_the_year[is.na(node_size), node_size := 0] } - - # backbone - if (backbone_method == "statistical") { - # prepare backbone function - backbone_functions <- - data.table::data.table( - biblio_function = c( - rlang::expr(backbone::sdsm), - rlang::expr(backbone::fdsm), - rlang::expr(backbone::fixedfrow), - rlang::expr(backbone::fixedcol), - rlang::expr(backbone::fixedfill) - ), - method = c("sdsm", "fdsm", "fixedfrow", "fixedcol", "fixedfill") - ) - - backbone_functions <- backbone_functions[method == statistical_method][["biblio_function"]][[1]] - + if (projection_method == "statistical") { # Evaluate the expression and catch internal errors to backbone package + tryCatch( + { + from_pref <- paste0("A:", edges_of_the_year$from) + to_pref <- paste0("B:", edges_of_the_year$to) + bip_graph <- igraph::graph_from_data_frame( + data.frame(from = from_pref, to = to_pref), + directed = FALSE + ) + node_ids_pref <- paste0("A:", node_ids) + igraph::V(bip_graph)$type <- !(igraph::V(bip_graph)$name %in% + node_ids_pref) - tryCatch({ - # using backbone with edgelist is simpler but lead to error in backbone function - edges_of_the_year <- - rlang::expr((!!backbone_functions)( - B = as.data.frame(edges_of_the_year), - alpha = rlang::inject(alpha) - )) %>% - eval() %>% - data.table::as.data.table() - - }, error = function(e) { - stop( - "The backbone function failed with an error. Read the backbone documentation for more information. Error message: ", - e$message - ) - }) - } + backbone_graph <- do.call( + backbone::backbone_from_projection, + c(list(B = bip_graph, alpha = alpha, model = model), backbone_args) + ) + if (inherits(backbone_graph, "igraph")) { + edges_of_the_year <- igraph::as_data_frame( + backbone_graph, + what = "edges" + ) + } else if (!is.null(backbone_graph$backbone)) { + edges_of_the_year <- igraph::as_data_frame( + backbone_graph$backbone, + what = "edges" + ) + } else { + stop("The backbone function returned an unexpected object type.") + } + + edges_of_the_year <- data.table::as.data.table(edges_of_the_year) + edges_of_the_year[, from := sub("^A:", "", from)] + edges_of_the_year[, to := sub("^A:", "", to)] + edges_of_the_year[, from := sub("^B:", "", from)] + edges_of_the_year[, to := sub("^B:", "", to)] + }, + error = function(e) { + stop( + "The backbone function failed with an error. Read the backbone documentation for more information. Error message: ", + e$message + ) + } + ) + } # coupling - if (backbone_method == "structured") { + if (projection_method == "structured") { biblio_functions <- data.table::data.table( biblio_function = c( @@ -637,42 +462,50 @@ build_dynamic_networks2 <- function(nodes, ) ) - biblio_function <- biblio_functions[method == coupling_measure][["biblio_function"]][[1]] + biblio_function <- biblio_functions[method == cooccurrence_method][[ + "biblio_function" + ]][[1]] # evaluate the expression and catch internal errors to biblionetwork package - tryCatch({ - edges_of_the_year <- - rlang::expr((!!biblio_function)( - dt = edges_of_the_year, - source = rlang::inject(source_id), - ref = rlang::inject(target_id), - weight_threshold = rlang::inject(edges_threshold) + tryCatch( + { + edges_for_biblio <- data.table::copy(edges_of_the_year) + data.table::setnames( + edges_for_biblio, + c("from", "to"), + c(source_id, target_id) ) - ) %>% - eval() - - }, error = function(e) { - stop( - "The coupling function failed with an error. Read the biblionetwork documentation for more information. Error message: ", - e$message - ) - }) - + edges_of_the_year <- + rlang::expr((!!biblio_function)( + dt = edges_for_biblio, + source = rlang::inject(source_id), + ref = rlang::inject(target_id), + weight_threshold = rlang::inject(edges_threshold) + )) %>% + eval() + }, + error = function(e) { + stop( + "The coupling function failed with an error. Read the biblionetwork documentation for more information. Error message: ", + e$message + ) + } + ) } - edges_of_the_year[, source_id := from] - edges_of_the_year[, target_id := to] - # remove nodes with no edges if (keep_singleton == FALSE) { - nodes_of_the_year <- nodes_of_the_year[source_id %in% edges_of_the_year$from | - source_id %in% edges_of_the_year$to, env = list(source_id = source_id)] + nodes_of_the_year <- nodes_of_the_year[ + source_id %in% + edges_of_the_year$from | + source_id %in% edges_of_the_year$to, + env = list(source_id = source_id) + ] } # make tbl - if (length(all_years) == 1) - { + if (length(all_years) == 1) { tbl_coup_list <- tidygraph::tbl_graph( nodes = nodes_of_the_year, edges = edges_of_the_year, @@ -693,33 +526,62 @@ build_dynamic_networks2 <- function(nodes, if (filter_components == TRUE) { tbl_coup_list <- filter_components(tbl_coup_list, ...) } - return (tbl_coup_list) + return(tbl_coup_list) } -#' @rdname build_dynamic_networks +#' Build a single network +#' +#' Convenience wrapper around [build_dynamic_networks()] for a single network. +#' +#' @inheritParams build_dynamic_networks +#' @param projection_method Method used to build the single network. Must be +#' one of `"structured"` or `"statistical"`. +#' @param cooccurrence_method Cooccurrence method used by the structured workflow. #' @export - -build_network <- function(nodes, - directed_edges, - source_id, - target_id, - cooccurrence_method = c("coupling_angle","coupling_strength","coupling_similarity"), - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ...){ -graph <- build_dynamic_networks(nodes = nodes, - directed_edges = directed_edges, - source_id = source_id, - target_id = target_id, - cooccurrence_method = cooccurrence_method, - edges_threshold = edges_threshold, - compute_size = compute_size, - keep_singleton = keep_singleton, - filter_components = FALSE, - ..., - verbose = FALSE) -if(filter_components == TRUE) graph <- filter_components(graph, ...) -return(graph) +build_network <- function( + nodes, + directed_edges, + source_id, + target_id, + projection_method, + cooccurrence_method = c( + "coupling_angle", + "coupling_strength", + "coupling_similarity" + ), + edges_threshold = 1, + compute_size = FALSE, + keep_singleton = FALSE, + filter_components = FALSE, + ... +) { + if (missing(projection_method)) { + cli::cli_abort( + "Please provide {.arg projection_method}: either {.val structured} or {.val statistical}." + ) + } + projection_method <- match.arg( + projection_method, + c("structured", "statistical") + ) + + graph <- build_dynamic_networks( + nodes = nodes, + directed_edges = directed_edges, + source_id = source_id, + target_id = target_id, + projection_method = projection_method, + cooccurrence_method = cooccurrence_method, + edges_threshold = edges_threshold, + compute_size = compute_size, + keep_singleton = keep_singleton, + filter_components = FALSE, + ..., + verbose = FALSE + ) + if (filter_components == TRUE) { + graph <- filter_components(graph, ...) + } + graph } + diff --git a/R/build_dynamic_networks2.R b/R/build_dynamic_networks2.R deleted file mode 100644 index ecfb95b..0000000 --- a/R/build_dynamic_networks2.R +++ /dev/null @@ -1,536 +0,0 @@ -#' Creating One or Multiple Networks Using Structured or Statistical Backbone Extraction -#' -#' @description -#' `r lifecycle::badge("experimental")` -#' -#' `build_dynamic_networks2()` builds one or several networks (as tidygraph objects) -#' from a table of nodes and directed edges, with support for both structured cooccurrence -#' methods and statistical backbone extraction using the [backbone](https://github.com/zpneal/backbone) -#' package \insertCite{neal2022}{networkflow}. -#' The function is useful for constructing bibliometric or affiliation networks across -#' static or dynamic time windows. -#' -#' @param nodes Table of nodes and their metadata. One row per node. For example, a table -#' of articles with identifiers, authors, publication year, etc. -#' -#' @param directed_edges Table of edges representing the links between nodes and associated entities -#' (e.g., references, authors, affiliations). -#' -#' @param source_id Quoted name of the column giving the unique identifier of each node. -#' -#' @param target_id Quoted name of the column giving the identifier of the element linked to each node. -#' -#' @param time_variable Optional name of the column with a temporal variable (e.g., publication year). -#' -#' @param time_window Optional size of the time window (in units of `time_variable`) to construct temporal networks. -#' -#' @param backbone_method Method used to extract the network backbone. Choose between: -#' - `"structured"`: uses cooccurrence measures from the [biblionetwork](https://agoutsmedt.github.io/biblionetwork/) package; -#' - `"statistical"`: uses statistical models from the [backbone](https://github.com/djmurphy533/backbone) package. -#' Defaults to `"structured"`. -#' -#' @param model Null model used by the [backbone](https://github.com/zpneal/backbone) -#' package: one of `"sdsm"`, `"fdsm"`, `"fixedfill"`, `"fixedrow"`, `"fixedcol"`. Required if -#' `backbone_method = "statistical"`. These correspond to model names in `backbone` and are passed -#' through to the selected backbone function. -#' -#' -#' @param alpha Significance threshold for statistical backbone extraction. Required if -#' `backbone_method = "statistical"`. Lower values keep fewer edges (stricter filtering). -#' -#' @param coupling_measure For `backbone_method = "structured"`, choose the cooccurrence method: -#' - `"coupling_angle"` (biblio_coupling); -#' - `"coupling_strength"`; -#' - `"coupling_similarity"`. -#' -#' @param edges_threshold Threshold for edge weight filtering in structured methods. -#' -#' @param overlapping_window Logical. If `TRUE`, builds networks using rolling time windows. -#' -#' @param compute_size Logical. If `TRUE`, computes the number of incoming edges per node (e.g., citation count). -#' -#' @param keep_singleton Logical. If `FALSE`, removes nodes with no edges in the final network. -#' -#' @param filter_components Logical. If `TRUE`, keeps only the main component(s) using `networkflow::filter_components()`. -#' -#' @param ... Additional arguments passed to `filter_components()`. -#' -#' @param backbone_args Optional list of additional arguments passed to -#' [backbone::backbone_from_projection()]. Use this to set parameters like `mtc`, -#' `signed`, `missing_as_zero`, or `trials`. If `backbone_args` includes `alpha` or -#' `model`, those values override the corresponding function arguments. -#' -#' @param verbose Logical. If `TRUE`, displays progress messages. -#' -#' @details -#' `build_dynamic_networks2()` generalizes `build_dynamic_networks()` by adding support for -#' statistical backbone extraction using null models from the `backbone` package -#' \insertCite{neal2022}{networkflow}. The cooccurence methods used in -#' `build_dynamic_networks()` can be viewed as deterministic (structured) methods to extract -#' the network backbone. The backbone is defined as the significant edges in the network. -#' -#' As with `build_dynamic_networks()`, the function constructs networks for each time window. If `time_variable` and `time_window` are defined, the function constructs networks -#' for each time window (sliding or non-overlapping). Otherwise, it builds a single static network. -#' -#' If `backbone_method = "structured"`, cooccurrence edges are computed using bibliometric coupling -#' techniques. The term structured refers to deterministic methods based on thresholding cooccurrence measures. -#' If `backbone_method = "statistical"`, the function applies a `backbone` null model to the -#' edgelist for each time window and keeps only statistically significant edges at the chosen `alpha`. -#' The model is selected via `model` and follows `backbone`'s nomenclature: `"sdsm"`, `"fdsm"`, -#' `"fixedfill"`, `"fixedrow"`, or `"fixedcol"`. Only these models are currently supported. -#' -#' @examples -#' library(networkflow) -#' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") -#' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) -#' -#' # Structured backbone (cooccurrence) -#' net_structured <- build_dynamic_networks2( -#' nodes = nodes, -#' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", -#' time_window = 20, -#' backbone_method = "structured", -#' coupling_measure = "coupling_similarity", -#' edges_threshold = 1 -#' ) -#' -#' # Statistical backbone (backbone package) -#' net_statistical <- build_dynamic_networks2( -#' nodes = nodes, -#' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", -#' time_window = 20, -#' backbone_method = "statistical", -#' model = "sdsm", -#' alpha = 0.05, -#' backbone_args = list(mtc = "holm") -#' ) -#' -#' @return -#' - A single tidygraph object if `time_window` is `NULL`; -#' - A list of tidygraph objects (one per time window) otherwise. -#' -#' @seealso [biblionetwork::biblio_coupling()], [backbone::backbone_from_projection()] -#' -#' @references -#' \insertAllCited{} -#' -#' @export -#' - -build_dynamic_networks2 <- function( - nodes, - directed_edges, - source_id, - target_id, - time_variable = NULL, - time_window = NULL, - backbone_method = c("structured", "statistical"), - model = c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol"), - alpha = NULL, - coupling_measure = c( - "coupling_angle", - "coupling_strength", - "coupling_similarity" - ), - edges_threshold = 1, - overlapping_window = FALSE, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ..., - backbone_args = list(), - verbose = TRUE -) { - size <- node_size <- N <- method <- NULL - - # Making sure the table is a datatable - nodes <- data.table::data.table(nodes) - directed_edges <- data.table::data.table(directed_edges) - - # Checking the methods - backbone_methods <- c("structured", "statistical") - - coupling_measures <- c( - "coupling_angle", - "coupling_strength", - "coupling_similarity" - ) - - statistical_methods <- c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol") - - if (length(backbone_method) > 1) { - backbone_method <- match.arg(backbone_method, backbone_methods) - if (verbose == TRUE && missing(backbone_method)) { - cli::cli_alert_info( - "No backbone_method provided. Defaulting to {.val {backbone_method}}." - ) - } - } else { - backbone_method <- match.arg(backbone_method, backbone_methods) - } - - # check various setting for the structured/statistical methods - if (backbone_method == "structured") { - # Checking various problems: lacking method, - if (length(coupling_measure) > 1) { - cli::cli_abort( - c( - "For structured backbone extraction, you have to choose a coupling measure among: ", - "*" = "\"coupling_angle\";", - "*" = "\"coupling_strength\";", - "*" = "\"coupling_similarity\"." - ) - ) - } - - if (!coupling_measure %in% coupling_measures) { - cli::cli_abort( - c( - "For structured backbone extraction, you have to choose a coupling measure among: ", - "*" = "\"coupling_angle\";", - "*" = "\"coupling_strength\";", - "*" = "\"coupling_similarity\"." - ) - ) - } - } else if (backbone_method == "statistical") { - if (is.null(model) || length(model) > 1) { - cli::cli_abort( - c( - "For statistical backbone extraction, you have to choose a model: ", - "*" = "\"sdsm\";", - "*" = "\"fdsm\";", - "*" = "\"fixedfill\".", - "*" = "\"fixedrow\".", - "*" = "\"fixedcol\"." - ) - ) - } - - model <- match.arg(model, statistical_methods) - - # check if alpha is given - if (is.null(alpha) && is.null(backbone_args$alpha)) { - cli::cli_abort( - "For statistical backbone extraction, you have to choose a significance level alpha." - ) - } - } - - # warning if the source_id is not unique - if ( - nodes[, .N, source_id, env = list(source_id = source_id)][N > 1, .N] > 0 - ) { - cli::cli_alert_warning( - "Some identifiers in your column {.field {source_id}} in your nodes table are not unique. You need only one row per node." - ) - } - - # check settings for intertemporal networks - if (!is.null(time_window) & is.null(time_variable)) { - cli::cli_abort( - "You cannot have a {.emph time_window} if you don't give any column with a temporal variable. Put a column in {.emph time_variable} or remove the {.emph time_window}." - ) - } - - # VERBOSE - - if (verbose == TRUE) { - if (!missing(backbone_method)) { - cli::cli_alert_info(paste( - "Backbone method selected:", - backbone_method - )) - } - - if (keep_singleton == FALSE) { - cli::cli_alert_info( - "Keep_singleton == FALSE: removing the nodes that are alone with no edge. \n\n" - ) - } - } - - # CHECKING THE DATA - - # NODES - nodes_coupling <- data.table::copy(nodes) - nodes_coupling[, - source_id := as.character(source_id), - env = list(source_id = source_id) - ] - - if (is.null(time_variable)) { - time_variable <- "fake_column" - nodes_coupling[, - time_variable := 1, - env = list(time_variable = time_variable) - ] - } - - if ( - !target_id %in% colnames(nodes_coupling) & - compute_size == TRUE - ) { - cli::cli_abort( - "You don't have the column {.field {target_id}} in your nodes table. Set {.emph compute_size} to {.val FALSE}." - ) - } - - if (compute_size == TRUE) { - nodes_coupling[, - target_id := as.character(target_id), - env = list(target_id = target_id) - ] - } - - # EDGES - - edges <- data.table::copy(directed_edges) - edges <- edges[, .( - from = as.character(get(source_id)), - to = as.character(get(target_id)) - )] # canonical edge columns - edges <- unique(edges) - - ######################### Dynamics networks ********************* - - # define the time window - nodes_coupling <- nodes_coupling[ - order(time_variable), - env = list(time_variable = time_variable) - ] - nodes_coupling[, - time_variable := as.integer(time_variable), - env = list(time_variable = time_variable) - ] - - first_year <- nodes_coupling[, - min(as.integer(time_variable)), - env = list(time_variable = time_variable) - ] - last_year <- nodes_coupling[, - max(as.integer(time_variable)), - env = list(time_variable = time_variable) - ] - - if (!is.null(time_window)) { - if (last_year - first_year + 1 < time_window) { - cli::cli_alert_warning( - "Your time window is larger than the number of distinct values of {.field {time_variable}}" - ) - } - } - - if (is.null(time_window)) { - all_years <- first_year - time_window <- last_year - first_year + 1 - } else { - if (overlapping_window == TRUE) { - last_year <- last_year - time_window + 1 - all_years <- first_year:last_year - } else { - all_years <- seq(first_year, last_year, by = time_window) - if (all_years[length(all_years)] + (time_window - 1) > last_year) { - cli::cli_warn( - "Your last network is shorter than the other(s) because the cutting by time window does not give a round count. - The last time unity in your data is {.val {last_year}}, but the upper limit of your last time window is - {.val {all_years[length(all_years)] + (time_window - 1)}}." - ) - } - } - } - - # Prepare our list - tbl_coup_list <- list() - - for (year in all_years) { - nodes_of_the_year <- nodes_coupling[ - time_variable >= year & - time_variable < (year + time_window), - env = list(time_variable = time_variable, year = year) - ] - - if (time_variable != "fake_column") { - nodes_of_the_year[, - time_window := paste0(year, "-", year + time_window - 1), - env = list(year = year) - ] - - if (verbose == TRUE) { - cli::cli_h1( - "Generation of the network for the {.val {year}}-{.val {year + time_window - 1}} time window." - ) - } - } else { - nodes_of_the_year <- nodes_of_the_year[, -c("fake_column")] - } - - node_ids <- nodes_of_the_year[[source_id]] - edges_of_the_year <- edges[from %in% node_ids] - - # size of nodes - if (compute_size == TRUE) { - nb_cit <- edges_of_the_year[from %in% node_ids, .N, by = to] - data.table::setnames(nb_cit, "to", target_id) - - colnames(nb_cit)[colnames(nb_cit) == "N"] <- "node_size" - - if ("node_size" %in% colnames(nodes_coupling) == TRUE) { - cli::cli_warn( - "You already have a column name {.field node_size}. The content of the column will be replaced." - ) - } - - nodes_of_the_year <- data.table::merge.data.table( - nodes_of_the_year, - nb_cit, - by = target_id, - all.x = TRUE - ) - - nodes_of_the_year[is.na(node_size), node_size := 0] - } - - # backbone - - if (backbone_method == "statistical") { - # Evaluate the expression and catch internal errors to backbone package - tryCatch( - { - from_pref <- paste0("A:", edges_of_the_year$from) - to_pref <- paste0("B:", edges_of_the_year$to) - bip_graph <- igraph::graph_from_data_frame( - data.frame(from = from_pref, to = to_pref), - directed = FALSE - ) - node_ids_pref <- paste0("A:", node_ids) - igraph::V(bip_graph)$type <- !(igraph::V(bip_graph)$name %in% - node_ids_pref) - - backbone_graph <- do.call( - backbone::backbone_from_projection, - c(list(B = bip_graph, alpha = alpha, model = model), backbone_args) - ) - - if (inherits(backbone_graph, "igraph")) { - edges_of_the_year <- igraph::as_data_frame( - backbone_graph, - what = "edges" - ) - } else if (!is.null(backbone_graph$backbone)) { - edges_of_the_year <- igraph::as_data_frame( - backbone_graph$backbone, - what = "edges" - ) - } else { - stop("The backbone function returned an unexpected object type.") - } - - edges_of_the_year <- data.table::as.data.table(edges_of_the_year) - edges_of_the_year[, from := sub("^A:", "", from)] - edges_of_the_year[, to := sub("^A:", "", to)] - edges_of_the_year[, from := sub("^B:", "", from)] - edges_of_the_year[, to := sub("^B:", "", to)] - }, - error = function(e) { - stop( - "The backbone function failed with an error. Read the backbone documentation for more information. Error message: ", - e$message - ) - } - ) - } - - # coupling - if (backbone_method == "structured") { - biblio_functions <- - data.table::data.table( - biblio_function = c( - rlang::expr(biblionetwork::biblio_coupling), - rlang::expr(biblionetwork::coupling_strength), - rlang::expr(biblionetwork::coupling_similarity) - ), - method = c( - "coupling_angle", - "coupling_strength", - "coupling_similarity" - ) - ) - - biblio_function <- biblio_functions[method == coupling_measure][[ - "biblio_function" - ]][[1]] - - # evaluate the expression and catch internal errors to biblionetwork package - - tryCatch( - { - edges_for_biblio <- data.table::copy(edges_of_the_year) - data.table::setnames( - edges_for_biblio, - c("from", "to"), - c(source_id, target_id) - ) - edges_of_the_year <- - rlang::expr((!!biblio_function)( - dt = edges_for_biblio, - source = rlang::inject(source_id), - ref = rlang::inject(target_id), - weight_threshold = rlang::inject(edges_threshold) - )) %>% - eval() - }, - error = function(e) { - stop( - "The coupling function failed with an error. Read the biblionetwork documentation for more information. Error message: ", - e$message - ) - } - ) - } - - # remove nodes with no edges - if (keep_singleton == FALSE) { - nodes_of_the_year <- nodes_of_the_year[ - source_id %in% - edges_of_the_year$from | - source_id %in% edges_of_the_year$to, - env = list(source_id = source_id) - ] - } - - # make tbl - if (length(all_years) == 1) { - tbl_coup_list <- tidygraph::tbl_graph( - nodes = nodes_of_the_year, - edges = edges_of_the_year, - directed = FALSE, - node_key = source_id - ) - } else { - tbl_coup_list[[paste0(year, "-", year + time_window - 1)]] <- - tidygraph::tbl_graph( - nodes = nodes_of_the_year, - edges = edges_of_the_year, - directed = FALSE, - node_key = source_id - ) - } - } - - if (filter_components == TRUE) { - tbl_coup_list <- filter_components(tbl_coup_list, ...) - } - return(tbl_coup_list) -} diff --git a/R/color_networks.R b/R/color_networks.R index 73ecb80..759fe07 100644 --- a/R/color_networks.R +++ b/R/color_networks.R @@ -58,18 +58,16 @@ #' @examples #' library(networkflow) #' - #' nodes <- Nodes_stagflation |> - #' dplyr::rename(ID_Art = ItemID_Ref) |> - #' dplyr::filter(Type == "Stagflation") + #' nodes <- networkflow::Nodes_stagflation |> + #' dplyr::filter(source_type == "Stagflation") #' - #' references <- Ref_stagflation |> - #' dplyr::rename(ID_Art = Citing_ItemID_Ref) + #' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, - #' source_id = "ID_Art", - #' target_id = "ItemID_Ref", - #' time_variable = "Year", + #' source_id = "source_id", + #' target_id = "target_id", + #' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -272,3 +270,4 @@ color_alluvial <- function(alluv_dt, return(alluv_dt) } + diff --git a/R/dynamic_network_cooccurrence.R b/R/dynamic_network_cooccurrence.R index 6ff0705..2e0a8d3 100644 --- a/R/dynamic_network_cooccurrence.R +++ b/R/dynamic_network_cooccurrence.R @@ -94,18 +94,16 @@ dynamic_network_cooccurrence <- function(nodes = NULL, #' of tidygraph networks, for each time window. #' #' @examples - #' nodes <- Nodes_stagflation |> - #' dplyr::rename(ID_Art = ItemID_Ref) |> - #' dplyr::filter(Type == "Stagflation") - #' - #' references <- Ref_stagflation |> - #' dplyr::rename(ID_Art = Citing_ItemID_Ref) - #' - #' temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, - #' directed_edges = references, - #' source_column = "ID_Art", - #' target_column = "ItemID_Ref", - #' time_variable = "Year", +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") +#' +#' references <- networkflow::Ref_stagflation +#' +#' temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, +#' directed_edges = references, +#' source_column = "source_id", +#' target_column = "target_id", + #' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = NULL, #' edges_threshold = 1, diff --git a/R/extract_tfidf.R b/R/extract_tfidf.R index 11eb5fd..1b5e69c 100644 --- a/R/extract_tfidf.R +++ b/R/extract_tfidf.R @@ -94,18 +94,16 @@ #' the top of your grouping variables. #' #' @examples -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 10, #' edges_threshold = 1, @@ -119,10 +117,10 @@ #' library(stopwords) #' tfidf <- extract_tfidf(temporal_networks, #' n_gram = 4, -#' text_columns = "Title", +#' text_columns = "source_title", #' grouping_columns = "cluster_leiden", #' grouping_across_list = TRUE, -#' clean_word_method = "lemmatise") +#' clean_word_method = "lemmatize") #' #' tfidf[[1]] #' @@ -205,3 +203,4 @@ extract_tfidf <- function(data, return(term_list) } + diff --git a/R/intertemporal_cluster_naming.R b/R/intertemporal_cluster_naming.R index 29079f5..cd9840f 100644 --- a/R/intertemporal_cluster_naming.R +++ b/R/intertemporal_cluster_naming.R @@ -58,20 +58,18 @@ intertemporal_cluster_naming <- function(list_graph = NA, #' @examples #' library(biblionetwork) #' library(magrittr) - #' library(tidygraph) - #' - #' nodes <- Nodes_stagflation %>% - #' dplyr::rename(ID_Art = ItemID_Ref) %>% - #' dplyr::filter(Type == "Stagflation") - #' - #' references <- Ref_stagflation %>% - #' dplyr::rename(ID_Art = Citing_ItemID_Ref) - #' - #' temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, - #' directed_edges = references, - #' source_column = "ID_Art", - #' target_column = "ItemID_Ref", - #' time_variable = "Year", +#' library(tidygraph) +#' +#' nodes <- networkflow::Nodes_stagflation %>% +#' dplyr::filter(source_type == "Stagflation") +#' +#' references <- networkflow::Ref_stagflation +#' +#' temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, +#' directed_edges = references, +#' source_column = "source_id", +#' target_column = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 15, #' edges_threshold = 1, @@ -83,11 +81,11 @@ intertemporal_cluster_naming <- function(list_graph = NA, #' function(tbl) tbl %N>% #' mutate(clusters = tidygraph::group_louvain())) #' - #' intertemporal_cluster_naming(temporal_networks, - #' cluster_column = "clusters", - #' node_key = "ID_Art", - #' threshold_similarity = 0.51, - #' similarity_type = "partial") +#' intertemporal_cluster_naming(temporal_networks, +#' cluster_column = "clusters", +#' node_key = "source_id", +#' threshold_similarity = 0.51, +#' similarity_type = "partial") #' #' @export diff --git a/R/launch_network_app.R b/R/launch_network_app.R index ec3e0c7..96ac048 100644 --- a/R/launch_network_app.R +++ b/R/launch_network_app.R @@ -30,18 +30,18 @@ #' library(networkflow) #' library(dplyr) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::filter(Type == "Stagflation") |> -#' dplyr::rename(ID_Art = ItemID_Ref) +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") |> +#' dplyr::mutate(source_id = as.character(source_id)) #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' g <- build_network( #' nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", +#' source_id = "source_id", +#' target_id = "target_id", +#' projection_method = "structured", #' cooccurrence_method = "coupling_similarity", #' edges_threshold = 1, #' compute_size = FALSE, @@ -58,10 +58,10 @@ #' launch_network_app( #' graph_tbl = g, #' cluster_id = "cluster_leiden", -#' cluster_information = c("Author", "Title", "Year", "Journal"), +#' cluster_information = c("source_author", "source_title", "source_year", "source_journal"), #' cluster_tooltip = "Cluster", -#' node_id = "ID_Art", -#' node_tooltip = "Author_date", +#' node_id = "source_id", +#' node_tooltip = "source_label", #' node_size = NULL, #' color = NULL, #' layout = "kk" @@ -72,9 +72,9 @@ #' g_list <- build_dynamic_networks( #' nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' time_window = 20, #' cooccurrence_method = "coupling_similarity", #' edges_threshold = 1, @@ -93,9 +93,9 @@ #' launch_network_app( #' graph_tbl = g_list, #' cluster_id = "cluster_leiden", -#' cluster_information = c("Author", "Title", "Year", "Journal"), -#' node_id = "ID_Art", -#' node_tooltip = "Author_date", +#' cluster_information = c("source_author", "source_title", "source_year", "source_journal"), +#' node_id = "source_id", +#' node_tooltip = "source_label", #' node_size = NULL, #' color = NULL, #' layout = "kk" diff --git a/R/layout_networks.R b/R/layout_networks.R index fa62c76..41e4756 100644 --- a/R/layout_networks.R +++ b/R/layout_networks.R @@ -51,18 +51,16 @@ #' @examples #' library(networkflow) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -70,7 +68,7 @@ #' filter_components = TRUE) #' #' temporal_networks <- layout_networks(temporal_networks, -#' node_id = "ID_Art", +#' node_id = "source_id", #' layout = "fr", #' compute_dynamic_coordinates = TRUE) #' @@ -184,3 +182,4 @@ join_coordinates <- function(graphs, } return(graphs) } + diff --git a/R/merge_dynamic_clusters.R b/R/merge_dynamic_clusters.R index 798d8cf..d7b77f1 100644 --- a/R/merge_dynamic_clusters.R +++ b/R/merge_dynamic_clusters.R @@ -57,18 +57,16 @@ #' @examples #' library(networkflow) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 10, #' edges_threshold = 1, @@ -81,7 +79,7 @@ #' #' temporal_networks <- merge_dynamic_clusters(temporal_networks, #' cluster_id = "cluster_leiden", -#' node_id = "ID_Art", +#' node_id = "source_id", #' threshold_similarity = 0.51, #' similarity_type = "partial") #' @@ -259,3 +257,4 @@ add_dynamic_cluster_to_edges <- function(graph, graph <- graph %E>% dplyr::left_join(cluster_correspondance, by = cluster_id) } + diff --git a/R/name_clusters.R b/R/name_clusters.R index e155d6a..c936fb7 100644 --- a/R/name_clusters.R +++ b/R/name_clusters.R @@ -91,18 +91,16 @@ #' @examples #' library(networkflow) #' -#' nodes <- Nodes_stagflation |> -#' dplyr::rename(ID_Art = ItemID_Ref) |> -#' dplyr::filter(Type == "Stagflation") +#' nodes <- networkflow::Nodes_stagflation |> +#' dplyr::filter(source_type == "Stagflation") #' -#' references <- Ref_stagflation |> -#' dplyr::rename(ID_Art = Citing_ItemID_Ref) +#' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, -#' source_id = "ID_Art", -#' target_id = "ItemID_Ref", -#' time_variable = "Year", +#' source_id = "source_id", +#' target_id = "target_id", +#' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -121,7 +119,7 @@ #' method = "tidygraph_functions", #' name_merged_clusters = FALSE, #' cluster_id = "cluster_leiden", -#' label_columns = c("Author", "Year"), +#' label_columns = c("source_author", "source_year"), #' tidygraph_function = tidygraph::centrality_pagerank()) #' #' temporal_networks_with_names[[1]] @@ -130,7 +128,7 @@ #' #' temporal_networks <- merge_dynamic_clusters(temporal_networks, #' cluster_id = "cluster_leiden", -#' node_id = "ID_Art", +#' node_id = "source_id", #' threshold_similarity = 0.51, #' similarity_type = "partial") #' @@ -138,9 +136,9 @@ #' method = "tf-idf", #' name_merged_clusters = TRUE, #' cluster_id = "dynamic_cluster_leiden", -#' text_columns = "Title", +#' text_columns = "source_title", #' nb_terms_label = 5, -#' clean_word_method = "lemmatise") +#' clean_word_method = "lemmatize") #' #' temporal_networks_with_names[[1]] #' @@ -262,3 +260,4 @@ name_clusters <- function(graphs, return(graphs) } + diff --git a/R/networks_to_alluv.R b/R/networks_to_alluv.R index a9439c5..e9128ab 100644 --- a/R/networks_to_alluv.R +++ b/R/networks_to_alluv.R @@ -55,18 +55,16 @@ networks_to_alluv <- function(graphs, #' @examples #' library(networkflow) #' - #' nodes <- Nodes_stagflation |> - #' dplyr::rename(ID_Art = ItemID_Ref) |> - #' dplyr::filter(Type == "Stagflation") + #' nodes <- networkflow::Nodes_stagflation |> + #' dplyr::filter(source_type == "Stagflation") #' - #' references <- Ref_stagflation |> - #' dplyr::rename(ID_Art = Citing_ItemID_Ref) + #' references <- networkflow::Ref_stagflation #' #' temporal_networks <- build_dynamic_networks(nodes = nodes, #' directed_edges = references, - #' source_id = "ID_Art", - #' target_id = "ItemID_Ref", - #' time_variable = "Year", + #' source_id = "source_id", + #' target_id = "target_id", + #' time_variable = "source_year", #' cooccurrence_method = "coupling_similarity", #' time_window = 20, #' edges_threshold = 1, @@ -81,7 +79,7 @@ networks_to_alluv <- function(graphs, #' #' temporal_networks <- merge_dynamic_clusters(temporal_networks, #' cluster_id = "cluster_leiden", - #' node_id = "ID_Art", + #' node_id = "source_id", #' threshold_similarity = 0.51, #' similarity_type = "partial") #' @@ -89,9 +87,9 @@ networks_to_alluv <- function(graphs, #' method = "tf-idf", #' name_merged_clusters = TRUE, #' cluster_id = "dynamic_cluster_leiden", - #' text_columns = "Title", + #' text_columns = "source_title", #' nb_terms_label = 5, - #' clean_word_method = "lemmatise") + #' clean_word_method = "lemmatize") #' #' temporal_networks <- color_networks(graphs = temporal_networks, #' column_to_color = "dynamic_cluster_leiden", @@ -99,7 +97,7 @@ networks_to_alluv <- function(graphs, #' #' alluv_dt <- networks_to_alluv(temporal_networks, #' intertemporal_cluster_column = "dynamic_cluster_leiden", - #' node_id = "ID_Art") + #' node_id = "source_id") #' #' alluv_dt[1:5] #' @@ -155,3 +153,4 @@ networks_to_alluv <- function(graphs, return (alluv_dt) } + diff --git a/R/plot_networks.R b/R/plot_networks.R index cfb01e0..4dc7972 100644 --- a/R/plot_networks.R +++ b/R/plot_networks.R @@ -172,8 +172,8 @@ plot_network <- function(graph, graph <- graph %E>% dplyr::mutate(weight = 1) } - if(! node_size_column %in% colnames(graph %N>% as.data.frame()) | is.null(node_size_column)){ - cli::cli_alert_info("No column `weight` found in edges data. All weight will equal 1.") + if (is.null(node_size_column) || !node_size_column %in% colnames(graph %N>% as.data.frame())) { + cli::cli_alert_info("No `node_size_column` found in node data. All node sizes will be set to 1.") graph <- graph %N>% dplyr::mutate(node_size = 1) node_size_column <- "node_size" diff --git a/README.Rmd b/README.Rmd index b5cb0e8..d577fbc 100644 --- a/README.Rmd +++ b/README.Rmd @@ -3,10 +3,6 @@ output: github_document: toc: false toc_depth: 3 - -### -### Bibliography settings -### bibliography: ./inst/REFERENCES.bib csl: ./inst/chicago-author-date.csl suppress-bibliography: false @@ -31,32 +27,34 @@ knitr::opts_chunk$set( [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) -The goal of networkflow (a workflow for networks) is to propose a series of functions to make -it easier and quicker to manipulats networks. It mainly targets working on bibliometric networks -(see the [biblionetwork](https://github.com/agoutsmedt/biblionetwork) package for creating such networks). -This package heavily relies on [igraph](https://igraph.org/r/) and [tidygraph](https://tidygraph.data-imaginist.com/index.html), -and aims at producing ready-made networks for projecting them using [ggraph](https://ggraph.data-imaginist.com/). -This package aims at helping the users to follow more quickly and easily the main steps of network manipulation, -from creating the graph, through detecting clusters, to projecting it. Please see -`vignette("workflow-network")` for details on the workflow for dealing with a unique -network. +`networkflow` provides a complete workflow to build, structure, and explore +networks from tabular data. -Networkflow also proposes a worfklow to deal with a list of networks, in order to develop a -dynamic analysis. It implements a method to merge clusters across successive networks, -to identify inter-temporal clusters. It also develops corresponding visualisations to -display the evolution of clusters across networks. `vignette("exploring_dynamic_networks")` -gives an example of the workflow for dynamic networks. You can also find illustrations for this -method in ["An Independent European Macroeconomics? A History of European Macroeconomics -through the Lens of the European Economic Review](https://aurelien-goutsmedt.com/publication/eer-history/). +Its key feature is a built-in dynamic analysis workflow: the package can +build networks across time windows, detect clusters in each window, and link +clusters across periods to track their evolution. +More broadly, `networkflow` supports the full analysis pipeline, from network +construction to interpretation and visualization, including clustering, layout +and color preparation, static plotting, and interactive exploration with a +Shiny app. +The package was developed with projected networks in mind (for example, +article -> reference), but it can also be used more generally once data are +represented as `tbl_graph` objects. -You can cite this package as: +The package includes: -```{r} -citation("networkflow") -``` +- network construction (`build_network()`, `build_dynamic_networks()`), +- clustering and inter-temporal matching (`add_clusters()`, `merge_dynamic_clusters()`), +- interpretation (`name_clusters()`, `extract_tfidf()`), +- visualization (`layout_networks()`, `color_networks()`, `plot_networks()`), +- interactive exploration (`launch_network_app()`). +For a full walkthrough, see: + +- `vignette("networkflow_presentation")` +- https://agoutsmedt.github.io/networkflow/ ## Installation @@ -67,3 +65,41 @@ install.packages("devtools") devtools::install_github("agoutsmedt/networkflow") ``` +## Quick start + +```{r eval = FALSE} +library(networkflow) + +nodes <- subset(Nodes_stagflation, source_type == "Stagflation") +references <- Ref_stagflation + +g <- build_network( + nodes = nodes, + directed_edges = references, + source_id = "source_id", + target_id = "target_id", + projection_method = "structured", + cooccurrence_method = "coupling_similarity", + edges_threshold = 1, + keep_singleton = FALSE +) + +g <- add_clusters( + graphs = g, + clustering_method = "leiden", + objective_function = "modularity", + seed = 123 +) + +g <- layout_networks(g, node_id = "source_id", layout = "kk") +g <- color_networks(g, column_to_color = "cluster_leiden") + +plot_networks( + graphs = g, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden", + node_size_column = NULL, + color_column = "color" +) +``` diff --git a/README.html b/README.html new file mode 100644 index 0000000..bd1dc19 --- /dev/null +++ b/README.html @@ -0,0 +1,683 @@ + + + + + + + + + + + + + + + + + + + + + +

networkflow

+ + +

R-CMD-check Lifecycle: experimental

+ + +

networkflow provides a complete workflow to build, +structure, and explore networks from tabular data.

+

Its key feature is a built-in dynamic analysis workflow: the package +can build networks across time windows, detect clusters in each window, +and link clusters across periods to track their evolution.

+

More broadly, networkflow supports the full analysis +pipeline, from network construction to interpretation and visualization, +including clustering, layout and color preparation, static plotting, and +interactive exploration with a Shiny app.

+

The package was developed with projected networks in mind (for +example, article -> reference), but it can also be used more +generally once data are represented as tbl_graph +objects.

+

The package includes:

+ +

For a full walkthrough, see:

+ +

Installation

+

You can install the development version from GitHub with:

+
install.packages("devtools")
+devtools::install_github("agoutsmedt/networkflow")
+

Quick start

+
library(networkflow)
+
+nodes <- subset(Nodes_stagflation, source_type == "Stagflation")
+references <- Ref_stagflation
+
+g <- build_network(
+  nodes = nodes,
+  directed_edges = references,
+  source_id = "source_id",
+  target_id = "target_id",
+  projection_method = "structured",
+  cooccurrence_method = "coupling_similarity",
+  edges_threshold = 1,
+  keep_singleton = FALSE
+)
+
+g <- add_clusters(
+  graphs = g,
+  clustering_method = "leiden",
+  objective_function = "modularity",
+  seed = 123
+)
+
+g <- layout_networks(g, node_id = "source_id", layout = "kk")
+g <- color_networks(g, column_to_color = "cluster_leiden")
+
+plot_networks(
+  graphs = g,
+  x = "x",
+  y = "y",
+  cluster_label_column = "cluster_leiden",
+  node_size_column = NULL,
+  color_column = "color"
+)
+ + + diff --git a/README.md b/README.md index 2bd3eef..52d921c 100644 --- a/README.md +++ b/README.md @@ -10,52 +10,36 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) -The goal of networkflow (a workflow for networks) is to propose a series -of functions to make it easier and quicker to manipulats networks. It -mainly targets working on bibliometric networks (see the -[biblionetwork](https://github.com/agoutsmedt/biblionetwork) package for -creating such networks). This package heavily relies on -[igraph](https://igraph.org/r/) and -[tidygraph](https://tidygraph.data-imaginist.com/index.html), and aims -at producing ready-made networks for projecting them using -[ggraph](https://ggraph.data-imaginist.com/). This package aims at -helping the users to follow more quickly and easily the main steps of -network manipulation, from creating the graph, through detecting -clusters, to projecting it. Please see `vignette("workflow-network")` -for details on the workflow for dealing with a unique network. - -Networkflow also proposes a worfklow to deal with a list of networks, in -order to develop a dynamic analysis. It implements a method to merge -clusters across successive networks, to identify inter-temporal -clusters. It also develops corresponding visualisations to display the -evolution of clusters across networks. -`vignette("exploring_dynamic_networks")` gives an example of the -workflow for dynamic networks. You can also find illustrations for this -method in [“An Independent European Macroeconomics? A History of -European Macroeconomics through the Lens of the European Economic -Review](https://aurelien-goutsmedt.com/publication/eer-history/). - -You can cite this package as: +`networkflow` provides a complete workflow to build, structure, and +explore networks from tabular data. -``` r -citation("networkflow") -#> -#> Pour citer le package 'networkflow' dans une publication, utilisez : -#> -#> Goutsmedt A, Truc A (2022). _networkflow: Functions For A Workflow To -#> Manipulate Networks_. https://github.com/agoutsmedt/networkflow, -#> https://agoutsmedt.github.io/networkflow/. -#> -#> Une entrée BibTeX pour les utilisateurs LaTeX est -#> -#> @Manual{, -#> title = {networkflow: Functions For A Workflow To Manipulate Networks}, -#> author = {Aurélien Goutsmedt and Alexandre Truc}, -#> year = {2022}, -#> note = {https://github.com/agoutsmedt/networkflow, -#> https://agoutsmedt.github.io/networkflow/}, -#> } -``` +Its key feature is a built-in dynamic analysis workflow: the package can +build networks across time windows, detect clusters in each window, and +link clusters across periods to track their evolution. + +More broadly, `networkflow` supports the full analysis pipeline, from +network construction to interpretation and visualization, including +clustering, layout and color preparation, static plotting, and +interactive exploration with a Shiny app. + +The package was developed with projected networks in mind (for example, +article -\> reference), but it can also be used more generally once data +are represented as `tbl_graph` objects. + +The package includes: + +- network construction (`build_network()`, `build_dynamic_networks()`), +- clustering and inter-temporal matching (`add_clusters()`, + `merge_dynamic_clusters()`), +- interpretation (`name_clusters()`, `extract_tfidf()`), +- visualization (`layout_networks()`, `color_networks()`, + `plot_networks()`), +- interactive exploration (`launch_network_app()`). + +For a full walkthrough, see: + +- `vignette("networkflow_presentation")` +- ## Installation @@ -66,3 +50,42 @@ You can install the development version from install.packages("devtools") devtools::install_github("agoutsmedt/networkflow") ``` + +## Quick start + +``` r +library(networkflow) + +nodes <- subset(Nodes_stagflation, source_type == "Stagflation") +references <- Ref_stagflation + +g <- build_network( + nodes = nodes, + directed_edges = references, + source_id = "source_id", + target_id = "target_id", + projection_method = "structured", + cooccurrence_method = "coupling_similarity", + edges_threshold = 1, + keep_singleton = FALSE +) + +g <- add_clusters( + graphs = g, + clustering_method = "leiden", + objective_function = "modularity", + seed = 123 +) + +g <- layout_networks(g, node_id = "source_id", layout = "kk") +g <- color_networks(g, column_to_color = "cluster_leiden") + +plot_networks( + graphs = g, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden", + node_size_column = NULL, + color_column = "color" +) +``` diff --git a/TODOLIST_DEV.md b/TODOLIST_DEV.md new file mode 100644 index 0000000..04bcd5a --- /dev/null +++ b/TODOLIST_DEV.md @@ -0,0 +1,17 @@ +# Todolist - Push vers dev + +Voila l'eval de GPT qui a acces au package complet. + +Le package est globalement bien structure (API claire, workflow coherent, docs presentes), mais il reste plusieurs points qualite importants avant une version vraiment robuste. + +## Points principaux releves + +- Pas de suite de tests (`tests/` absent): risque de regression eleve. +- CI incoherente: le README affiche un badge `R-CMD-check`, mais le workflow correspondant n'est pas present dans `.github/workflows/` (seulement `pkgdown`). +- Incoherences de deprecation: + - `filter_components()` est documentee comme "deprecated" mais avec badge `experimental`. + - `tbl_main_component()` annonce un remplacement par `extract_main_component()` qui n'existe pas. +- `NEWS.md` mentionne `layout_clusters()` alors que la fonction n'existe pas dans `R/`. +- Dette technique legere: + - duplication de `mixcolor()` dans deux fichiers, + - `rename_at()` encore utilise (fonction `dplyr` superseded). diff --git a/_pkgdown.yml b/_pkgdown.yml index 2dcb569..fc605b4 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -10,15 +10,18 @@ reference: Functions for creating one or multiple networks and to filter the networks. contents: - build_dynamic_networks + - build_network - filter_components - title: "Step 2: Clustering" desc: > - Functions for detected clusters and manipulate them. + Functions for detecting clusters and interpreting them. contents: - add_clusters - merge_dynamic_clusters - name_clusters + - add_node_roles + - extract_tfidf - title: "Step 3: Plot networks" desc: > @@ -31,16 +34,12 @@ reference: - prepare_label_networks - plot_alluvial - plot_networks + - launch_network_app -- title: "Step 4: Analysis networks" - desc: > - Functions for exploring the content of the networks. - contents: - - extract_tfidf - title: "Included datasets" desc: "Data on articles about stagflation from Goutsmedt (2021) 'From the Stagflation to the Great Inflation'" -- contents: + contents: - contains("stagflation") - ends_with("coupling") diff --git a/data/Authors_stagflation.rda b/data/Authors_stagflation.rda index f527d63..0debf64 100644 Binary files a/data/Authors_stagflation.rda and b/data/Authors_stagflation.rda differ diff --git a/data/Nodes_coupling.rda b/data/Nodes_coupling.rda index 302a2f2..5b2f7a4 100644 Binary files a/data/Nodes_coupling.rda and b/data/Nodes_coupling.rda differ diff --git a/data/Nodes_stagflation.rda b/data/Nodes_stagflation.rda index f5ca194..3c50833 100644 Binary files a/data/Nodes_stagflation.rda and b/data/Nodes_stagflation.rda differ diff --git a/data/Ref_stagflation.rda b/data/Ref_stagflation.rda index 3e14586..563bf67 100644 Binary files a/data/Ref_stagflation.rda and b/data/Ref_stagflation.rda differ diff --git a/inst/data-raw/creating_network_data.R b/inst/data-raw/creating_network_data.R index a56b442..5cf49b1 100644 --- a/inst/data-raw/creating_network_data.R +++ b/inst/data-raw/creating_network_data.R @@ -2,13 +2,13 @@ library(biblionetwork) library(data.table) Nodes_coupling <- as.data.table(Nodes_stagflation) -Nodes_coupling <- Nodes_coupling[Type == "Stagflation" & ItemID_Ref %in% Ref_stagflation$Citing_ItemID_Ref] -Nodes_coupling$ItemID_Ref <- as.character(Nodes_coupling$ItemID_Ref) -Nodes_coupling <- Nodes_coupling[,-"Type"] +Nodes_coupling <- Nodes_coupling[source_type == "Stagflation" & source_id %in% Ref_stagflation$source_id] +Nodes_coupling$source_id <- as.character(Nodes_coupling$source_id) +Nodes_coupling <- Nodes_coupling[,-"source_type"] -Edges_coupling <- biblio_coupling(Ref_stagflation, "Citing_ItemID_Ref", "ItemID_Ref") -Edges_coupling <- Edges_coupling[from %in% Nodes_coupling$ItemID_Ref] -Edges_coupling <- Edges_coupling[to %in% Nodes_coupling$ItemID_Ref] +Edges_coupling <- biblio_coupling(Ref_stagflation, "source_id", "target_id") +Edges_coupling <- Edges_coupling[from %in% Nodes_coupling$source_id] +Edges_coupling <- Edges_coupling[to %in% Nodes_coupling$source_id] use_data(Nodes_coupling, overwrite = TRUE) use_data(Edges_coupling, overwrite = TRUE) diff --git a/man/Authors_stagflation.Rd b/man/Authors_stagflation.Rd index 14abb5b..8d0b5b4 100644 --- a/man/Authors_stagflation.Rd +++ b/man/Authors_stagflation.Rd @@ -5,11 +5,11 @@ \alias{Authors_stagflation} \title{List Of Authors Of The Articles and Books Explaining the 1970s US Stagflation.} \format{ -A data frame with 558 rows and 7 variables: +A data frame with 231 rows and 3 variables: \describe{ -\item{ItemID_Ref}{Identifier of the document published by the author} -\item{Author}{Author of the document} -\item{Order}{Use this as a label for nodes} +\item{source_id}{Identifier of the document published by the author} +\item{author_name}{Author of the document} +\item{author_order}{Author order in the document author list} } } \source{ diff --git a/man/Edges_coupling.Rd b/man/Edges_coupling.Rd index a159d72..d89cc03 100644 --- a/man/Edges_coupling.Rd +++ b/man/Edges_coupling.Rd @@ -5,7 +5,7 @@ \alias{Edges_coupling} \title{Edges For Bibliographic Coupling Network Of Articles and Books Explaining the 1970s US Stagflation.} \format{ -A data frame with 154 rows and 6 variables: +A data frame with 2593 rows and 5 variables: \describe{ \item{from}{Identifier of the Source document on stagflation, in character format} \item{to}{Identifier of the Target document on stagflation, in character format} @@ -22,7 +22,7 @@ Edges_coupling } \description{ A dataset containing the edges of the bibliographic coupling network of articles and books on stagflation. -Built by using \link{Ref_stagflation}: \code{biblionetwork::biblio_coupling(Ref_stagflation,"Citing_ItemID_Ref","ItemID_Ref")}. +Built by using \link{Ref_stagflation}: \code{biblionetwork::biblio_coupling(Ref_stagflation,"source_id","target_id")}. Could be used with \link{Nodes_coupling} to create a network with tidygraph. } \keyword{datasets} diff --git a/man/Nodes_coupling.Rd b/man/Nodes_coupling.Rd index c7eeebb..fecea97 100644 --- a/man/Nodes_coupling.Rd +++ b/man/Nodes_coupling.Rd @@ -7,12 +7,12 @@ \format{ A data frame with 154 rows and 6 variables: \describe{ -\item{ItemID_Ref}{Identifier of the document on stagflation, in character format} -\item{Author}{Author of the document on stagflation} -\item{Author_date}{Use this as a label for nodes} -\item{Year}{Year of publication of the document} -\item{Title}{Title of the document} -\item{Journal}{Journal of publication of the document (if an article)} +\item{source_id}{Identifier of the document on stagflation, in character format} +\item{source_author}{Author of the document on stagflation} +\item{source_label}{Use this as a label for nodes} +\item{source_year}{Year of publication of the document} +\item{source_title}{Title of the document} +\item{source_journal}{Journal of publication of the document (if an article)} } } \source{ diff --git a/man/Nodes_stagflation.Rd b/man/Nodes_stagflation.Rd index 0c44245..0a2ffef 100644 --- a/man/Nodes_stagflation.Rd +++ b/man/Nodes_stagflation.Rd @@ -5,15 +5,15 @@ \alias{Nodes_stagflation} \title{Articles and Books Explaining the 1970s US Stagflation.} \format{ -A data frame with 558 rows and 7 variables: +A data frame with 654 rows and 7 variables: \describe{ -\item{ItemID_Ref}{Identifier of the document} -\item{Author}{Author of the document} -\item{Author_date}{Use this as a label for nodes} -\item{Year}{Year of publication of the document} -\item{Title}{Title of the document} -\item{Journal}{Journal of publication of the document (if an article)} -\item{Type}{If "Stagflation", the document is listed as an explanation of the US stagflation. +\item{source_id}{Identifier of the document} +\item{source_author}{Author of the document} +\item{source_label}{Use this as a label for nodes} +\item{source_year}{Year of publication of the document} +\item{source_title}{Title of the document} +\item{source_journal}{Journal of publication of the document (if an article)} +\item{source_type}{If "Stagflation", the document is listed as an explanation of the US stagflation. If "Non-Stagflation", the document is cited by a document explaining the stagflation} } } diff --git a/man/Ref_stagflation.Rd b/man/Ref_stagflation.Rd index 2299363..aba643f 100644 --- a/man/Ref_stagflation.Rd +++ b/man/Ref_stagflation.Rd @@ -7,12 +7,12 @@ \format{ A data frame with 4416 rows and 6 variables: \describe{ -\item{Citing_ItemID_Ref}{Identifier of the citing document} -\item{ItemID_Ref}{Identifier of the cited document} -\item{Author}{Author of the cited document} -\item{Year}{Year of publication of the cited document} -\item{Title}{Title of the cited document} -\item{Journal}{Journal of publication of the cited document (if an article)} +\item{source_id}{Identifier of the citing document} +\item{target_id}{Identifier of the cited document} +\item{target_author}{Author of the cited document} +\item{target_year}{Year of publication of the cited document} +\item{target_title}{Title of the cited document} +\item{target_journal}{Journal of publication of the cited document (if an article)} } } \source{ diff --git a/man/add_clusters.Rd b/man/add_clusters.Rd index 3faa272..218acdc 100644 --- a/man/add_clusters.Rd +++ b/man/add_clusters.Rd @@ -105,7 +105,7 @@ for the edges, called \code{cluster_leiden_from}, \code{cluster_leiden_to} and \ The function also automatically calculates the percentage of total nodes that are gathered in each -cluster, in the column \code{size_com}. +cluster, in the column \verb{size_cluster_\{clustering_method\}}. To make plotting easier later, a zero is put before one-digit cluster identifier (cluster 5 becomes "05"; cluster 10 becomes "10"). Attributing a cluster identifier to edges @@ -115,18 +115,16 @@ or a different color from both nodes, if the nodes belong to different clusters. \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, diff --git a/man/add_node_roles.Rd b/man/add_node_roles.Rd index 3c1dc84..a72688c 100644 --- a/man/add_node_roles.Rd +++ b/man/add_node_roles.Rd @@ -45,18 +45,16 @@ The \code{z_threshold} parameter can be adjusted to change the sensitivity of hu \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, diff --git a/man/build_dynamic_networks.Rd b/man/build_dynamic_networks.Rd index 073c041..a987ea0 100644 --- a/man/build_dynamic_networks.Rd +++ b/man/build_dynamic_networks.Rd @@ -2,9 +2,7 @@ % Please edit documentation in R/build_dynamic_networks.R \name{build_dynamic_networks} \alias{build_dynamic_networks} -\alias{build_dynamic_networks2} -\alias{build_network} -\title{Creating One or Multiple Networks from a List of Nodes and Directed Edges} +\title{Build One or Multiple Networks from Bipartite Links} \usage{ build_dynamic_networks( nodes, @@ -13,27 +11,10 @@ build_dynamic_networks( target_id, time_variable = NULL, time_window = NULL, - cooccurrence_method = c("coupling_angle", "coupling_strength", "coupling_similarity"), - overlapping_window = FALSE, - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ..., - verbose = TRUE -) - -build_dynamic_networks2( - nodes, - directed_edges, - source_id, - target_id, - time_variable = NULL, - time_window = NULL, - backbone_method = c("structured", "statistical"), + projection_method = c("structured", "statistical"), model = c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol"), alpha = NULL, - coupling_measure = c("coupling_angle", "coupling_strength", "coupling_similarity"), + cooccurrence_method = c("coupling_angle", "coupling_strength", "coupling_similarity"), edges_threshold = 1, overlapping_window = FALSE, compute_size = FALSE, @@ -43,139 +24,132 @@ build_dynamic_networks2( backbone_args = list(), verbose = TRUE ) - -build_network( - nodes, - directed_edges, - source_id, - target_id, - cooccurrence_method = c("coupling_angle", "coupling_strength", "coupling_similarity"), - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ... -) } \arguments{ -\item{nodes}{The table with all the nodes and their metadata. For instance, if your nodes are -articles, this table is likely to contain the year of publication, the name of the authors, -the title of the article, etc... The table must have one row per node.} - -\item{directed_edges}{The table with of all the elements to which your nodes are connected. If your nodes are -articles, the \code{directed_edges} table can contain the list of the references cited -by these articles, the authors that have written these articles, or the affiliations -of the authors of these articles.} - -\item{source_id}{The quoted name of the column with the unique identifier of each node. For instance, -for a bibliographic coupling network, the id of your citing documents. It corresponds -to the \code{source} argument of \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork} -functions.} - -\item{target_id}{The quoted name of the column with the unique identifier of each element connected to the node (for -instance, the identifier of the reference cited by your node if the node is an article). -It corresponds to the \code{ref} argument of -\href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork} functions.} - -\item{time_variable}{The column with the temporal variable you want to use to build your windows for the -succession of networks. By default, \code{time_variable} is \code{NULL} and the function -will only build one network without taking into account any temporal variable.} - -\item{time_window}{The length of your network relatively of the unity of the \code{time_variable} column. If you -use a variable in years as \code{time_variable} and you set \code{time_window} at 5, the function -will build network on five year windows. By default, \code{time_window} is \code{NULL} and the -function will only build one network.} - -\item{cooccurrence_method}{Choose a cooccurrence method to build your indirect edges table. The function propose -three methods that depends on the \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork package} -and three methods that are implemented in it: -\itemize{ -\item the coupling angle measure (see \code{\link[biblionetwork:biblio_coupling]{biblionetwork::biblio_coupling()}} for documentation); -\item the coupling strength measure (\code{\link[biblionetwork:coupling_strength]{biblionetwork::coupling_strength()}}); -\item the coupling similarity measure (\code{\link[biblionetwork:coupling_similarity]{biblionetwork:: coupling_similarity()}}). -}} +\item{nodes}{Table of nodes and their metadata. One row per node. For example, a table +of articles with identifiers, authors, publication year, etc.} -\item{overlapping_window}{Set to \code{FALSE} by default. If set to \code{TRUE}, and if \code{time_variable} and \code{time_window} not -\code{NULL}, the function will create a succession of networks for moving time windows. The windows are -moving one unit per one unit of the \code{time_variable}. For instance, for years, if \code{time_window} -set to 5, it creates networks for successive time windows like 1970-1974, 1971-1975, 1972-1976, etc.} +\item{directed_edges}{Table of bipartite links between \code{source_id} nodes and +\code{target_id} entities (e.g., article -> reference, author -> paper).} -\item{edges_threshold}{Threshold value for building your edges. With a higher threshold, only the stronger links -will be kept. See the \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork package} -documentation and the \code{cooccurrence_method} parameter.} +\item{source_id}{Quoted name of the source-side node identifier.} -\item{compute_size}{Set to \code{FALSE} by default. If \code{TRUE}, the function uses the \code{directed_edges} data -to calculate how many directed edges a node receives (as a target). If \code{directed_edges} -is a table of direct citations, the functions calculates the number of time a node -is cited by the other nodes. You need to have the \code{target_id} in the \code{nodes} table -to make the link with the targetted nodes in the \code{directed_edges} table.} +\item{target_id}{Quoted name of the target-side identifier linked to each source node.} -\item{keep_singleton}{Set to \code{FALSE} by default. If \code{TRUE}, the function removes the nodes that have no -undirected edges, i.e. no cooccurrence with any other nodes. In graphical terms, -these nodes are alone in the network, with no link with other nodes.} +\item{time_variable}{Optional name of the column with a temporal variable (e.g., publication year).} -\item{filter_components}{Set to \code{TRUE} if you want to run \code{networkflow::filter_components()} -to filter the components of the network(s) and keep only the biggest component(s). If -you don't change the defaults parameters of \code{networkflow::filter_components()}, -it will keep only the main component.} +\item{time_window}{Optional size of the time window (in units of \code{time_variable}) to construct temporal networks.} -\item{...}{Additional arguments from \code{networkflow::filter_components()}.} +\item{projection_method}{method used to extract the network backbone. Choose between: +\itemize{ +\item \code{"structured"}: uses cooccurrence measures from the \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork} package; +\item \code{"statistical"}: uses statistical models from the \href{https://github.com/zpneal/backbone}{backbone} package. +Defaults to \code{"structured"}. The \code{"statistical"} method can be computationally slow on large networks. +}} -\item{verbose}{Set to \code{FALSE} if you don't want the function to display different sort of information.} +\item{model}{Statistical null model from \href{https://github.com/zpneal/backbone}{backbone}: +one of \code{"sdsm"}, \code{"fdsm"}, \code{"fixedfill"}, \code{"fixedrow"}, \code{"fixedcol"}. +Required if \code{projection_method = "statistical"}.} -\item{backbone_method}{Method used to extract the network backbone. Choose between: +\item{alpha}{Significance threshold for statistical backbone filtering. Required if +\code{projection_method = "statistical"}. Lower values keep fewer edges.} + +\item{cooccurrence_method}{For \code{projection_method = "structured"}, choose the coupling method: \itemize{ -\item \code{"structured"}: uses cooccurrence measures from the \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork} package; -\item \code{"statistical"}: uses statistical models from the \href{https://github.com/djmurphy533/backbone}{backbone} package. +\item \code{"coupling_angle"} +\item \code{"coupling_strength"}; +\item \code{"coupling_similarity"}. }} -\item{alpha}{Significance threshold for statistical backbone extraction. Required if -\code{backbone_method = "statistical"}.} +\item{edges_threshold}{Threshold used to filter weak edges in structured mode.} + +\item{overlapping_window}{Logical. If \code{TRUE}, builds networks using rolling time windows.} -\item{statistical_method}{For \code{backbone_method = "statistical"}, select the null model: one of -\code{"sdsm"}, \code{"fdsm"}, \code{"fixedfill"}, \code{"fixedfrow"}, \code{"fixedcol"}.} +\item{compute_size}{Logical. If \code{TRUE}, computes the number of incoming edges per node (e.g., citation count).} + +\item{keep_singleton}{Logical. If \code{FALSE}, removes nodes with no edges in the final network.} + +\item{filter_components}{Logical. If \code{TRUE}, keeps only the main component(s) using \code{networkflow::filter_components()}.} + +\item{...}{Additional arguments passed to \code{filter_components()}.} + +\item{backbone_args}{Optional list of additional arguments passed to the +backbone extraction call. If \code{backbone_args} includes \code{alpha} or \code{model}, +those values override function arguments.} + +\item{verbose}{Logical. If \code{TRUE}, displays progress messages.} } \value{ -If \code{time_window} is \code{NULL}, the function computes only -one network and return a tidygraph object built with \link[tidygraph:tbl_graph]{tbl_graph()}. -If \code{time_variable} and \code{time_window} are not \code{NULL}, the function returns a list -of tidygraph networks, for each time window. +\itemize{ +\item A single tidygraph object if \code{time_window} is \code{NULL}; +\item A list of tidygraph objects (one per time window) otherwise. +} } \description{ \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}} -\code{build_network()} creates a network from a table of nodes and its -directed edges. That is a special case of the more general \code{build_dynamic_networks()}. -This function creates one or several tibble graphs (built with -\href{https://tidygraph.data-imaginist.com/}{tidygraph}) from a table of nodes and its -directed edges. For instance, for bibliometric networks, you can give a list of -articles and the list of the references these articles cite. You can use it to -build a single network or multiple networks over different time windows. +\code{build_dynamic_networks()} builds one or several \code{tbl_graph} networks from a +node table (\code{source_id}) and a bipartite link table (\code{source_id} -> \code{target_id}). \code{build_network()} is a wrapper for a single network. + +It supports two backbone extraction methods: +\itemize{ +\item structured filtering using coupling/cooccurrence measures from +\href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork}; +\item statistical filtering using null models from +\href{https://github.com/zpneal/backbone}{backbone} \insertCite{neal2022}{networkflow}. +} + +The function can build a single network or multiple networks across time windows. } \details{ -\code{build_network()} has been added for convenience but it is just -a special case of the more general \code{build_dynamic_networks()}, with +The function uses bipartite links (\code{source_id} -> \code{target_id}) to produce +source-side networks. + +If \code{time_variable} and \code{time_window} are provided, it builds one network per +time window (rolling or non-overlapping). Otherwise it builds a single network. + +\code{projection_method = "structured"} applies coupling/cooccurrence filtering. +\code{projection_method = "statistical"} applies a statistical backbone model. } \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation -temporal_networks <- build_dynamic_networks(nodes = nodes, +# Structured backbone (cooccurrence) +net_structured <- build_dynamic_networks( +nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", -cooccurrence_method = "coupling_similarity", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", time_window = 20, -edges_threshold = 1, -overlapping_window = TRUE) +projection_method = "structured", +cooccurrence_method = "coupling_similarity", +edges_threshold = 1 +) -temporal_networks[[1]] +# Statistical backbone (backbone package) +net_statistical <- build_dynamic_networks( +nodes = nodes, +directed_edges = references, +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", +time_window = 20, +projection_method = "statistical", +model = "sdsm", +alpha = 0.05, +backbone_args = list(mtc = "holm") +) } +\references{ +\insertAllCited{} +} +\seealso{ +\code{\link[biblionetwork:biblio_coupling]{biblionetwork::biblio_coupling()}}, \code{\link[backbone:backbone]{backbone::backbone()}} +} diff --git a/man/build_dynamic_networks2.Rd b/man/build_dynamic_networks2.Rd deleted file mode 100644 index 8b5324a..0000000 --- a/man/build_dynamic_networks2.Rd +++ /dev/null @@ -1,160 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/build_dynamic_networks2.R -\name{build_dynamic_networks2} -\alias{build_dynamic_networks2} -\title{Creating One or Multiple Networks Using Structured or Statistical Backbone Extraction} -\usage{ -build_dynamic_networks2( - nodes, - directed_edges, - source_id, - target_id, - time_variable = NULL, - time_window = NULL, - backbone_method = c("structured", "statistical"), - model = c("sdsm", "fdsm", "fixedfill", "fixedrow", "fixedcol"), - alpha = NULL, - coupling_measure = c("coupling_angle", "coupling_strength", "coupling_similarity"), - edges_threshold = 1, - overlapping_window = FALSE, - compute_size = FALSE, - keep_singleton = FALSE, - filter_components = FALSE, - ..., - backbone_args = list(), - verbose = TRUE -) -} -\arguments{ -\item{nodes}{Table of nodes and their metadata. One row per node. For example, a table -of articles with identifiers, authors, publication year, etc.} - -\item{directed_edges}{Table of edges representing the links between nodes and associated entities -(e.g., references, authors, affiliations).} - -\item{source_id}{Quoted name of the column giving the unique identifier of each node.} - -\item{target_id}{Quoted name of the column giving the identifier of the element linked to each node.} - -\item{time_variable}{Optional name of the column with a temporal variable (e.g., publication year).} - -\item{time_window}{Optional size of the time window (in units of \code{time_variable}) to construct temporal networks.} - -\item{backbone_method}{Method used to extract the network backbone. Choose between: -\itemize{ -\item \code{"structured"}: uses cooccurrence measures from the \href{https://agoutsmedt.github.io/biblionetwork/}{biblionetwork} package; -\item \code{"statistical"}: uses statistical models from the \href{https://github.com/djmurphy533/backbone}{backbone} package. -Defaults to \code{"structured"}. -}} - -\item{model}{Null model used by the \href{https://github.com/zpneal/backbone}{backbone} -package: one of \code{"sdsm"}, \code{"fdsm"}, \code{"fixedfill"}, \code{"fixedrow"}, \code{"fixedcol"}. Required if -\code{backbone_method = "statistical"}. These correspond to model names in \code{backbone} and are passed -through to the selected backbone function.} - -\item{alpha}{Significance threshold for statistical backbone extraction. Required if -\code{backbone_method = "statistical"}. Lower values keep fewer edges (stricter filtering).} - -\item{coupling_measure}{For \code{backbone_method = "structured"}, choose the cooccurrence method: -\itemize{ -\item \code{"coupling_angle"} (biblio_coupling); -\item \code{"coupling_strength"}; -\item \code{"coupling_similarity"}. -}} - -\item{edges_threshold}{Threshold for edge weight filtering in structured methods.} - -\item{overlapping_window}{Logical. If \code{TRUE}, builds networks using rolling time windows.} - -\item{compute_size}{Logical. If \code{TRUE}, computes the number of incoming edges per node (e.g., citation count).} - -\item{keep_singleton}{Logical. If \code{FALSE}, removes nodes with no edges in the final network.} - -\item{filter_components}{Logical. If \code{TRUE}, keeps only the main component(s) using \code{networkflow::filter_components()}.} - -\item{...}{Additional arguments passed to \code{filter_components()}.} - -\item{backbone_args}{Optional list of additional arguments passed to -\code{\link[backbone:backbone_from_projection]{backbone::backbone_from_projection()}}. Use this to set parameters like \code{mtc}, -\code{signed}, \code{missing_as_zero}, or \code{trials}. If \code{backbone_args} includes \code{alpha} or -\code{model}, those values override the corresponding function arguments.} - -\item{verbose}{Logical. If \code{TRUE}, displays progress messages.} -} -\value{ -\itemize{ -\item A single tidygraph object if \code{time_window} is \code{NULL}; -\item A list of tidygraph objects (one per time window) otherwise. -} -} -\description{ -\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}} - -\code{build_dynamic_networks2()} builds one or several networks (as tidygraph objects) -from a table of nodes and directed edges, with support for both structured cooccurrence -methods and statistical backbone extraction using the \href{https://github.com/zpneal/backbone}{backbone} -package \insertCite{neal2022}{networkflow}. -The function is useful for constructing bibliometric or affiliation networks across -static or dynamic time windows. -} -\details{ -\code{build_dynamic_networks2()} generalizes \code{build_dynamic_networks()} by adding support for -statistical backbone extraction using null models from the \code{backbone} package -\insertCite{neal2022}{networkflow}. The cooccurence methods used in -\code{build_dynamic_networks()} can be viewed as deterministic (structured) methods to extract -the network backbone. The backbone is defined as the significant edges in the network. - -As with \code{build_dynamic_networks()}, the function constructs networks for each time window. If \code{time_variable} and \code{time_window} are defined, the function constructs networks -for each time window (sliding or non-overlapping). Otherwise, it builds a single static network. - -If \code{backbone_method = "structured"}, cooccurrence edges are computed using bibliometric coupling -techniques. The term structured refers to deterministic methods based on thresholding cooccurrence measures. -If \code{backbone_method = "statistical"}, the function applies a \code{backbone} null model to the -edgelist for each time window and keeps only statistically significant edges at the chosen \code{alpha}. -The model is selected via \code{model} and follows \code{backbone}'s nomenclature: \code{"sdsm"}, \code{"fdsm"}, -\code{"fixedfill"}, \code{"fixedrow"}, or \code{"fixedcol"}. Only these models are currently supported. -} -\examples{ -library(networkflow) - -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") - -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) - -# Structured backbone (cooccurrence) -net_structured <- build_dynamic_networks2( -nodes = nodes, -directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", -time_window = 20, -backbone_method = "structured", -coupling_measure = "coupling_similarity", -edges_threshold = 1 -) - -# Statistical backbone (backbone package) -net_statistical <- build_dynamic_networks2( -nodes = nodes, -directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", -time_window = 20, -backbone_method = "statistical", -model = "sdsm", -alpha = 0.05, -backbone_args = list(mtc = "holm") -) - -} -\references{ -\insertAllCited{} -} -\seealso{ -\code{\link[biblionetwork:biblio_coupling]{biblionetwork::biblio_coupling()}}, \code{\link[backbone:backbone_from_projection]{backbone::backbone_from_projection()}} -} diff --git a/man/build_network.Rd b/man/build_network.Rd new file mode 100644 index 0000000..30e3096 --- /dev/null +++ b/man/build_network.Rd @@ -0,0 +1,49 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/build_dynamic_networks.R +\name{build_network} +\alias{build_network} +\title{Build a single network} +\usage{ +build_network( + nodes, + directed_edges, + source_id, + target_id, + projection_method, + cooccurrence_method = c("coupling_angle", "coupling_strength", "coupling_similarity"), + edges_threshold = 1, + compute_size = FALSE, + keep_singleton = FALSE, + filter_components = FALSE, + ... +) +} +\arguments{ +\item{nodes}{Table of nodes and their metadata. One row per node. For example, a table +of articles with identifiers, authors, publication year, etc.} + +\item{directed_edges}{Table of bipartite links between \code{source_id} nodes and +\code{target_id} entities (e.g., article -> reference, author -> paper).} + +\item{source_id}{Quoted name of the source-side node identifier.} + +\item{target_id}{Quoted name of the target-side identifier linked to each source node.} + +\item{projection_method}{Method used to build the single network. Must be +one of \code{"structured"} or \code{"statistical"}.} + +\item{cooccurrence_method}{Cooccurrence method used by the structured workflow.} + +\item{edges_threshold}{Threshold used to filter weak edges in structured mode.} + +\item{compute_size}{Logical. If \code{TRUE}, computes the number of incoming edges per node (e.g., citation count).} + +\item{keep_singleton}{Logical. If \code{FALSE}, removes nodes with no edges in the final network.} + +\item{filter_components}{Logical. If \code{TRUE}, keeps only the main component(s) using \code{networkflow::filter_components()}.} + +\item{...}{Additional arguments passed to \code{filter_components()}.} +} +\description{ +Convenience wrapper around \code{\link[=build_dynamic_networks]{build_dynamic_networks()}} for a single network. +} diff --git a/man/color_networks.Rd b/man/color_networks.Rd index 16f532a..0250929 100644 --- a/man/color_networks.Rd +++ b/man/color_networks.Rd @@ -67,18 +67,16 @@ the colors of the two palettes will be recycled. \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, diff --git a/man/dynamic_network_cooccurrence.Rd b/man/dynamic_network_cooccurrence.Rd index 8abe218..2ff362a 100644 --- a/man/dynamic_network_cooccurrence.Rd +++ b/man/dynamic_network_cooccurrence.Rd @@ -91,18 +91,16 @@ articles and the list of the references these articles cite. You can use it to build a single network or multiple networks over different time windows. } \examples{ -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> +dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, directed_edges = references, -source_column = "ID_Art", -target_column = "ItemID_Ref", -time_variable = "Year", +source_column = "source_id", +target_column = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = NULL, edges_threshold = 1, diff --git a/man/extract_tfidf.Rd b/man/extract_tfidf.Rd index 95c5448..a889e79 100644 --- a/man/extract_tfidf.Rd +++ b/man/extract_tfidf.Rd @@ -103,18 +103,16 @@ The terms which occur only once are removed to avoid too rare terms to appear at the top of your grouping variables. } \examples{ -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 10, edges_threshold = 1, @@ -128,10 +126,10 @@ clustering_method = "leiden") library(stopwords) tfidf <- extract_tfidf(temporal_networks, n_gram = 4, -text_columns = "Title", +text_columns = "source_title", grouping_columns = "cluster_leiden", grouping_across_list = TRUE, -clean_word_method = "lemmatise") +clean_word_method = "lemmatize") tfidf[[1]] diff --git a/man/intertemporal_cluster_naming.Rd b/man/intertemporal_cluster_naming.Rd index ed2c1d0..164e60f 100644 --- a/man/intertemporal_cluster_naming.Rd +++ b/man/intertemporal_cluster_naming.Rd @@ -63,18 +63,16 @@ library(biblionetwork) library(magrittr) library(tidygraph) -nodes <- Nodes_stagflation \%>\% -dplyr::rename(ID_Art = ItemID_Ref) \%>\% -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation \%>\% +dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation \%>\% -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- dynamic_network_cooccurrence(nodes = nodes, directed_edges = references, -source_column = "ID_Art", -target_column = "ItemID_Ref", -time_variable = "Year", +source_column = "source_id", +target_column = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 15, edges_threshold = 1, @@ -88,7 +86,7 @@ temporal_networks <- lapply(temporal_networks, intertemporal_cluster_naming(temporal_networks, cluster_column = "clusters", -node_key = "ID_Art", +node_key = "source_id", threshold_similarity = 0.51, similarity_type = "partial") diff --git a/man/launch_network_app.Rd b/man/launch_network_app.Rd index 33f3649..be7fe87 100644 --- a/man/launch_network_app.Rd +++ b/man/launch_network_app.Rd @@ -53,18 +53,18 @@ If the graph does not contain a layout (columns \code{x} and \code{y}) the funct library(networkflow) library(dplyr) -nodes <- Nodes_stagflation |> - dplyr::filter(Type == "Stagflation") |> - dplyr::rename(ID_Art = ItemID_Ref) +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") |> + dplyr::mutate(source_id = as.character(source_id)) -references <- Ref_stagflation |> - dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation g <- build_network( nodes = nodes, directed_edges = references, - source_id = "ID_Art", - target_id = "ItemID_Ref", + source_id = "source_id", + target_id = "target_id", + projection_method = "structured", cooccurrence_method = "coupling_similarity", edges_threshold = 1, compute_size = FALSE, @@ -81,10 +81,10 @@ g <- add_clusters( launch_network_app( graph_tbl = g, cluster_id = "cluster_leiden", - cluster_information = c("Author", "Title", "Year", "Journal"), + cluster_information = c("source_author", "source_title", "source_year", "source_journal"), cluster_tooltip = "Cluster", - node_id = "ID_Art", - node_tooltip = "Author_date", + node_id = "source_id", + node_tooltip = "source_label", node_size = NULL, color = NULL, layout = "kk" @@ -95,9 +95,9 @@ launch_network_app( g_list <- build_dynamic_networks( nodes = nodes, directed_edges = references, - source_id = "ID_Art", - target_id = "ItemID_Ref", - time_variable = "Year", + source_id = "source_id", + target_id = "target_id", + time_variable = "source_year", time_window = 20, cooccurrence_method = "coupling_similarity", edges_threshold = 1, @@ -116,9 +116,9 @@ g_list <- add_clusters( launch_network_app( graph_tbl = g_list, cluster_id = "cluster_leiden", - cluster_information = c("Author", "Title", "Year", "Journal"), - node_id = "ID_Art", - node_tooltip = "Author_date", + cluster_information = c("source_author", "source_title", "source_year", "source_journal"), + node_id = "source_id", + node_tooltip = "source_label", node_size = NULL, color = NULL, layout = "kk" diff --git a/man/layout_networks.Rd b/man/layout_networks.Rd index f1176b4..c1f105b 100644 --- a/man/layout_networks.Rd +++ b/man/layout_networks.Rd @@ -61,18 +61,16 @@ if the layout used allows a parameter called \code{coord}. \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, @@ -80,7 +78,7 @@ overlapping_window = TRUE, filter_components = TRUE) temporal_networks <- layout_networks(temporal_networks, -node_id = "ID_Art", +node_id = "source_id", layout = "fr", compute_dynamic_coordinates = TRUE) diff --git a/man/merge_dynamic_clusters.Rd b/man/merge_dynamic_clusters.Rd index b545edb..360d02c 100644 --- a/man/merge_dynamic_clusters.Rd +++ b/man/merge_dynamic_clusters.Rd @@ -66,18 +66,16 @@ the user: \code{threshold_similarity}, \code{cluster_colum} and \code{similarity \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 10, edges_threshold = 1, @@ -90,7 +88,7 @@ clustering_method = "leiden") temporal_networks <- merge_dynamic_clusters(temporal_networks, cluster_id = "cluster_leiden", -node_id = "ID_Art", +node_id = "source_id", threshold_similarity = 0.51, similarity_type = "partial") diff --git a/man/name_clusters.Rd b/man/name_clusters.Rd index 4253e44..b6ef9a0 100644 --- a/man/name_clusters.Rd +++ b/man/name_clusters.Rd @@ -97,18 +97,16 @@ tibble graphs will share the same name. \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, @@ -127,7 +125,7 @@ temporal_networks_with_names <- name_clusters(graphs = temporal_networks, method = "tidygraph_functions", name_merged_clusters = FALSE, cluster_id = "cluster_leiden", -label_columns = c("Author", "Year"), +label_columns = c("source_author", "source_year"), tidygraph_function = tidygraph::centrality_pagerank()) temporal_networks_with_names[[1]] @@ -136,7 +134,7 @@ temporal_networks_with_names[[1]] temporal_networks <- merge_dynamic_clusters(temporal_networks, cluster_id = "cluster_leiden", -node_id = "ID_Art", +node_id = "source_id", threshold_similarity = 0.51, similarity_type = "partial") @@ -144,9 +142,9 @@ temporal_networks_with_names <- name_clusters(graphs = temporal_networks, method = "tf-idf", name_merged_clusters = TRUE, cluster_id = "dynamic_cluster_leiden", -text_columns = "Title", +text_columns = "source_title", nb_terms_label = 5, -clean_word_method = "lemmatise") +clean_word_method = "lemmatize") temporal_networks_with_names[[1]] diff --git a/man/networkflow-package.Rd b/man/networkflow-package.Rd index 46d40bf..2df59f1 100644 --- a/man/networkflow-package.Rd +++ b/man/networkflow-package.Rd @@ -6,7 +6,7 @@ \alias{networkflow-package} \title{networkflow: Functions For A Workflow To Manipulate Networks} \description{ -This package proposes a series of function to make it easier and quicker to work on networks. It mainly targets working on bibliometric networks (see the [biblionetwork](https://github.com/agoutsmedt/biblionetwork) package for creating such networks). This package heavily relies on [igraph](https://igraph.org/r/) and [tidygraph](https://tidygraph.data-imaginist.com/index.html), and aims at producing ready-made networks for projecting them using [ggraph](https://ggraph.data-imaginist.com/). This package does not invent nothing new, properly speaking, but it allows the users to follow more quickly and easily the main steps of network manipulation, from creating the graph to projecting it. It is inspired by what could be done with [GEPHI](https://gephi.org/): the package allows the use of the Leiden community detection algorithm, as well as of the Force Atlas 2 layout, both being unavailable in igraph (and so in tidygraph). +Provides a workflow to build, analyze, and visualize projected networks from tabular data. The package supports dynamic analysis across time windows, including cluster detection and cross-period cluster matching. It also covers network construction, interpretation, static plotting, and interactive exploration through a 'shiny' app. Although designed for projected networks (e.g., article -> reference), it can be used more generally with 'tbl_graph' objects. } \seealso{ Useful links: @@ -23,6 +23,7 @@ Useful links: Authors: \itemize{ \item Alexandre Truc \email{alexandre.truc77@gmail.com} (\href{https://orcid.org/0000-0002-1328-7819}{ORCID}) + \item Thomas Delcey (\href{https://orcid.org/0000-0003-0546-1474}{ORCID}) } } diff --git a/man/networks_to_alluv.Rd b/man/networks_to_alluv.Rd index d7594a5..014004a 100644 --- a/man/networks_to_alluv.Rd +++ b/man/networks_to_alluv.Rd @@ -56,18 +56,16 @@ This function creates a data.frame that can be easily plotted with ggalluvial fr \examples{ library(networkflow) -nodes <- Nodes_stagflation |> -dplyr::rename(ID_Art = ItemID_Ref) |> -dplyr::filter(Type == "Stagflation") +nodes <- networkflow::Nodes_stagflation |> + dplyr::filter(source_type == "Stagflation") -references <- Ref_stagflation |> -dplyr::rename(ID_Art = Citing_ItemID_Ref) +references <- networkflow::Ref_stagflation temporal_networks <- build_dynamic_networks(nodes = nodes, directed_edges = references, -source_id = "ID_Art", -target_id = "ItemID_Ref", -time_variable = "Year", +source_id = "source_id", +target_id = "target_id", +time_variable = "source_year", cooccurrence_method = "coupling_similarity", time_window = 20, edges_threshold = 1, @@ -82,7 +80,7 @@ verbose = FALSE) temporal_networks <- merge_dynamic_clusters(temporal_networks, cluster_id = "cluster_leiden", -node_id = "ID_Art", +node_id = "source_id", threshold_similarity = 0.51, similarity_type = "partial") @@ -90,9 +88,9 @@ temporal_networks <- name_clusters(graphs = temporal_networks, method = "tf-idf", name_merged_clusters = TRUE, cluster_id = "dynamic_cluster_leiden", -text_columns = "Title", +text_columns = "source_title", nb_terms_label = 5, -clean_word_method = "lemmatise") +clean_word_method = "lemmatize") temporal_networks <- color_networks(graphs = temporal_networks, column_to_color = "dynamic_cluster_leiden", @@ -100,7 +98,7 @@ color = NULL) alluv_dt <- networks_to_alluv(temporal_networks, intertemporal_cluster_column = "dynamic_cluster_leiden", -node_id = "ID_Art") +node_id = "source_id") alluv_dt[1:5] diff --git a/vignettes/.gitignore b/vignettes/.gitignore index 097b241..47018d6 100644 --- a/vignettes/.gitignore +++ b/vignettes/.gitignore @@ -1,2 +1,5 @@ *.html *.R + +/.quarto/ +**/*.quarto_ipynb diff --git a/vignettes/exploring_dynamic_networks.Rmd b/vignettes/exploring_dynamic_networks.Rmd deleted file mode 100644 index 3b54133..0000000 --- a/vignettes/exploring_dynamic_networks.Rmd +++ /dev/null @@ -1,142 +0,0 @@ ---- -title: "Exploring dynamic networks" -author: "Aurélien Goutsmedt and Alexandre Truc" -description: "Introduction to the uses of the networkflow package for temporal networks" -output: - rmarkdown::html_vignette: - toc: true -vignette: > - %\VignetteIndexEntry{Exploring dynamic networks} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - - - -This vignette introduces you to some functions of the package with the [data integrated][Incorporated data] -in the package. Here, we are interested to the exploration of dynamic networks. - -# Building your list of networks - -```{r setup} -library(networkflow) -library(magrittr) -library(dplyr) -library(tidygraph) -``` - -```{r} -nodes <- Nodes_stagflation %>% - dplyr::rename(ID_Art = ItemID_Ref) %>% - dplyr::filter(Type == "Stagflation") - -references <- Ref_stagflation %>% - dplyr::rename(ID_Art = Citing_ItemID_Ref) -``` - - -```{r} -single_network <- dynamic_network_cooccurrence(nodes = nodes, - directed_edges = references, - source_column = "ID_Art", - target_column = "ItemID_Ref", - time_variable = NULL, - cooccurrence_method = "coupling_similarity", - time_window = NULL, - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - overlapping_window = TRUE) -``` - -```{r} -network_list <- dynamic_network_cooccurrence(nodes = nodes, - directed_edges = references, - source_column = "ID_Art", - target_column = "ItemID_Ref", - time_variable = "Year", - cooccurrence_method = "coupling_similarity", - time_window = 15, - edges_threshold = 1, - compute_size = FALSE, - keep_singleton = FALSE, - overlapping_window = TRUE) - -network_list[[1]] -``` - -# Clustering and intertemporal naming - -```{r} -network_list <- lapply(network_list, - function(tbl) tbl %N>% mutate(clusters = group_louvain())) - -``` - -```{r} -network_list <- intertemporal_cluster_naming(list_graph = network_list, - cluster_column = "clusters", - node_key = "ID_Art", - threshold_similarity = 0.5001, - similarity_type = "partial") - -network_list[[1]] -``` - -# Building Alluvial - -```{r, eval = FALSE, fig.dim=c(10,8)} -library(ggplot2) -library(ggalluvial) - -alluv_dt <- networks_to_alluv(list_graph = network_list, - intertemporal_cluster_column = "intertemporal_name", - node_key = "ID_Art", - summary_cl_stats = FALSE) - -alluv_dt <- minimize_crossing_alluvial(alluv_dt = alluv_dt, - node_key = "ID_Art") -alluv_dt[,y_alluv:=1/.N, Window] - -ggplot(alluv_dt, aes(x = Window, y= y_alluv, stratum = intertemporal_name, alluvium = ID_Art, fill = intertemporal_name, label = intertemporal_name)) + - geom_stratum(alpha =1, size=1/12) + - geom_flow() + - theme(legend.position = "none") + - theme_minimal() + - theme(plot.background = element_rect(fill = 'white', colour = NA)) + - ggtitle("") -``` - -# exploring tf-idf - -```{r, eval=FALSE} -corpus <- merge(alluv_dt, - nodes, - by = "ID_Art", - all.x = TRUE) - -tf_idf <- extract_tfidf(data = corpus, - text_columns = "Title", - grouping_columns = "intertemporal_name", - n_gram = 3L, - stopwords = NULL, - stopwords_type = "smart", - clean_word_method = "lemmatize", - ngrams_filter = 2) - -tf_idf %>% - group_by(document) %>% - slice_max(order_by = tf_idf, n = 1, with_ties = FALSE) %>% - ungroup() %>% - arrange(intertemporal_name) %>% - select(-document) - -``` - diff --git a/vignettes/networkflow_presentation.Rmd b/vignettes/networkflow_presentation.Rmd new file mode 100644 index 0000000..54c30cd --- /dev/null +++ b/vignettes/networkflow_presentation.Rmd @@ -0,0 +1,627 @@ +--- +title: "Presentation of networkflow" +author: "" +description: "" +output: + rmarkdown::html_vignette: + toc: true +vignette: > + %\VignetteIndexEntry{Presentation of networkflow} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +# Introduction + +## Overview + +`networkflow` provides a complete workflow to build, structure, and explore networks from tabular data. + +Its key feature is a built-in dynamic analysis workflow: the package can build networks across time windows, detect clusters in each window, and link clusters across periods to track their evolution. + +More broadly, `networkflow` supports the full analysis pipeline, from network construction to interpretation and visualization, including clustering, layout and color preparation, static plotting, and interactive exploration with a Shiny app. + +The package was developed with projected networks in mind (for example, article -> reference), but it can also be used more generally once data are represented as `tbl_graph` objects. + +## What this package does + +The package is organized in three main steps: + +1. create networks (static or dynamic) from tabular data; +2. detect and harmonize clusters across time windows; +3. prepare visualization and exploration outputs (layout, colors, labels, plotting, Shiny app). + + +## Typical workflow + +A typical workflow is: + +1. prepare `nodes` and `directed_edges` tables; +2. build the network with `build_network()` (or `build_dynamic_networks()` for temporal analyses); +3. detect clusters with `add_clusters()`; +4. prepare plotting attributes (`layout_networks()`, `color_networks()`); +5. inspect and interpret results with plots and `launch_network_app()`. + +# Data Requirements + +## Network objects: `tbl_graph` + +A **`tbl_graph`** is the core network object in `networkflow`. + +It comes from `tidygraph` and stores: + +1. a node table (attributes of entities); +2. an edge table (connections between entities). + +Most functions in this package take a `tbl_graph` (or a list of `tbl_graph`) as input. In practice, `build_network()` and `build_dynamic_networks()` are the main entry points that create these objects from tabular data. + +`networkflow` expects tabular inputs with explicit identifiers: + +1. `nodes`: one row per source entity (for example, one row per article), with a unique ID used as `source_id`; +2. `directed_edges`: links from `source_id` to `target_id` (for example, article -> reference, author -> paper); +3. `time_variable`: required only for dynamic analyses with `build_dynamic_networks()`. + + +## Input of `build_network()` and `build_dynamic_networks()` + +`build_network()` and `build_dynamic_networks()` are the main entry points to create `tbl_graph` objects in the package. They start from a bipartite relation (`source_id` -> `target_id`) and produce a one-mode weighted network on `source_id` entities. The bipartite relation is a table of directed edges from source to target entities (for example, article -> reference, author -> paper). After projection, the result is a one-mode network. + +If your data is already one-mode (for example, author -> author), you can build a `tbl_graph` directly and use downstream functions for clustering, layout, plotting, and exploration. Typical downstream functions in this case are `add_clusters()`, `layout_networks()`, `color_networks()`, and `launch_network_app()`. + +# Step 1: Creating networks + +Functions used: + +- `build_dynamic_networks()` +- `build_network()` +- `filter_components()` + +This step creates one static network (`tbl_graph`) or a list of temporal networks +(`list` of `tbl_graph`) from bipartite links (`source_id -> target_id`). +`build_network()` is the single-network wrapper around `build_dynamic_networks()`. + +### `build_network()` and `build_dynamic_networks()` + + +`build_dynamic_networks()` builds one network or a list of time-window networks if the user provides a temporal variable. `build_network()` is the wrapper for one network of `build_dynamic_networks(time_variable = NULL)`. `build_dynamic_networks()` supports two different filtering strategies: + +`build_dynamic_networks()` takes as input a bipartite relation and projects it into a one-mode network. It supports two different filtering strategies for edge retention after projection: + +1. a structured strategy, which defines edge strength using measures derived from co-occurrence intensity; +2. a statistical strategy, which defines a null model of random co-occurrence and keeps edges based on statistical significance. + +In short, the first approach defines and filters by observed tie strength, while the second defines and filters by statistical significance. The structured method is generally more computationally efficient, while the statistical method provides a more rigorous filter for connections beyond random chance. For example, if `source_id` is article and `target_id` is reference, the structured method will keep article pairs with strong co-citation or bibliographic coupling, while the statistical method will keep article pairs whose co-citation or bibliographic coupling is significantly stronger than expected under a random model. + +Parameters common to both methods: + +- `nodes`: table with one row per source entity defined by `source_id`. +- `directed_edges`: table with directed edges from `source_id` to `target_id`. +- `source_id`, `target_id`: identifier columns used for projection. +- `projection_method`: `"structured"` or `"statistical"`. +- `compute_size`: if `TRUE`, computes `node_size`. +- `keep_singleton`: if `FALSE`, removes isolated nodes. + +Structured-only parameters (`projection_method = "structured"`) + +- uses cooccurrence/coupling measures from the `biblionetwork` package. +- `cooccurrence_method`: `"coupling_angle"`, `"coupling_strength"`, `"coupling_similarity"`. +- `edges_threshold`: minimum edge strength retained. + + +Statistical-only parameters (`projection_method = "statistical"`) + +- uses statistical backbone extraction (`backbone` package). +- `model`: `"sdsm"`, `"fdsm"`, `"fixedfill"`, `"fixedrow"`, `"fixedcol"`. +- `alpha`: significance threshold for edge retention. +- `backbone_args`: additional arguments passed to backbone routines. + +Dynamic-only parameters (`build_dynamic_networks()`): + +- `time_variable`: temporal column in `nodes` (for example publication year). +- `time_window`: width of each window. +- `overlapping_window`: rolling windows (`TRUE`) or disjoint windows (`FALSE`). In the first case, partition is done by rolling the time window by one unit (for example, 1990-2009, 1991-2010, etc.). In the second case, partition is done by fixed intervals (for example, 1990-1999, 2000-2009, etc.). + +The main output of these functions is a `tbl_graph` (or a list of `tbl_graph` for dynamic analyses). If `projection_method = "structured"`, the output edges are weighted by the selected cooccurrence/coupling measure. If `projection_method = "statistical"`, the output edges are unweighted and retained based on statistical significance. + +Examples: + +```{r static-build-setup} +library(networkflow) + +nodes <- subset(Nodes_stagflation, source_type == "Stagflation") + +references <- Ref_stagflation +``` + +```{r static-build-network} +g_static <- build_network( + nodes = nodes, + directed_edges = references, + source_id = "source_id", + target_id = "target_id", + projection_method = "structured", + cooccurrence_method = "coupling_similarity", + edges_threshold = 1, + compute_size = FALSE, + keep_singleton = FALSE +) +``` + +```{r dynamic-build-network-structured, eval = FALSE} +g_dynamic <- build_dynamic_networks( + nodes = nodes, + directed_edges = references, + source_id = "source_id", + target_id = "target_id", + time_variable = "source_year", + time_window = 20, + projection_method = "structured", + cooccurrence_method = "coupling_similarity", + edges_threshold = 1, + overlapping_window = TRUE, + compute_size = FALSE, + keep_singleton = FALSE +) +``` + +```{r dynamic-build-network-statistical, eval = FALSE} +g_dynamic_stat <- build_dynamic_networks( + nodes = nodes, + directed_edges = references, + source_id = "source_id", + target_id = "target_id", + time_variable = "source_year", + time_window = 20, + projection_method = "statistical", + model = "sdsm", + alpha = 0.05, + overlapping_window = TRUE, + compute_size = FALSE, + keep_singleton = FALSE +) +``` + +### `filter_components()` + +Use `filter_components()` to keep the main connected component(s): + +- `nb_components`: number of largest components to keep. +- `threshold_alert`: warning threshold when a removed component is still large. +- `keep_component_columns`: keep or remove helper columns on component IDs and sizes. + +```{r static-filter-components, eval = FALSE} +g_static <- filter_components(g_static, nb_components = 1) +``` + + +# Step 2: Clustering + +Functions used: + +- `add_clusters()` +- `merge_dynamic_clusters()` +- `name_clusters()` +- `add_node_roles()` +- `extract_tfidf()` + +### `add_clusters()` + +Run community detection on a static or dynamic network. The function is a wrapper around `tidygraph::group_graph()`. It also supports the `igraph` implementation of the Leiden algorithm, which is the default method. + +Main parameters: + +- `clustering_method`: the clustering algorithm to use. +- `weights`: edge weight column usage. +- `objective_function`, `resolution`, `n_iterations`: Leiden controls. +- `seed`: reproducibility for stochastic algorithms. + +The output is a `tbl_graph` (or a list of `tbl_graph`) with new columns: +- node column `cluster_{method}`. +- edge columns `cluster_{method}_from`, `cluster_{method}_to`, `cluster_{method}`. +- node column `size_cluster_{method}` with cluster shares. + +Example: + +```{r static-add-clusters-leiden} +g_static <- add_clusters( + graphs = g_static, + clustering_method = "leiden", + objective_function = "modularity", + resolution = 1, + n_iterations = 1000, + seed = 123 +) +``` + +```{r dynamic-add-clusters-leiden, eval = FALSE} +g_dynamic <- add_clusters( + graphs = g_dynamic, + clustering_method = "leiden", + objective_function = "modularity", + resolution = 1, + n_iterations = 1000, + seed = 123 +) +``` + +### `merge_dynamic_clusters()`(dynamic only) + +`add_clusters()` runs independently on each time window, so cluster IDs are not directly comparable across windows. `merge_dynamic_clusters()` links clusters from adjacent windows when node overlap is high enough, and assigns stable intertemporal IDs. + +Input requirements: + +- `list_graph` must be a list of at least two `tbl_graph`. +- the list order must be chronological (oldest to most recent window). + +Main parameters: + +- `cluster_id`: input cluster column (for example `cluster_leiden`). +- `node_id`: stable node identifier across windows. +- `threshold_similarity`: matching threshold in `(0.5, 1]`. +- `similarity_type`: `"complete"` or `"partial"`. + +`similarity_type` controls how overlap is computed: + +- `"complete"`: the overlap share is computed over all nodes in the compared clusters, including entries that exist only in one window. This is stricter when network size changes over time. +- `"partial"`: the overlap share is computed only on nodes present in both adjacent windows. This is often preferable when many new nodes enter over time. + +Output: + +- new node column `dynamic_{cluster_id}` (for example `dynamic_cluster_leiden`). The dynamic cluster IDs are assigned by propagation: it starts by assigning unique IDs to clusters in the first time window, then propagates those IDs to later windows when a cluster match passes the similarity threshold; otherwise, a new dynamic ID is created. +- corresponding edge columns `dynamic_{cluster_id}_from`, `dynamic_{cluster_id}_to`, and `dynamic_{cluster_id}`. + +```{r dynamic-merge-clusters, eval = FALSE} +g_dynamic <- merge_dynamic_clusters( + list_graph = g_dynamic, + cluster_id = "cluster_leiden", + node_id = "source_id", + threshold_similarity = 0.51, + similarity_type = "partial" +) +``` + +### `name_clusters()` + +Cluster IDs are not very informative in themselves. `name_clusters()` helps assign readable labels to clusters based on their content. The labels are not meant to be definitive cluster names, but rather a quick way to get a sense of cluster content. The function supports three methods: + +- `method = "tf-idf"`: labels clusters with the most distinctive terms extracted from a `text_columns`. This is usually the best default for thematic interpretation. +- `method = "given_column"`: selects, within each cluster, the node with the + highest value in `order_by`, then builds the label from `label_columns` of + that node. Typically, you can use this method to label clusters with the title of a representative article (for example the most cited one). +- `method = "tidygraph_functions"`: computes a centrality measure with + `tidygraph_function`, selects the most central node per cluster, then builds + the label from `label_columns`. + +Main parameters: + +- `method`: `"tidygraph_functions"`, `"given_column"`, or `"tf-idf"`. +- `name_merged_clusters`: `TRUE` to name dynamic clusters across the list. Typically, you want to set this to `TRUE` when your `cluster_id` is the dynamic cluster column created by `merge_dynamic_clusters()`. +- `cluster_id`: column to name. +- `label_name`: output label column name (`"cluster_label"` by default). +- `text_columns`, `nb_terms_label`: key arguments for TF-IDF naming. + +```{r dynamic-name-clusters, eval = FALSE} +g_dynamic <- name_clusters( + graphs = g_dynamic, + method = "tf-idf", + name_merged_clusters = TRUE, + cluster_id = "dynamic_cluster_leiden", + text_columns = "source_title", + nb_terms_label = 3 +) +``` + +### `add_node_roles()` + +Nodes in a cluster can play different structural roles. `add_node_roles()` implements the Guimera-Amaral classification of node roles based on two measures: within-module degree (z-score) and participation coefficient. Use `add_node_roles()` after clustering to classify nodes according to their structural position in modules (within-module degree, participation coefficient, Guimera-Amaral roles). This helps distinguish peripheral nodes, connectors, and hubs. + + +Main parameters: + +- `module_col`: cluster/module column used to compute roles. +- `weight_col`: edge weight column. +- `z_threshold`: hub threshold for within-module z-score. + +Main outputs: + +- `within_module_degree`. +- `within_module_z`. +- `participation_coeff`. +- `role_ga`. + +```{r static-node-roles, eval = FALSE} +g_static <- add_node_roles( + graphs = g_static, + module_col = "cluster_leiden", + weight_col = "weight", + z_threshold = 2.5 +) +``` + +### `extract_tfidf()` + +Use `extract_tfidf()` to characterize cluster content from textual metadata (for instance titles, abstracts, or keywords). In a static network, cluster IDs are usually unique within the graph, so `grouping_across_list = FALSE`. +Main parameters: + +- `text_columns`: one or more text fields used to extract ngrams. +- `grouping_columns`: document units for TF-IDF (for example cluster IDs). +- `grouping_across_list`: helps disambiguate group IDs across windows. +- `n_gram`: maximum n for ngrams. +- `clean_word_method`: `"lemmatize"`, `"stemming"`, `"none"`. +- `ngrams_filter`: remove terms that are too rare globally. +- `nb_terms`: number of top terms returned per group. + +```{r static-tfidf, eval = FALSE} +tfidf_static <- extract_tfidf( + data = g_static, + text_columns = "source_title", + grouping_columns = "cluster_leiden", + grouping_across_list = FALSE, + n_gram = 2, + nb_terms = 5 +) +``` + +# Step 3: Plot networks + +Functions used: + +- `layout_networks()` +- `minimize_crossing_alluvial()` +- `color_networks()` +- `prepare_label_alluvial()` +- `prepare_label_networks()` +- `plot_alluvial()` +- `plot_networks()` +- `launch_network_app()` + +### `layout_networks()` + +Compute node coordinates before plotting. The function is a wrapper around `ggraph::ggraph()` and supports all its layout algorithms. For dynamic networks, coordinates are computed sequentially by window: the first window is computed with the selected layout, then subsequent windows are computed by reusing prior coordinates when `compute_dynamic_coordinates = TRUE`. + +Main parameters: + +- `node_id`: unique node ID column used to join coordinates. +- `layout`: layout algorithm accepted by `ggraph::create_layout()`. +- `compute_dynamic_coordinates`: reuse prior window coordinates. +- `save_coordinates`: if `TRUE`, saves coordinates in node columns `{layout}_x` and `{layout}_y` (for example `kk_x`, `kk_y`). Typically, you want to set this to `TRUE` when testing different layouts for plotting. + +The output is a `tbl_graph` (or list of `tbl_graph`) with new node columns `{layout}_x` and `{layout}_y` or `x` and `y` if `save_coordinates = FALSE`. + +Example: + +```{r static-layout} +g_static <- layout_networks( + graphs = g_static, + node_id = "source_id", + layout = "kk" +) +``` + +```{r dynamic-layout, eval = FALSE} +g_dynamic <- layout_networks( + graphs = g_dynamic, + node_id = "source_id", + layout = "fr", + compute_dynamic_coordinates = TRUE +) +``` + +### `color_networks()` + +Assign colors to nodes and edges based on a categorical attribute `column_to_color` present in the node table. Typically, it is used to color clusters from `add_clusters()`. The function supports various color input formats: a named vector of colors with a length equal to the number of unique categories in `column_to_color`, a data frame mapping categories to colors. If `color = NULL`, the function generates a color palette automatically. + +Main parameters: + +- `column_to_color`: node attribute used to define categories to color. +- `color`: : a palette or a two-column data frame mapping categories to colors. +- `unique_color_across_list`: for dynamic networks only. It controls whether the same value of `column_to_color` in different time windows should receive the same color. If set to `FALSE`, the same categorical variable will be considered as the same variable in different graphs. If set to `TRUE`, the same categorical variable will be considered as a different variable in different graphs and thus receive a different color. + +Output: + +- node column `color`. +- edge column `color` computed as a mix of source and target node colors. + +```{r static-colors} +g_static <- color_networks( + graphs = g_static, + column_to_color = "cluster_leiden", + color = NULL +) +``` + +### `prepare_label_networks()` + +Create label coordinates (`label_x`, `label_y`) for the label positioning in network plots. The function computes the average coordinates of nodes within each cluster to position the label. + +Main parameters: + +- `x`, `y`: coordinate columns used to compute label centers. +- `cluster_label_column`: column used for grouping and label text. + +```{r static-labels} +g_static <- prepare_label_networks( + graphs = g_static, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden" +) +``` + +The output is a `tbl_graph` (or list of `tbl_graph`) with new node columns `label_x` and `label_y` for label coordinates. + +### `plot_networks()` + +`plot_networks()` builds a ready-to-use network visualization from graph attributes (coordinates, colors, labels). For exploration and analysis, we strongly encourage to use launch_network_app() for interactive exploration. + +It requires node coordinates (x, y) and a cluster label column. Colors must either already exist or be generated by setting `color_networks = TRUE`. The user can also customize the plot by setting `print_plot_code = TRUE`, which prints the generated ggplot/ggraph code for manual adjustments. + +Main parameters: + +- `x`, `y`: node coordinates. +- `cluster_label_column`: displayed cluster labels. +- `node_size_column`: node size variable (`NULL` or missing column gives constant size). +- `color_column`: color column for nodes and edges. +- `color_networks`: if `TRUE`, applies `color_networks()` automatically with + `cluster_label_column` as grouping variable. +- `color`: optional palette passed to `color_networks()` when + `color_networks = TRUE`. +- `print_plot_code`: if `TRUE`, prints the generated ggplot/ggraph code for + manual customization. + +Automatic behavior: + +- if label coordinates are missing, `prepare_label_networks()` is called + automatically. +- if edge weights are missing, a constant weight of `1` is used. +- if `node_size_column` is missing, a constant node size of `1` is used. + +The output is a ggplot object. For dynamic analyses, the function returns a list of ggplot objects (one per time window) stored in the `$plot` column of each list element. + +```{r static-plot, eval = FALSE} +plot_networks( + graphs = g_static, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden", + node_size_column = NULL, + color_column = "color" +) +``` + +### `launch_network_app()` + +`launch_network_app()` extends `plot_networks()` by providing an interactive +Shiny interface for network exploration. It launches a local app with an +interactive network view. Users can click on clusters to display a table with +selected metadata and adjust visual settings (node size, edge width, labels, +edge visibility) to improve readability. For example, for a coupling network, the application allows users to explore article-level information in each cluster. + +The app expects a `tbl_graph` (or a list of `tbl_graph` for dynamic analysis), +cluster identifiers, and metadata columns to display. If the input is a list, +the app shows a dropdown menu to select the graph by list name (typically time +windows when graphs are built with `build_dynamic_networks()`). + +Main parameters: + +- `cluster_id`: node cluster column used for interaction. +- `cluster_information`: node metadata columns shown in the table (for example `c("source_author", "source_title", "source_year")`), present in node data. +- `node_id`: unique node ID. +- `node_tooltip`: optional node tooltip column for hover information. +- `node_size`: optional node size column. +- `color`: optional color column (`color_networks()` is applied if `NULL`). +- `layout`: layout algorithm available in `layout_networks()`. If `NULL`, the function assumes layout coordinates already exist in node columns `x` and `y`. + +```{r static-app, eval = FALSE} +launch_network_app( + graph_tbl = g_static, + cluster_id = "cluster_leiden", + cluster_information = c("source_author", "source_title", "source_year", "source_journal"), + node_id = "source_id", + node_tooltip = "source_label", + node_size = NULL, + color = "color", + layout = NULL +) +``` + +### `prepare_label_alluvial()` + +### `minimize_crossing_alluvial()` + +### `plot_alluvial()` + + + +# End-to-end executable example + +The chunk below runs a complete static workflow on bundled data: +network construction, clustering, layout, coloring, label preparation, and plotting. + +```{r static-end-to-end, message = FALSE, warning = FALSE} +set.seed(123) + +# 1) Input tables +nodes_ex <- subset(Nodes_stagflation, source_type == "Stagflation") + +references_ex <- Ref_stagflation + +# 2) Build network +g_pipeline <- build_network( + nodes = nodes_ex, + directed_edges = references_ex, + source_id = "source_id", + target_id = "target_id", + projection_method = "structured", + cooccurrence_method = "coupling_similarity", + edges_threshold = 1, + keep_singleton = FALSE +) + +# 3) Cluster +g_pipeline <- add_clusters( + graphs = g_pipeline, + clustering_method = "leiden", + objective_function = "modularity", + resolution = 1, + n_iterations = 1000, + seed = 123 +) + +# 4) Prepare plot attributes +g_pipeline <- layout_networks(g_pipeline, node_id = "source_id", layout = "kk") +g_pipeline <- color_networks(g_pipeline, column_to_color = "cluster_leiden") +g_pipeline <- prepare_label_networks( + g_pipeline, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden" +) + +# 5) Quick checks on generated attributes +head( + g_pipeline %>% + tidygraph::activate(nodes) %>% + as.data.frame() %>% + subset(select = c(source_id, cluster_leiden, size_cluster_leiden, x, y, color, label_x, label_y)) +) + +head( + g_pipeline %>% + tidygraph::activate(edges) %>% + as.data.frame() %>% + subset(select = c(from, to, weight, color)) +) +``` + +```{r static-end-to-end-plot, fig.width = 8, fig.height = 6, fig.alt = "Static network plot showing clustered nodes colored by community with labels positioned near cluster centers."} +plot_networks( + graphs = g_pipeline, + x = "x", + y = "y", + cluster_label_column = "cluster_leiden", + node_size_column = NULL, + color_column = "color" +) +``` + + +# Included datasets + +- `Nodes_stagflation` +- `Ref_stagflation` +- `Authors_stagflation` +- `Nodes_coupling` +- `Edges_coupling` + +# Deprecated functions + +- `community_names()` -> use `name_clusters()` +- `community_labels()` -> use `prepare_label_networks()` +- `community_colors()` -> use `color_networks()` +- `dynamic_network_cooccurrence()` -> use `build_dynamic_networks()` +- `intertemporal_cluster_naming()` -> use `merge_dynamic_clusters()` +- `leiden_workflow()` -> use `add_clusters()` +- `networks_to_alluv()` +- `tbl_main_component()` -> use `filter_components()` +- `top_nodes()` diff --git a/vignettes/workflow-network.Rmd b/vignettes/workflow-network.Rmd deleted file mode 100644 index a95e7ff..0000000 --- a/vignettes/workflow-network.Rmd +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: "A Workflow for network analysis" -author: "Aurélien Goutsmedt and Alexandre Truc" -description: > - Introduction to the standards function of the networkflow package for building, manipulating - plotting, and analysing a network. -output: - rmarkdown::html_vignette: - toc: true -vignette: > - %\VignetteIndexEntry{workflow-network} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} - - -### Bibliography settings -bibliography: REFERENCES.bib -csl: chicago-author-date.csl -suppress-bibliography: false -link-citations: true ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - -In the following article, we will go other the different steps to create a bibliometric network, manipulate it, prepare the plotting and eventually plot it. - -## First step: creating the network and keeping the main component - -As a point of departure, you need your bibliometric data to be prepared in a certain format:^[See how to extract and clean data from scopus [here](https://aurelien-goutsmedt.com/post/extracting-biblio-data-1/) and from Dimensions [here](https://aurelien-goutsmedt.com/post/extracting-biblio-data-2/).] - -- you need a `nodes` table. For instance, it may be a list of articles with metadata (author(s), title, journal, etc.). Nodes must have a unique identifier and all the information about a node are gathered on only one row (in case of articles, you need one row per article). -- you need a `directed_edges` table, that is a table that links your nodes with another variable that will be used to build the edges between your nodes. For instance, the table could links articles (your nodes) with the references cited by these articles. It can also be a journal or a list of authors (if you are interested in collaboration). In your `directed_edges` table, you need the identifier of the nodes (also present in the `nodes` table), and the unique identifier of the categories (references cited, journals, authors...) the nodes are linked to. - -As soon as you have a nodes and a edges file (see the [biblionetwork](https://github.com/agoutsmedt/biblionetwork) package for creating such files), you can create a graph, using tidygraph and the [tbl_graph()](https://rdrr.io/cran/tidygraph/man/tbl_graph.html) function. The next step, as it is recurrent in many network analyses, notably in bibliometric netwoks like bibliographic coupling networks would be to keep only the [main component](https://en.wikipedia.org/wiki/Component_(graph_theory)) of your network. This could be done in one step using the `tbl_main_component()` function of `networkflow`. - -```{r creating graph} -library(networkflow) - -## basic example code - -graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) -print(graph) - -``` - -The parameter `nb_components` allows you to choose the number of components you want to keep. For obvious reasons, it is settled to 1 by default. - -However, it could happen in some networks (for instance co-authorship networks) that the second biggest component of your network is quite large. To avoid removing too big components without knowing it, the `tbl_main_component()` function integrates a warning that happens when a secondary component gathering more than x% of the total number of nodes is removed. The `threshold_alert` parameter is set to 0.05 by default, but you can reduce it if you really want to avoid removing relatively big components. - -```{r components} - -## basic example code - -graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", threshold_alert = 0.001) -print(graph) - -``` - -## Second step: finding communities - -Once you have you tidygraph graph, an important step is to run community detection algorithms to group the nodes depending on their links. This package uses the [leidenAlg](https://cran.r-project.org/web/packages/leidenAlg/index.html) package, and its `find_partition()` function, to implement the Leiden algorithm [@traag2019]. The `leiden_workflow()` function of our package runs the Leiden algorithm and attributes a community number to each node in the `Com_ID` column, but also to each edge (depending if the `from` and `to` nodes are within the same community). - -```{r leiden, eval = FALSE} - -# creating again the graph -graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) - -# finding communities -graph <- leiden_workflow(graph) -print(graph) - -``` - -You can observe that the function also gives the size of the community, by calculating the share of total nodes that are in each community. - -The function also allows to play with the `resolution` parameter of leidenAlg [`find_partition()`]() function. Varying the resolution of the algorithm results in a different partition and different number of communities. A lower resolution means less communities, and conversely. The basic resolution of the `leiden_workflow()` is set by `res_1` and equals 1 by default. You can vary this parameter, but also try a second resolution with `res_2` and a third one with `res_3`: - -```{r resolution, eval = FALSE} - -# creating again the graph -graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) - -# finding communities -graph <- leiden_workflow(graph, res_1 = 0.5, res_2 = 2, res_3 = 3) -print(graph) - -``` - -Once you have detected different communities in your network, you are well on the way of the projection of your graph, but two important steps should be implemented before. First, you have to attribute some colors to each community. These colors will be used for your nodes and edges when you will project your graph with `ggraph`. The function `community_colors` of the `networkflow` package allow to do that. You just have to give it a palette (with as many colors as the number of communities for a better visualisation).^[If two connected nodes are in the same community, their edge will take the same color. If they are in different communities, their edge will have a mix of the two communities colors.] - -```{r, eval = FALSE} -# loading a palette with many colors -palette <- c("#1969B3","#01A5D8","#DA3E61","#3CB95F","#E0AF0C","#E25920","#6C7FC9","#DE9493","#CD242E","#6F4288","#B2EEF8","#7FF6FD","#FDB8D6","#8BF9A9","#FEF34A","#FEC57D","#DAEFFB","#FEE3E1","#FBB2A7","#EFD7F2","#5CAADA","#37D4F5","#F5779B","#62E186","#FBDA28","#FB8F4A","#A4B9EA","#FAC2C0","#EB6466","#AD87BC","#0B3074","#00517C","#871B2A","#1A6029","#7C4B05","#8A260E","#2E3679","#793F3F","#840F14","#401C56","#003C65","#741A09","#602A2A","#34134A","#114A1B","#27DDD1","#27DD8D","#4ADD27","#D3DD27","#DDA427","#DF2935","#DD27BC","#BA27DD","#3227DD","#2761DD","#27DDD1") - -# creating again the graph -graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) - -# finding communities -graph <- leiden_workflow(graph) - -# attributing colors -graph <- community_colors(graph, palette, community_column = "Com_ID") -print(graph) -``` - -What you want to do next is to give a name automatically to your community. The `community_names()` function allows you to do that: it gives to the community the label of the node, within the community, which has the highest score in the statistics you choose. In the next exemple, we will calculate the degree of each node, and each community will take as a name the label of its highest degree node. - -```{r naming, eval = FALSE} - -library(magrittr) -library(dplyr) -library(tidygraph) - -# calculating the degree of nodes - graph <- graph %>% - activate(nodes) %>% - mutate(degree = centrality_degree()) - -# giving names to communities - graph <- community_names(graph, ordering_column = "degree", naming = "Author_date", community_column = "Com_ID") - print(graph) - -``` - - -## Third step: plotting the network - -### Preparing the plot - - - -## References