{networds} - package to build text network

{networds} - a package to build graphs from text

THIS PACKAGE IS NOW UNDER DEVELOPMENT

Extracting co-occurences and relations in text

This is a package to extract graphs and build and visualize text networks in static and dynamic graphs.

It extracts graphs from plain text using:

word co-occurences. Words co-occurring in the same sentence or paragraph.
Rule based: Regex to extract proper names, and build a co-occurrence network
(under development) Extraction using Part of Speech tagging of proper names and nouns and its co-occurrence

extraction of relations (verbs, in most cases) like in {rsyntax} and {semgram}, that uses Universal Stanford Dependencies: A cross-linguistic typology “propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones”

(Under development) Relation extraction using Large Language Models running locally with {rollama}.

The method 1 and 2 is quick and easy to understand and to explain, but has its limitations. Method 3 and 4 are still under development and are more powerful, and can solve more complex problems. One of the problems is disambiguation, the same word can have different meanings depending on the context. Other problem, that can be solved with methods two and three, is possible to do what is called “anaphora resolution”, when there is repeated reference to the same entities with different words.

For example, in the phrase: “John Doe gave Mary a flower and she loved it,” the pronoun “she” is the anaphor of “Mary” and “it” is the anaphor of “flower”. The opposite case, when a pronoun precedes its referent, it is called cataphora. We are working on the implementation of this feature to reduce the number of redundant nodes.

Installation

You can install the development version of networds from GitHub with:

# install.packages("pak")
pak::pak("SoaresAlisson/networds")

Example

Ex. graph POS

Obama and Trump SOTU about China

Check the vignettes:

website of the project

Basic usage

library(networds)
# a vector of texts from wikipedia
txt <- networds::txt_wiki

my_sw <- c("of", "the", "a", "in", "for", "and", "50", "also", "are", "as", "with", "is", "was", "do", "to", "him", "its", "on", "have", "they", "them", "be", "it", "had", "or", "an", "that", "he", "his", "this", "at", "from", "my", "their", "has", "but")
cooc <- cooccur_words(txt, sw = my_sw)
#> You provided a vector of 45 elements instead of one. No problem, but these will be collapsed into a single element, with a final punctuation mark added to each, to ensure it is treated as different sentences in the process of tokenization.
#> tokenizing sentences...
#> tokenizing words...
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
cooc |> plot_graph()
#> Only one value passed to edge_width. All edges will have the same width.

Same data, more customization:

txt <- txt |>
  tolower() |>
  gsub("new york", "New_York", x = _) |> # making New York a single word. I checked the text
  gsub("'s\\b", "", x = _) # making New York a single word. I checked the text

cooc <- cooccur_words(txt, sw = my_sw)
#> You provided a vector of 45 elements instead of one. No problem, but these will be collapsed into a single element, with a final punctuation mark added to each, to ensure it is treated as different sentences in the process of tokenization.
#> tokenizing sentences...
#> tokenizing words...
#>                                                                                  |======================================================================| 100%

# ploting the graph with more control (but there is more)
cooc |>
  plot_graph2( # plot_graph2 also show the frequency of individual words
    txt, # the original text, to count the frequency of words
    head_n = 250, # increasing number of pairs. Default: 30
    text_size = 4,
    text_contour_color = "white",
    edge_color = "darkblue",
    edge_alpha = 0.25,
    edge_bend = 0, # how much the links are bended. 0 = straight line.
    layout = layouts[4],
  ) +
  # you can use ggplot2 customizatiions:
  ggplot2::labs(
    title = "Title of my amazing semantic network with {networds} package",
    subtitle = "Text from wikipedia"
  )
#> You provided a vector of 45 elements instead of one. No problem, but these will be collapsed into a single element, with a final punctuation mark added to each.
#>   |                                                                              |                                                                              |======================================================================| 100%
#> Using node_size proportional to word frequency as no node_size was provided in parameters

Similar Projects

textnet - “textNet is a set of tools in the R language that uses part-of-speech tagging and dependency parsing to generate semantic networks from text data. It is compatible with Universal Dependencies and has been tested on English-language text data”.
textnets from Chris Bail. It captures the proper names using UDpipe, plot word networks and calculates the centrality/betweeness of words in the network.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
R		R
data-raw		data-raw
data		data
dev		dev
inst		inst
man		man
tests		tests
www		www
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NAMESPACE.back		NAMESPACE.back
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
_pkgdown.yml		_pkgdown.yml
netword_hex.svg		netword_hex.svg
ses		ses
txtnet_hex.svg		txtnet_hex.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

{networds} - package to build text network

{networds} - a package to build graphs from text

Extracting co-occurences and relations in text

Installation

Example

Basic usage

Similar Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

{networds} - package to build text network

{networds} - a package to build graphs from text

Extracting co-occurences and relations in text

Installation

Example

Basic usage

Similar Projects

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages