Overview
This repository contains my senior mathematics research project, which analyzes the social relationships and behavioral trends among adult female elephants located in a protected nature preserve. The project applies methods in graph theory, ecology, and hypergraph modeling techniques to better understand ever-changing group dynamics, relationship strength, and patterns of interaction over time.
Objectives:
-Model elephant social interactions using mathematical and computational tools
-Identify patterns in group associations and relationship stability over time
-Explore trends in group size, centrality, and clustering over time
-Visualize and interpret social structures using Python-based data analysis and graph visualization tools
Methods & Tools Languages: Python
Key Libraries: hypernetx, pandas, networkx, matplotlib
Mathematical Concepts: Graph theory, hypergraphs, modularity analysis, clustering algorithms
Data: Social association data collected from observations within a nature preserve
Results and Script Details
For the sake of simplicity, both algorithms have been run with the 'resident' individuals in the study and any elephants they are connected to. So the hypergraph is built off of the elephants that had sightnings every year of the study and any elephants they were seen with. This script has not been ran with the full population as that would cause confusion when developing this project. In theory, the algorithm would behave the same whether given the full population or not, but the results and further analysis of this project will be mainly on the results achieved from the resident hypergraphs.
text is an algorithm that creates and visualizes the hypergraph per year of the study. The clustering is done by package HypernetX and its given clustering algorithms. The intra-cluster edges are color coded with the nodes of that cluster while the inter-cluster edges are represented in a gray color. There are no duplicate edges in the hypergraph due to the HypernetX function 'collapse_edges'. This function aggregates edges that are duplicates and sums up their weights. Initally, each edge has a weight of 1 (representing 1 sighting) so any edge that has a weight n listed represents n sightings of individuals in the hyperedge. Weights that were equal to 1 were not represented in the visualization for clarity. The visualizations, stored in text, are a bit complicated to see unless zoomed in, but once zoomed in they offer lots of detail of the inner workings of each cluster. There we are able to see the nodes and their impacts on the intra-cluster edges more clearly. The results, stored in text, indicated that the modularity greatly increased using the HypernetX algorithms, as well as the number of communities detected. The individual elephant and their clusters per year are found at text
text Uses the same inital hypergraph and clustering algorithms above. There are no duplicate edges in the hypergraph due to the HypernetX function 'collapse_edges'. This function aggregates edges that are duplicates and sums up their weights. Initally, each edge has a weight of 1 (representing 1 sighting) so any edge that has a weight n listed represents n sightings of individuals in the hyperedge. Weights that were equal to 1 were not represented in the visualization for clarity. As a further step, each intra-hyperedge and inter-hyperedge for each cluster was aggregated into one hyperedge with a summation of all of the weights. The weight thus represents the total sightnings of all elephants in the respective cluster for that year. So, if a cluster containing elephants [A, B, C] where A was seen 3 times, B was seen 2 times, and C was seen 2 times, the intra-edge for this cluster would have a weight of 7. A similar method was taken for the inter-edges. If there was a second cluster [D, E], and A and D were seen together 2 times, then the inter-hyperedge between the two clusters would have a weight of 2. The visualizations, stored in text, offer a more clear visualization than the main clustering vizualizations but it does lose some detail regarding the individual behaviors of the elephants. The results, stored in text, indicated that the modularity greatly increased using the HypernetX algorithms (though often a bit less than the non-condensed clusters), as well as the number of communities detected. The individual elephant and their clusters per year are found at text
This script's purpose is to compare the results and consistency with the hypergraph model to the regular graph model. The results can be found at text The ARI score and mutual information score are independent of the cluster labels and calculate the similarity of the groupings. The ARI score measures the pairwise agreement and the mutual information score calculates the information overlap between the two groups. Both metrics result in a very low similarity between the hypergraphs and the graph clustering. In addiiton, the confusion matrices found at
and
represent the basic pairwise agreement, similar to the ARI. There is the pure statistics and the ratios listed, which compare the true and false negatives and positives of the consistency of the hypergraph clusterings and the regular graph clusterings.
This script finds occurance counts, marginal probabilities, and conditional probabilities given that there elephants A and B that are in the same cluster for n years, if there is a C that was in the same cluster as A and B in the first year, what are the probabilites C will be in the same cluster across the rest of the study. The timeframe of how often A and B are together can be changed as well as the number of groups looked at overall. Violin plots representing P(C|A & B) and P(A & B|C) are saved as well.
This script analyzes elephant social associations across time by focusing on pairs of elephants A and B that were in the same cluster in 2007. For each pair, a third elephant C is selected from the same cluster in that year, and the script computes the probability that C will appear in the same cluster as A and B both in the baseline year and in subsequent years after a specified time jump. It calculates occurrences, marginal and conditional probabilities, including Bayesian probabilities. The number of years to examine after the time jump and the number of elephant pairs analyzed can be adjusted. Results are visualized using violin plots to compare P(C | A & B) in the baseline year versus after the time jump, and a histogram illustrates the distribution of differences in probabilities over time.
This script examines at how often groups of three elephants (A, B, and C) are seen together over the years 2007–2011. It first loads the cluster information and daily sightings for each elephant. For each year, it picks triples where C is likely to be seen with A and B (0.5 or greater). It calculates four probabilities for each triple: P(C|AB), P(AB|C), P(A or B | not C). Then, it “removes” C from the sightings and recalculates these probabilities to see how the removal changes the relationshops. All the results are saved to a text file found in the respective experiment folder within text. The script then generates violin plots comparing the distributions of each probability before and after C is removed, allowing visualization of how removing a single elephant affects dyadic and triadic co-occurrence patterns over time.
Acknowledgements Special thanks to my mentors and the Mathematics Department at UC San Diego for guidance and support.