Repository files navigation Centre for Computational Biology - University of Birmingham (03.2020 - 06.2020)
Data Visualization, NLP
use text as input and analyse text with nlp methods (tokenize, punctuation, stop words, stemming, lemmatizing etc.)
text visualization
Find How much of the text reducal
Find How big is the dictinary
train word2vec models to find robust networks
measure the variation in word distances
vectorize words
sentence -> tokenize -> count frequency
train a word2vec neural network
visulaize the results
n-dimension vector -> 2-dimension vector -> visulaize
fetch all articles from pubmed with keywords ("antibiotic resistant")
parse articles (title, year, abstract)
save data as json
Read all articles and find country count with geotext and pycountry
Filter by publication type and exclude reviews
Then we have to word2vec model to vectorize text and find word embeddings
visualise data and extract insights from data
Write an interactive geographical map to show the number of studdies on the map
Use bokeh and seaborn to develop the visulization tool
How much is antimicrobial resistance reported at different geographical scales over time?
How does the emergence of AMR vary across time for different classes of antimicrobials?
Data preparation and Data integration
unsupervised clustering
simple linear regression
how to find relationship between countries publications with countries gdp ?
prepare input data
read all wdi excel files, filter by country and merge all of them
create correlation matrix and Visualize with R language (Spearman test and Hierarchical Clustering)
correlation between vector distance and metadata difference
create/train random forests model and desicion trees
RMSE – shows error about how my model Works, because we are doing regression not classfication, our aim to predict.
create feature importance – how often the feature is used in the model for the predict
we want to see what paramters are more important? Visualise importance of features, how much you can predict that
find Local Importance with using SHAP
firstly create clustering algorithm to capture information about clusters then Dimension reduction algorithm
Check for Countries what factors are most important ?
create shap summary plot
local importance visualize for some countries
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.