Skip to content

manaazarm/open-model-popularity

Repository files navigation

Open Model Safety Lab

This repository contains the work done in our Open Weight Large Language Models Safety (LLM) Lab. Our main goal is to test different open weight models and tools for text classification, summarization, extraction, and question-answering, particularly in the context of academic papers and research documents. We also work on retrieving research papers from Semantic Scholar and other sources.

Official Docs Page

docs

Directory Structure

├──.env
├──.git
├──.gitignore
├──amazon_comprehend
└──Building_AI_Papers_Repository_by_Industry
   ├──Download_arXiV
   └──Retrieve_ExternPaperID_arXiv
└──datasets 
   ├──03_30 Scopus Papers Database.csv
   ├──df_with_overlap_and_id.csv
   ├──exceeding_tokens.csv
   ├──fewshot_industry.csv
   ├──groq_mixtral_answers.csv
   ├──HF_dataset_4_11_cleaned.csv
   ├──llm_training_data_fta.csv
   ├──llm_training_data_full_text.csv
   ├──mx_answers_by_abstract.csv
   ├──overlap_df_full_text.csv
   ├──ra_scopus_llm_full_text.csv
   ├──readme.md
   ├──Scopus_Database 4_11_2024.csv
   ├──scopus_full_text.csv
   ├──scopus_rag_answers.csv
   ├──scopus_rag_with_json.csv
   └──semantic_man_verified_papers_full_text.csv
├──DB_Create_Script.sql
├──DB_ERD.png
├──hf_models.csv
├──hf_models.txt
├──hub_models.py
├──Insights.sql
├──JSON
├──LICENSE
├──llm_application_papers_data.sql
├──llm_ERD.pgerd
├──main.py
├──paper_insights.sql
└──pics
   ├──DL_percentiles.csv
   ├──models_by_task.png
   ├──number of popular models per task.png
   └──tasks performed by popular models.png
├──playground.ipynb
└──RAG_mixtral
   └──mixtral-milvus-rag.ipynb
├──README.md
├──Retrieving_Papers_Semantic_Scholar
└──Retrieving_ResearchPapers_SemanticScholar
   ├──Amazon_Comprehend
   ├──Step 2. Bulkdownload_SemanticScholars_Paperswithover1000
   ├──Step 3. RetrieveInformation_BulkDownload
   ├──Step 4. Create master dataset
   └──Step1. SearchResultYield_isbelow_1000
├──Scopus_Papers_Cleanup.sql
├──tags.csv
└──testing
   ├──ollama.py
   └──test.ipynb
└──text_inferencing
   ├──inferencing-phi-2.ipynb
   ├──inferencing_mistral-7B.ipynb
   ├──llama_8b_groq.ipynb
   ├──llm_training_data_full_text.json
   ├──mixtral_8x7b_groq.ipynb
   ├──mixtral_answers_by_abstract.ipynb
   └──semantic_man_verified_full_text.json
└──text_extraction
   ├──pdf_text_extraction.ipynb
   └──scopus_text_extraction.ipynb
└──text_summarization
   ├──text_summarization_using_bert.ipynb
   └──text_summarization_using_phi-2.ipynb
├──text_sum_llm.py
├──Updated Google Scholar Code
└──utils
   ├──add_data_milvus.py
   ├──config_serverless.ini
   ├──GroqApi.py
   ├──OllamaApi.py
   ├──main.py
   ├──PDFExtractor.py
   ├──rag-tester.py
   ├──RAG_Milvus.py
   ├──TextSummarizer.py
   └──JSONExtractor.py
    

About

Experiments on open model popularity prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors