Open Model Safety Lab

This repository contains the work done in our Open Weight Large Language Models Safety (LLM) Lab. Our main goal is to test different open weight models and tools for text classification, summarization, extraction, and question-answering, particularly in the context of academic papers and research documents. We also work on retrieving research papers from Semantic Scholar and other sources.

Official Docs Page

docs

Directory Structure

├──.env
├──.git
├──.gitignore
├──amazon_comprehend
└──Building_AI_Papers_Repository_by_Industry
   ├──Download_arXiV
   └──Retrieve_ExternPaperID_arXiv
└──datasets 
   ├──03_30 Scopus Papers Database.csv
   ├──df_with_overlap_and_id.csv
   ├──exceeding_tokens.csv
   ├──fewshot_industry.csv
   ├──groq_mixtral_answers.csv
   ├──HF_dataset_4_11_cleaned.csv
   ├──llm_training_data_fta.csv
   ├──llm_training_data_full_text.csv
   ├──mx_answers_by_abstract.csv
   ├──overlap_df_full_text.csv
   ├──ra_scopus_llm_full_text.csv
   ├──readme.md
   ├──Scopus_Database 4_11_2024.csv
   ├──scopus_full_text.csv
   ├──scopus_rag_answers.csv
   ├──scopus_rag_with_json.csv
   └──semantic_man_verified_papers_full_text.csv
├──DB_Create_Script.sql
├──DB_ERD.png
├──hf_models.csv
├──hf_models.txt
├──hub_models.py
├──Insights.sql
├──JSON
├──LICENSE
├──llm_application_papers_data.sql
├──llm_ERD.pgerd
├──main.py
├──paper_insights.sql
└──pics
   ├──DL_percentiles.csv
   ├──models_by_task.png
   ├──number of popular models per task.png
   └──tasks performed by popular models.png
├──playground.ipynb
└──RAG_mixtral
   └──mixtral-milvus-rag.ipynb
├──README.md
├──Retrieving_Papers_Semantic_Scholar
└──Retrieving_ResearchPapers_SemanticScholar
   ├──Amazon_Comprehend
   ├──Step 2. Bulkdownload_SemanticScholars_Paperswithover1000
   ├──Step 3. RetrieveInformation_BulkDownload
   ├──Step 4. Create master dataset
   └──Step1. SearchResultYield_isbelow_1000
├──Scopus_Papers_Cleanup.sql
├──tags.csv
└──testing
   ├──ollama.py
   └──test.ipynb
└──text_inferencing
   ├──inferencing-phi-2.ipynb
   ├──inferencing_mistral-7B.ipynb
   ├──llama_8b_groq.ipynb
   ├──llm_training_data_full_text.json
   ├──mixtral_8x7b_groq.ipynb
   ├──mixtral_answers_by_abstract.ipynb
   └──semantic_man_verified_full_text.json
└──text_extraction
   ├──pdf_text_extraction.ipynb
   └──scopus_text_extraction.ipynb
└──text_summarization
   ├──text_summarization_using_bert.ipynb
   └──text_summarization_using_phi-2.ipynb
├──text_sum_llm.py
├──Updated Google Scholar Code
└──utils
   ├──add_data_milvus.py
   ├──config_serverless.ini
   ├──GroqApi.py
   ├──OllamaApi.py
   ├──main.py
   ├──PDFExtractor.py
   ├──rag-tester.py
   ├──RAG_Milvus.py
   ├──TextSummarizer.py
   └──JSONExtractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Model Safety Lab

Official Docs Page

Directory Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
RAG_mixtral		RAG_mixtral
Retrieving_ResearchPapers_SemanticScholar		Retrieving_ResearchPapers_SemanticScholar
SQL Queries		SQL Queries
building_AI_Papers_Repository_by_Industry		building_AI_Papers_Repository_by_Industry
experiments		experiments
testing		testing
text_extraction		text_extraction
text_summarization		text_summarization
.gitignore		.gitignore
DB_ERD.png		DB_ERD.png
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
text_sum_llm.py		text_sum_llm.py
tmp.ipynb		tmp.ipynb

Folders and files

Latest commit

History

Repository files navigation

Open Model Safety Lab

Official Docs Page

Directory Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages