Evaluate AI

Unlocking AI Transparency: Empowering Trust Through Precision Evaluation.

By Team "Hackatlopi", Junction 2023, addressing the Outokumpu Sustainable AI challenge

The goals of our solution

We aim to built more advanced and sustainable AI experiences by echieving what is not sufficiently provided by any other tools:

Evaluations of the environmental impact of training and deploying LLMs*
Evaluations of LLMs’ interpretability and explainability*
Ways to check with AI if information generated by AI is correct or wrong

*features partially under development

How do we plan to achieve them?

A comprehensive solution designed to assess the reliability, interpretability, and resource utilization of any Large Language Model (LLM) tool currently in use. This tool aims to provide a thorough evaluation, ensuring that the LLM's trustworthiness is upheld, its interpretability is clear, and it optimally utilizes resources in a production environment, therefore prooving long-term planning.

The tool helps to test the trustworthiness and sustainability of an LLM model based on the following criteria:

Explainability
Reproducibility
Fairness
Factuality and precision*
CPU use / computer resources usage*
Query response time

While building the prototype, we inspired from such resources as:

Conceptual frameworks (AI Verify, Vertex AI)
Fact-checking solutions (EvalAI, Factinsect)
Analysis and MLOps (Comet ML, Snorkel AI, W&B)

*features partially under development

Our prototype

How did we built it?

Our team was pleased to have a wide range of diverse specialists, starting from full-stack development and AI/ML, and ending with project management and business. We successfully used collaboration tools and streamlined our team work.

The tech stack we used consists of:

Python - as our main programming language
Llama Index - for deeper LLM understanding and insights
OpenAI tools - to power the intelligence and decision making
Docker - for making it scalable
Vue (with Tailwind) - for beautiful design

Our future roadmap

Develop the feature that would generate suggestions on how to improve LLM models tested with our tool.
Improve UI and front-end side of the tool, so that it is easily accessible and usable by larger audiences.
Add and improve the feature that helps to analyze physical metrics of LLM models, more specifically GPU, CPU consumption.
Test the existing tool with at least 20 LLM models to understand the efficiency of the built tool. Make improvements based on the conclusions from testing.

Additional resources

Some more cool resources about our project:

Video demo of our prototype

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
backend		backend
frontend		frontend
model_runner		model_runner
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluate AI

Unlocking AI Transparency: Empowering Trust Through Precision Evaluation.

The goals of our solution

How do we plan to achieve them?

Our prototype

How did we built it?

Our future roadmap

Additional resources

Thank you

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluate AI

Unlocking AI Transparency: Empowering Trust Through Precision Evaluation.

The goals of our solution

How do we plan to achieve them?

Our prototype

How did we built it?

Our future roadmap

Additional resources

Thank you

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages