Background

Training models is resource intensive and out of reach, economically, for most research groups. In contrast, these research groups are the ones who carry most of the domain knowledge. The problem that arises is that those who need the models, are those who have limited computational resources available for producing advanced models trained on vast amounts of data.

By letting researchers construct tests that gives models different scores, researchers can see which existing foundation models are the most suitable. Furthermore, it gives the ML companies an overview of tests where models underperform which can motivate them to improve models in these areas.

Then benchmarks can be found here: https://ml-peg.stfc.ac.uk/

Introduction

The Machine Learning Performance and Extrapolation Guide (ML-PEG) gives a summary of different foundation models along with their scores in certain areas. Some (at the time of writing) examples of areas are 'molecular reactions', 'electric field', and 'thermochemistry', etc.

Example: The tests are defined under each area, where each test has its own table. Each table/test carries with it its own documentation. For example, in one test under 'molecular crystals', the lattice energy of 13 different ice polymorphs is measured, and the MAE is across this data set is reported for a number of foundation models. All such metrics are then combined into an overall score. There are more tests within the 'molecular crystals' area, and each foundation model is assigned a score to each of these tests, which are combined to give each of them a 'molecular crystals' score. The foundation models are then ranked in terms of their scores in each area.

The scores are presented visually with a table, but each entry is interactive. In general, a metric such as the MAE is related to some correlation plot. In the above example, when pressing an entry (an MAE value) the user is presented with a correlation plot with labelled entries. Hovering over a data point reveals the prediction, reference value and the polymorph. By pressing the data point the user is presented with a 3D model of the polymorph, which is also interactive.

Installation

python3 -m pip install ml-peg

Methodology

In summary, the interactive test scores require an app to make the results interactive. But in order to get results, there must be some analysis which---in turn---require calculations.

Thus, the order of operations are calculation > analysis > app. Here we go through the steps needed to add a new test and how to run them.

Start by creating a new branch with a suitable name, clone it and ajust according to the instructions given below.

ml_peg/calcs

This section controls the model evaluation. Here, data is downloaded from some source which means it needs to be uploaded somewhere. Ideally the data should not be sensitive in any way so only make accessible data from existing publications or data that could have been generated by anyone. The ml-peg suite currently offers functionality do automatically download from a github source. This is what should be used, currently.

Setting up calculations

Alter the file ml_peg/calcs/TEMPLATE_AREA/TEMPLATE_TEST/calc_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.

Running calculations

Calculations are run using the following terminal commands:

ml_peg calc --test TEST_NAME
ml_peg calc --test TEST_NAME --models name_of_model

where the last command only evaluates using specified models, whose tags can be found in ~/ml_peg/models/models.yml.

ml_peg/analysis

Alter the file ml_peg/analysis/TEMPLATE_AREA/TEMPLATE_TEST/analyse_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.

Running analyses

Analyses are run using the following terminal command:

ml_peg analyse --test scaling_pol

ml_peg/app

Alter the file ml_peg/app/TEMPLATE_AREA/TEMPLATE_TEST/analyse_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.

Running the app

The app is run using the following terminal command:

ml_peg ??

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.github		.github
containers		containers
docs		docs
ml_peg		ml_peg
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
coding_style.md		coding_style.md
conftest.py		conftest.py
contributing.md		contributing.md
pyproject.toml		pyproject.toml
release.sh		release.sh
run_analysis.sh		run_analysis.sh
run_app.py		run_app.py
run_calcs.sh		run_calcs.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Introduction

Installation

Methodology

ml_peg/calcs

Setting up calculations

Running calculations

ml_peg/analysis

Running analyses

ml_peg/app

Running the app

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Background

Introduction

Installation

Methodology

ml_peg/calcs

Setting up calculations

Running calculations

ml_peg/analysis

Running analyses

ml_peg/app

Running the app

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages