Training models is resource intensive and out of reach, economically, for most research groups. In contrast, these research groups are the ones who carry most of the domain knowledge. The problem that arises is that those who need the models, are those who have limited computational resources available for producing advanced models trained on vast amounts of data.
By letting researchers construct tests that gives models different scores, researchers can see which existing foundation models are the most suitable. Furthermore, it gives the ML companies an overview of tests where models underperform which can motivate them to improve models in these areas.
Then benchmarks can be found here: https://ml-peg.stfc.ac.uk/
The Machine Learning Performance and Extrapolation Guide (ML-PEG) gives a summary of different foundation models along with their scores in certain areas. Some (at the time of writing) examples of areas are 'molecular reactions', 'electric field', and 'thermochemistry', etc.
Example: The tests are defined under each area, where each test has its own table. Each table/test carries with it its own documentation. For example, in one test under 'molecular crystals', the lattice energy of 13 different ice polymorphs is measured, and the MAE is across this data set is reported for a number of foundation models. All such metrics are then combined into an overall score. There are more tests within the 'molecular crystals' area, and each foundation model is assigned a score to each of these tests, which are combined to give each of them a 'molecular crystals' score. The foundation models are then ranked in terms of their scores in each area.
The scores are presented visually with a table, but each entry is interactive. In general, a metric such as the MAE is related to some correlation plot. In the above example, when pressing an entry (an MAE value) the user is presented with a correlation plot with labelled entries. Hovering over a data point reveals the prediction, reference value and the polymorph. By pressing the data point the user is presented with a 3D model of the polymorph, which is also interactive.
python3 -m pip install ml-pegIn summary, the interactive test scores require an app to make the results interactive. But in order to get results, there must be some analysis which---in turn---require calculations.
Thus, the order of operations are calculation > analysis > app. Here we go through the steps needed to add a new test and how to run them.
Start by creating a new branch with a suitable name, clone it and ajust according to the instructions given below.
This section controls the model evaluation. Here, data is downloaded from some source which means it needs to be uploaded somewhere. Ideally the data should not be sensitive in any way so only make accessible data from existing publications or data that could have been generated by anyone. The ml-peg suite currently offers functionality do automatically download from a github source. This is what should be used, currently.
Alter the file ml_peg/calcs/TEMPLATE_AREA/TEMPLATE_TEST/calc_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.
Calculations are run using the following terminal commands:
ml_peg calc --test TEST_NAME
ml_peg calc --test TEST_NAME --models name_of_modelwhere the last command only evaluates using specified models, whose tags can be found in ~/ml_peg/models/models.yml.
Alter the file ml_peg/analysis/TEMPLATE_AREA/TEMPLATE_TEST/analyse_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.
Analyses are run using the following terminal command:
ml_peg analyse --test scaling_polAlter the file ml_peg/app/TEMPLATE_AREA/TEMPLATE_TEST/analyse_TEMPLATE_TEST.py according to your needs. Only change where the comments are. The other parts should only be touched if absoluteley needed.
The app is run using the following terminal command:
ml_peg ??