Published November 11, 2020 | Version v1
Dataset Open

Au38Q MBTR-K3

Description

Purpose
The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change.

 

Background
The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science.  The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe version 0.4.0. The dataset was first used to probe the properties of Minimal Learning Machine in paper Do Randomized Algorithms Improve the Efficiency of
Minimal Learning Machine?
by Linja et al.

 

Description
The dataset contains nine variants of the same idea. In each, an observation refers to a MBTR description of the structural angles of the Au38Q hybrid nanoparticle of a single timestep in a DFT simulation and the potential energy of the said nanoparticle at the timestep. The input space is the MBTR description and the output space is the potential energy. Features refer to the output of the MBTR descriptor, here used as the input.

We used three different numbers of observations and three different numbers of descriptor accuracies. Regarding the the number of observations, we used RS-maximin to find out the most different observations available and used the first 4000 and first 8000 as the selections in 4k and 8k variants. Regarding the number of features, we used different descriptor accuracy values [2,10,100] that produced descriptors of lengths [80,400,4000]. This allowed the number of features to represent the data description resolution. Downsampling of the number of features from 4000 to lower numbers was not used.

Further details are presented in paper Do Randomized Algorithms Improve the Efficiency of
Minimal Learning Machine?
by Linja et al.

Files

Au38Q_MBTR-K3-dat.zip

Files (1.3 GB)

Name Size Download all
md5:e8a06590b032de9714da648190dfc3c9
896.4 MB Preview Download
md5:df993295c1bb8f85c89c2085c695eb64
437.1 MB Preview Download

Additional details

Related works

Is supplement to
Journal article: 10.3390/make2040029 (DOI)

Funding

Structure prediction of hybrid nanoparticles via artificial intelligence (HNP-AI) / Consortium: HNP-AI 315550
Academy of Finland

References

  • Juarez-Mosqueda, R.; Sami, M.; Hannu, H. Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating. The European Physical Journal D 2019, 73. doi:10.1140/epjd/e2019-90441-5
  • Himanen, L.; Jäger, M.O.J.; Morooka, E.V.; Federici Canova, F.; Ranawat, Y.S.; Gao, D.Z.; Rinke, P.; Foster, A.S. DScribe: Library of descriptors for machine learning in materials science. Computer Physics Communications 2020, 247, 106949. doi:10.1016/j.cpc.2019.106949
  • Pihlajamäki, A.; Hämäläinen, J.; Linja, J.; Nieminen, P.; Malola, S.; Kärkkäinen, T.; Häkkinen, H. Monte Carlo Simulations of Au38(SCH3)24 Nanocluster Using Distance-Based Machine Learning Methods. The Journal of Physical Chemistry A 2020. doi:10.1021/acs.jpca.0c01512