# Application and comparison of regression methods for force constant extraction using hiphive

Erik Fransson, **Fredrik Eriksson**, and Paul Erhart

Chalmers University of Technology, Sweden

The extraction of harmonic and anharmonic force constants from ab-initio calculations is often the bottleneck when calculating thermal properties.
The commonly used direct approach is robust but suffers from poor scaling with respect to system symmetry, size, and force constant expansion order.
During the last decade regularized regression based approaches have proved viable for efficient and accurate extraction of anharmonic force constants.
Even though computational effort is saved at the ab-initio level the regression itself can still be a formidable task.
To this point no consensus exists about what regularization method to use nor have different regression methods been comprehensively compared.
In order to use regression methods in e.g., high throughput computations, robust methods for sampling and fitting must be developed.
Thus a flexible framework which can easily interface to other codes is needed in order to study and benchmark different methods.
Here, we present the *hiphive* package, which is entirely written in Python for easy accessibility and interfaces well with libraries such as scikit-learn, which provides a rich set of methods for linear regression and validation.
The core goal of *hiphive* is to focus on the extraction of force constants while leaving the analysis (e.g., phonon dispersions or thermal conductivity) to other more specialized packages.
We show that ordinary least-squares, especially in connection with feature elimination, often yields the best performance in terms of convergence with respect to training set size and sparsity of the solution.
The automatic relevance determination regression (ARDR) method also shows promising performance.
Regression based on the least absolute shrinkage and selection operator (LASSO) on the other hand, while useful in some cases, tends to yield a larger number of features, with a noise level that has a detrimental effect on the prediction of e.g., the thermal conductivity.
Finally, we also consider methods for the prediction of the temperature dependence of vibrational spectra from high-order FC expansions via molecular dynamics simulations as well as self-consistent phonons.

- E. Fransson, F. Eriksson and P. Erhart, arXiv:1902.01271 [cond-mat.mtrl-sci] 2019
- hiPhive — High-order force constants for the masses. https://hiphive.materialsmodeling.org