Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words

Eneko Sendín, Javier Conde, Pedro Reviriego, Juan Haro, Pilar Ferré, José A. Hinojosa, Marc Brysbaert

Psicológica (10 de septiembre de 2025), 46(2), e17563

Abrir PDF Articulo en la revista

Abstract

This study examined the ability of a large language model, GPT-4o mini, to predict age of acquisition (AoA) for Spanish words, as compared to human ratings. We found a strong correlation (ρ=.75) between the model's AoA estimates and mean human ratings. This correlation was lower than the level of agreement observed between individual human raters (ρ=.85), but we found that finetuning the model on a relatively small dataset of 2000 human AoA ratings has the potential to enhance the model's performance to a level comparable to human consensus. Consistent with theoretical expectations, our analyses confirmed that AoA estimates are meaningful only for words within an individual's vocabulary. Finally, we present a novel dataset of AoA estimates for 28,453 Spanish words likely known by adult speakers.

PDF

Cita APA

Sendín, E., Conde, J., Reviriego, P., Haro, J., Ferré, P., Hinojosa, J. A., Brysbaert, M. (2025). Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words. Psicológica, 46(2), e17563. https://doi.org/10.20350/digitalCSIC/17563

Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words

Abstract

PDF

Cita APA

Contacto

Enlaces