← Volver a publicaciones

SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan

Roger Boada, Marc Guasch, Juan Haro, Josep Demestre, Pilar Ferré

Behavior Research Methods (febrero de 2020), 52(1), 360-375

Abstract

SUBTLEX-CAT is a word frequency and contextual diversity database for Catalan, obtained from a 278-million-word corpus based on subtitles supplied from broadcast Catalan television. Like all previous SUBTLEX corpora, it comprises subtitles from films and TV series. In addition, it includes a wider range of TV shows (e.g., news, documentaries, debates, and talk shows) than has been included in most previous databases. Frequency metrics were obtained for the whole corpus, on the one hand, and only for films and fiction TV series, on the other. Two lexical decision experiments revealed that the subtitle-based metrics outperformed the previously available frequency estimates, computed from either written texts or texts from the Internet. Furthermore, the metrics obtained from the whole corpus were better predictors than the ones obtained from films and fiction TV series alone. In both experiments, the best predictor of response times and accuracy was contextual diversity.

Cita APA

Boada, R., Guasch, M., Haro, J., Demestre, J., Ferré, P. (2020). SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan. Behavior Research Methods, 52(1), 360-375. https://doi.org/10.3758/s13428-019-01233-1