Preview

Science Editor and Publisher

Advanced search

Author keywords and editorial terms in the abstract database: a statistical analysis of differences

Abstract

Author keywords, unlike terms assigned by professional indexers, are not regulated by normative documents or controlled by special dictionaries. The aim of this study is to identify statistical differences between two sets of keywords (KWs): those assigned by authors, on the one hand, and those assigned by editors of abstract database of VINITI RAS, on the other. It is believed that confirming and understanding these differences may be useful for more rational use of keywords obtained from various sources. A comparative analysis of quantitative indicators of the novelty and lexical diversity of author and editorial KWs was conducted for the first time in this study. A comparison of the inclusion measures of author and editorial KWs in other metadata elements was conducted for the first time on several independent thematic samples. The methodological basis of the study is generalization—the identification and quantitative analysis of common features inherent in the studied data arrays. The empirical base of the study consisted of five independent statistical samples, the size of which varied from 10.40 thousand to 18.97 thousand articles. The topics of the samples corresponded to five headings of the State Rubricator of Scientific and Technical Information: 52. Mining; 53. Metallurgy; 55. Mechanical Engineering; 61. Chemical Technology. Chemical Industry; 73. Transport. We selected Russian-language articles uploaded to the VINITI abstract database in 2021–2024 and simultaneously containing the following non-empty metadata elements: title, author’s keywords, author’s abstract, editor’s keywords, and an abstract specially prepared for the VINITI abstract database. For each sample and separately for author’s and editor’s KWs, point statistical estimates of the identified common features were obtained: lexical diversity, novelty, and inclusion of keywords in other metadata elements (title and abstract). Similar statistical differences of author’s and editor’s KWs were observed across all five thematic collections: the degree of lexical diversity in author-generated KWs is higher than that of editor-generated terms; the novelty coefficient of author-generated KWs is higher than that of editor-generated terms; the novelty coefficient of author-generated annotations is higher than that of abstracts; and the degree of inclusion of author-generated KWs in article titles is lower than the degree of inclusion of editor-generated terms. Replication of the identified differences across five independent thematic samples, corresponding to randomly selected fields of knowledge, suggests the statistical stability of these differences. The vocabulary of author KWs is more variable over time compared to the more stable vocabulary of editor-generated terms, which may be useful for the rapid identification of new terminology and scientific frontiers. Unlike editor-generated KWs, author-generated KWs cannot independently express the main themes and concepts of a document, as they supplement the terms that can be extracted from publication titles.

About the Authors

Oleg V. Fedorets
All-Russian Institute for Scientific and Technical Information, Russian Academy of Sciences (VINITI RAS), Moscow, Russian Federation
Russian Federation

Cand. Sci. (Eng.), Head of Automation Tools Laboratory



Nataliya S. Soloshenko
All-Russian Institute for Sci- entific and Technical Information, Russian Academy of Sciences (VINITI RAS), Moscow, Russian Federation
Russian Federation

Cand. Sci. (Educ.), Head of the Acquisitions Department



References

1. Chen Y. N., Ke H. R. A study on mental models of taggers and experts for article indexing based on analysis of keyword usage. Journal of the Association for Information Science and Technology. 2014; 65(8): 1675-1694. https://doi.org/10.1002/asi.23077

2. Cobo M. J., López-Herrera A. G., Herrera-Viedma E., Herrera F. An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field. Journal of informetrics. 2011; 5(1): 146-166. https://doi.org/10.1016/j.joi.2010.10.002

3. Gbur, E. E., & Trumbo, B. E. Key Words and Phrases — The Key to Scholarly Visibility and Efficiency in an Information Explosion. The American Statistician, 1995; 49(1), 29–33. https://doi.org/10.1080/00031305.1995.10476108

4. Gil-Leiva I., Alonso-Arroyo A. Keywords given by authors of scientific articles in database descriptors. Journal of the American society for information science and technology. 2007; 58(8): 1175-1187. https://doi.org/10.1002/asi.20595

5. Golub, K. Automated Subject Indexing: An Overview. Cataloging & Classification Quarterly, 2021; 59(8), 702–719. https://doi.org/10.1080/01639374.2021.2012311

6. González, L. M., García-Massó, X., Pardo-Ibañez, A., Peset, F., & Devís-Devís, J. An author keyword analysis for mapping Sport Sciences. PloS one, 2018; 13(8), e0201435. https://doi.org/10.1371/journal.pone.0201435

7. Hjørland, B. Indexing: Concepts and Theory. KNOWLEDGE ORGANIZATION, 2018; 45(7), 609–639. https://doi.org/10.5771/0943-7444-2018-7-609

8. Lu W., Li X., Liu Z., Cheng Q. How do author-selected keywords function semantically in scientific manuscripts? Knowledge Organization: KO. 2019; 46(6): 403. https://doi.org/10.5771/0943-7444-2019-6-403

9. Lu W., Liu Z., Huang Y., Bu Y., Li X., Cheng Q. How do authors select keywords? A preliminary study of author keyword selection behavior. Journal of Informetrics. 2020; 14(4): 101066. https://doi.org/10.1016/j.joi.2020.101066

10. Lu, W., Huang, S., Yang, J., Bu, Y., Cheng, Q., & Huang, Y. Detecting research topic trends by author-defined keyword frequency. Information Processing & Management, 2021; 58(4), 102594. https://doi.org/10.1016/j.ipm.2021.102594

11. Malvern, D., Richards, B., Chipere, N., & Durán, P. Lexical Diversity and Language Development (3), 2004; Palgrave Macmillan UK; https://doi.org/10.1057/9780230511804

12. McCarthy, P. M., & Jarvis, S. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 2010; 42(2), 381–392. https://doi.org/10.3758/BRM.42.2.381

13. Nabilah, N., Zafrullah, Z., Nakamo, S. J., & Mwakapemba, M. L. Mapping the Evolution of Research Themes on ChatGPT Integration in Education: Thematic and Novelty Keywords. Elementaria: Journal of Educational Research, 2025; 3(1), 34–44. https://doi.org/10.61166/elm.v3i1.90

14. Pearson, W. S. Research Topics in Applied Linguistics as Keywords from Authors and Keywords from Abstracts: A Bibliometric Study. In A Scientometrics Research Perspective in Applied Linguistics. 2024; pp. 113-134. Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-51726-6_5

15. Peset, F., Garzón‐Farinós, F., González, L., García‐Massó, X., Ferrer‐Sapena, A., Toca‐Herrera, J., & Sánchez‐Pérez, E. Survival analysis of author keywords: An application to the library and information sciences area. Journal of the Association for Information Science and Technology, 2020; 71(4), 462–473. https://doi.org/10.1002/asi.24248

16. Powell, J., Klein, M., & Balakireva, L. Combining keyphrase extraction and lexical diversity to characterize ideas in publication titles (3; Version 1). arXiv, 2022; https://doi.org/10.48550/ARXIV.2208.13978

17. Singh, P., Singh, V. K., & Kanaujia, A. Exploring the Publication Metadata Fields in Web of Science, Scopus and Dimensions: Possibilities and Ease of doing Scientometric Analysis. Journal of Scientometric Research, 2024; 13(3), 715–731. https://doi.org/10.5530/jscires.20041144

18. Song, C., Chen, K., Jin, Y., Chen, L., & Huang, Z. Visual analysis of research hotspots and trends in traditional Chinese medicine for depression in the 21st century: A bibliometric study based on citespace and VOSviewer. Heliyon, 2025; 11(1), e39785. https://doi.org/10.1016/j.heliyon.2024.e39785

19. Stapleton, S., Dinsmore, C., Van Kleeck, D., & Ma, X. Computer-assisted Indexing Complements Manual Selection of Subject Terms for Metadata in Specialized Collections. College & Research Libraries, 2021; 82(6). https://doi.org/10.5860/crl.82.6.792

20. Tripathi M., Kumar S., Sonker S. K., Babbar, P. Occurrence of author keywords and keywords plus in social sciences and humanities research: A preliminary study. COLLNET Journal of Scientometrics and Information Management. 2018; 12(2): 215-232. https://doi.org/10.1080/09737766.2018.1436951

21. Uddin S., Khan A. The impact of author-selected keywords on citation counts. Journal of Informetrics. 2016; 10(4): 1166-1177. https://doi.org/10.1016/j.joi.2016.10.004

22. Gao, J., & Wang, X. Exploring research hotspots and trends in the field of intelligent diagnosis and treatment from a bibliometric perspective: A comprehensive analysis of Citespace and VOSviewer. Proceedings of the 2024 5th International Symposium on Artificial Intelligence for Medicine Science, 2024; 108–113. https://doi.org/10.1145/3706890.3706908

23. Wei, R. Z., Liu, X. Y., & Lyu, P. H. Bibliometrics of public administration research hotspots: Topic keywords, author keywords, keywords plus analysis. Heliyon, 2024; 10(21). https://doi.org/10.1016/j.heliyon.2024.e39352

24. Yang, S., Han, R., Wolfram, D., & Zhao, Y. Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis. Journal of informetrics, 2016; 10(1): 132-150. https://doi.org/10.1016/j.joi.2015.12.003

25. Yang, J., Lu, W., Hu, J., & Huang, S. A novel emerging topic detection method: A knowledge ecology perspective. Information Processing & Management, 2022; 59(2), 102843. https://doi.org/10.1016/j.ipm.2021.102843

26. Yang J., Liu Z., Cheng X., Ye G. Understanding the keyword adoption behavior patterns of researchers from a functional structure perspective. Scientometrics. 2024; 129(6): 3359-3384. https://doi.org/10.1007/s11192-024-05031-1

27. Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 2016; 67(4), 967–972. https://doi.org/10.1002/asi.23437

28. Akoev M.A. Mapping science and technology, forecasting research and development. Rukovodstvo po naukometrii: indikatory razvitiia nauki i tekhnologii / Guide to Scientometry: indicators of science and technology development. Ural University Publishing House. 2014; 164-184. (In Russ.) https://doi.org/10.15826/B978-5-7996-1352-5.0007

29. Dubinina E.Yu. Selection of Keywords in a Scientific Article in the Process of Creating an Automatic Abstract. Vestnik Voronezhskogo gosudarstvennogo universiteta. Seriya: Filologiya. Zhurnalistika / Proceedings of Voronezh State University. Series: Philology. Journalism. 2020; (1): 26-28. (In Russ.). http://www.vestnik.vsu.ru/pdf/phylolog/2020/01/2020-01-06.pdf

30. Timoshenko I.V. The current trends in the development of methods and regulatory framework for indexing library information resources. Nauchnye I Tekhnicheskie biblioteki / Scientific and Technical Libraries. 2024; (10):102-122. (In Russ.). https://doi.org/10.33186/1027-3689-2024-10-102-122

31. Tikhonova E.V., Kosycheva M.A. Effective Keywords: Strategies for their Formulation. Health, food & biotechnology. 2021; 3(4): 7-15. (In Russ.) https://doi.org/10.36107/hfb.2021.i4.s122

32. Shcherbinina G.S. The philosophy of coordinate indexing. Nauchnye I Tekhnicheskie biblioteki / Scientific and Technical Libraries. 2000; (10):102-122. (In Russ.). https://www.gpntb.ru/win/ntb/ntb2000/9/f09_08.html


Review

For citations:


Fedorets O.V., Soloshenko N.S. Author keywords and editorial terms in the abstract database: a statistical analysis of differences. Science Editor and Publisher. (In Russ.)

Views: 1


ISSN 2542-0267 (Print)
ISSN 2541-8122 (Online)