A reviewer identification using machine learning methods

Denis Yu. Bolshakov

doi:10.24069/SEP-25-35

A reviewer identification using machine learning methods

Denis Yu. Bolshakov

https://doi.org/10.24069/SEP-25-35

Full Text:

PDF (Rus)

Generate QR code

Abstract

This article addresses the task of automatically assigning peer reviewers based on historical data from previously submitted and reviewed manuscripts. In conventional editorial practice, reviewer selection relies heavily on the subjective judgment of editors, which can lead to delays and inconsistencies in the quality of expert evaluation. The purpose of this study is to demonstrate that simple natural language processing (NLP) models can be used to automate this process in an efficient and transparent manner. The dataset used in this research consists of both published and rejected articles submitted to the Almaz-Antey Air and Space Defense Corporation Journal, enriched with information about the reviewers assigned to each manuscript. Methodologically, the approach relies on basic text preprocessing, including lemmatization, removal of stop words and punctuation, followed by vectorization using bag-of-words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) models. Text similarity is calculated via cosine distance between vectorized representations. The core assumption is that a newly submitted manuscript is most similar to an already reviewed one and, therefore, can be assigned to the same reviewers. The results indicate that simple frequency-based models (BoW, TF-IDF) achieve higher accuracy in reviewer assignment (up to 99%) compared to neural network approaches such as Doc2Vec, especially when enhanced with a reviewer co-review graph. The proposed method remains interpretable, requires minimal computational resources, and is fully compatible with office-level computing environments. The model has been shown to perform reliably under class imbalance and is applicable even to relatively small datasets, starting from around 30 manuscripts. However, its generalization to multi-journal editorial systems would require local adaptation, and the task of predicting publication outcomes calls for significantly larger corpora and the use of deep learning architectures. This approach can be seamlessly integrated into digital editorial platforms, contributing to faster decision-making, increased transparency in peer review, and reduced workload for journal staff.

Keywords

computational linguistics, cosine distance, bag-of-words model, TF-IDF model, machine learning, lemmatization, regular expressions, natural language processing

About the Author

Denis Yu. Bolshakov

http://www.almaz-antey.ru/zhurnal-vestnik-kontserna-pvo-almaz-antey/
Almaz– Antey Air and Space Defence Corporation

Cand. Sci. (Eng.), Head of the Department of Scientific and Technical Issues and Special Projects of the Office of the Director General, Almaz– Antey Air and Space Defence Corporation, JSC, Deputy Editor-in-Chief of the Journal of “Almaz– Antey” Air and Space Defence Corporation

References

1. Turing A. Computing Machinery and Intelligence. Mind. 1950;59(236):433–460. https://doi.org/10.1093/mind/LIX.236.433

2. Goldberg Y. Neural Network Methods for Natural Language Processing. Cham: Springer; 2017. 312 p. (Synthesis Lectures on Human Language Technologies). https://doi.org/10.1007/978-3-031-02165-7

3. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805v2 [cs.CL]. 2019 May 24. https://doi.org/10.48550/arXiv.1810.04805

4. Zhu X., Zhang M., Hong Y., He R., editos. Natural Language Processing and Chinese Computing. Proceedings of the 9th CCF International Conference, NLPCC 2020, (Zhengzhou, October 14–18, 2020). Cham: Springer; 2020. 857 p. (Lecture Notes in Computer Science. Vol. 12430). https://doi.org/10.1007/978-3-030-60450-9

5. Jia J., Liang W., Liang Y. A review of hybrid and ensemble in deep learning for natural language processing. arXiv preprint arXiv:2312.05589. 2023. https://doi.org/10.48550/arXiv.2312.05589

6. Jurafsky D., Martin J.H., Kehler A., Linden K. V., Ward N. Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Upper Saddle River, NJ: Prentice-Hall; 2000. 934 p.

7. Bolshakova E. I., Vorontsov K. V., Efremova N. E., Klyshinskiy E. S., Lukashevich N. V., Sapin A. S. Automatic natural language processing and data analysis. Moscow: NRU HSE Publ.; 2017. 269 p. (In Russ.).

8. Bhattacharya S., Mazumder A., Banerjee A., Bandyopadhyay C., Nandi S. Automated Reviewer Assignment Process Using Machine Learning Technique. In: Patel A., Kesswani N., Mishra M., Meher P., editos. Advances in Machine Learning and Big Data Analytics (ICMLBDA 2023). Cham: Springer; 2025, pp. 87–99. (Springer Proceedings in Mathematics & Statistics. Vol. 441). https://doi.org/10.1007/978-3-031-51338-1_7

9. Tan S., Duan Z., Zhao S., Chen J., Zhang Y. Improved Reviewer Assignment Based on Both Word and Semantic Features. Information Retrieval Journal. 2021;24(2):175–204. https://doi.org/10.1007/s10791-021-09390-8

10. Adebiyi A., Ogunleye O., Adebiyi M., Okesola O. A Comparative Analysis of TF-IDF, LSI and LDA in Semantic Information Retrieval Approach for Paper-Reviewer Assignment. ARPN Journal of Engineering and Applied Sciences. 2019;14(10):3378–3382. https://doi.org/10.36478/jeasci.2019.3378.3382

11. Anjum O., Gong H., Bhat S., Hwu W.M., Xiong J. PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space. arXiv preprint arXiv:1909.11258. 2019 Sep. https://doi.org/10.48550/arXiv.1909.11258

12. Peng H., Hu H., Wang K., Wang X. Time-Aware and Topic-Based Reviewer Assignment. In: Bao Z., Trajcevski G., Chang L., Hua W., editos. Database Systems for Advanced Applications (DASFAA 2017). Cham: Springer; 2017:145-157. (Lecture Notes in Computer Science. Vol. 10179). https://doi.org/10.1007/978-3-319-55705-2_11

13. Li C. L., Hu X., Xu M. H., Li K. , Zhang Y., Cheng X. Z. Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study. arXiv:2506.17311v1 [cs.CY]. 2025 June 18. https://doi.org/10.48550/arXiv.2506.17311

14. Liang W. X., Zhang Y. H., Cao H. C., Wang B., Ding D., Yang X. et al. Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis. arXiv:2310.01783v1 [cs.LG]. 2023 Oct 3. https://doi.org/10.48550/arXiv.2310.01783

15. Lee J., Lee J., Yoo J.-J. The Role of Large Language Models in the Peer-Review Process: Opportunities and Challenges for Medical Journal Reviewers and Editors. Journal of Educational Evaluation for Health Professions. 2025;22:4. https://doi.org/10.3352/jeehp.2025.22.4

16. Vasiliev Yu. Natural language processing with python and spaCy: A practical introduction. San Francisco, CA: No Starch Press; 2020. 217 p.

17. Lane H., Howard C., Hapke H. M. Natural language processing in action: understanding, analyzing, and generating text with python. 1st ed. Shelter Island, NY: Manning Publications Co.; 2019. 544 p.

18. Bengfort B., Bilbro R., Ojeda T. Applied text analysis with python: enabling language-aware data products with machine learning. 1st ed. Sebastopol, CA: O’Reilly Media; 2018. 330 p.

19. Kiela D., Clark S. A Systematic Study of Semantic Vector Space Model Parameters. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and Their Compositionality (CVSC). Kerrville, TX: Association for Computational Linguistics; 2014, pp. 21–30. https://doi.org/10.3115/v1/W14-1503

20. Pugachev V. S. Probability theory and mathematical statistics. Moscow: Fizmatlit; 2002. 496 p. (In Russ.).

21. Bolshakov D. Yu. On Relations in Science: The case of the scientific journal editorial board. Scholarly Research and Information. 2021;4(1-2):23–32. https://doi.org/10.24108/2658-3143-2021-4-1-2-23-32

22. Bolshakov D. Yu. Supplement to the article “On relations in science: the case of the scientific journal editorial board”. Scholarly Research and Information. 2022;5(1):8–10. https://doi.org/10.24108/2658-3143-2022-5-1-2

23. Diestel R. Graph theory. 6th ed. Berlin, Heidelberg: Springer; 2025. 455 p. https://doi.org/10.1007/978-3-662-70107-2

24. van der Maaten L. J. P., Hinton G. E. Visualizing data using t-SNE. Journal of Machine Learning Research. 2008;9(86):2579–2605. Available from: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf (accessed: 13.02.2025).

25. Brezina V., Gablasova D. A frequency dictionary of British English: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2024. 340 p.

26. Davies M., Gardner D. A frequency dictionary of contemporary American English: Word sketches, collocates, and thematic lists. 1st ed. London, New York: Routledge; 2010. 368 p.

27. Buckwalter T., Parkinson D. A frequency dictionary of Arabic: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2011. 578 p.

28. Tiberius C., Schoonheim T. A frequency dictionary of Dutch: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2014. 320 p.

29. Davies M. H., Davies K. H. A frequency dictionary of Spanish: Core vocabulary for learners. 2nd ed. London, New York: Routledge; 2018. 350 p.

30. Xiao R., Rayson P., McEnery T. A frequency dictionary of Mandarin Chinese: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2009. 390 p.

31. Lee S. H., Jang S. B., Seo S. K. A frequency dictionary of Korean: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2017. 358 p.

32. Tschirner E., Möhring J. A frequency dictionary of German: Core vocabulary for learners. 2nd ed. London, New York: Routledge; 2020. 304 p.

33. Davies M., Raposo Preto-Bay A. M. A frequency dictionary of Portuguese: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2008. 336 p.

34. Miller C., Aghajanian-Stewart K. A frequency dctionary of Persian: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2018. 366 p.

35. Sharoff S., Umanskaya E., Wilson J. A frequency dictionary of Russian: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2013. 400 p.

36. Aksan Y., Aksan M., Mersinli U. U., Demirhan U. U. A frequency dictionary of Turkish: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2017. 349 p.

37. Lonsdale D., Bras Y. L. A frequency dictionary of French: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2009. 320 p.

38. Cermák F., Kren M. A frequency dictionary of Czech: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2011. 296 p.

39. Tono Y., Yamazaki M., Maekawa K. A frequency dictionary of Japanese: Core vocabulary for learners. 1st ed. London, New York: Routledge; 2013. 384 p.

Review

For citations:

Bolshakov D.Yu. A reviewer identification using machine learning methods. Science Editor and Publisher. 2025;10(1):32-49. (In Russ.) https://doi.org/10.24069/SEP-25-35

ISSN 2542-0267 (Print)
ISSN 2541-8122 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Science Editor and Publisher

A reviewer identification using machine learning methods

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy