DEVELOPING A METHODOLOGY FOR CLASSIFYING COMMITS  IN GIT-REPOSITORIES USING MACHINE LEARNING

Dmitriy Kopeliovich; Mihail Kurguz

doi:doi:10.30987/2658-6436-2026-2-27-33

Home / Journals / Automation and modeling in design and management / Volume 2026 Issue 2 / DEVELOPING A METHODOLOGY FOR CLASSIFYING COMMITS IN GIT-REPOSITORIES USING MACHINE LEARNING

DEVELOPING A METHODOLOGY FOR CLASSIFYING COMMITS IN GIT-REPOSITORIES USING MACHINE LEARNING

Submit manuscript Download PDF
Text

To cite

Citations:

DEVELOPING A METHODOLOGY FOR CLASSIFYING COMMITS IN GIT-REPOSITORIES USING MACHINE LEARNING

Journal: AUTOMATION AND MODELING IN DESIGN AND MANAGEMENT Volume 2026 № 2 , 2026

Rubrics: MATHEMATICAL AND COMPUTER MODELING

UDC 004.89

Dmitriy Kopeliovich ¹

Mihail Kurguz ²

Author and publication information

Authors:

1. Bryansk State Technical University (Engineering Center, Director)

Bryansk, Bryansk, Russian Federation

2. Bryansk State Technical University (kafedra "Komp'yuternye tehnologii i sistemy")
graduate student from 01.01.2024 until now

Bryansk, Bryansk, Russian Federation

Type:

Article

DOI:

https://doi.org/10.30987/2658-6436-2026-2-27-33

Pages:

from 27 to 33

Status:

Published

Received:

24.03.2026

Accepted:

15.04.2026

Published:

20.04.2026

Subject area:

UDC 004.89

Language:

Russian

Keywords:

machine learning, Git, commit classification, active learning, TF-IDF, Multinomial Naive Bayes

Abstract and keywords

Abstract:
This paper presents the development of a methodology and software for automatically classifying commits in Git-repositories using machine learning methods. The proposed approach combines text vectorization based on TF-IDF and the Multinomial Naive Bayes model for classifying commits into categories. The approach includes an active learning system that allows the user to adjust the proposed classifications, facilitating continuous model improvement. The methodology includes preprocessing commit descriptions, extracting semantic features, and building an adaptive classification model. The results of this work can be used to improve the transparency of development processes, to analyze change histories, to analyze and optimize code, and to automate testing and delivery of new modules of the project being developed to stakeholders (Continuous Integration / Continuous Delivery).

Keywords:
machine learning, Git, commit classification, active learning, TF-IDF, Multinomial Naive Bayes

References

1. Krupkin SA. Working with the Git Version Control System. Moscow: Moscow University Press; 2022.

2. Wang X, Jiang Y, Xu Y, et al. Automated Commit Classification for Git Repositories Using Machine Learning Technique. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft-ware Engineering (ESEC/FSE 2023): 2023. p. 112-124.

3. Conventional Commits. Conventional Commits Initiative. Version 1.0.0 [Internet]. 2019.

4. Ivanov N.N. Syntactic Parsing of a Sentence for Text Vectorization. Problems of Science and Education. 2017;11:45-46.

5. Zhang H., Jiang L., Yu H-K. A Literature Review on Naive Bayes Classifiers. Intelligent Data Analysis. 2020;24(1):37-57.

6. Gusev PY. Word Processing and Preparation of Vectorization Models for a Software Package for the Classification of Scientific Texts. Modelling, Optimization and Information Technology. 2021;9(1).

7. Terentyeva Yu. Sentiment Analysis, InSet Lexicon, SentiStrength Lexicon, Naive Bayes, Multinomial Naive Bayes, TF-IDF, Machine Learning. International Journal of Open Information Technologies [Internet]. 2024 [cited 2026 Jan 10];7. Available from: https://cyberleninka.ru/article/n/sentiment-analysis-inset-lexicon-sentistrength-lexicon-naive-bayes-multinomial-naive-bayes-tf-idf-machine-learning

8. Pascarella L. On the Use of Machine Learning Techniques for Software Engineering Tasks: A Systematic Literature Review. IEEE Transactions on Software Engineering. 2021;47(11):2301-2325.

9. Zhang Y., Wang H., Liu Z. A Comparative Study of Text Classification Algorithms. Journal of Machine Learning Research. 2018;19:1-35.

10. Chen M, Li X, Zhou J. Scalable Text Classification: A Benchmark. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL): 2020. p. 4567-4579.

11. Wang T, Jiang L, Chen R. Noise-Robust Text Classification with Naive Bayes. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): 2019. p. 1234-1243.

Submit manuscript JATS XML

To cite

Citations:

Confirmation

Регистрация