RETRIEVAL-AUGMENTED GENERATION FOR FORENSIC LEGAL ANALYSIS: INTEGRATION OF UKRAINIAN CRIMINAL CODE WITH MOBILE DEVICE EVIDENCE
DOI:
https://doi.org/10.28925/2663-4023.2026.32.1196Keywords:
Retrieval-Augmented Generation; Legal NLP; Ukrainian Criminal Code; Digital Forensics; Multilingual Embeddings; SLM; LLM, Mobile Forensics.Abstract
Digital forensic investigations in Ukraine require analysts to classify mobile device evidence according to the Criminal Code, a process that is time-consuming and requires deep legal expertise. This paper presents the first retrieval-augmented generation (RAG) system for Ukrainian Criminal Code analysis, focusing on Section I (Crimes Against National Security). We construct a database of 9 articles covering treason, espionage, collaboration, and sabotage offenses, and evaluate the system on 60 synthetic forensic scenarios with deterministically-derived ground truth. Our experiments compare four chunking strategies, three multilingual embedding models, and four large language models (both API-based and locally-deployed). The best retrieval configuration achieves MRR of 0.588 using multilingual-e5-large embeddings with part-level chunking. For end-to-end classification, RAG with GPT-4o-mini achieves 54.2% article identification accuracy, outperforming a few-shot prompting baseline (29.2%, p=0.03) but showing no statistically significant improvement over direct LLM prompting (52.1%, p=0.89). We argue that RAG’s primary advantage for forensic applications lies not in classification accuracy but in grounding, transparency, and governance: retrieved legal provisions are traceable and verifiable, the knowledge base can be updated without retraining, and the system supports fully local deployment where evidence cannot leave the organization. Local LLMs achieve 77% of API performance (41.7% accuracy), confirming that on-premise deployment is feasible at reduced accuracy.
Downloads
References
Verkhovna Rada of Ukraine. (2001). Criminal Code of Ukraine. https://zakon.rada.gov.ua/laws/show/2341-14
Mykhaylova, O., Fedynyshyn, T., Sokolov, V., & Kyrychok, R. (2024). Person-of-interest detection on mobile forensics data: AI-driven roadmap. CEUR Workshop Proceedings, 3654, 239–252. https://ceur-ws.org/Vol-3654/paper20.pdf
Brown, T., et. al., (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Lewis, P., et. al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., … Yih, W. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (pp. 6769–6781). https://doi.org/10.18653/v1/2020.emnlp-main.550
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (pp. 3982–3992). https://doi.org/10.18653/v1/D19-1410
Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., & Sun, M. (2020). How does NLP benefit legal systems: A summary of legal artificial intelligence. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5218–5230). https://doi.org/10.18653/v1/2020.acl-main.466
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 2898–2904). https://doi.org/10.18653/v1/2020.findings-emnlp.261
Reuter, M., Lingenberg, T., Liepina, R., Lagioia, F., Lippi, M., Sartor, G., Passerini, A., & Sayin, B. (2025). Towards reliable retrieval in RAG systems for large legal datasets. In Proceedings of the Natural Legal Language Processing Workshop 2025. https://doi.org/10.18653/v1/2025.nllp-1.3
Ho, J., Colby, A., & Fisher, W. (2025). Incorporating legal structure in retrieval-augmented generation: A case study on copyright fair use. arXiv. https://doi.org/10.48550/arXiv.2505.02164
Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 878–891). https://doi.org/10.18653/v1/2022.acl-long.62
Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., & Wei, F. (2024). Multilingual E5 text embeddings: A technical report. arXiv. https://doi.org/10.48550/arXiv.2402.05672
Lillis, D., Becker, B., O’Sullivan, T., & Scanlon, M. (2016). Current challenges and future research areas for digital forensic investigation. In Proceedings of the Annual ADFSL Conference on Digital Forensics, Security and Law (pp. 9–20). https://commons.erau.edu/adfsl/2016/tuesday/5/
Dunsin, D., Ghanem, M. C., Ouazzane, K., & Vassilev, V. (2023). Artificial intelligence and machine learning in digital forensics and incident response. Forensic Science International: Digital Investigation, 48, 301675. https://doi.org/10.1016/j.fsidi.2023.301675
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR 2022). https://openreview.net/forum?id=nZeVKeeFYf9
Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535–547. https://doi.org/10.1109/TBDATA.2019.2921572
Voorhees, E. M. (1999). The TREC-8 question answering track report. In Proceedings of the Eighth Text REtrieval Conference (TREC-8) (pp. 77–82). https://trec.nist.gov/pubs/trec8/papers/qa_report.pdf
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Тарас Фединишин, Ольга Партика

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.