A Vietnamese Multitask Language Understanding Benchmark Suite for Large Language Models
2025
About VMLU
VMLU is a human-centric benchmark suite specifically designed to assess the overall capabilities of foundation models, with a strong specialized for Vietnamese language. The benchmark comprises four distinct datasets; Vi-MQA, Vi-SQuAD, Vi-DROP, and Vi-Dialog — each targeting a different aspect of LLMs performance, including general knowledge, reading comprehension, logical reasoning, and conversational ability. By providing comprehensive and diverse evaluation tasks, VMLU helps enrich the Vietnamese NLP evaluation benchmarks, driving the development of more robust foundation models and encouraging further research in LLMs.
Dataset
Vi-MQA is a multiple-choice question answering benchmark designed to evaluate general knowledge and reasoning capabilities. It includes questions that span difficulty levels from basic understanding to advanced professional expertise.
The dataset comprises 58 distinct subjects, with the majority containing approximately 200 questions. These subjects are systematically categorized into four primary domains: STEM, Humanities, Social Sciences, and a broad category designated as 'Others.'
The dataset primarily originates from examinations administered by a diverse array of esteemed educational institutions, spanning elementary, middle, and high schools, as well as universities. Additionally, a portion of the dataset is sourced from high school graduation examinations, meticulously organized and overseen by the Ministry of Education and Training.
The difficulty level of the subjects is classified into four distinct tiers, contingent upon the depth of knowledge required, including Elementary School, Middle High School, High School, and the Professional level, which encompasses both undergraduate and graduate examination standards.

Vi-MQA
Stem
- 1. Elementary Mathematics
- 2. Elementary Science
- 3. Middle School Biology
- 4. Middle School Chemistry
- 5. Middle School Mathematics
- 6. Middle School Physics
- 7. High School Biology
- 8. High School Chemistry
- 9. High School Mathematics
- 10. High School Physics
- 11. Applied Informatics
- 12. Computer Architecture
- 13. Computer Network
- 14. Discrete Mathematics
- 15. Electrical Engineering
- 16. Introduction to Chemistry
- 17. Introduction to Physics
- 18. Introduction to Programming
- 19. Metrology Engineer
- 20. Operating System
- 21. Statistics and Probability
Social science
- 22. Middle School Civil Education
- 23. Middle School Geography
- 24. High School Civil Education
- 25. High School Geography
- 26. Business Administration
- 27. Ho Chi Minh Ideology
- 28. Macroeconomics
- 29. Microeconomics
- 30. Principles of Marxism and Leninism
- 31. Sociology
Humanity
- 32. Elementary History
- 33. Middle School History
- 34. Middle School Literature
- 35. High School History
- 36. High School Literature
- 37. Administrative Law
- 38. Business Law
- 39. Civil Law
- 40. Criminal Law
- 41. Economic Law
- 42. Education Law
- 43. History of World Civilization
- 44. Idealogical and Moral Cultivation
- 45. Introduction to Laws
- 46. Introduction to Vietnam Culture
- 47. Logic
- 48. Revolutionary Policy of the Vietnamese Commununist Part
- 49. Vietnamese Language and Literature
others
- 50. Accountant
- 51. Clinical Pharmacology
- 52. Environmental Engineering
- 53. Internal Basic Medicine
- 54. Preschool Pedagogy
- 55. Tax Accountant
- 56. Tax Civil Servant
- 57. Civil Servant
- 58. Driving License Certificate

Below are some sample questions for reference.
Elementary Mathematics (in STEM)
Tính chất nào sau đây không phải là tính chất của thủy tinh chất lượng cao:
- A. Rất trong
- B. Bền, khó vỡ
- C. Chịu được nóng, lạnh
- D. Dễ cháy
Answer: D
Middle School Geography (in Social Science)
Việc phát triển nông-lâm-thủy sản tạo cơ sở nguyên liệu cho ngành phát triển công nghiệp nào?
- A. Công nghiệp năng lượng
- B. Công nghiệp chế biến lương thực thực phẩm
- C. Công nghiệp hóa chất
- D. Công nghiệp sản xuất vật liệu xây dựng
Answer: B
High School History (in Humanity)
Sự kiện nào sau đây đã tạo ra một cơ chế giải quyết các vấn đề liên quan đến hòa bình và an ninh ở châu Âu?
- A. Định ước Henxinki (08/1975)
- B. Liên Xô và Mỹ ký Hiệp định hạn chế vũ khí tiến công chiến lược
- C. Mỹ và Liên Xô tuyên bố chấm dứt Chiến tranh lạnh
- D. Hiệp định về những cơ sở của quan hệ giữa Đông Đức và Tây Đức
Answer: A
Clinical Pharmacology (in Others)
Khái niệm DƯỢC LỰC HỌC:
- A. Động học của sự hấp thu, phân phối, chuyển hóa và thải trừ thuốc
- B. Nghiên cứu tác động của thuốc trên cơ thể sống
- C. Nghiên cứu về tác động của cơ thể đến thuốc
- D. Là môn khoa học nghiên cứu về thuốc
Answer: B
Download
The VMLU datasets can be downloaded as shown below
- Vi-MQA v1.5- Vietnamese Multiple-choice Question Answering
- Vi-SQuAD v1.0- Vietnamese Stanford Question Answering Dataset
- Vi-Drop v1.0- Vietnamese Discrete Reasoning Over Paragraphs
- Vi-Dialog v1.0- Vietnamese Dialogue Dataset
Github
The VMLU repository provides extensive information about the dataset, including details on the number of questions in each subject, instructional guidance, and sample code for its utilization. Furthermore, it presents a detailed exposition of benchmarking results for publicly available models, elucidating the techniques employed in prompting and the metrics used for evaluation. We also make our benchmarking code accessible to facilitate result replication, which interested parties can access via the following hyperlink.