“Assess-med-BERT” – An Algorithm for the Automated Generation of Distractors in German-Language Multiple-Choice Questions for Online Self-Assessments to Efficiently Support Learning

Using the example of the medical context, the project aims to improve learning by enabling educators to generate practice questions more efficiently.

Duration: September 2022 – October 2024
Status: Completed
Educational Level: Tertiary Level
Topic: Artificial Intelligence AI, Digital Tools
Keywords: Deep Learning

Initial Situation

Across disciplines, there is a clear lack of practice tasks (i.e. self-assessments) for learners, as creating them is time-consuming. Since educators already invest significant time in preparing multiple-choice questions (MCQs) for exams, they often lack the capacity to offer additional practice materials. Existing research on automated generation of practice tasks is largely based on English-language datasets, which cannot be directly applied to the German language. This project seeks to address that gap.

Objectives

The goal of the project is to improve learning in the context of the German-speaking medical field, by enabling educators to generate (or have generated) practice questions more efficiently, thereby helping learners—both students and continuing education participants—acquire new knowledge more effectively through the use of these generated materials.

Method

The model development involves two key steps: the first is the construction of a medical corpus—a collection of algorithms and data. In the second step, the model is trained using this corpus to automatically generate distractors, i.e., incorrect answer options for MCQs. The question stem and correct answer serve as input variables. The resulting distractors are intended for use in self-assessments. A further part of the project involves implementing the self-assessments, including content validation by subject matter experts and practical testing with students.

Results

This research project developed a model based on artificial intelligence and natural language processing (NLP) that allows educational stakeholders to offer learning opportunities (self-assessments) with significantly less effort than before. The results showed that the quality of the generated distractors differed only marginally from those created manually. No significant differences were found in the selection of generated distractors compared to existing ones.

Implemented Translation

This collaborative project involved four institutions (Institute for Medical Education and Institute of Psychology at the University of Bern, University of Fribourg, and Bern University of Applied Sciences – Health) and focused on MCQs in the medical domain, where the project team had particular expertise and access to existing datasets. The planned publication of the results, including detailed methodology and key insights, is intended to enable broader adoption of these improvements to self-assessments among partners and in other German-speaking educational settings.

Before broader implementation, however, the model must first be optimised to reduce its hardware resource requirements.