Automated exam generation for General Computer Science lectures
Introduction
This is a dogfood use case from our academic practice, and is (partially) used in day-to-day operation: The second author teaches a first-year, two-semester Introduction to Computer Science a Jacobs university and – over the last six years – has accumulated a collection of about 1000 homework, quiz, and exam problems encoded into the XML-based OMDoc format. For the courses we need to prepare regular four exams, four “grand tutorial test exams” and two make-up exams per year. While the homework problems are typically new (and add to the corpus of well-tested problems), we assemble the exams from it semi- automatically with a VDoc Spec that generates random exam sheet based on the input list of topics we intend to cover throughout an exam. There are two kind of proper exams: midterms and finals. Midterms usually are meant to be for 1 hour, although sometimes it takes 75 minutes or so, whereas finals are designed for 2 hours. Thus we also provide an exam duration as an input parameter for our exam VDoc. Changing only this parameter together with the topic list allows us to get different exam sheets that do not exceed the certain time and cover desired topics. All necessary information is encoded into the problems as RDFa metadata annotations. Our XQuery for a VDoc Spec takes care about adjusting the timing closely to the provided limit. When VDoc content is generated, it can be rendered by utilizing XSLTs and developed in our JOMDoc library for rendering MathML. Everything is embedded into TNTBase, and once an exam VDoc is installed it is a matter of one click in the TNTBase web interface to get the unique human-readable exam sheet for the students. VDoc Editing facilities also find an application in our use case. Before giving generated exam to students we test it on our teaching assistants that may express some of the comments or suggestions how to improve particular problems. Then we edit the contents of an exam VDoc and commit it back – all modifications are automatically patched into original XML sources: easily and painlessly. If one does not like a particular problem to be included into exam, we can adjust a VDoc parameter that excludes them from the exam. The biggest advantage of current exam generation approach is that we write a VDoc Spec once and reuse it next semester by simply adjusting few parameters to a VDoc. When one is satisfied with the exam presented, it can be materialized and saved in a repository as a normal file that can be referenced in future to keep track how students performed on different assignment and figure out what their weaknesses are. Although a presented approach already meets our requirements, there are some issues that could be improved. For instance, we might want to take total exam difficulty into account to generate exams that do not exceed a certain duration and that have difficulty in a certain range (again difficulty information is embedded into problems XML). That will lead to a more complicated queries for a VDoc Spec, but is still feasible. Apart from generating exams, this use cases might be used by students that are willing to sharpen their knowledge: they could generate practice sheets starting from easy tasks and end up with the complex ones. Some parameters in e.g. cookies may keep track of what exercises already appeared in the practice sheet, and a VDoc will never show them again. It could easily be done by providing dynamic parameters to a VDoc retrieval method as was described in the previous section.
Realization
XQuery that is responsible for generating exam with given topics and close to the time limit provided (but not exceeds):
declare default element namespace "http://omdoc.org/ns";
declare variable $threshold := 5;
declare function local:exercises($topic as xs:string) as element()* {
(collection('xml_content.dbxml')[starts-with(dbxml:metadata('tnt:path'), concat('/problems/', $topic, '/en/'))]//exercise[./metadata/meta/@property = 'prob:points'
and ./metadata/meta/@property = 'prob:solvedinminutes'
and ./@xml:id])
};
declare function tnt:minutes($ex as element()*) as xs:double* {
for $i in $ex return
number($i/metadata/meta[@property = 'prob:solvedinminutes']/text())
};
declare function tnt:random-element($els as element()*) as element()? {
if(empty($els)) then () else
$els[tnt:random(count($els))]
};
declare function tnt:ex-duration($exs as element()*) as xs:double {
sum(tnt:minutes($exs))
};
declare function tnt:topic-exercises($topics as xs:string*) as element()* {
for $i in $topics return
tnt:random-element(local:exercises($i))
};
declare function tnt:substitute-item($els as item()*, $to as item(), $pos as xs:integer) as item()* {
remove(insert-before($els, $pos + 1, $to), $pos)
};
declare function tnt:change-problem($position as xs:integer, $topic as xs:string, $exs as element()*, $duration_diff as xs:double, $increase as xs:boolean) as element()* {
let $cur_ex := $exs[$position]
let $ex_mins := tnt:ex-duration($cur_ex)
let $new_ex := if($increase) then
tnt:random-element(local:exercises($topic)[number(./metadata/meta[@property = 'prob:solvedinminutes']/text()) > $ex_mins and
number(./metadata/meta[@property = 'prob:solvedinminutes']/text()) < $ex_mins + $duration_diff])
else
tnt:random-element(local:exercises($topic)[number(./metadata/meta[@property = 'prob:solvedinminutes']/text()) < $ex_mins])
return
if(empty($new_ex)) then $exs
else
tnt:substitute-item($exs, $new_ex, $position)
};
declare function tnt:improve-timing($topics as xs:string*, $exs as element()*, $duration as xs:double) {
tnt:improve-timing-helper(1, $topics, $exs, $duration)
};
declare function tnt:improve-timing-helper($pos as xs:integer, $topics as xs:string*, $exs as element()*, $duration as xs:double) {
if($pos > count($topics)) then $exs
else
let $cd := tnt:ex-duration($exs) return
if ($duration > $cd ) then
if($duration - $cd < $threshold) then $exs
else
let $updated_exs := tnt:change-problem($pos, $topics[$pos], $exs, $duration - $cd, xs:boolean("true"))
return
tnt:improve-timing-helper($pos + 1, $topics, $updated_exs, $duration)
else
let $updated_exs := tnt:change-problem($pos, $topics[$pos], $exs, $cd - $duration, xs:boolean("false"))
return
tnt:improve-timing-helper($pos + 1, $topics, $updated_exs, $duration)
};
declare function tnt:sequence-node-equal ($seq1 as node()*, $seq2 as node()* ) as xs:boolean {
every $i in 1 to max((count($seq1),count($seq2)))
satisfies $seq1[$i] is $seq2[$i]
} ;
declare function tnt:exam-improver($topics as xs:string*, $exs as element()*, $duration as xs:double) {
tnt:exam-improver-helper($topics, $exs, $duration, 1)
};
declare function tnt:exam-improver-helper($topics as xs:string*, $exs as element()*, $duration as xs:double, $deep as xs:integer) {
tnt:improve-timing($topics, $exs, $duration)
(: Although our recursion converges to a best exam in a couple of steps, sometimes DB XML stucks and consumes way too much memory.
Therefore we limit recursion artificially, and furthermore recursion with depth 2 gives almost always best results :)
if($deep > 2) then $exs else
let $upd_exs := tnt:improve-timing($topics, $exs, $duration)
return
if (tnt:sequence-node-equal($exs, $upd_exs)) then $exs else
tnt:exam-improver-helper($topics, $upd_exs, $duration, $deep + 1)
};
declare function tnt:get-exam($topics as xs:string+, $duration as xs:double) {
let $exs := tnt:topic-exercises($topics)
return if(count($exs) = count($topics)) then
tnt:exam-improver($topics, $exs, $duration)
else
error(
QName('http://tntbase.mathweb.org/ns',
'ExamError'),
'Cannot generate exam with given topics, make sure you provided right topics and those topics contain correct exercises.')
};
tnt:get-exam($topics, number($duration))
Note that variables $topics and $duration are not defined in this query. They are defined as parameters.
The VDoc spec is simpler:
<tnt:virtualdocument xmlns:tnt="http://tntbase.mathweb.org/ns">
<tnt:skeleton xml:id="exercises">
<omdoc xmlns="http://omdoc.org/ns" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Final exam for Genetal Computer Science</dc:title>
<dc:creator>Michael Kohlhase</dc:creator>
<tnt:xqinclude query="current-date()">
<tnt:return><dc:date><tnt:result/></dc:date></tnt:return>
</tnt:xqinclude>
<tnt:xqinclude>
<tnt:query name="xq.exercises"/>
<tnt:return><tnt:result/></tnt:return>
</tnt:xqinclude>
</omdoc>
</tnt:skeleton>
<tnt:query name="xq.exercises" href="tntbase:/vds/exam-query.xq"/>
<tnt:params>
<tnt:param name="topics">
<tnt:value>dmath</tnt:value>
<tnt:value>boolexp</tnt:value>
<tnt:value>graphs-trees</tnt:value>
<tnt:value>search</tnt:value>
<tnt:value>codes</tnt:value>
</tnt:param>
<tnt:param name="duration">
<tnt:value>60</tnt:value>
</tnt:param>
</tnt:params>
</tnt:virtualdocument>
Things to mention there:
- together with complex exam generation query we also provide inlined query that returns and embeds date into exam
- we provide default parameters (topics and duration) inside VDoc Spec since those parameters are usual for a typical exam. However, these are "overridden" in the VDoc itself.
Links
Please note that due to huge amount of exercises and long imports chain that contain different mathematical notations, exam generation may take some time (3-6 secs depending on serer load)
- Human-oriented presentation of an exam: http://alpha.tntbase.mathweb.org:8080/tntbase/eswc/XHTMLBasicBrowser/problems/dmath/en/exam.vd Note: due to lack of notations definitions some of the mathematical symbols might not be rendered correctly
- VDoc content of XML exam representation: http://alpha.tntbase.mathweb.org:8080/tntbase/eswc/BasicBrowser/problems/dmath/en/exam.vd
- VDoc spec for exam VDoc: http://alpha.tntbase.mathweb.org:8080/tntbase/eswc/BasicBrowser/vds/exam-spec.xml
- Main XQuery for flexible exam generation: http://alpha.tntbase.mathweb.org/repos/eswc/vds/exam-query.xq
