Designing human benchmark experiments for testing software agents

Robert D. Grant; David DeAngelis; Dan Luu; Dewayne E. Perry; Kathy Ryall

doi:10.1049/ic.2011.0015

Designing human benchmark experiments for testing software agents

Grant, Robert D., DeAngelis, David, Luu, Dan, Perry, Dewayne E., Ryall, Kathy

Source

15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011) > 124 - 128

Abstract

Background: Software agents are becoming increasingly common in the engineering of software systems. We explore the use of humans in creating benchmarks for the evaluation of software agents. In our case studies, we address the domain of instructable software agents (e-students) as proposed by the Bootstrapped Learning project [Oblinger, 2006]. Aim: Our aim is to define and refine requirements, problem solving strategies, and evaluation methodologies for e-students, paving the way for rigorous experiments comparing e-student performance with human benchmarks. Method: Little was known about what factors would be critical, so our empirical approach is exploratory case studies. In two studies covering three distinct groups, we use human subjects to develop an evaluation curriculum for e-students, collecting quantitative data through online quizzes and tests and qualitative data through observation. Results: Though we collect quantitative data, our most important results are qualitative. We uncover and address several intrinsic challenges in comparing software agents with humans, including the greater semantic understanding of humans, the eidetic memory of e-students, and the importance of various study parameters (including timing issues and lesson complexity) to human performance. Conclusions: Important future work will be controlled experiments based on the experience of these case studies. These will provide benchmark human performance results for specific problem domains for comparison to e-student results.

Identifiers

book e-ISBN :	978-1-84919-509-6
DOI	10.1049/ic.2011.0015

Authors

Grant, Robert D.

Empirical Software Engineering Lab, The University of Texas at Austin, Austin, TX 78712

DeAngelis, David

Empirical Software Engineering Lab, The University of Texas at Austin, Austin, TX 78712

Luu, Dan

Empirical Software Engineering Lab, The University of Texas at Austin, Austin, TX 78712

Perry, Dewayne E.

Empirical Software Engineering Lab, The University of Texas at Austin, Austin, TX 78712

see all

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Designing human benchmark experiments for testing software agents $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Grant, Robert D.

DeAngelis, David

Luu, Dan

Perry, Dewayne E.

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Designing human benchmark experiments for testing software agents