Graduate student or postdoctoral researcher

Project

PREP0003676

The focus of this project is to conduct a pilot study and develop a demonstration of reproducible AI evaluations. The student will explore the different ways AI evaluations are conducted and the challenges of reproducibility in these contexts. The project aims to produce a study report and a working demonstration of reproducible AI evaluations, supporting broader work at NIST in the measurement of AI.

Developing a Demonstration of Reproducible AI Evaluations

Qualifications

Background in Computer Science, Software Engineering, Systems Engineering, Data Science, or related field.
Education level: graduate student or higher.
Strong interest in software development, AI measurement, reproducibility
Experience with software development in Python, version control systems, AI models, and the shell, as well as scientific reading and technical writing.
Experience conducting AI evaluations and designing reproducible software experiments preferred.

Research Proposal

Key Responsibilities

Conduct literature survey on the state-of-the-art of reproducible evaluations of software systems
Gain familiarity with existing AI evaluation frameworks
Contribute to a plan detailing a demonstration of reproducible AI evaluations
Design, implement, test, and document software and systems used for demonstration
Document overall demonstration, including current limitations and challenges

Deliverables

Survey briefly describing key research on software experiment reproducibility
Summary report of existing AI evaluation frameworks
Working demonstration of reproducible AI evaluations
Report describing the demonstration and discussing the challenges in AI evaluation reproducibility.

Group

Information Access - HQ

Salary / Hourly Rate {Min}

$50,000.00

Schedule of Appointment

Full time

Start Date

2 September, 2025

Work Location

Onsite NIST

Salary / Hourly rate {Max}

$120,000.00

Total Hours per week

End Date

1 September, 2026

Apply Now