Post Bachelor

Project
PREP0004176
Overview

This project focuses on using Large Language Models (LLMs) to provide annotations of evaluation data
(a.k.a., LLM as judge), and the design of an Inter-Annotator Agreement study to assess the reliability of
both human and LLM annotations. The candidate will explore assessing the indicators of a given AI-
related risk, determining how to identify them, and providing annotators with examples to annotate the
presence of various risks. The project aims to develop an annotation framework for AI risk assessment
and establish metrics for data quality in AI risk research, supporting broader work at NIST in assessing
and measuring the validity and reliability of AI-related risks in data annotation.

Reliability of Human and LLM Annotations for AI Risk Assessment

Qualifications
  • Background in Computer Science, Data Science, or related field.
  • Education level: Bachelor’s or Graduate Degree  
  • Strong interest in data annotation and AI risks 
  • Familiarity with scientific reading and technical writing
  • U.S. Citizen Preferred
Research Proposal

Key responsibilities will include but are not limited to:

  • Gain familiarity with existing literature on data annotation and LLM as judge 
  • Understand NIST’s role and ongoing efforts in assessing and measuring the validity and reliability
    of AI-related risks in data annotation
  • Contribute to developing an annotation framework for AI risk assessment
  • Collaborate effectively with cross-functional and interdisciplinary stakeholders to ensure
    successful project outcomes

    Deliverables

  • Contributions to a NIST report that supports ongoing NIST AI evaluation efforts focused on the
    design of an Inter-Annotator Agreement to assess the reliability of both human and LLM
    annotations.
NIST Sponsor
Razvan Amironesei
Group
Visualization and Usability Group
Schedule of Appointment
Full time
Start Date
Work Location
Remote
Salary / Hourly rate {Max}
$66,000.00
Total Hours per week
40
End Date