Contractor – AI Agent & LLM Evaluation

Jo79阅读0评论7 个月前

Research Intern / Contractor – AI Agent & LLM Evaluation

Company: DeepReach

Remote | Flexible Hours

About Us

DeepReach is building the foundational data and evaluation infrastructure for the next generation of AI. We work with top LLM companies and research labs worldwide, providing expert-driven data annotation, human feedback loops, and cutting-edge model evaluation systems.

We are looking for a Research Intern or Contractor to join our team and help us push the boundaries of agent and large language model (LLM) evaluation.

Responsibilities

Conduct research on evaluation methodologies for LLMs and AI agents.
Design, implement, and refine benchmarks to measure real-world performance of advanced models.
Collaborate with industry-leading researchers to develop state-of-the-art evaluation frameworks.
Participate in daily research syncs.
Contribute to academic publications and research papers.

Requirements

Prior experience building LLM evaluation benchmarks
- (Optional but highly preferred) First-author publication(s) at top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV).
Bachelor’s degree completed; currently pursuing or holding a Master’s or PhD in AI/ML-related field.
Strong communication and writing skills.
Ability to work independently and manage time flexibly across global teams.

What You’ll Gain

Direct collaboration with industry-leading researchers working on the frontier of AI evaluation.
Exposure to real-world challenges faced by top LLM companies.
Opportunity to transition into a full-time founding researcher role at a fast-growing startup.
Strong support for academic publishing (papers, workshops, benchmarks).