Research Intern / Contractor – AI Agent & LLM Evaluation
Company: DeepReach
Remote | Flexible Hours
About Us
DeepReach is building the foundational data and evaluation infrastructure for the next generation of AI. We work with top LLM companies and research labs worldwide, providing expert-driven data annotation, human feedback loops, and cutting-edge model evaluation systems.
We are looking for a Research Intern or Contractor to join our team and help us push the boundaries of agent and large language model (LLM) evaluation.
Responsibilities
-
Conduct research on evaluation methodologies for LLMs and AI agents.
-
Design, implement, and refine benchmarks to measure real-world performance of advanced models.
-
Collaborate with industry-leading researchers to develop state-of-the-art evaluation frameworks.
-
Participate in daily research syncs.
-
Contribute to academic publications and research papers.
Requirements
-
Prior experience building LLM evaluation benchmarks
- (Optional but highly preferred) First-author publication(s) at top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV).
-
Bachelor’s degree completed; currently pursuing or holding a Master’s or PhD in AI/ML-related field.
-
Strong communication and writing skills.
-
Ability to work independently and manage time flexibly across global teams.
What You’ll Gain
-
Direct collaboration with industry-leading researchers working on the frontier of AI evaluation.
-
Exposure to real-world challenges faced by top LLM companies.
-
Opportunity to transition into a full-time founding researcher role at a fast-growing startup.
-
Strong support for academic publishing (papers, workshops, benchmarks).