About the Company
We are a seed-stage AI company building the industry standard for evaluating and benchmarking large language models on real enterprise tasks.
About the Role
As a Research Scientist, you will develop new benchmarks, methodologies, and evaluation pipelines that shape how cutting-edge models are assessed, compared, and deployed in production environments. Your work will directly influence model selection and safety decisions across foundation model labs, high-growth AI product companies, and Fortune-scale enterprises.
Responsibilities
Benchmarking & Model Analysis
Design New Benchmarks from Scratch
Advance Automated Evaluation Methodologies
Cross-functional Collaboration
Qualifications
Required Skills
Preferred Skills
Pay range and compensation package
Equal Opportunity Statement
Visa sponsorship available. Relocation support. Health & dental coverage. Lunch + dinner provided, snacks & coffee. Unlimited PTO. Weekly happy hours with community guests. Team events (bowling, hiking, rock climbing, etc.). Swag program (hats, etc.).
Work Environment & Culture
In-person, San Francisco HQ (required). Core hours: 9–5, some teammates extend voluntarily. Most team members work 1 weekend day per week (flexible). High-ownership, low-ego, collaborative. Live demos Mondays, team lunch Thursdays, community Fridays. Early-stage pace, applied focus—not academic publishing.
Tech Environment
(while research-focused, exposure beneficial) Backend: Python / Django. Frontend: React + TypeScript. Infra: AWS. Evaluation frameworks + internal tooling.
Why This Role Is Unique
The company already collaborates with foundation model labs, high-growth AI vertical product companies, and Fortune 500 enterprises (not publicly facing). ChatGPT Vals AI $5M seed raised, runway of 2+ years at current burn. Only one research scientist is being hired—true founding impact. Opportunity to define industry standards for model trust, reliability, and certification. Positioned to become the rating agency for generative AI.
...looking for an L2 Client Support Engineer with MSP experience who enjoys working directly with clients and taking ownership of issues from first response through resolution. This is a hybrid work from home role based in Boston. You will split time between remote support...
...construction, agricultural, or other off-highway equipment domains. Knowledge of CAN, J1939, or other vehicle communication protocols. Exposure to model-based development tools (MATLAB/Simulink). Experience working in Agile or V-model development environments....
Job Description Area of Focus - Core Actions Backend Engineering -Code complex logic using advanced Activities, Job Schedulers, and Queue Processors. Pega CDH Configure Next-Best-Action strategies, Data Flows, and adaptive models. Architecture - Lead migration...
...About This Role You will be leveraging your security operations experience to analyze and respond to security notifications, events, and inquiries. You will be performing initial triage of potential security incidents through log and data analysis to determine whether...
...accidents and remain updated on emergency/safety procedures. Maintain records of Waterpark inventory and prepare them in accordance with CFI and Westgate accounting departments. Responsible for canceling or shutting down Waterpark when weather or other conditions pose a...