Security Engineer, Anti-Abuse at Arena Intelligence, Inc.

Arena Intelligence, Inc. NEW

3h ago • 0 views • 0 applications

Full-time Hybrid

Bay Area

Competitive

Full-time

Security Engineer

Job Description

About Arena IntelligenceArena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use.Millions of people use Arena Intelligence each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress.We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus.About the RoleArena Intelligence is seeking a Security Engineer, Anti-Abuse to own platform misuse end-to-end. Arena's evaluations are only as trustworthy as the signal behind them — and that signal is under constant, creative attack. You will build the detection, enforcement, and investigation systems that keep Arena's leaderboards trustworthy, stop automated abuse across our services, and defend against the full spectrum of AI-era harms.This is a founding builder role. You will set the strategy, write the code, and build the platform that future abuse, integrity, and trust & safety hires grow on top of. You'll work shoulder-to-shoulder with product, infrastructure, model partners, policy, and leadership, and you'll be accountable for outcomes the whole company can see: is the leaderboard clean, are harmful uses caught, are our services safe to ship?You’llOwn the abuse vision for Arena: what gets detected, what gets enforced, how fast, and with what false-positive budgetDesign and operate detection for bots, sybils, coordinated inauthentic voting, and rating-system manipulation — the integrity of Arena's leaderboards is the productBuild enforcement primitives (rate limits, challenges, shadowbans, account actions, model-side refusals) that are reversible, auditable, and humaneDetect and mitigate inference abuse and cost exploitation at the platform layerBuild jailbreak and multi-provider misuse detection across the models Arena serves, and partner with model-provider trust & safety teams on signal-sharing and escalationScope and implement abuse monitoring for every new product launch — web search, web fetch, live site deployment, and whatever's next — as part of the launch checklist, not after the factPrototype and mature into production systems of detection, review, and enforcement for the highest-severity harms (CSAM/NCII, violent extremism, self-harm), including the legal reporting pipeline (e.g., NCMEC)Build internal investigator tooling so policy, on-call, and future T&S analysts can triage incidents without engineering bottleneckPartner with Security on shared surface — account takeover, credential stuffing, API-key abuse, and the identity/behavioral-signal platformPartner with policy, legal, and leadership on acceptable-use policy, enforcement escalations, and public-integrity narrativeYou’ll have6+ years of production software engineering experience, including building and operating systems under adversarial conditionsShipped experience in at least one of: trust & safety, anti-abuse, anti-fraud, anti-spam, integrity, or risk engineeringStrong SQL and data-analysis skills — this role is 30%+ pattern-finding and investigation, not just shipping codeAdversarial and investigative mindset — you can articulate a novel attack before designing the defense, and follow evidence when a novel harm surfacesHigh judgment on false-positive cost, user harm, and the reversibility of enforcement actionsProficiency in a modern backend language (Node.js, TypeScript, Python, or Go)Excellent communication — you'll build alignment with engineering, product, policy, and leadership routinelyBonus ExperienceExperience with LLM-specific adversarial inputs — jailbreaks, direct and indirect prompt injection, tool-use abuseExperience with agent safety, browser-automation abuse, or LLM-API abuseBackground in securing voting, rating, reputation, or marketplace platforms against coordinated manipulationML or ML-systems experience — feature engineering, online/offline evaluation, label acquisition, drift handlingExperience building investigator or analyst tooling used by non-engineersContributions to open-source trust & safety, abuse-detection, or adversarial-ML workBackground in gaming integrity, ad-fraud, or financial-crime engineering at scaleWhat we offerWe offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.The opportunity to work on cutting-edge AI with a small, mission-driven teamA culture that values transparency, trust, and community impactCome help build the space where anyone can explore and help shape the future of AI.Arena Intelligence provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.