How We Handled the Surge in Spam Posts on Zenn Using LLM

2024/10/31に公開

Introduction

I'm Yoshikawa (dyoshikawa) from the Zenn team.

Around June 2024, due to a rapid increase in spam posts on Zenn, we built a system for automatic spam detection using LLM (generative AI).

Due to the nature of the purpose, we cannot disclose too many details, but I would like to introduce an overview of this initiative to share technical insights and make our operational team's efforts as open as possible to the community.

The Challenge

Around June 2024, spam posts increased rapidly on Zenn. With the increase in user violation reports, we, the Zenn operations team, became aware of the situation.

We wanted to avoid normalizing spam posts being seen by readers, and we felt that the burden on users who report violations each time was significant, so we decided to take countermeasures.

The Solution

In response to this situation, we determined that we needed to build a system to detect spam posts somewhat automatically. So we decided to utilize LLM.

While I have no background in machine learning or natural language processing, with LLM we can create a detection system with a certain level of quality by creating prompts in natural language. This is one of the revolutionary aspects of LLM 👍

However, since this initiative is not core value development for Zenn, we wanted to minimize the effort required. Also, we wanted to get as close to zero as possible cases where articles that should not be judged as spam are marked as spam and penalized. So we utilized the existing violation reporting function.

[mermaid diagram]

LLM patrols published content and files violation reports when content is judged to be spam. Then, from the admin screen, operation members check the violation reports and actual content to confirm whether it is truly spam (according to terms of service). By including human eyes, we can reduce the risk of false judgments by LLM.

In this way, we were able to incorporate spam post handling without significantly changing existing routine operations.

[Image caption: Violation reports filed by LLM (AI)]

LLM Selection

Since Zenn is primarily built on Google Cloud, we decided to use the Vertex AI platform. We believe that if various resource builds can be consolidated into a single cloud service, it makes IaC and IAM permission management easier.

While Gemini is Google's LLM, we selected Anthropic Claude this time.

The selection didn't have strong intent - our company has experience using Amazon Bedrock+Claude for generative AI projects outside of our Zenn team, so we thought choosing Claude here would allow us to potentially share operational and prompt knowledge within the company (so we might change it in the future).

Discussion