Are you, as an educator, troubled by the growing ease with which students might bypass their learning journey using AI tools to craft essays or solve homework? You are certainly not alone in this concern. With the increasing presence of advanced Natural Language Models (LLMs), particularly ChatGPT, a multitude of new AI-powered applications now exist that can churn out coherent content with minimal input.
There have been countless recorded cases of students exploiting AI models to complete their assignments, igniting widespread debate about the ethical boundaries of such practices. While not traditionally viewed as plagiarism, it certainly treads the line of academic dishonesty. OpenAI itself has addressed these concerns in a comprehensive document that delves deeper into these emerging challenges.
In this guide, we will walk through the process of building an AI text detector in Python that examines text and predicts the likelihood of its origin being machine-generated. Such a tool can become your ally in determining whether the content you’re reviewing was genuinely crafted by human effort or merely simulated.
Python Script for AI Text Detection
The objective here is to develop a streamlined Python script capable of:
- Accepting textual input.
- Returning a verdict, including a percentage indicating how likely the text was crafted by AI.
Below is a preview of what the final script might resemble:
# app.py content = “”” Your essay content goes here “”” ### Configuration config = {} detector = AITextDetector(config) response = detector.detect(content) print(response) ### Sample Output { “output”: “The classifier suspects this text might be AI-generated.”, “confidence (%)”: 95.13 } |
To meet our objective, we will:
- Create an instance of the AITextDetector class, passing a configuration object.
- Customize this configuration with parameters that link to different AI-detection APIs (such as ai-text-classifier and GLTR).
- Utilize the detect method from the AITextDetector class to analyze the content. The magic happens within this method, and the result is returned.
- The confidence score is a key metric that assists in evaluating whether the text under scrutiny is AI-generated or not.
Let’s get underway!
Project Setup for the AI Detector
To follow the examples in this guide, you’ll want to download and set up our pre-built environment, AI-Generated-Text-Detector, which includes:
- A stable version of Python (3.10).
- Pre-packaged dependencies for Windows, Mac, and Linux users:
- Requests – essential for API calls.
- Python-dotenv – to handle environment variables.
For access, create a free ActiveState Platform account using either your email or GitHub credentials. Once registered, you’ll be able to download the complete environment and unlock additional dependency management benefits.
For Windows users:
Run the following at your command prompt after downloading the installer:
state activate –default Pizza-Team/AI-Generated-Text-Detector |
For Mac and Linux users:
Run the following script to download and install the environment:
sh <(curl -q https://platform.activestate.com/dl/cli/_pdli01/install.sh) -c’state activate –default Pizza-Team/AI-Generated-Text-Detector’ |
Once your environment is ready, let’s proceed to create the Python script:
$ touch app.py |
Building an AI Text Detection Tool with OpenAI’s Classifier
To construct an AI text detector, we’ll harness the power of OpenAI’s Completions API. By analyzing the API’s output, we’ll compute the likelihood of the input text being AI-generated.
Configuring OpenAI API Access
Begin by generating an API key from OpenAI’s API page. Store it in a .env file:
.env OPENAI_API_KEY=<Your_API_Key> |
Now, load this API key in Python using the following snippet:
# app.py from dotenv import load_dotenv load_dotenv() OPENAI_API_KEY = os.environ.get(“OPENAI_API_KEY”) |
The AITextDetector Class Mechanics
At the heart of our project is the AITextDetector class. It is tasked with sending requests to OpenAI’s Completions API and assessing the likelihood of the text being AI-generated.
The constructor of the class initializes by setting up the request headers:
# app.py class AITextDetector: def __init__(self, token): self.header = { “Content-Type”: “application/json”, “Authorization”: f”Bearer {token}”, } |
Now, let’s dissect the detect method. This method prepares the request payload, sends the request, and processes the response:
def detect(self, text): data = { “prompt”: text + “.\n<|disc_score|>”, “max_tokens”: 1, “temperature”: 1, “top_p”: 1, “n”: 1, “logprobs”: 5, “stop”: “\n”, “stream”: False, “model”: “model-detect-v2”, } response = requests.post( “https://api.openai.com/v1/completions”, headers=self.header, json=data ) |
Once the response is received, we extract and interpret the probability of the text being AI-generated:
if response.status_code == 200: choices = response.json()[“choices”][0] key_prob = choices[“logprobs”][“top_logprobs”][0][“!”] or -10 prob = math.exp(key_prob) confidence = 100 * (1 – (prob or 0)) assessment = self.get_assessment(confidence) return { “Verdict”: f”The classifier judges this text to be {assessment}.”, “AI-Generated Probability”: confidence, } |
The get_assessment function provides a human-readable verdict based on the calculated probability, such as “likely” or “unlikely AI-generated.”
Detecting AI Text with GLTR
For those preferring to conduct detection locally, the Giant Language Model Test Room (GLTR) offers an alternative approach. It uses color-coding to highlight word probabilities, allowing for a visual analysis of whether the text was AI-generated.
To get started with GLTR, install the necessary dependencies (torch, transformers, and numpy) and preload models:
# preload.py from transformers import GPT2LMHeadModel, GPT2Tokenizer def run(model_name_or_path): GPT2Tokenizer.from_pretrained(model_name_or_path) GPT2LMHeadModel.from_pretrained(model_name_or_path) print(“Loaded GPT-2 model!”) |
Conclusion
In an age where AI-generated content is ubiquitous, having reliable tools to detect machine-generated text is becoming crucial across multiple sectors, from academia to marketing. While the methods detailed here offer powerful insights into text detection, they represent only a fraction of the possibilities available. As the world continues to grapple with the ethical implications of AI-generated content, tools like these will prove invaluable in maintaining the integrity of human expression.