Content Integrity for Publishers
In the digital age, ensuring the originality of your content is more crucial than ever. With the vast amount of information available online, it is easy for content to be copied or plagiarized without proper attribution. This can lead to significant issues for publishers, including legal challenges, loss of credibility, and damage to brand reputation.
The Power of Internet-Wide Scanning
Section titled “The Power of Internet-Wide Scanning”The Copyleaks Plagiarism Checker API provides a powerful solution for detecting internet plagiarism, allowing you to compare your content against billions of online sources, including websites, articles, and academic journals.
When you enable internet scanning, you are tapping into a vast and ever-growing database of online content. This allows you to:
- Verify Originality: Ensure that your content is original before publishing.
- Protect Your IP: Discover if your content has been plagiarized and published elsewhere without your permission.
- Maintain SEO Rankings: Avoid penalties from search engines for duplicate content.
Text Moderation for Safe Content
Section titled “Text Moderation for Safe Content”The Copyleaks Text Moderation API is designed to detect harmful content, including hate speech, adult content, and other forms of inappropriate material. This is particularly useful for publishers who want to ensure that their content adheres to community guidelines and standards.
📚 Before You Begin
Section titled “📚 Before You Begin”Make sure you are familiar with Copyleaks scans by completing the Check for Plagiarism guide.
Verify Content Originality Against Online Sources
Section titled “Verify Content Originality Against Online Sources”Enabling Internet Scanning
Section titled “Enabling Internet Scanning”To scan your document against internet sources, set the properties.scanning.internet
parameter to true
. This enables scanning against all non-paywalled online sources, including a variety of academic journals.
For more information check out our documentation for URL scans, OCR scans, and File scans.
{ "properties": { "scanning": { "internet": true } }}
Receiving Results
Section titled “Receiving Results”Once your scan is completed, you’ll receive the results through the completed webhook event. This webhook is triggered when the scan process finishes successfully and contains the output information from the scan.
The internet plagiarism results will be located in the results.internet
array within the webhook payload. Each internet match includes:
id
- Unique identifier for the matchtitle
- Title of the matched contenturl
- Source URL where the match was foundmatchedWords
- Number of words that matchedmetadata
- Additional information about the source (author, organization, publish date, etc.)
Example payload structure
Section titled “Example payload structure”{ "status": 0, "scannedDocument": { "scanId": "your-scan-id", "totalWords": 1250, "credits": 1 }, "results": { "internet": [ { "id": "match-id", "title": "Source Title", "url": "https://example.com/source", "matchedWords": 45, "metadata": { "author": "Author Name", "organization": "Publisher", "publishDate": "2023-01-01" } } ] }}
Moderating Content for Safety
Section titled “Moderating Content for Safety”To ensure that your published content is safe and adheres to your community standards, you can use the Copyleaks Text Moderation API. This API allows you to scan text for harmful content across more than 10 categories, including hate speech, adult content, and other inappropriate material.
Submitting Content for Moderation
Section titled “Submitting Content for Moderation”To moderate a piece of content, send a POST request to the /v1/text-moderation/{scanId}/check
endpoint. In the request body, you will provide the text to be analyzed and specify which content moderation labels you want to check for.
For example, a publisher might want to check for toxicity, profanity, and hate speech:
{ "text": "Your text content to be moderated goes here.", "labels": [ { "id": "toxic-v1" }, { "id": "profanity-v1" }, { "id": "hate-speech-v1" } ]}
Understanding the Results
Section titled “Understanding the Results”The API will respond with a detailed analysis, pinpointing the exact segments of text that were flagged and for which categories. This allows you to build a workflow to automatically handle or review content that violates your policies.
For a complete list of supported categories, see the Content Moderation Labels documentation. To get started with your integration, follow the Moderate Text Content guide.
💬 Support
Section titled “💬 Support”Should you require any assistance or have inquiries, please contact Copyleaks Support or ask a question on StackOverflow with the copyleaks-api
tag. We appreciate your interest in Copyleaks and look forward to supporting your efforts to maintain originality and integrity.
🚀 Next Steps
Section titled “🚀 Next Steps”Schedule a Live Demo
Want to see how internet plagiarism detection works with your specific content? Our technical team can walk you through live examples of scanning against billions of online sources, including academic journals and websites.