Skip to content
Use Cases

Content Integrity for Publishers

In the digital age, ensuring the originality of your content is more crucial than ever. With the vast amount of information available online, it is easy for content to be copied or plagiarized without proper attribution. This can lead to significant issues for publishers, including legal challenges, loss of credibility, and damage to brand reputation.

The Copyleaks Plagiarism Checker API provides a powerful solution for detecting internet plagiarism, allowing you to compare your content against billions of online sources, including websites, articles, and academic journals.

When you enable internet scanning, you are tapping into a vast and ever-growing database of online content. This allows you to:

  • Verify Originality: Ensure that your content is original before publishing.
  • Protect Your IP: Discover if your content has been plagiarized and published elsewhere without your permission.
  • Maintain SEO Rankings: Avoid penalties from search engines for duplicate content.

The Copyleaks Text Moderation API is designed to detect harmful content, including hate speech, adult content, and other forms of inappropriate material. This is particularly useful for publishers who want to ensure that their content adheres to community guidelines and standards.

Make sure you are familiar with Copyleaks scans by completing the Check for Plagiarism guide.

Verify Content Originality Against Online Sources

Section titled “Verify Content Originality Against Online Sources”

To scan your document against internet sources, set the properties.scanning.internet parameter to true. This enables scanning against all non-paywalled online sources, including a variety of academic journals.

For more information check out our documentation for URL scans, OCR scans, and File scans.

Enable Internet Scanning
{
"properties": {
"scanning": {
"internet": true
}
}
}

Once your scan is completed, you’ll receive the results through the completed webhook event. This webhook is triggered when the scan process finishes successfully and contains the output information from the scan.

The internet plagiarism results will be located in the results.internet array within the webhook payload. Each internet match includes:

  • id - Unique identifier for the match
  • title - Title of the matched content
  • url - Source URL where the match was found
  • matchedWords - Number of words that matched
  • metadata - Additional information about the source (author, organization, publish date, etc.)
{
"status": 0,
"scannedDocument": {
"scanId": "your-scan-id",
"totalWords": 1250,
"credits": 1
},
"results": {
"internet": [
{
"id": "match-id",
"title": "Source Title",
"url": "https://example.com/source",
"matchedWords": 45,
"metadata": {
"author": "Author Name",
"organization": "Publisher",
"publishDate": "2023-01-01"
}
}
]
}
}

To ensure that your published content is safe and adheres to your community standards, you can use the Copyleaks Text Moderation API. This API allows you to scan text for harmful content across more than 10 categories, including hate speech, adult content, and other inappropriate material.

To moderate a piece of content, send a POST request to the /v1/text-moderation/{scanId}/check endpoint. In the request body, you will provide the text to be analyzed and specify which content moderation labels you want to check for.

For example, a publisher might want to check for toxicity, profanity, and hate speech:

Example Moderation Request
{
"text": "Your text content to be moderated goes here.",
"labels": [
{ "id": "toxic-v1" },
{ "id": "profanity-v1" },
{ "id": "hate-speech-v1" }
]
}

The API will respond with a detailed analysis, pinpointing the exact segments of text that were flagged and for which categories. This allows you to build a workflow to automatically handle or review content that violates your policies.

For a complete list of supported categories, see the Content Moderation Labels documentation. To get started with your integration, follow the Moderate Text Content guide.

Should you require any assistance or have inquiries, please contact Copyleaks Support or ask a question on StackOverflow with the copyleaks-api tag. We appreciate your interest in Copyleaks and look forward to supporting your efforts to maintain originality and integrity.

Schedule a Live Demo

Want to see how internet plagiarism detection works with your specific content? Our technical team can walk you through live examples of scanning against billions of online sources, including academic journals and websites.