Skip to content
Features

Prevent Self-Plagiarism and Author Conflicts

This guide helps you avoid situations where documents from the same author are flagged as plagiarism against each other. This is particularly important when authors submit multiple assignments, revisions, or when you want to prevent matches within the same author’s work while maintaining detection across different authors.

When working with document databases, you may encounter scenarios where:

  • An author’s current document matches against their previous work from earlier submissions.
  • Multiple versions or drafts of the same document are flagged against each other.
  • Legitimate self-referencing or building upon previous work is incorrectly identified as plagiarism.

This guide provides strategies to prevent these false positives while maintaining effective plagiarism detection.

🛡️ Prevention Strategy: Smart Scan ID Structure

Section titled “🛡️ Prevention Strategy: Smart Scan ID Structure”

The best strategy is to design a strategic scanId for each submission. A well-structured ID makes it easy to include or exclude specific groups of documents from a scan.

Basic Structure:

  • <AUTHOR_ID>-<DOCUMENT_ID> (e.g., author123-essay1, emp456-report2)

Extended Structure:

  • <ORGANIZATION_ID>-<AUTHOR_ID>-<DOCUMENT_ID> (e.g., acmeuni-author123-essay1, techcorp-emp456-proposal)

This structure enables you to:

  • Exclude by author: Use author123-* or emp456-*.
  • Include by organization: Use acmeuni-* or techcorp-*.
  • Focus on document types: Use *-final or *-report.

Use the properties.scanning.exclude.idPattern parameter to exclude specific patterns from your scan results. The * character acts as a wildcard.

{
"properties": {
"scanning": {
"exclude": {
"idPattern": "author123-*"
}
}
}
}

This example excludes all submissions with IDs starting with author123-.

Use the properties.scanning.include.idPattern parameter to only include specific patterns in your scan results. This is useful for limiting comparisons to specific groups, like an organization or a class.

{
"properties": {
"scanning": {
"include": {
"idPattern": "acmeuni-*"
}
}
}
}

This example will only compare the submitted document against other documents with IDs starting with acmeuni-.

Example 1: Exclude Same Author’s Previous Work

Section titled “Example 1: Exclude Same Author’s Previous Work”
{
"properties": {
"scanning": {
"copyleaksDb": { "includeMySubmissions": true, "includeOthersSubmissions": true },
"exclude": { "idPattern": "author123-*" }
}
}
}

Example 2: Compare Only Within Same Organization

Section titled “Example 2: Compare Only Within Same Organization”
{
"properties": {
"scanning": {
"repositories": [{ "id": "assignment_repository", "includeMySubmissions": true, "includeOthersSubmissions": true }],
"include": { "idPattern": "acmeuni-*" }
}
}
}
  • 📋 Plan your ID structure: Design scan ID patterns from the beginning.
  • 🎯 Be specific: Use precise patterns to avoid excluding too much or too little.
  • 📊 Test patterns: Verify your patterns work correctly with sample data.
  • 🔄 Document conventions: Maintain clear documentation of your ID structure for your team.
  • 📏 Keep it short: Remember the 36-character limit.

Should you require any assistance or have inquiries about implementing author conflict prevention, please contact Copyleaks Support or ask a question on StackOverflow with the copyleaks-api tag.