Features

Prevent Self-Plagiarism and Author Conflicts

This guide helps you avoid situations where documents from the same author are flagged as plagiarism against each other. This is particularly important when authors submit multiple assignments, revisions, or when you want to prevent matches within the same author’s work while maintaining detection across different authors.

🔄 Understanding the Problem

When working with document databases, you may encounter scenarios where:

An author’s current document matches against their previous work from earlier submissions.
Multiple versions or drafts of the same document are flagged against each other.
Legitimate self-referencing or building upon previous work is incorrectly identified as plagiarism.

This guide provides strategies to prevent these false positives while maintaining effective plagiarism detection.

🛡️ Prevention Strategy: Smart Scan ID Structure

The best strategy is to design a strategic scanId for each submission. A well-structured ID makes it easy to include or exclude specific groups of documents from a scan.

Example ID Structures

Basic Structure:

<AUTHOR_ID>-<DOCUMENT_ID> (e.g., author123-essay1, emp456-report2)

Extended Structure:

<ORGANIZATION_ID>-<AUTHOR_ID>-<DOCUMENT_ID> (e.g., acmeuni-author123-essay1, techcorp-emp456-proposal)

This structure enables you to:

Exclude by author: Use author123-* or emp456-*.
Include by organization: Use acmeuni-* or techcorp-*.
Focus on document types: Use *-final or *-report.

🚫 Using Exclude Patterns

Use the properties.scanning.exclude.idPattern parameter to exclude specific patterns from your scan results. The * character acts as a wildcard.

Exclude by ID Pattern

{
  "properties": {
    "scanning": {
      "exclude": {
        "idPattern": "author123-*"
      }
    }
  }
}

This example excludes all submissions with IDs starting with author123-.

Exclude by Domain

{
  "properties": {
    "scanning": {
      "exclude": {
        "backlinksDomains": ["wikipedia.org", "example.edu"]
      }
    }
  }
}

This example excludes any internet results that contain backlinks to Wikipedia or example.edu.

Exclude by Text Phrases

{
  "properties": {
    "scanning": {
      "exclude": {
        "text": ["common reference phrase", "standard disclaimer"]
      }
    }
  }
}

This example excludes any results that contain the specified text phrases, useful for filtering out common boilerplate text or standard disclaimers.

✅ Using Include Patterns

Use the properties.scanning.include.idPattern parameter to only include specific patterns in your scan results. This is useful for limiting comparisons to specific groups, like an organization or a class.

{
  "properties": {
    "scanning": {
      "include": {
        "idPattern": "acmeuni-*"
      }
    }
  }
}

This example will only compare the submitted document against other documents with IDs starting with acmeuni-.

📝 Implementation Examples

Example 1: Exclude Same Author’s Previous Work

{
  "properties": {
    "scanning": {
      "copyleaksDb": { "includeMySubmissions": true, "includeOthersSubmissions": true },
      "exclude": { "idPattern": "author123-*" }
    }
  }
}

Example 2: Compare Only Within Same Organization

{
  "properties": {
    "scanning": {
      "repositories": [{ "id": "assignment_repository", "includeMySubmissions": true, "includeOthersSubmissions": true }],
      "include": { "idPattern": "acmeuni-*" }
    }
  }
}

💡 Best Practices

📋 Plan your ID structure: Design scan ID patterns from the beginning.
🎯 Be specific: Use precise patterns to avoid excluding too much or too little.
📊 Test patterns: Verify your patterns work correctly with sample data.
🔄 Document conventions: Maintain clear documentation of your ID structure for your team.
📏 Keep it short: Remember the 36-character limit.

🚀 Next Steps

Submit File Documentation Learn how to submit files with your custom scan IDs.

Compare Multiple Documents Learn about cross-document comparison strategies.

Support

Should you require any assistance or have inquiries about implementing author conflict prevention, please contact Copyleaks Support or ask a question on Stack Overflow with the copyleaks-api tag.