Prevent Self-Plagiarism and Author Conflicts
This guide helps you avoid situations where documents from the same author are flagged as plagiarism against each other. This is particularly important when authors submit multiple assignments, revisions, or when you want to prevent matches within the same author’s work while maintaining detection across different authors.
🔄 Understanding the Problem
Section titled “🔄 Understanding the Problem”When working with document databases, you may encounter scenarios where:
- An author’s current document matches against their previous work from earlier submissions.
- Multiple versions or drafts of the same document are flagged against each other.
- Legitimate self-referencing or building upon previous work is incorrectly identified as plagiarism.
This guide provides strategies to prevent these false positives while maintaining effective plagiarism detection.
🛡️ Prevention Strategy: Smart Scan ID Structure
Section titled “🛡️ Prevention Strategy: Smart Scan ID Structure”The best strategy is to design a strategic scanId
for each submission. A well-structured ID makes it easy to include or exclude specific groups of documents from a scan.
Example ID Structures
Section titled “Example ID Structures”Basic Structure:
<AUTHOR_ID>-<DOCUMENT_ID>
(e.g.,author123-essay1
,emp456-report2
)
Extended Structure:
<ORGANIZATION_ID>-<AUTHOR_ID>-<DOCUMENT_ID>
(e.g.,acmeuni-author123-essay1
,techcorp-emp456-proposal
)
This structure enables you to:
- Exclude by author: Use
author123-*
oremp456-*
. - Include by organization: Use
acmeuni-*
ortechcorp-*
. - Focus on document types: Use
*-final
or*-report
.
🚫 Using Exclude Patterns
Section titled “🚫 Using Exclude Patterns”Use the properties.scanning.exclude.idPattern
parameter to exclude specific patterns from your scan results. The *
character acts as a wildcard.
{ "properties": { "scanning": { "exclude": { "idPattern": "author123-*" } } }}
This example excludes all submissions with IDs starting with author123-
.
✅ Using Include Patterns
Section titled “✅ Using Include Patterns”Use the properties.scanning.include.idPattern
parameter to only include specific patterns in your scan results. This is useful for limiting comparisons to specific groups, like an organization or a class.
{ "properties": { "scanning": { "include": { "idPattern": "acmeuni-*" } } }}
This example will only compare the submitted document against other documents with IDs starting with acmeuni-
.
📝 Implementation Examples
Section titled “📝 Implementation Examples”Example 1: Exclude Same Author’s Previous Work
Section titled “Example 1: Exclude Same Author’s Previous Work”{ "properties": { "scanning": { "copyleaksDb": { "includeMySubmissions": true, "includeOthersSubmissions": true }, "exclude": { "idPattern": "author123-*" } } }}
Example 2: Compare Only Within Same Organization
Section titled “Example 2: Compare Only Within Same Organization”{ "properties": { "scanning": { "repositories": [{ "id": "assignment_repository", "includeMySubmissions": true, "includeOthersSubmissions": true }], "include": { "idPattern": "acmeuni-*" } } }}
💡 Best Practices
Section titled “💡 Best Practices”- 📋 Plan your ID structure: Design scan ID patterns from the beginning.
- 🎯 Be specific: Use precise patterns to avoid excluding too much or too little.
- 📊 Test patterns: Verify your patterns work correctly with sample data.
- 🔄 Document conventions: Maintain clear documentation of your ID structure for your team.
- 📏 Keep it short: Remember the 36-character limit.
🚀 Next Steps
Section titled “🚀 Next Steps”Support
Section titled “Support”Should you require any assistance or have inquiries about implementing author conflict prevention, please contact Copyleaks Support or ask a question on StackOverflow with the copyleaks-api
tag.