Data Hubs
Copyleaks’ Data Hubs provide a powerful way to compare multiple documents against each other, allowing you to detect similarities and prevent plagiarism within a large batch of content.
This is particularly useful for educators who want to check if students have shared work or submitted identical content across a batch of assignments, or companies with large amounts of documents in order to find duplication.
How It Works
Section titled “How It Works”Copyleaks provides two types of databases for storing and comparing documents:
- Shared Data Hub: Global database that contains millions of documents from institutions worldwide.
- Private Cloud Hub: Private database that is exclusive to your organization, ensuring that your documents remain confidential and secure.
You can contribute documents to those databases and compare your documents against them.
Understanding Your Database Options
Section titled “Understanding Your Database Options”You have two database options for storing and comparing your documents:
Shared Data Hub (Free)
Section titled “Shared Data Hub (Free)”- Contains millions of documents from institutions worldwide
- When you index a document, it becomes available for everyone to compare against
- Contributes to the global academic integrity community
- Your documents will be matched against submissions from other institutions
Private Cloud Hub (Paid)
Section titled “Private Cloud Hub (Paid)”- Creates a completely private database for your organization only
- Your documents stay within your private environment
- Perfect for sensitive or confidential documents
- Only you and your organization can access and compare against these documents
- Built for large organizations looking to securely store and manage documents
- Enables team collaboration with controlled access and user management
How Cross-Comparison Works
Section titled “How Cross-Comparison Works”The process involves two main steps:
- 📥 Index your documents: Upload documents to your chosen database using
IndexOnly
mode. - 🚀 Start the comparison: Run a scan that compares all indexed documents against each other and your selected databases.
This two-step approach ensures all documents are properly stored before the comparison begins.
🚀 Get Started
Section titled “🚀 Get Started”-
Before you begin
Section titled “Before you begin”Before you start, ensure you have the following:
- An active Copyleaks account. If you don’t have one, sign up for free.
- You can find your API key on the API Dashboard.
-
Installation
Section titled “Installation”Choose your preferred method for making API calls.
You can interact with the API using any standard HTTP client.
For a quicker setup, we provide a Postman collection. See our Postman guide for instructions.
Terminal window sudo apt-get install curlDownload it from curl.se
Terminal window brew install curlTerminal window pip install copyleaksTerminal window npm install plagiarism-checker -
To perform a scan, we first need to generate an access token. For that, we will use the login endpoint. The API key can be found on the Copyleaks API Dashboard.
Upon successful authentication, you will receive a token that must be attached to subsequent API calls via the Authorization: Bearer
<TOKEN>
header. This token remains valid for 48 hours.POST https://id.copyleaks.com/v3/account/login/apiHeadersContent-Type: application/jsonBody{"key": "00000000-0000-0000-0000-000000000000"}Terminal window export COPYLEAKS_API_KEY="your-api-key-here"curl --request POST \--url https://id.copyleaks.com/v3/account/login/api \--header 'Accept: application/json' \--header 'Content-Type: application/json' \--data "{\"email\": \"${COPYLEAKS_EMAIL}\",\"key\": \"${COPYLEAKS_API_KEY}\"}"from copyleaks.copyleaks import CopyleaksAPI_KEY = "your-api-key-here"# Login to Copyleaksauth_token = Copyleaks.login(EMAIL_ADDRESS, API_KEY)print("Logged successfully!\nToken:", auth_token)const { Copyleaks } = require("plagiarism-checker");const API_KEY = "your-api-key-here";const copyleaks = new Copyleaks();// Login functionfunction loginToCopyleaks() {return copyleaks.loginAsync(EMAIL_ADDRESS, API_KEY).then((loginResult) => {console.log("Login successful!");console.log("Access Token:", loginResult.access_token);return loginResult;},(err) => {console.error('Login failed:', err);throw err;});}loginToCopyleaks();import com.copyleaks.sdk.api.Copyleaks;String API_KEY = "00000000-0000-0000-0000-000000000000";// Login to Copyleakstry {String authToken = Copyleaks.login(EMAIL_ADDRESS, API_KEY);System.out.println("Logged successfully!\nToken: " + authToken);} catch (CommandException e) {System.out.println("Failed to login: " + e.getMessage());System.exit(1);}Response
{"access_token": "<ACCESS_TOKEN>",".issued": "2025-07-31T10:19:40.0690015Z",".expires": "2025-08-02T10:19:40.0690016Z"} -
Index Your Documents
Section titled “Index Your Documents”For each document you want to include in the comparison, submit it for indexing using one of the submit endpoints (
submit-file
,submit-url
, orsubmit-ocr
).Set
properties.action
to2
(IndexOnly
) to store the document without scanning it immediately. This avoids consuming scan credits during the indexing phase. You also need to specify which repository to index the document into.PUT https://api.copyleaks.com/v3/scans/submit/file/my-index-scan-1Content-Type: application/jsonAuthorization: Bearer <YOUR_AUTH_TOKEN>{"base64": "SGVsbG8gd29ybGQh","filename": "document1.txt","properties": {"action": 2,"indexing": {"repositories": ["my-repo-id"]},"sandbox": true}}Terminal window curl --request PUT \--url https://api.copyleaks.com/v3/scans/submit/file/my-index-scan-1 \-H "Authorization: Bearer <YOUR_AUTH_TOKEN>" \-H "Content-Type: application/json" \-d '{"base64": "SGVsbG8gd29ybGQh","filename": "document1.txt","properties": {"action": 2,"indexing": {"repositories": ["my-repo-id"]},"sandbox": true}}'from copyleaks.copyleaks import Copyleaksfrom copyleaks.models.submit.document import FileDocumentfrom copyleaks.models.submit.properties.scan_properties import ScanPropertiesfrom copyleaks.models.submit.properties.indexing_properties import IndexingPropertiesscan_id = "my-index-scan-1"properties = ScanProperties()properties.set_action(2) # IndexOnlyproperties.set_sandbox(True)indexing = IndexingProperties()indexing.add_repository("my-repo-id")properties.set_indexing(indexing)file_submission = FileDocument(base64="SGVsbG8gd29ybGQh",filename="document1.txt",properties=properties)response = Copyleaks.Scans.submit_file(auth_token, scan_id, file_submission)print("Document indexed successfully!")print("Scan ID:", scan_id)print("Response:", response)const { Copyleaks } = require('plagiarism-checker');const API_KEY = "your-api-key-here";async function indexDocumentToRepository() {const copyleaks = new Copyleaks();// Login firstconst authToken = await copyleaks.loginAsync(EMAIL_ADDRESS, API_KEY);console.log('Logged successfully!\nToken:', authToken);// Document to indexconst base64Content = "SGVsbG8gd29ybGQh"; // "Hello world!" in base64// Submit file for indexing onlyconst scanId = "my-index-scan-1";const fileSubmission = {base64: base64Content,filename: "document1.txt",properties: {action: 2, // IndexOnlyindexing: {repositories: ["my-repo-id"],copyleaksDb: true // Also index to shared database},scanning: {internet: true,repositories: ["my-repo-id"]},sandbox: true}};try {const result = await copyleaks.submitFileAsync(authToken, scanId, fileSubmission);console.log('Document indexed successfully!');console.log('Scan ID:', scanId);console.log('Repository ID: my-repo-id');console.log('Status: Indexed - waiting for IndexOnly webhook');return result;} catch (error) {console.error('Failed to index document:', error);}}indexDocumentToRepository();import classes.Copyleaks;import models.submissions.CopyleaksFileSubmissionModel;import models.submissions.properties.*;public class DataHubIndexingExample {private static final String API_KEY = "00000000-0000-0000-0000-000000000000";public static void main(String[] args) {try {// Login to CopyleaksString authToken = Copyleaks.login(EMAIL_ADDRESS, API_KEY);System.out.println("Logged successfully!\nToken: " + authToken);// Document content to indexString base64Content = "SGVsbG8gd29ybGQh"; // "Hello world!" in base64// Configure submission properties for indexingSubmissionProperties properties = new SubmissionProperties();properties.setSandbox(true);properties.setAction(SubmissionActions.IndexOnly); // Action 2 = IndexOnly// Configure indexing to repositoriesSubmissionIndexing indexing = new SubmissionIndexing();indexing.addRepository("my-repo-id");indexing.setCopyleaksDb(true); // Also index to shared databaseproperties.setIndexing(indexing);// Configure scanning settings (applied during indexing)SubmissionScanning scanning = new SubmissionScanning();scanning.setInternet(true);scanning.addRepository("my-repo-id");properties.setScanning(scanning);// Create file submission for indexingString scanId = "my-index-scan-1";CopyleaksFileSubmissionModel fileSubmission = new CopyleaksFileSubmissionModel(base64Content,"document1.txt",properties);// Submit file for indexingCopyleaks.submitFile(authToken, scanId, fileSubmission);System.out.println("Document indexed successfully!");System.out.println("Scan ID: " + scanId);System.out.println("Repository ID: my-repo-id");System.out.println("Status: Indexed - waiting for IndexOnly webhook");System.out.println("Next step: Wait for all documents to be indexed, then call /v3/scans/start");} catch (Exception e) {System.out.println("Failed to index document: " + e.getMessage());e.printStackTrace();}}}You will need to wait for the
IndexOnly
webhook for each document to confirm it has been successfully indexed before proceeding to the next step. -
Start Your Cross-Comparison
Section titled “Start Your Cross-Comparison”Once all your documents are indexed, make a
PATCH
request to the/v3/scans/start
endpoint. This will begin the comparison scan for all the documents you indexed.Provide the list of
scanId
s from the previous step in thetrigger
array.PATCH https://api.copyleaks.com/v3/scans/startContent-Type: application/jsonAuthorization: Bearer <YOUR_AUTH_TOKEN>{"trigger": ["my-index-scan-1","my-index-scan-2","my-index-scan-3"],"errorHandling": 0}Terminal window curl --request PATCH \--url https://api.copyleaks.com/v3/scans/start \-H "Authorization: Bearer <YOUR_AUTH_TOKEN>" \-H "Content-Type: application/json" \-d '{"trigger": ["my-index-scan-1","my-index-scan-2","my-index-scan-3"],"errorHandling": 0}'import requestsurl = "https://api.copyleaks.com/v3/scans/start"payload = {"trigger": ["my-index-scan-1","my-index-scan-2","my-index-scan-3"],"errorHandling": 0}headers = {"Authorization": "Bearer <YOUR_AUTH_TOKEN>","Content-Type": "application/json","Accept": "application/json"}response = requests.patch(url, json=payload, headers=headers)result = response.json()print("Cross-comparison started!")print("Success:", result.get("success", []))print("Failed:", result.get("failed", []))if result.get("success"):print(f"Successfully started {len(result['success'])} scans")print("Watch for Completed webhooks for each scan")const { Copyleaks } = require('plagiarism-checker');const API_KEY = "your-api-key-here";async function startCrossComparison() {const copyleaks = new Copyleaks();// Login firstconst authToken = await copyleaks.loginAsync(EMAIL_ADDRESS, API_KEY);console.log('Logged successfully!\nToken:', authToken);// Start cross-comparison for indexed documentsconst scanIds = ["my-index-scan-1","my-index-scan-2","my-index-scan-3"];const startRequest = {trigger: scanIds,errorHandling: 0};try {const result = await copyleaks.startScansAsync(authToken, startRequest);console.log('Cross-comparison started successfully!');console.log('Success:', result.success || []);console.log('Failed:', result.failed || []);if (result.success && result.success.length > 0) {console.log(`Successfully started ${result.success.length} scans`);console.log('Watch for Completed webhooks for each scan');}return result;} catch (error) {console.error('Failed to start cross-comparison:', error);}}startCrossComparison();import classes.Copyleaks;import models.StartScanRequest;public class DataHubStartScanExample {private static final String API_KEY = "00000000-0000-0000-0000-000000000000";public static void main(String[] args) {try {// Login to CopyleaksString authToken = Copyleaks.login(EMAIL_ADDRESS, API_KEY);System.out.println("Logged successfully!\nToken: " + authToken);// Prepare list of scan IDs to startString[] scanIds = {"my-index-scan-1","my-index-scan-2","my-index-scan-3"};// Create start scan requestStartScanRequest startRequest = new StartScanRequest();startRequest.setTrigger(scanIds);startRequest.setErrorHandling(0);// Start cross-comparisonvar result = Copyleaks.startScans(authToken, startRequest);System.out.println("Cross-comparison started successfully!");System.out.println("Success: " + String.join(", ", result.getSuccess()));System.out.println("Failed: " + String.join(", ", result.getFailed()));if (result.getSuccess().length > 0) {System.out.println("Successfully started " + result.getSuccess().length + " scans");System.out.println("Watch for Completed webhooks for each scan");}} catch (Exception e) {System.out.println("Failed to start cross-comparison: " + e.getMessage());e.printStackTrace();}}} -
Interpreting The Results
Section titled “Interpreting The Results”A successful
200 OK
response from thestart
endpoint will confirm which scans were started. The actual scan results for each document will be delivered asynchronously via theCompleted
webhook, just like a regular scan.Example Success Response from
/v3/scans/start
:{"success": ["my-index-scan-1","my-index-scan-2","my-index-scan-3"],"failed": []} -
🎉Congratulations!
Section titled “🎉Congratulations!”You have successfully started a cross-comparison scan between multiple documents in your Data Hub.
👥 Team Collaboration with Private Cloud Hub
Section titled “👥 Team Collaboration with Private Cloud Hub”Multiple users can access, scan against, and index to your Private Cloud Hub. Manage permissions and data masking settings through the admin dashboard.
💡 Best Practices
Section titled “💡 Best Practices”- Plan your scanning options: Configure settings during indexing.
- Monitor indexing progress: Wait for all
IndexOnly
webhooks before starting the comparison. - Choose your database strategy: Decide whether to use Private, Shared, or both.
- Batch efficiently: Group related documents together.
- Respect API limits: Monitor your API dashboard.
🚀 Next Steps
Section titled “🚀 Next Steps”Support
Section titled “Support”Should you require any assistance, please contact Copyleaks Support or ask a question on Stack Overflow with the copyleaks-api
tag.
Schedule a Live Demo
Want to see how Data Hubs can help you manage and compare your documents? Our technical team can walk you through live examples of setting up a Private Cloud Hub, indexing large batches of content, and running cross-comparisons in a secure environment.
Book a Demo