How It Works
Copyleaks provides two types of databases for storing and comparing documents:- Shared Data Hub: Global database that contains millions of documents from institutions worldwide.
- Private Cloud Hub: Private database that is exclusive to your organization, ensuring that your documents remain confidential and secure.
You can use both databases simultaneously to maximize detection coverage while keeping sensitive documents private.
Understanding Your Database Options
You have two database options for storing and comparing your documents:Shared Data Hub (Free)
- Contains millions of documents from institutions worldwide
- When you index a document, it becomes available for everyone to compare against
- Contributes to the global academic integrity community
- Your documents will be matched against submissions from other institutions
Private Cloud Hub (Paid)
- Creates a completely private database for your organization only
- Your documents stay within your private environment
- Perfect for sensitive or confidential documents
- Only you and your organization can access and compare against these documents
- Built for large organizations looking to securely store and manage documents
- Enables team collaboration with controlled access and user management
You can use both databases simultaneously. Your documents can be stored in your Private Cloud Hub while also being compared against the Shared Data Hub for maximum detection coverage.
How Cross-Comparison Works
The process involves two main steps:- ** Index your documents**: Upload documents to your chosen database using
IndexOnlymode. - ** Start the comparison**: Run a scan that compares all indexed documents against each other and your selected databases.
Get Started
Before you begin
Before you start, ensure you have the following:
- An active Copyleaks account. If you don’t have one, sign up for free.
- You can find your API key on the API Dashboard.
Installation
Choose your preferred method for making API calls.
HTTP needs no installation - call the API with any standard HTTP client, or import our Postman collection for a quicker start.
Login
To perform a scan, we first need to generate an access token. For that, we will use the login endpoint.
The API key can be found on the Copyleaks API Dashboard.Upon successful authentication, you will receive a token that must be attached to subsequent API calls via the Response
Authorization: Bearer <TOKEN> header.
This token remains valid for 48 hours.Save this token! It’s valid for 48 hours and can be reused for subsequent API calls.
Index Your Documents
For each document you want to include in the comparison, submit it for indexing using one of the submit endpoints (You will need to wait for the
submit-file, submit-url, or submit-ocr).Set properties.action to 2 (IndexOnly) to store the document without scanning it immediately. This avoids consuming scan credits during the indexing phase. You also need to specify which repository to index the document into.IndexOnly webhook for each document to confirm it has been successfully indexed before proceeding to the next step.Start Your Cross-Comparison
Once all your documents are indexed, make a
PATCH request to the /v3/scans/start endpoint. This will begin the comparison scan for all the documents you indexed.Provide the list of scanIds from the previous step in the trigger array.Interpreting The Results
A successful
200 OK response from the start endpoint will confirm which scans were started. The actual scan results for each document will be delivered asynchronously via the Completed webhook, just like a regular scan.Example Success Response from /v3/scans/start:Team Collaboration with Private Cloud Hub
Multiple users can access, scan against, and index to your Private Cloud Hub. Manage permissions and data masking settings through the admin dashboard.Best Practices
- Plan your scanning options: Configure settings during indexing.
- Monitor indexing progress: Wait for all
IndexOnlywebhooks before starting the comparison. - Choose your database strategy: Decide whether to use Private, Shared, or both.
- Batch efficiently: Group related documents together.
- Respect API limits: Monitor your API dashboard.
Next Steps
Create Private Cloud Hub
Set up your own private database for document storage.
Support
Should you require any assistance, please contact Copyleaks Support or ask a question on Stack Overflow with thecopyleaks-api tag.
Schedule a Live Demo
Want to see how Data Hubs can help you manage and compare your documents? Our technical team can walk you through live examples of setting up a Private Cloud Hub, indexing large batches of content, and running cross-comparisons in a secure environment.

