Submit File (OCR)
Scan images with textual content to find where the content has been used before and check its originality. Using submit-ocr you can scan various image file types for plagiarism and identify infringed content. Only the textual content in the picture will be scanned and not the graphics. See supported formats.
Request
Section titled “Request”Path Parameters
Section titled “Path Parameters”A unique scan id provided by you. We recommend you use the same id in your database to represent the scan in the Copyleaks database. This will help you to debug incidents. Using the same ID for the same file will help you to avoid network problems that may lead to multiple scans for the same file. learn more about the criteria for creating a Scan ID.
>= 3 characters
<= 36 characters
Match pattern: [a-z0-9] !@$^&-+%=_(){}<>';:/.",~
|
Headers
Section titled “Headers”Content-Type: application/jsonAuthorization: Bearer YOUR_LOGIN_TOKEN
Request Body
Section titled “Request Body”The request body is a JSON object containing the image file to scan and a properties
object to configure the scan.
A base64 data string of a file. If you would like to scan plain text, encode it as base64 and submit it.
Example: aGVsbG8gd29ybGQ=
The name of the file as it will appear in the Copyleaks scan report. Make sure to include the right extension for your filetype.
<= 255
characters
Example: image.jpg
The language of the text in the image. See supported languages.
Example: en
properties object required
Configuration options for the scan.
The type of submission action.
0
: Scan- Start scan immediately.
1
: Check-Credits- Check how many credits will be used for this scan.
2
: Index Only- Only index the file in the Copyleaks internal database or Copyleaks Repository (depends on your submit request). No credits will be used.
By default, Copyleaks will present the report in text format. If set to true
, Copyleaks will also include html format.
True
: results will be generated as HTML format, if possible. Otherwise, it will be generated as text format.False
: results will be generated as text format.
Add custom developer payload that will then be provided on the webhooks.
<= 512 characters
You can test the integration with the Copyleaks API for free using the sandbox mode.
You will be able to submit content for a scan and get back mock results, simulating the way Copyleaks will work to make sure that you successfully integrated with the API.
Turn off this feature on production environment.
Rate Limiting: This method has a maximum call rate limit of 100 sandbox scans within 1 hour. See the 429 Response code section at the bottom of this page.
Specify the maximum life span of a scan in hours on the Copyleaks servers. When expired, the scan will be deleted and will no longer be accessible.
>= 1
<= 2880
Choose the algorithm goal. You can set this value depending on your use-case.
Available Options:
- 0 - MaximumCoverage: prioritize higher similarity score.
- 1 - MaximumResults: prioritize finding more sources.
Add custom properties that will be attached to your document in a Copyleaks repository.
If this document is found as a repository result, your custom properties will be added to the result.
Example:
[ { "key":"Test1", "value":"Test1" }, ...]
author object
A unique identifier for the author of the content.
course object
A unique course identifier for tracking analytics.
assignment object
A unique assignment identifier for tracking analytics.
institution object
A unique institution identifier for tracking analytics.
webhooks object
The webhooks
object is where you define the callback URLs for Copyleaks to send notifications to. This object is required.
A URL that will be called when the scan status changes. Use the {STATUS}
placeholder, which will be replaced with completed
, error
, creditsChecked
, or indexed
. Example: https://yoursite.com/webhook/{STATUS}
A URL that will be called when a new result is found during the scan.
Custom headers to add to the newResult
webhook. Example: [["key", "value"]]
Custom headers to add to the status
webhook. Example: [["key", "value"]]
filters object
Fine-tune what kind of results are included in the scan report.
Enable matching of exact words.
Enable matching of nearly identical words (e.g., slow/slowly).
Enable matching of paraphrased content.
Only show results with at least this many copied words.
Block explicit adult content from scan results.
A list of domains to include or exclude from the scan.
0
to include domains, 1
to exclude them.
Allow results from the same domain as the submitted URL.
scanning object
Define the sources to compare your document against.
Compare your content with online sources.
Exclude submissions from results if their ID matches a pattern (e.g., abc*
).
Include only submissions whose ID matches a pattern.
An array of Private Cloud Hubs objects to scan against. Each object needs a Private Cloud Hub id
.
copyleaksDb object
Configure scanning against the Copyleaks Shared Data Hub.
crossLanguages object
Configure cross-language plagiarism detection.
indexing object
Configure where to index the submitted content.
Specify which repositories to index the scanned document to.
Add the submitted document to the Copyleaks Shared Data Hub.
exclude object
Configure what content to exclude from the scan.
Exclude quoted text from the scan.
Exclude citations from the scan.
Exclude referenced text from the scan.
Exclude table of contents from the scan.
When the scanned document is an HTML document, exclude titles from the scan.
When the scanned document is an HTML document, exclude irrelevant text that appears across the site like the website footer or header.
Exclude template text found in other documents. Provide an array of scan IDs (max 3).
pdf object
Configure and request a PDF report of the scan results.
Set to true to generate a PDF report for this scan.
Customize the title for the PDF report (max 256 chars).
A base64 encoded PNG image (max 100kb) to use as a logo in the report.
When set to true the text in the report will be aligned from right to left.
reportVersion
. PDF version to generate (1, 2, or 3).Specifies which version of the PDF report to generate (v1
, v2
, v3
, latest
). Overrides version
.
colors object
Customize colors for titles, identical matches, minor changes, etc. in HEX format.
When specified, the PDF report will be generated in the selected language. Future updates may also apply this setting to the overview and other components.
Currently supported languages:
Code | Language |
---|---|
en | English |
es | Spanish |
de | German |
fr | French |
it | Italian |
pt | Portuguese |
You can control the level of plagiarism sensitivity that will be identified according to the speed of the scan. If you prefer a faster scan with the results that contains the highest amount of plagiarism choose 1, and if a slower, more comprehensive scan, that will also detect the smallest instances choose 5.
Optional Values:
Range between 1 (faster) to 5 (slower but more comprehensive)
When set to true the submitted document will be checked for cheating. If a cheating will be detected, a scan alert will be added to the completed webhook.
aiGeneratedText object
Configure AI-generated text detection.
Detects whether the text was written by an AI.
Control the behavior of the AI detection (1-3).
explain object
Enable AI Logic feature for AI detection.
sensitiveDataProtection object
Mask driver’s license numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Supported Types:
Type |
---|
Australia driver’s license number |
Canada driver’s license number |
United Kingdom driver’s license number |
USA drivers license number |
Japan driver’s license number |
Spain driver’s license number |
Germany driver’s license number |
Mask credentials from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Supported Types:
Type |
---|
Authentication token |
Amazon Web Services credentials |
Azure JSON Web Token |
HTTP basic authentication header |
Google Cloud Platform service account credentials |
Google Cloud Platform API key |
JSON Web Token |
Encryption key |
Password |
Mask passports from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Supported Types:
Type |
---|
Canada passport number |
China passport number |
France passport number |
Germany passport number |
Ireland passport number |
Japan passport number |
Korea passport number |
Mexico passport number |
Spain passport number |
United Kingdom passport number |
USA passport number |
Netherlands passport number |
Poland passport |
Sweden passport number |
Australia passport number |
Singapore passport number |
Taiwan passport number |
Mask network identifiers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Supported Types:
Type |
---|
IP address |
Local MAC address |
MAC address |
Mask url from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Mask email addresses from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Mask credit card numbers and credit card track numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Mask phone numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
writingFeedback object
Configure the automated Writing Assistant.
Enable automated Writing Assistant for grammar, spelling, etc.
score object
Configure the weighting of different categories in the overall writing score.
Grammar correction category weight. Range: 0.0
to 1.0
.
Mechanics correction category weight. Range: 0.0
to 1.0
.
Sentence structure correction category weight. Range: 0.0
to 1.0
.
Word choice correction category weight. Range: 0.0
to 1.0
.
overview object
Enable Gen-AI Overview feature to extract key insights from the scan data.
Enable Gen-AI Overview feature to extract key insights from the scan data.
Ignore AI detection when generating the scan’s overview. Only applicable if AI detection was enabled.
Ignore plagiarism detection when generating the scan’s overview. Only applicable if plagiarism detection was enabled.
Ignore writing assistant when generating the scan’s overview. Only applicable if the writing assistant was enabled.
Ignore the author’s historical data when generating the scan’s overview. Only applicable if author ID added to the request.
aiSourceMatch object
The AI Source Match feature enhances plagiarism detection by identifying online sources that are suspected of containing AI-generated text. This allows you to find instances of potential plagiarism and understand if the matched source content itself might have been created by an AI.
Activates or deactivates the AI Source Match functionality.
Responses
Section titled “Responses”The scan was successfully created and is now processing.
Response Schema
The response contains the following fields:
scannedDocument object
results object
notifications object
writingFeedback object
Example Response
A typical response from this endpoint:
{"scannedDocument": { "scanId": "scan-id32", "totalWords": 2, "totalExcluded": 0, "credits": 0, "expectedCredits": 1, "creationTime": "2025-08-05T07:19:08.181236Z", "metadata": { "filename": "file.txt" }, "enabled": { "plagiarismDetection": true, "aiDetection": false, "explainableAi": false,// ... truncated
The filename field is required.
Example Response
A typical response from this endpoint:
{ "filename": [ "The filename field is required." ]}
Authentication failed or API key is invalid.
Example Response
A typical response from this endpoint:
{ "type": "https://tools.ietf.org/html/rfc9110#section-15.5.2", "title": "Unauthorized", "status": 401, "traceId": "00-ef0db7690ced98431ac97782051edc77-2c4194d74ae6c08b-00"}
Conflict. An export task with the same Id already exists in the system.
Example Response
A typical response from this endpoint:
{ "type": "https://tools.ietf.org/html/rfc9110#section-15.5.10", "title": "Conflict", "status": 409, "traceId": "00-561fe3b2451eb51ce489557a2f34f247-3acd41c0b4ad1295-00"}
Rate limit exceeded. Please retry after the specified time.
Example Response
A typical response from this endpoint:
{ "error": "Rate limit exceeded",}
Examples
Section titled “Examples”PUT https://api.copyleaks.com/v3/scans/submit/ocr/my-scan-123Content-Type: application/jsonAuthorization: Bearer YOUR_LOGIN_TOKEN
{ "base64": "YOUR_BASE64_HERE", "filename": "image.jpg", "langCode": "en", "properties": { "webhooks": { "status": "https://my-server.com/webhook/{STATUS}" }, "sandbox": true }}
curl --request PUT \ --url https://api.copyleaks.com/v3/scans/submit/ocr/my-scan-123 \ --header 'Authorization: Bearer YOUR_LOGIN_TOKEN' \ --header 'Content-Type: application/json' \ --data '{ "base64": "YOUR_BASE64_HERE", "filename": "image.jpg", "langCode": "en", "properties": { "webhooks": { "status": "https://my-server.com/webhook/{STATUS}" }, "sandbox": true } }'