Authenticity

Submit File (OCR)

PUT https://api.copyleaks.com/v3/scans/submit/ocr/{scanId}

Scan images with textual content to find where the content has been used before and check its originality. Using submit-ocr you can scan various image file types for plagiarism and identify infringed content. Only the textual content in the picture will be scanned and not the graphics. See supported formats.

Authentication Required

You need to login with a user and API key in order to access this method. Add this HTTP header to your request:

Authorization: Bearer < Your-Login-Token >

Request

Path Parameters

scanId string Required

A unique scan id provided by you. We recommend you use the same id in your database to represent the scan in the Copyleaks database. This will help you to debug incidents. Using the same ID for the same file will help you to avoid network problems that may lead to multiple scans for the same file. learn more about the criteria for creating a Scan ID.

>= 3 characters <= 36 characters

Match pattern: [a-z0-9] !@$^&-+%=_(){}<>';:/.",~|

Headers

Content-Type: application/json
Authorization: Bearer YOUR_LOGIN_TOKEN

Request Body

The request body is a JSON object containing the image file to scan and a properties object to configure the scan.

base64 string Required

A base64 data string of a file. If you would like to scan plain text, encode it as base64 and submit it.

Example: aGVsbG8gd29ybGQ=

filename string Required

The name of the file as it will appear in the Copyleaks scan report. Make sure to include the right extension for your filetype.

<= 255 characters

Example: image.jpg

langCode string Required

The language of the text in the image. See supported languages.

Example: en

properties object Required

Configuration options for the scan.

action integer default: "0"

The type of submission action.

0: Scan: Start scan immediately.
1: Check-Credits: Check how many credits will be used for this scan.
2: Index Only: Only index the file in the Copyleaks internal database or Copyleaks Repository (depends on your submit request). No credits will be used.

includeHtml boolean

By default, Copyleaks will present the report in text format. If set to true, Copyleaks will also include html format.

True : results will be generated as HTML format, if possible. Otherwise, it will be generated as text format.
False : results will be generated as text format.

developerPayload string default: "null"

Add custom developer payload that will then be provided on the webhooks.

<= 512 characters

sandbox boolean default: "false"

You can test the integration with the Copyleaks API for free using the sandbox mode.

You will be able to submit content for a scan and get back mock results, simulating the way Copyleaks will work to make sure that you successfully integrated with the API.

Turn off this feature on production environment.

Rate Limiting: This method has a maximum call rate limit of 100 sandbox scans within 1 hour. See the 429 Response code section at the bottom of this page.

expiration integer default: "2880"

Specify the maximum life span of a scan in hours on the Copyleaks servers. When expired, the scan will be deleted and will no longer be accessible.

>= 1 <= 2880

scanMethodAlgorithm integer default: "0"

Choose the algorithm goal. You can set this value depending on your use-case.

Available Options:

0 - MaximumCoverage: prioritize higher similarity score.
1 - MaximumResults: prioritize finding more sources.

customMetadata array default: "[]"

Add custom properties that will be attached to your document in a Copyleaks repository.

If this document is found as a repository result, your custom properties will be added to the result.

Example:

[
  {
    "key":"Test1",
    "value":"Test1"
  },
  ...
]

displayLanguage string default: "en"

When specified, the PDF report will be generated in the selected language. Future updates may also apply this setting to the overview and other components.

Currently supported languages:

Code	Language
`en`	English
`es`	Spanish
`de`	German
`fr`	French
`it`	Italian
`pt`	Portuguese

scanTimeZone string default: "User's country timezone"

Specify the timezone for the scan time displayed on the final PDF report. The value must be a valid, case-sensitive IANA Time Zone name (e.g., America/New_York). If unspecified, the timezone defaults to the user’s country, or UTC if their country is unknown.

Available Options: See the full List of IANA Time Zones.

sensitivityLevel integer default: "3"

You can control the level of plagiarism sensitivity that will be identified according to the speed of the scan. If you prefer a faster scan with the results that contains the highest amount of plagiarism choose 1, and if a slower, more comprehensive scan, that will also detect the smallest instances choose 5.

Optional Values:

Range between 1 (faster) to 5 (slower but more comprehensive)

cheatDetection boolean default: "false"

When set to true the submitted document will be checked for cheating. If a cheating will be detected, a scan alert will be added to the completed webhook.

author object

id string

A unique identifier for the author of the content.

course object

id string

A unique course identifier for tracking analytics.

assignment object

id string

A unique assignment identifier for tracking analytics.

institution object

id string

A unique institution identifier for tracking analytics.

webhooks object

The webhooks object is where you define the callback URLs for Copyleaks to send notifications to. This object is required.

status string Required

A URL that will be called when the scan status changes. Use the {STATUS} placeholder, which will be replaced with completed, error, creditsChecked, or indexed. Example: https://yoursite.com/webhook/{STATUS}

newResult string

A URL that will be called when a new result is found during the scan.

newResultHeaders array

Custom headers to add to the newResult webhook. Example: [["key", "value"]]

statusHeaders array

Custom headers to add to the status webhook. Example: [["key", "value"]]

filters object

Fine-tune what kind of results are included in the scan report.

identicalEnabled boolean default: "true"

Enable matching of exact words.

minorChangesEnabled boolean default: "true"

Enable matching of nearly identical words (e.g., slow/slowly).

relatedMeaningEnabled boolean default: "true"

Enable matching of paraphrased content.

minCopiedWords integer

Only show results with at least this many copied words.

safeSearch boolean default: "false"

Block explicit adult content from scan results.

domains array default: "[]"

A list of domains to include or exclude from the scan.

domainsMode integer default: "1"

0 to include domains, 1 to exclude them.

allowSameDomain boolean default: "false"

Allow results from the same domain as the submitted URL.

scanning object

Define the sources to compare your document against.

internet boolean default: "true"

Compare your content with online sources.

exclude object

idPattern string default: "null"

Exclude submissions from results if their ID matches the supplied pattern. Matched submissions will be excluded from both internal database and repository results.

Supported pattern wildcards:

* - Matches any number of characters (including zero)
. - Matches exactly one non-whitespace character

Examples:

abc* - Excludes submissions with IDs starting with “abc”
ab.. - Excludes submissions with exactly 4-character IDs starting with “ab”
*test - Excludes submissions with IDs ending with “test”
user.* - Excludes submissions with IDs starting with “user.” followed by any characters

Use patterns carefully to avoid excluding legitimate comparisons. Test your patterns with a small dataset first.

backlinksDomains array[string] default: "null"

Exclude results that contain backlinks to these specific domains.

Limits:

Maximum 10 domains
Each domain: 1-256 characters

Examples:

["google.com"] - Excludes results contain backlinks to google.com

text array[string] default: "null"

Exclude any results that contain text matching these phrases.

Limits:

Maximum 5 text phrases
Each phrase: 1-30 characters

Examples:

["google"] - Excludes results containing the word “google”

include object

idPattern string default: "null"

Includes results only if their scan id matches the supplied pattern. Matched submissions will be the only submissions Included from internal database and repositories results.

Supported pattern wildcards:

* - Matches any number of characters (including zero)
. - Matches exactly one non-whitespace character

Examples:

abc* - Includes submissions with IDs starting with “abc”
ab.. - Includes submissions with exactly 4-character IDs starting with “ab”
*test - Includes submissions with IDs ending with “test”
user.* - Includes submissions with IDs starting with “user.” followed by any characters

Use patterns carefully to avoid excluding legitimate comparisons. Test your patterns with a small dataset first.

repositories array[object] default: "[]"

Specify which repositories to scan the document against.

id string default: "null"

Id of a repository to scan the submitted document against.

includeMySubmissions boolean default: "false"

Compare the scanned document against MY submissions in the repository.

includeOthersSubmissions boolean default: "false"

Compare the scanned document against OTHER users submissions in the repository.

copyleaksDb object default: "null"

Configure scanning against the Copyleaks Shared Data Hub.

includeMySubmissions boolean default: "true"

When set to true: Copyleaks will also compare against content which was uploaded by YOU to the Copyleaks internal database. If true, it will also index the scan in the Copyleaks internal database.

includeOthersSubmissions boolean default: "true"

When set to true: Copyleaks will also compare against content which was uploaded by OTHERS to the Copyleaks internal database. If true, it will also index the scan in the Copyleaks internal database.

crossLanguages object

Configure cross-language plagiarism detection.

languages array[object] default: "[]"

Cross language plagiarism detection. Choose which languages to scan your content against. For each additional language chosen, your pages will be deducted per page submitted. The language of the original document submitted is always scanned, therefore should not be included in the additional languages chosen. Supported languages list.

code string default: "null"

Language code for cross language plagiarism detection.

indexing object

Configure where to index the submitted content.

repositories array default: "[]"

Specify which repositories to index the scanned document to.

copyleaksDb boolean default: "false"

Add the submitted document to the Copyleaks Shared Data Hub.

exclude object

Configure what content to exclude from the scan.

quotes boolean default: "false"

Exclude quoted text from the scan.

citations boolean default: "false"

Exclude citations from the scan.

references boolean default: "false"

Exclude referenced text from the scan.

tableOfContents boolean default: "false"

Exclude table of contents from the scan.

titles boolean default: "false"

When the scanned document is an HTML document, exclude titles from the scan.

htmlTemplate boolean default: "false"

When the scanned document is an HTML document, exclude irrelevant text that appears across the site like the website footer or header.

documentTemplateIds array default: "[]"

Exclude template text found in other documents. Provide an array of scan IDs (max 3).

pdf object

Configure and request a PDF report of the scan results.

create boolean default: "false"

Set to true to generate a PDF report for this scan.

title string

Customize the title for the PDF report (max 256 chars).

largeLogo string

A base64 encoded PNG image (max 100kb) to use as a logo in the report.

rtl boolean default: "false"

When set to true the text in the report will be aligned from right to left.

version integer default: "1" Deprecated

Use reportVersion. PDF version to generate (1, 2, or 3).

reportVersion string default: "latest"

Specifies which version of the PDF report to generate (v1, v2, v3, latest). Overrides version.

colors object

Customize the highlight colors used in the PDF report.

titles string default: "#040F21"

Change the color of titles in the PDF.