Authenticity

Submit File (OCR)

PUT https://api.copyleaks.com/v3/scans/submit/ocr/{scanId}

Scan images with textual content to find where the content has been used before and check its originality. Using submit-ocr you can scan various image file types for plagiarism and identify infringed content. Only the textual content in the picture will be scanned and not the graphics. See supported formats.

Request

Path Parameters

scanId string required

A unique scan id provided by you. We recommend you use the same id in your database to represent the scan in the Copyleaks database. This will help you to debug incidents. Using the same ID for the same file will help you to avoid network problems that may lead to multiple scans for the same file. learn more about the criteria for creating a Scan ID.

>= 3 characters <= 36 characters

Match pattern: [a-z0-9] !@$^&-+%=_(){}<>';:/.",~|

Headers

Content-Type: application/json
Authorization: Bearer YOUR_LOGIN_TOKEN

Request Body

The request body is a JSON object containing the image file to scan and a properties object to configure the scan.

base64 string required

A base64 data string of a file. If you would like to scan plain text, encode it as base64 and submit it.

Example: aGVsbG8gd29ybGQ=

filename string required

The name of the file as it will appear in the Copyleaks scan report. Make sure to include the right extension for your filetype.

<= 255 characters

Example: image.jpg

langCode string required

The language of the text in the image. See supported languages.

Example: en

properties object required

Configuration options for the scan.

action integer default: "0"

The type of submission action.

0: Scan: Start scan immediately.
1: Check-Credits: Check how many credits will be used for this scan.
2: Index Only: Only index the file in the Copyleaks internal database or Copyleaks Repository (depends on your submit request). No credits will be used.

includeHtml boolean

By default, Copyleaks will present the report in text format. If set to true, Copyleaks will also include html format.

True : results will be generated as HTML format, if possible. Otherwise, it will be generated as text format.
False : results will be generated as text format.

developerPayload string default: "null"

Add custom developer payload that will then be provided on the webhooks.

<= 512 characters

sandbox boolean default: "false"

You can test the integration with the Copyleaks API for free using the sandbox mode.

You will be able to submit content for a scan and get back mock results, simulating the way Copyleaks will work to make sure that you successfully integrated with the API.

Turn off this feature on production environment.

Rate Limiting: This method has a maximum call rate limit of 100 sandbox scans within 1 hour. See the 429 Response code section at the bottom of this page.

expiration integer default: "2880"

Specify the maximum life span of a scan in hours on the Copyleaks servers. When expired, the scan will be deleted and will no longer be accessible.

>= 1 <= 2880

scanMethodAlgorithm integer default: "0"

Choose the algorithm goal. You can set this value depending on your use-case.

Available Options:

0 - MaximumCoverage: prioritize higher similarity score.
1 - MaximumResults: prioritize finding more sources.

customMetadata array default: "[]"

Add custom properties that will be attached to your document in a Copyleaks repository.

If this document is found as a repository result, your custom properties will be added to the result.

Example:

[
  {
    "key":"Test1",
    "value":"Test1"
  },
  ...
]

author object

id string

A unique identifier for the author of the content.

course object

id string

A unique course identifier for tracking analytics.

assignment object

id string

A unique assignment identifier for tracking analytics.

institution object

id string

A unique institution identifier for tracking analytics.

webhooks object

The webhooks object is where you define the callback URLs for Copyleaks to send notifications to. This object is required.

status string required

A URL that will be called when the scan status changes. Use the {STATUS} placeholder, which will be replaced with completed, error, creditsChecked, or indexed. Example: https://yoursite.com/webhook/{STATUS}

newResult string

A URL that will be called when a new result is found during the scan.

newResultHeaders array

Custom headers to add to the newResult webhook. Example: [["key", "value"]]

statusHeaders array

Custom headers to add to the status webhook. Example: [["key", "value"]]

filters object

Fine-tune what kind of results are included in the scan report.

identicalEnabled boolean default: "true"

Enable matching of exact words.

minorChangesEnabled boolean default: "true"

Enable matching of nearly identical words (e.g., slow/slowly).

relatedMeaningEnabled boolean default: "true"

Enable matching of paraphrased content.

minCopiedWords integer

Only show results with at least this many copied words.

safeSearch boolean default: "false"

Block explicit adult content from scan results.

domains array default: "[]"

A list of domains to include or exclude from the scan.

domainsMode integer default: "1"

0 to include domains, 1 to exclude them.

allowSameDomain boolean default: "false"

Allow results from the same domain as the submitted URL.

scanning object

Define the sources to compare your document against.

internet boolean default: "true"

Compare your content with online sources.

exclude.idPattern string

Exclude submissions from results if their ID matches a pattern (e.g., abc*).

include.idPattern string

Include only submissions whose ID matches a pattern.

repositories array default: "[]"

An array of Private Cloud Hubs objects to scan against. Each object needs a Private Cloud Hub id.

copyleaksDb object

Configure scanning against the Copyleaks Shared Data Hub.

crossLanguages object

Configure cross-language plagiarism detection.

indexing object

Configure where to index the submitted content.

repositories array default: "[]"

Specify which repositories to index the scanned document to.

copyleaksDb boolean default: "false"

Add the submitted document to the Copyleaks Shared Data Hub.

exclude object

Configure what content to exclude from the scan.

quotes boolean default: "false"

Exclude quoted text from the scan.

citations boolean default: "false"

Exclude citations from the scan.

references boolean default: "false"

Exclude referenced text from the scan.

tableOfContents boolean default: "false"

Exclude table of contents from the scan.

titles boolean default: "false"

When the scanned document is an HTML document, exclude titles from the scan.

htmlTemplate boolean default: "false"

When the scanned document is an HTML document, exclude irrelevant text that appears across the site like the website footer or header.

documentTemplateIds array default: "[]"

Exclude template text found in other documents. Provide an array of scan IDs (max 3).

pdf object

Configure and request a PDF report of the scan results.

create boolean default: "false"

Set to true to generate a PDF report for this scan.

title string

Customize the title for the PDF report (max 256 chars).

largeLogo string

A base64 encoded PNG image (max 100kb) to use as a logo in the report.

rtl boolean default: "false"

When set to true the text in the report will be aligned from right to left.

version integer default: "1" Deprecated

Use reportVersion. PDF version to generate (1, 2, or 3).

reportVersion string default: "latest"

Specifies which version of the PDF report to generate (v1, v2, v3, latest). Overrides version.

colors object

Customize colors for titles, identical matches, minor changes, etc. in HEX format.

displayLanguage string default: "en"

When specified, the PDF report will be generated in the selected language. Future updates may also apply this setting to the overview and other components.

Currently supported languages:

Code	Language
`en`	English
`es`	Spanish
`de`	German
`fr`	French
`it`	Italian
`pt`	Portuguese

sensitivityLevel integer default: "3"

You can control the level of plagiarism sensitivity that will be identified according to the speed of the scan. If you prefer a faster scan with the results that contains the highest amount of plagiarism choose 1, and if a slower, more comprehensive scan, that will also detect the smallest instances choose 5.

Optional Values:

Range between 1 (faster) to 5 (slower but more comprehensive)

cheatDetection boolean default: "false"

When set to true the submitted document will be checked for cheating. If a cheating will be detected, a scan alert will be added to the completed webhook.

aiGeneratedText object

Configure AI-generated text detection.

detect boolean default: "false"

Detects whether the text was written by an AI.

sensitivity integer default: "2"

Control the behavior of the AI detection (1-3).

explain object

enable boolean default: "false"

Enable AI Logic feature for AI detection.

sensitiveDataProtection object

driversLicense boolean default: "false"

Mask driver’s license numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

Supported Types:

Type
Australia driver’s license number
Canada driver’s license number
United Kingdom driver’s license number
USA drivers license number
Japan driver’s license number
Spain driver’s license number
Germany driver’s license number

credentials boolean default: "false"

Mask credentials from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

Supported Types:

Type
Authentication token
Amazon Web Services credentials
Azure JSON Web Token
HTTP basic authentication header
Google Cloud Platform service account credentials
Google Cloud Platform API key
JSON Web Token
Encryption key
Password

passport boolean default: "false"

Mask passports from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

Supported Types:

Type
Canada passport number
China passport number
France passport number
Germany passport number
Ireland passport number
Japan passport number
Korea passport number
Mexico passport number
Spain passport number
United Kingdom passport number
USA passport number
Netherlands passport number
Poland passport
Sweden passport number
Australia passport number
Singapore passport number
Taiwan passport number

network boolean default: "false"

Mask network identifiers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

Supported Types:

Type
IP address
Local MAC address
MAC address

url boolean default: "false"

Mask url from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

emailAddress boolean default: "false"

Mask email addresses from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

creditCard boolean default: "false"

Mask credit card numbers and credit card track numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

phoneNumber boolean default: "false"

Mask phone numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.

writingFeedback object

Configure the automated Writing Assistant.

enable boolean default: "false"

Enable automated Writing Assistant for grammar, spelling, etc.

score object

Configure the weighting of different categories in the overall writing score.

grammarScoreWeight number default: "1.0"

Grammar correction category weight. Range: 0.0 to 1.0.

mechanicsScoreWeight number default: "1.0"

Mechanics correction category weight. Range: 0.0 to 1.0.

sentenceStructureScoreWeight number default: "1.0"

Sentence structure correction category weight. Range: 0.0 to 1.0.

wordChoiceScoreWeight number default: "1.0"

Word choice correction category weight. Range: 0.0 to 1.0.

overview object

Enable Gen-AI Overview feature to extract key insights from the scan data.

enable boolean default: "false"

Enable Gen-AI Overview feature to extract key insights from the scan data.

ignoreAIDetection boolean default: "false"

Ignore AI detection when generating the scan’s overview. Only applicable if AI detection was enabled.

ignorePlagiarismDetection boolean default: "false"

Ignore plagiarism detection when generating the scan’s overview. Only applicable if plagiarism detection was enabled.

ignoreWritingFeedback boolean default: "false"

Ignore writing assistant when generating the scan’s overview. Only applicable if the writing assistant was enabled.

ignoreAuthorData boolean default: "false"

Ignore the author’s historical data when generating the scan’s overview. Only applicable if author ID added to the request.

aiSourceMatch object

The AI Source Match feature enhances plagiarism detection by identifying online sources that are suspected of containing AI-generated text. This allows you to find instances of potential plagiarism and understand if the matched source content itself might have been created by an AI.

enable boolean default: "false"

Activates or deactivates the AI Source Match functionality.

Responses

201 Created

The scan was successfully created and is now processing.

Response Schema

The response contains the following fields:

scannedDocument object

[object Object]

results object

An object containing the plagiarism and AI detection results. See Results Object for more details.

notifications object

An object containing alerts and notifications about the scan. See Notifications Object for more details.

writingFeedback object

An object containing writing feedback and suggestions. See Writing Feedback for more details.

status integer

The status of the scan. A value of `0` indicates the scan is still in progress. See Scan Status for a complete list of statuses.

developerPayload string

A developer-provided string that is attached to the scan. This can be used to store your own internal identifiers. `256 characters`

Example Response

A typical response from this endpoint:

Show full example (95 lines)

{
"scannedDocument": {
  "scanId": "scan-id32",
  "totalWords": 2,
  "totalExcluded": 0,
  "credits": 0,
  "expectedCredits": 1,
  "creationTime": "2025-08-05T07:19:08.181236Z",
  "metadata": {
    "filename": "file.txt"
  },
  "enabled": {
    "plagiarismDetection": true,
    "aiDetection": false,
    "explainableAi": false,
    "writingFeedback": false,
    "pdfReport": true,
    "cheatDetection": false,
    "aiSourceMatch": false
  },
  "detectedLanguage": "en"
},
"results": {
  "score": {
    "identicalWords": 1,
    "minorChangedWords": 0,
    "relatedMeaningWords": 0,
    "aggregatedScore": 50.0
  },
  "internet": [
    {
      "url": "http://example.com/",
      "id": "2a1b402420",
      "title": "Example Domain",
      "introduction": "Example Domain This domain is for use in illustrative examples in documents. You may use this domain in literature without...",
      "matchedWords": 1,
      "identicalWords": 1,
      "similarWords": 0,
      "paraphrasedWords": 0,
      "totalWords": 28,
      "metadata": {
        "authors": []
      },
      "tags": []
    }
  ],
  "database": [],
  "batch": [],
  "repositories": []
},
"notifications": {
  "alerts": [
    {
      "code": "suspected-ai-text",
      "title": "Potential AI-Generated Text Detected",
      "message": "We are unable to verify that the text was written by a human.",
      "severity": 4,
      "additionalData": "{"results": [{"classification": 2, "probability": 0.7307997032499992, "matches": [ {"text": {"chars": {"starts": [0], "lengths": [1453]}, "words": {"starts": [0], "lengths": [230]}}}]}], "summary": {"human": 0.0, "ai": 1.0}, "modelVersion": "v8.0", "translationProvider": 0, "explain": {"patterns": {"statistics": {"aiCount": [18.38596534729004, 1.4073526859283447, 2.917997121810913, 3.4344568252563477, 2.4015376567840576, 1.58811354637146, 2.324068546295166, 8.947664260864258, 1.6268479824066162, 3.9767396450042725, 1.3944411277770996, 4.648137092590332, 15.093534469604492, 4.389907360076904, 18.734575271606445, 1.4073526859283447, 3.3957223892211914, 4.196235179901123, 1.6397595405578613, 1.9367238283157349, 3.938005208969116, 9.980583190917969, 7.566134452819824, 85.03508758544922, 5.706879615783691, 428.94140625, 58.77622985839844, 35.015625, 47.52120590209961, 227.6015625], "humanCount": [1.1367287635803223, 0.3948078751564026, 0.5908869504928589, 0.0794915184378624, 0.0397457592189312, 0.20932766795158386, 0.2808700203895569, 1.5845309495925903, 0.041070617735385895, 0.041070617735385895, 0.45442652702331543, 0.076841801404953, 0.5604152083396912, 0.06226835772395134, 0.6399067044258118, 0.3948078751564026, 0.5352429151535034, 0.3020677864551544, 0.18283049762248993, 0.2676214575767517, 0.049019768834114075, 0.06094349920749664, 0.014573445543646812, 2.1263980865478516, 0.7286722660064697, 8.128008842468262, 5.878292083740234, 3.120574712753296, 7.039435863494873, 40.785186767578125], "proportion": [16.17445182800293, 3.5646519660949707, 4.938333988189697, 43.205326080322266, 60.4224853515625, 7.586734771728516, 8.274534225463867, 5.646884918212891, 39.610992431640625, 96.82687377929688, 3.068573474884033, 60.48969650268555, 26.932769775390625, 70.49980926513672, 29.277040481567383, 3.5646519660949707, 6.344264507293701, 13.89169979095459, 8.968741416931152, 7.2368035316467285, 80.335037231... [truncated]
      "category": 2
    }
  ]
},
"writingFeedback": {
  "textStatistics": {
    "sentenceCount": 5,
    "averageWordLength": 4.7,
    "averageSentenceLength": 12.8,
    "readingTimeSeconds": 21.0,
    "speakingTimeSeconds": 29.5
  },
  "score": {
    "grammarCorrectionsCount": 1,
    "grammarCorrectionsScore": 93,
    "grammarScoreWeight": 1.0,
    "mechanicsCorrectionsCount": 1,
    "mechanicsCorrectionsScore": 93,
    "mechanicsScoreWeight": 1.0,
    "sentenceStructureCorrectionsCount": 1,
    "sentenceStructureCorrectionsScore": 93,
    "sentenceStructureScoreWeight": 1.0,
    "wordChoiceCorrectionsCount": 0,
    "wordChoiceCorrectionsScore": 100,
    "wordChoiceScoreWeight": 1.0,
    "overallScore": 94
  },
  "readability": {
    "score": 95,
    "readabilityLevel": 1,
    "readabilityLevelText": "5th Grader",
    "readabilityLevelDescription": "Very easy to read"
  }
},
"status": 0,
"developerPayload": ""
}

{
"scannedDocument": {
  "scanId": "scan-id32",
  "totalWords": 2,
  "totalExcluded": 0,
  "credits": 0,
  "expectedCredits": 1,
  "creationTime": "2025-08-05T07:19:08.181236Z",
  "metadata": {
    "filename": "file.txt"
  },
  "enabled": {
    "plagiarismDetection": true,
    "aiDetection": false,
    "explainableAi": false,
// ... truncated

400 Bad Request

The filename field is required.

Example Response

A typical response from this endpoint:

{
  "filename": [
    "The filename field is required."
  ]
}

401 Unauthorized

Authentication failed or API key is invalid.

Example Response

A typical response from this endpoint:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.2",
  "title": "Unauthorized",
  "status": 401,
  "traceId": "00-ef0db7690ced98431ac97782051edc77-2c4194d74ae6c08b-00"
}

409 Conflict

Conflict. An export task with the same Id already exists in the system.

Example Response

A typical response from this endpoint:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.10",
  "title": "Conflict",
  "status": 409,
  "traceId": "00-561fe3b2451eb51ce489557a2f34f247-3acd41c0b4ad1295-00"
}

429 Too Many Requests

Rate limit exceeded. Please retry after the specified time.

Example Response

A typical response from this endpoint:

{
  "error": "Rate limit exceeded",
}

PUT https://api.copyleaks.com/v3/scans/submit/ocr/my-scan-123
Content-Type: application/json
Authorization: Bearer YOUR_LOGIN_TOKEN

{
  "base64": "YOUR_BASE64_HERE",
  "filename": "image.jpg",
  "langCode": "en",
  "properties": {
    "webhooks": {
      "status": "https://my-server.com/webhook/{STATUS}"
    },
    "sandbox": true
  }
}

curl --request PUT \
  --url https://api.copyleaks.com/v3/scans/submit/ocr/my-scan-123 \
  --header 'Authorization: Bearer YOUR_LOGIN_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "base64": "YOUR_BASE64_HERE",
    "filename": "image.jpg",
    "langCode": "en",
    "properties": {
      "webhooks": {
        "status": "https://my-server.com/webhook/{STATUS}"
      },
      "sandbox": true
    }
  }'