Skip to content
Features

Excluding and Preventing Indexing of Content

You have granular control over what content is scanned and what data is stored when you submit a document to Copyleaks. This guide covers two distinct types of exclusion:

  1. Excluding Parts of a Document from Scan Analysis: This allows you to refine the plagiarism scan by ignoring specific elements like quotes or code blocks.
  2. Preventing a Document from Being Indexed: This allows you to control whether the entire document is added to the Copyleaks Internal Database for future comparisons.

The exclude object can contain the following boolean properties:

  • quotes: If set to true, all text within quotation marks will be ignored.
  • citations: If set to true, citations and references will be ignored.
  • references: If set to true, the bibliography or reference list will be ignored.
  • tableOfContents: If set to true, the table of contents will be ignored.
  • titles: If set to true, titles and headings will be ignored.
  • code: If set to true, code blocks will be ignored.
  • documentTemplateIds: An array of unique identifiers for predefined templates stored in your Private Cloud Hub or the Shared Data Hub. These templates’ content is then excluded from the document and won’t count towards plagiarism or AI analysis.

The exclude property is a powerful feature with two primary uses:

  1. Active Scans: When submitting a document for scanning, you can include the exclude object in the request payload to specify which parts of the document should be ignored during analysis, e.g., you can exclude quotes, citations, or code blocks to focus on the most relevant content.

  2. Exclude Template: The exclude template allows you to refine the analysis of documents by excluding specific sections based on a predefined template. The template document could be located in either your Private Cloud Hub or the Shared Data Hub, e.g., excluding exam questions from a student’s filled out exam.

  1. Before you start, ensure you have the following:

  2. Choose your preferred method for making API calls.

    You can interact with the API using any standard HTTP client.

    For a quicker setup, we provide a Postman collection. See our Postman guide for instructions.

  3. To perform a scan, we first need to generate an access token. For that, we will use the login endpoint. The API key can be found on the Copyleaks API Dashboard.

    Upon successful authentication, you will receive a token that must be attached to subsequent API calls via the Authorization: Bearer <TOKEN> header. This token remains valid for 48 hours.

    POST https://id.copyleaks.com/v3/account/login/api
    Headers
    Content-Type: application/json
    Body
    {
    "email": "[email protected]",
    "key": "00000000-0000-0000-0000-000000000000"
    }

    Response

    {
    "access_token": "<ACCESS_TOKEN>",
    ".issued": "2025-07-31T10:19:40.0690015Z",
    ".expires": "2025-08-02T10:19:40.0690016Z"
    }
  4. Use the following example to index a document as a template in your Private Cloud Hub.

    PUT https://api.copyleaks.com/v3/scans/submit/file/my-template-index
    Content-Type: application/json
    Authorization: Bearer YOUR_LOGIN_TOKEN
    {
    "base64": "VGhpcyBpcyBhIHRlc3QgZG9jdW1lbnQu",
    "filename": "student_solved_exam",
    "properties": {
    "indexing": {
    "action": 2,
    "repositories": ["my_private_cloud_exam_template"]
    },
    "sandbox": true
    }
    }
  5. Include the exclude object in the request payload to specify which parts of the document should be ignored during analysis.

    PUT https://api.copyleaks.com/v3/scans/submit/file/my-scan-with-template
    Content-Type: application/json
    Authorization: Bearer YOUR_LOGIN_TOKEN
    {
    "base64": "VGhpcyBpcyBhIHRlc3QgZG9jdW1lbnQu",
    "filename": "document-to-scan.txt",
    "properties": {
    "exclude": {
    "documentTemplateIds": ["my-template-index"],
    "quotes": true,
    "citations": true,
    "references": true,
    "tableOfContents": true,
    "titles": true,
    "htmlTemplate": true,
    "code": {
    "comments": true
    }
    },
    "sandbox": true
    }
    }

When the scan is processed, the scannedDocument object in the response will reflect the number of words that were excluded.

201 Created

The scan was successfully created and is now processing. The excluded word count is reflected in the response.

Example Response

A typical response from this endpoint:

Show full example (39 lines)
{
"scannedDocument": {
"scanId": "my-scan-exclude-example",
"totalWords": 8,
"totalExcluded": 4,
"credits": 0,
"expectedCredits": 1,
"creationTime": "2025-08-10T10:00:00.000000Z",
"metadata": {
"filename": "document-with-exclusions.txt"
},
"enabled": {
"plagiarismDetection": true,
"aiDetection": false,
"explainableAi": false,
// ... truncated