# Technical Specifications

> Copyleaks API technical specifications: page definition (250 words = 1 page), file size limits, and scanning constraints.

This page describes the technical specifications of the Copyleaks API.

## Page Definition

A page is defined as up to **250 words**. This means that every 250 words (or portion thereof) in your document counts as one page for billing purposes.

How Page Counting Works:
- 1-250 words = 1 page
- 251-500 words = 2 pages
- 501-750 words = 3 pages
- etc.

## Input Limits

### Supported Plagiarism File Types

| Type     | File Types List                                                                                                                           |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| Textual:     | `html`, `htm`, `txt`, `csv`, `rtf`, `xml`, `md`                                                                                                     |
| Non-Textual: | `pdf`, `docx`, `doc`, `pptx`, `ppt`, `odt`, `chm`, `epub`, `odp`, `ppsx`, `pages`, `xlsx`, `xls`, `LaTeX`                                     |
| Source code: | `ts`, `py`, `go`, `cs`, `c`, `h`, `idc`, `cpp`, `hpp`, `c++`, `h++`, `cc`, `hh`, `java`, `js`, `swift`, `rb`, `pl`, `php`, `sh`, `m`, `scala`, `css` |

<Note>
  You can access this list programmatically, for more info [click here](/reference/actions/miscellaneous/supported-plagiarism-file-types).
</Note>

### Supported Textual File Types

All supported plagiarism file types are also supported when submitted online by URL.

### Supported Image Types (OCR)

The supported image files are `pdf, docx, gif, png, bmp, jpg and jpeg` . The files must contain textual content. Upload only.

<Note>
  You can access this list programmatically, for more info [click here](/reference/actions/miscellaneous/ocr-supported-languages).
</Note>

### Supported Plagiarism Languages
| Setting                      | Description  |
|------------------------------|-------------|
| **Supported Languages**      | All languages supported by Unicode, including English, Spanish, French, Portuguese, Arabic, Russian, German, Greek, Chinese, Japanese, and more. [More info](https://unicode.org/standard/supported.html). |
| **Supported OCR Languages**  | See full list [here](/reference/actions/miscellaneous/ocr-supported-languages). |
| **Supported Cross Languages** | See full list [here](/reference/actions/miscellaneous/supported-cross-languages). |
| **Maximum Document Length**  | The maximum length allowed is **2000 pages** (500K words). |

### File Size

| Description                                       | Max Upload File Size |
| ------------------------------------------------- | -------------------- |
| HTML files (`html`, `htm`, ...)                   | 5 MB                 |
| Text files (`txt`, `csv`) and source-code         | 3 MB                 |
| Non-Textual Documents (`pdf`, `doc`, `docx`, ...) | 50 MB                |
| Image Types (`jpg`, `png`, `bmp`, ...)            | 25 MB                |

## Rate Limit

An account by default has a rate of 10 requests per second. If you still need higher rates, feel free to [contact us](https://help.copyleaks.com/s/contactsupport).

<Warning>
  Rate Limit Exceeded, If your host has reached its API limit, you will receive the HTTP error 429 (Too Many Requests) and you will be unable to authenticate with the Copyleaks API for 5 minutes.
</Warning>

## Maintenance Periods

When our servers are under maintenance you will receive a `503` HTTP status code. Please wait a full minute and try again.

For more information about the service status -[ Copyleaks System Status](https://status.copyleaks.com).

## Time

| Setting                     | Value               |
|-----------------------------|---------------------|
| **Time Format**             | `dd/MM/yyyy HH:mm:ss` |
| **Time Zone**               | UTC                 |
| **Default HTTP Request Timeout** | 110 seconds        |

## Scan Expiration

Your created scans using the [/v3/submit](/reference/actions/authenticity/submit-file) endpoints will be stored in Copyleaks servers for a specific duration of time. You can control the expiration of your scans in your submit request. Make sure you save your data before it expires:

| Type               | hours |
| ------------------ | ----- |
| Max Expiration     | 2880  |
| Default Expiration | 2880  |

## Frequently asked questions

### How does Copyleaks count pages for billing?

A page is defined as up to 250 words. Every 250 words (or portion thereof) counts as one page, so 1-250 words is 1 page, 251-500 words is 2 pages, and so on.

### What is the maximum file size I can submit?

It depends on the file type: 50 MB for non-textual documents (PDF, DOC, DOCX), 25 MB for images submitted to OCR, 5 MB for HTML files, and 3 MB for text and source-code files.

### What is the maximum document length?

2000 pages, which is approximately 500,000 words.

### What is the Copyleaks API rate limit?

10 requests per second by default. Exceeding it returns HTTP 429 (Too Many Requests) and blocks authentication for 5 minutes. Contact Copyleaks if you need a higher rate.

### How long are scans stored before they expire?

Scans are stored for 2880 hours (120 days) by default, which is also the maximum. You can set a shorter expiration in the submit request, so save your results before they expire.
