OCR Supported Languages

Q: How do I get the current list of OCR languages?

Call GET https://api.copyleaks.com/v3/miscellaneous/ocr-languages-list . It is a public endpoint that needs no authentication. Copyleaks keeps adding languages, so load the list at runtime instead of hardcoding it.

Q: What language code format does OCR use?

ISO-639-1 codes (for example en , fr , ar ), with zh-CN for Simplified Chinese and zh-TW for Traditional Chinese.

curl --request GET \
  --url https://api.copyleaks.com/v3/miscellaneous/ocr-languages-list

["af", "sq", "az", "...", "zu"]

Get a list of the supported languages for OCR

This is not a list of supported languages for the API, but only for the OCR files scan

Response

200 OK - The supported language codes in ISO-639-1 standard.

["af", "sq", "az", "...", "zu"]

These are the language codes supported by our OCR scan in ISO-639-1 standard:

We keep updating the list with new languages so we recommend loading the list in runtime rather than copying it to your code.

Code	Language	Code	Language
af	Afrikaans	am	Amharic
ar	Arabic	az	Azerbaijani
be	Belarusian	bg	Bulgarian
bn	Bengali	bs	Bosnian
ca	Catalan	ceb	Cebuano
co	Corsican	cs	Czech
cy	Welsh	da	Danish
de	German	el	Greek
en	English	eo	Esperanto
es	Spanish	et	Estonian
eu	Basque	fa	Persian
fi	Finnish	fr	French
fy	Frisian	ga	Irish
gd	Scottish Gaelic	gl	Galician
gu	Gujarati	ha	Hausa
haw	Hawaiian	hi	Hindi
hmn	Hmong	hr	Croatian
ht	Haitian Creole	hu	Hungarian
hy	Armenian	id	Indonesian
ig	Igbo	is	Icelandic
it	Italian	iw	Hebrew
ja	Japanese	jw	Javanese
ka	Georgian	kk	Kazakh
km	Khmer	kn	Kannada
ko	Korean	ku	Kurdish
ky	Kyrgyz	la	Latin
lb	Luxembourgish	lo	Lao
lt	Lithuanian	lv	Latvian
ma	Marathi	mg	Malagasy
mi	Maori	mk	Macedonian
ml	Malayalam	mn	Mongolian
mr	Marathi	ms	Malay
mt	Maltese	my	Burmese
ne	Nepali	nl	Dutch
no	Norwegian	ny	Chichewa
pl	Polish	ps	Pashto
pt	Portuguese	ro	Romanian
ru	Russian	sd	Sindhi
si	Sinhala	sk	Slovak
sl	Slovenian	sm	Samoan
sn	Shona	so	Somali
sq	Albanian	sr	Serbian
st	Sesotho	su	Sundanese
sv	Swedish	sw	Swahili
ta	Tamil	te	Telugu
tg	Tajik	th	Thai
tl	Tagalog	tr	Turkish
uk	Ukrainian	ur	Urdu
uz	Uzbek	vi	Vietnamese
xh	Xhosa	yi	Yiddish
yo	Yoruba	zh-CN	Chinese (Simplified)
zh-TW	Chinese (Traditional)	zu	Zulu

Frequently asked questions

What are OCR supported languages used for?

They apply only to OCR scans, where Copyleaks extracts text from images and scanned documents. This is not the general language list for plagiarism or AI detection.

How do I get the current list of OCR languages?

Call GET https://api.copyleaks.com/v3/miscellaneous/ocr-languages-list. It is a public endpoint that needs no authentication. Copyleaks keeps adding languages, so load the list at runtime instead of hardcoding it.

What language code format does OCR use?

ISO-639-1 codes (for example en, fr, ar), with zh-CN for Simplified Chinese and zh-TW for Traditional Chinese.

Does OCR support non-Latin scripts like Arabic, Chinese, and Hindi?

Yes. The OCR engine supports 100+ languages, including Arabic (ar), Chinese (zh-CN, zh-TW), Hindi (hi), Japanese (ja), Korean (ko), and many more.

​Response