OCR your documents before index
Перейти к файлу
Andy Scherzinger 8d22f0582f
Merge pull request #64 from nextcloud/misc/pr-feedback
pr-feedback
2023-10-06 16:49:33 +02:00
.github/workflows pr-feedback 2023-10-06 13:08:48 -01:00
appinfo compat nc27 2023-06-13 09:32:36 -01:00
js limit the number of pages to be ocr on pdf 2020-03-12 12:57:48 -01:00
lib compat nc27 2023-06-13 09:32:36 -01:00
templates limit the number of pages to be ocr on pdf 2020-03-12 12:57:48 -01:00
.gitignore first commit 2018-05-25 08:31:07 -01:00
.scrutinizer.yml first commit 2018-05-25 08:31:07 -01:00
CHANGELOG.md fixing master 2022-05-04 15:21:20 -01:00
LICENSE first commit 2018-05-25 08:31:07 -01:00
Makefile compat nc27 2023-06-13 09:32:36 -01:00
README.md 1.4.1 2020-03-12 13:09:47 -01:00
composer.json compat nc27 2023-06-13 09:32:36 -01:00
composer.lock compat nc27 2023-06-13 09:32:36 -01:00

README.md

files_fulltextsearch_tesseract

OCR your documents before index

Installation / Setup

  • install Tesseract

  • download language files from: https://github.com/tesseract-ocr/tessdata

  • copy language files into /usr/share/tessdata/ (or /usr/share/tesseract-ocr/tessdata/, depends on our distribution)

  • configure this app in the Full text search Admin panel

  • report bugs

more

devblog about PDF and OCR: https://daita.github.io/files-fulltextsearch-tesseract-ocr-pdf/