* Add demo gifs
* Add gifs to documentation
This commit is contained in:
Jianjie Liu 2021-07-29 18:39:26 -04:00 коммит произвёл GitHub
Родитель adf3f3c457
Коммит 530c1569df
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
4 изменённых файлов: 45 добавлений и 0 удалений

Просмотреть файл

@ -4,6 +4,8 @@
Genalog is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.
![demo-gif](docs/genalog_docs/static/genalog_demo.gif)
Overview
-------------------------------------
Genalog has various capabilities:

Просмотреть файл

@ -11,6 +11,11 @@ pip install genalog
`genalog` is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you can create in simple HTML format.
```{figure} static/genalog_demo.gif
:width: 80%
Generate documents and apply degradations
```
`genalog` provides several document templates as a start. You can alter the document layout using standard CSS properties like `font-family`, `font-size`, `text-align`, etc. Here are some of the example generated documents:
````{tab} Multi-Column

Двоичные данные
docs/genalog_docs/static/genalog_demo.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 5.2 MiB

38
example/demo_generate.py Normal file
Просмотреть файл

@ -0,0 +1,38 @@
#%%
from genalog.pipeline import AnalogDocumentGeneration
from genalog.degradation.degrader import ImageState
sample_text = "sample/generation/example.txt"
# Common CSS properties
STYLE_COMBINATIONS = {
"font_family" : ["Times"], # sans-serif, Times, monospace, etc
"font_size" : ["12px"],
"text_align" : ["justify"], # left, right, center, justify
"language" : ["en_US"], # controls how words are hyphenated
"hyphenate" : [True],
}
# <columns|letter|text_block>.html.jinja
HTML_TEMPLATE = "columns.html.jinja"
# Degration effects applied in sequence
DEGRADATIONS = [
("blur", {"radius": 5}), # needs to be an odd number
("bleed_through", {
"src": ImageState.CURRENT_STATE, "background": ImageState.ORIGINAL_STATE,
"alpha": 0.8,
"offset_y": 9, "offset_x": 12
}),
("morphology", {"operation": "open", "kernel_shape":(5,5)}),
("pepper", {"amount": 0.05}),
("salt", {"amount": 0.2}),
]
doc_generation = AnalogDocumentGeneration(styles=STYLE_COMBINATIONS, degradations=DEGRADATIONS)
img_array = doc_generation.generate_img(sample_text, HTML_TEMPLATE, target_folder=None)
import cv2
from IPython.core.display import Image, display
_, encoded_image = cv2.imencode('.png', img_array)
display(Image(data=encoded_image, width=600))