зеркало из https://github.com/microsoft/genalog.git
Laserprec/add-demo-gif (#36)
* Add demo gifs * Add gifs to documentation
This commit is contained in:
Родитель
adf3f3c457
Коммит
530c1569df
|
@ -4,6 +4,8 @@
|
|||
|
||||
Genalog is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you create in simple HTML format.
|
||||
|
||||
![demo-gif](docs/genalog_docs/static/genalog_demo.gif)
|
||||
|
||||
Overview
|
||||
-------------------------------------
|
||||
Genalog has various capabilities:
|
||||
|
|
|
@ -11,6 +11,11 @@ pip install genalog
|
|||
|
||||
`genalog` is an open source, cross-platform python package for **gen**erating document images with synthetic noise that mimics scanned an**alog** documents (thus the name `genalog`). You can also add various text degradations to these images. The purpose of this tool is to provide a fast and efficient way to generate synthetic documents from text data by leveraging layout from templates that you can create in simple HTML format.
|
||||
|
||||
```{figure} static/genalog_demo.gif
|
||||
:width: 80%
|
||||
Generate documents and apply degradations
|
||||
```
|
||||
|
||||
`genalog` provides several document templates as a start. You can alter the document layout using standard CSS properties like `font-family`, `font-size`, `text-align`, etc. Here are some of the example generated documents:
|
||||
|
||||
````{tab} Multi-Column
|
||||
|
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 5.2 MiB |
|
@ -0,0 +1,38 @@
|
|||
#%%
|
||||
from genalog.pipeline import AnalogDocumentGeneration
|
||||
from genalog.degradation.degrader import ImageState
|
||||
|
||||
sample_text = "sample/generation/example.txt"
|
||||
|
||||
# Common CSS properties
|
||||
STYLE_COMBINATIONS = {
|
||||
"font_family" : ["Times"], # sans-serif, Times, monospace, etc
|
||||
"font_size" : ["12px"],
|
||||
"text_align" : ["justify"], # left, right, center, justify
|
||||
"language" : ["en_US"], # controls how words are hyphenated
|
||||
"hyphenate" : [True],
|
||||
}
|
||||
# <columns|letter|text_block>.html.jinja
|
||||
HTML_TEMPLATE = "columns.html.jinja"
|
||||
# Degration effects applied in sequence
|
||||
DEGRADATIONS = [
|
||||
("blur", {"radius": 5}), # needs to be an odd number
|
||||
("bleed_through", {
|
||||
"src": ImageState.CURRENT_STATE, "background": ImageState.ORIGINAL_STATE,
|
||||
"alpha": 0.8,
|
||||
"offset_y": 9, "offset_x": 12
|
||||
}),
|
||||
("morphology", {"operation": "open", "kernel_shape":(5,5)}),
|
||||
("pepper", {"amount": 0.05}),
|
||||
("salt", {"amount": 0.2}),
|
||||
]
|
||||
|
||||
doc_generation = AnalogDocumentGeneration(styles=STYLE_COMBINATIONS, degradations=DEGRADATIONS)
|
||||
img_array = doc_generation.generate_img(sample_text, HTML_TEMPLATE, target_folder=None)
|
||||
|
||||
import cv2
|
||||
from IPython.core.display import Image, display
|
||||
|
||||
_, encoded_image = cv2.imencode('.png', img_array)
|
||||
display(Image(data=encoded_image, width=600))
|
||||
|
Загрузка…
Ссылка в новой задаче