How to read this notebook:
jupyter lab
.
Also install the table of contents extension in Jupyter Lab, since this is a lengthy notebook
You can run the code cells now.Import the fusus package. If you do not have it yet, see get fusus.
from fusus.book import Book
B = Book(cd="~/github/among/fusus/example")
# cd to the book directory
!cd `pwd`
B.availablePages()
1.19s 18 pages: 47-48,58-59,63,67,101-102,111-113,121-122,131-132,200,300,400
lastPage = B.process(pages=47, doOcr=False)
0.00s Batch of 1 pages: 47 0.00s Start batch processing images | 0.87s 1 047.tif 0.88s all done
lastPage = B.process(pages="47,58", doOcr=False)
0.00s Batch of 2 pages: 47,58 0.00s Start batch processing images | 0.79s 1 047.tif | 0.80s 2 058.tif 1.59s all done
lastPage = B.process(pages="47,58,122-250", doOcr=False)
0.00s Batch of 6 pages: 47,58,122,131-132,200 0.00s Start batch processing images | 0.79s 1 047.tif | 0.72s 2 058.tif | 1.08s 3 122.jpg | 1.56s 4 131.jpg | 1.09s 5 132.jpg | 3.20s 6 200.tif 8.44s all done
All pages, but no OCR:
lastPage = B.process(doOcr=False)
0.00s Batch of 18 pages: 47-48,58-59,63,67,101-102,111-113,121-122,131-132,200,300,400 0.00s Start batch processing images | 1.19s 1 047.tif | 1.24s 2 048.tif | 1.23s 3 058.tif | 1.24s 4 059.tif | 1.26s 5 063.tif | 1.29s 6 067.tif | 0.99s 7 101.jpg | 1.11s 8 102.jpg | 1.54s 9 111.jpg | 1.60s 10 112.jpg | 1.52s 11 113.jpg | 1.44s 12 121.jpg | 1.47s 13 122.jpg | 2.04s 14 131.jpg | 1.89s 15 132.jpg | 6.15s 16 200.tif | 5.44s 17 300.tif | 6.34s 18 400.tif 39s all done
Finally, all pages, the complete pipeline.
Note that the Kraken model for Arabic printed text is loaded on demand and then kept in memory.
lastPage = B.process()
0.00s Batch of 18 pages: 47-48,58-59,63,67,101-102,111-113,121-122,131-132,200,300,400 0.00s Start batch processing images | 0.96s Loading for Kraken: ~/github/among/fusus/model/arabic_generalized.mlmodel | 8.43s model loaded | 11s 1 047.tif | 2.80s 2 048.tif | 2.69s 3 058.tif | 3.71s 4 059.tif | 3.70s 5 063.tif | 3.58s 6 067.tif | 4.51s 7 101.jpg | 5.32s 8 102.jpg | 8.14s 9 111.jpg | 7.75s 10 112.jpg | 7.16s 11 113.jpg | 6.61s 12 121.jpg | 6.18s 13 122.jpg | 9.09s 14 131.jpg | 7.42s 15 132.jpg | 26s 16 200.tif | 16s 17 300.tif | 19s 18 400.tif 2m 30s all done