Export a TF dataset as Pandas
from tf.app import use
A = use("CLARIAH/wp6-ferdinandhuyck:clone", checkout="clone", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
text | 1 | 218025.00 | 100 |
body | 1 | 218018.00 | 100 |
div | 42 | 5190.90 | 100 |
chapter | 44 | 4963.18 | 100 |
fileDesc | 1 | 299.00 | 0 |
editionStmt | 1 | 268.00 | 0 |
p | 3725 | 58.04 | 99 |
chunk | 3833 | 56.93 | 100 |
lg | 41 | 23.34 | 0 |
ebook | 1 | 21.00 | 0 |
pod | 1 | 21.00 | 0 |
note | 9 | 20.89 | 0 |
sourceDesc | 1 | 16.00 | 0 |
bibl | 2 | 13.00 | 0 |
revisionDesc | 1 | 12.00 | 0 |
q | 27 | 9.04 | 0 |
head | 86 | 8.40 | 0 |
titleStmt | 1 | 8.00 | 0 |
l | 122 | 7.80 | 0 |
interpGrp | 1 | 7.00 | 0 |
change | 2 | 6.00 | 0 |
publicationStmt | 1 | 5.00 | 0 |
title | 3 | 5.00 | 0 |
item | 2 | 4.00 | 0 |
hi | 602 | 3.50 | 1 |
author | 3 | 3.00 | 0 |
imprint | 2 | 3.00 | 0 |
encodingDesc | 1 | 2.00 | 0 |
notesStmt | 1 | 2.00 | 0 |
order | 2 | 2.00 | 0 |
availability | 3 | 1.67 | 0 |
name | 268 | 1.21 | 0 |
idno | 9 | 1.11 | 0 |
blurb | 2 | 1.00 | 0 |
colofon | 2 | 1.00 | 0 |
date | 4 | 1.00 | 0 |
figure | 5 | 1.00 | 0 |
interp | 7 | 1.00 | 0 |
price | 2 | 1.00 | 0 |
pubPlace | 2 | 1.00 | 0 |
publisher | 2 | 1.00 | 0 |
respStmt | 2 | 1.00 | 0 |
titlepage | 2 | 1.00 | 0 |
xptr | 5 | 1.00 | 0 |
word | 218380 | 1.00 | 100 |
c1 = F.otype.s("chunk")[100]
c1
218538
A.plain(c1)
A.exportPandas(inTypes="")
0.00s Create tsv file ... | 0.17s 5% 11363 nodes written | 0.33s 10% 22726 nodes written | 0.49s 15% 34089 nodes written | 0.65s 20% 45452 nodes written | 0.81s 25% 56815 nodes written | 0.97s 30% 68178 nodes written | 1.13s 35% 79541 nodes written | 1.29s 40% 90904 nodes written | 1.45s 45% 102267 nodes written | 1.61s 50% 113630 nodes written | 1.77s 55% 124993 nodes written | 1.93s 60% 136356 nodes written | 2.09s 65% 147719 nodes written | 2.25s 70% 159082 nodes written | 2.41s 75% 170445 nodes written | 2.57s 80% 181808 nodes written | 2.73s 85% 193171 nodes written | 2.89s 90% 204534 nodes written | 3.05s 95% 215897 nodes written | 3.22s 95% 227255 nodes written and done 3.22s TSV file is ~/github/CLARIAH/wp6-ferdinandhuyck/_temp/data-0.1.tsv 3.22s Columns 32: 3.22s nd 3.22s otype 3.22s after 3.22s str 3.22s in_chapter 3.22s in_chunk 3.22s chapter 3.22s chunk 3.23s curr 3.23s empty 3.23s empty_lb 3.23s empty_link 3.23s empty_pb 3.23s empty_pb_n 3.23s is_meta 3.23s is_note 3.23s n 3.23s place 3.23s rend 3.23s rend_1tab 3.23s rend_b 3.23s rend_bq 3.23s rend_h2 3.23s rend_h3 3.23s rend_h4 3.23s rend_i 3.23s rend_sc 3.23s rend_spat 3.23s rend_sup 3.23s to 3.23s type 3.23s value 3.29s 227256 rows 3.29s 13520413 characters 3.29s Importing into Pandas ... | 0.00s Reading tsv file ... | 0.96s Done. Size = 7272160 | 0.96s Saving as Parquet file ... | 1.12s Saved 4.41s PD in ~/github/CLARIAH/wp6-ferdinandhuyck/pandas/data-0.1.pd