We'll open a binary feature file and have a look inside.
ROOT = '/Users/dirk/laf/laf-fabric-data/etcbc4b/bin'
FILE = 'Fn0(etcbc4,ft,lex)'
The file is a gzipped file with a serialized Python datastructure in it. The Python way of serializing is called pickling, and the result is a pickled data structure. It can serve many of the purposes that json serves, except that pickled data is binary, so it is not that transparent. Unlike json, pickle has space optimizations.
import pickle, gzip
with gzip.open('{}/{}'.format(ROOT, FILE), "rb") as f:
data = pickle.load(f)
There we are, we have a big chunk of data in the variable data
now.
print('type: {}'.format(type(data)))
type: <class 'dict'>
So it is a dictionary. Let's examine the keys.
print('{} keys'.format(len(data)))
426568 keys
That's a familiar number, the number of monads (words) in the Hebrew bible. Let's show the first 20 keys and their values.
print('\n'.join('{:>6}: "{}"'.format(*x) for x in sorted(data.items())[0:20]))
0: "B" 1: "R>CJT/" 2: "BR>[" 3: ">LHJM/" 4: ">T" 5: "H" 6: "CMJM/" 7: "W" 8: ">T" 9: "H" 10: ">RY/" 11: "W" 12: "H" 13: ">RY/" 14: "HJH[" 15: "THW/" 16: "W" 17: "BHW/" 18: "W" 19: "XCK/"
Now without the numbers:
print('\n'.join(x[1] for x in sorted(data.items())[0:20]))
B R>CJT/ BR>[ >LHJM/ >T H CMJM/ W >T H >RY/ W H >RY/ HJH[ THW/ W BHW/ W XCK/
print('\n'.join(x[1] for x in sorted(data.items())[200000:200020]))
BN/ KL/ H JWM/ B JWM/ PC<[ >DWM/ MN TXT/ JD/ JHWDH/ W MLK[ <L MLK/ W <BR[ JWRM/ Y<JR=/