github: https://github.com/ponyatov/pycat
Jupyter notebook render: https://nbviewer.jupyter.org/github/ponyatov/pycat/blob/master/pycat.ipynb
This manual is about learning Picat programming language by reimplementing it yourself from scratch.
This method is not for a total newbie in programming as you must know Python a bit to start, and have some entry-level programming skills. The Python was selected as it is the most popular programming language now for programming learning, and because this language implementation is so dynamic, as you can implement your own language system atop of PVM runtime. The advantage of learning by reimplementing is you are able not only embed this scripting engine to any software system you are doing but also deeply understand the original Picat.
But it is not the end as this Picat implementation uses not pure Python but the metaL
: homoiconic hypergraph model written itself in Python -- executable data structure (EDS) built from some sort of Marvin Minsky frames [minsky]. It was mostly selected to make this language to be not visual but visualizable: it is important for the newbies that a learning programming system to be Smalltalk-like interactive and deeply researchable.
Copyright (c) 2019 Dmitry Ponyatov <dponyatov@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
~$ git clone -o gh https://github.com/ponyatov/pycat ; cd pycat
~/pycat$ make install
~/pycat$ .bin/activate
(venv)$ jupyter notebook pycat.ipynb &
[minsky] A Framework for Representing Knowledge (c) Marvin Minsky, 1974
Programming Python, 4th Edition (c) Mark Lutz, O'Reilly 2019
Picat code samples (c) Claudio Cesar de Sá
This Jupyter notebook is packed with graphviz
binding to show data structures we'll make later. To make it work locally, or via Binder from the master
branch on the GitHub your host system must be preinstalled with some APT packages:
! cat apt.txt
graphviz
! grep graphviz requirements.txt
graphviz==0.13.2
import graphviz
viz = graphviz.Digraph(graph_attr={'rankdir':'LR'})
viz.edge('Hello','World',label='pycat')
viz
The hypothesis: directed hypergraph is a universal knowledge representation that can describe anything including software specifications, programs, data and documentation in the same universal form. As we use the same structure for program and data representation, and this structure can be executed by interpretation, we have a homoiconic metaprogramming system.
The Frame
is a root object class that is implemented on ideas of a [minsky] book. The original frame model was extended to support ordered storage is definitely required for representing
The presence of an ordered collection is definitively required for representing program source code in any programming language, as this is very close to classical attribute grammar, parsing and abstract syntax trees, and graphs used in compiler design. The frame (object) hypergraph representation of a program as an executable form is clear and native for any work involved with source code transformations: synthesis, modifications, analysis, cross-language translation, etc.
In software design in a case when our programming language or computing platform does not provide some features, we can add a layer of interpretation for some data structure to get more dynamics or extra features such as metaprogramming, program self-modification, compact low memory bytecode, etc. For example, in the Lisp language, all programs are represented in the form of executable lists. Java uses bytecode as the main program form, which does not require, but often uses JIT for runtime compilation as a way to make the program run faster.
The metaL
homoiconic layer uses attributed hypergraph as an executable data structure:
nest[]
and named attributes in slot{}
s the way close to Prolog predicatesHomoiconicity (homoiconic) \
is a property of some programming languages in which data and program representation is the same and defined in core types of the same language. In other words, homoiconicity is when the program source code is written as the reflectable (and mutable) data structure and the programming language provides transparent access to the program as data in runtime.
In homoiconic language, metaprogramming is the main software development technique, which is also used to expand the language to the capabilities that a particular programmer needs. As the metaprogramming language, the Lisp language is the first sample, which was created to process data presented in the form of nested lists. Lisp programs are also recorded, stored, and executed as lists; as a result, it makes the running program able to access its own source code as a native data structure, as well as automatically change itself on the fly. Homoiconic languages, as a rule, include full support for syntactic macros, allowing the programmer to define new syntactic structures and express transformations of programs in a compact form.
viz = graphviz.Digraph(graph_attr={'rankdir':'LR'}) ; viz.node('Frame') ; viz
## Marvin Minsky's extended frame model
class Frame:
def __init__(self,V):
# type/class tag /required for lexer using PLY library/
self.type = self.__class__.__name__.lower()
# scalar value: frame name, string, number,..
self.val = V
# slots = attributes = vocabulary
self.slot = {}
# nested AST = universal ordered container = stack
self.nest = []
Frame('Hello World')
<__main__.Frame at 0x7fa3a4070048>
We'll use tree dump as an easy to use method to see any frame graph in a text form.
class Frame(Frame):
## dump
# print/str conversion
def __repr__(self):
return self.dump()
# full tree dump
def dump(self,depth=0,prefix=''):
# subtree header: tab padding and <T:V> header
tree = self._pad(depth) + self.head(prefix)
# block infinite recursion if we have cycles in a graph
if not depth: Frame._dump = [] # recursion root -- zero depth
if self in Frame._dump: return tree + ' _/'
else: Frame._dump.append(self)
# slot{}s: recursive traversal via hypergraph references
for i in self.slot:
tree += self.slot[i].dump(depth+1,'%s = '%i)
# nest[]ed: recursive traversal via ordered subgraphs
idx = 0
for j in self.nest:
tree += j.dump(depth+1,'%i: '%idx) ; idx +=1
# resulting subtree dump
return tree
# short <T:V> header-only dump
def head(self,prefix=''):
return '%s<%s:%s> id:%x' % (prefix,self.type,self._val(),id(self))
# tree dump padding
def _pad(self,depth):
return '\n' + '\t' * depth
# .val in dump must be overridable for strings, numbers,..
def _val(self):
return '%s' % self.val
print(Frame('Hello World').dump(prefix='prefix = '))
prefix = <frame:Hello World> id:7fa3a4070a58
<T:V>
header contains
T
type/class tagV
scalar valueprefix
used for slot names printing in tree dumpidx
counter shows index of every element in the universal ordered container mostly used as a stackid:hex
a pointer is unique for every object in a system lets to distinct objects with the same T:V
pairAs we pretend to make not visual but visualizable programming language, we must have a graph plotting tool to see any part of an EDS program graph as a diagram.
class Frame(Frame):
## plot
# recursive traversal with graphviz calls
def plot(self,dot=None,parent=None,link='',color='black'):
if not dot: # plotter must be initialized in a recursion root
dot = graphviz.Digraph(graph_attr={'rankdir':'LR'}) # horizontal layout
Frame._plot = [] # skip visited branches
# plot node and optional edge from parent / like header in .dump() /
dot.node('%s'%id(self),label='%s:%s'%(self.type,self._val())) # ref via id()
if parent: dot.edge('%s'%id(parent),'%s'%id(self),label='%s'%link,color=color)
# rings/infinite recursion block
if self in Frame._plot: return dot
else: Frame._plot.append(self)
# slot{}s with blue edges and slot names
for i in self.slot:
dot = self.slot[i].plot(dot,parent=self,link=i,color='blue')
# nest[]ed with green edges and integer indexes
idx = 0
for j in self.nest:
dot = j.plot(dot,parent=self,link=idx,color='green') ; idx += 1
# return resulting subtree plot
return dot
Frame('Hello World').plot()
To manipulate frames in a pythonic way we need to define some used operators to do frame combinations:
class Frame(Frame):
## operators
# automatic type cast
def cast(self,that):
if isinstance(that,Frame): return that
return {
str : lambda x:Str(x),
float: lambda x:Num(x),
int : lambda x:Int(x)
}[that.__class__](that)
# ` A[key] ` get frame by slot name (vocabulary lookup/fetch)
def __getitem__(self,key):
return self.slot[key]
# ` A[key] = B ` set/create slot with name and frame (vocabulary write)
def __setitem__(self,key,that):
self.slot[key] = self.cast(that) ; return self
# ` A << B ` set slot with operand type A[B.type] = B
def __lshift__(self,that):
that = self.cast(that) ; self[that.type] = that ; return self
# ` A << B ` set slot with operand value A[B.val] = B
def __rshift__(self,that):
that = self.cast(that) ; self[that.val] = that ; return self
# ` A // B ` push to nest[]ed
def __floordiv__(self,that):
self.nest.append(self.cast(that)) ; return self
viz.edge('Frame','Prim')
viz.edge('Prim','Sym')
viz.edge('Prim','Str')
viz.edge('Prim','Num')
viz.edge('Num','Int')
viz.edge('Int','Hex')
viz.edge('Int','Bin')
viz
class Prim(Frame): pass
class Sym(Prim): pass
class Str(Prim): pass
class Num(Prim): pass
class Int(Num): pass
class Hex(Int): pass
class Bin(Int): pass
hello = Frame('Hello') // Frame('World')
hello['some'] = Frame('attribute') << Frame('slot')
hello['self'] = hello // hello # self ring
hello['string'] = 'typecasted' ; hello // 'string'
print(hello)
hello.plot()
<frame:Hello> id:7fa3a407f048 self = <frame:Hello> id:7fa3a407f048 _/ string = <str:typecasted> id:7fa3a407f208 some = <frame:attribute> id:7fa3a4087fd0 frame = <frame:slot> id:7fa3a407f198 0: <frame:World> id:7fa3a407f0b8 1: <frame:Hello> id:7fa3a407f048 _/ 2: <str:string> id:7fa3a407f128
picat = graphviz.Digraph(graph_attr={'rankdir':'LR'})
picat.edge('Frame','Term')
picat.edge('Term','Var') ; picat.edge('Term','Atomic') ; picat.edge('Atomic','Atom')
picat.edge('Atomic','Number') ; picat.edge('Number','Integer') ; picat.edge('Number','Real')
picat.edge('Term','Compound') ; picat.edge('Compound','List') ; picat.edge('List','String')
picat.edge('Compound','Struct') ; picat.edge('Struct','Array') ; picat.edge('Struct','Map')
picat
Picat class tree is isolated from metaL
class tree and have some aliases in atoms. It is done to follow Pical model as close as possible, and then map resulting structures to metaL
classes the way like cross-model translation.
class Term(Frame): pass
class Var(Term): pass
class Atomic(Term): pass
class Atom(Atomic): pass
class Number(Atomic): pass
class Integer(Number): pass
class Real(Number): pass
class Compound(Term): pass
class List(Compound): pass
class String(List): pass
class Struct(Compound): pass
class Array(Struct): pass
class Map(Struct): pass
## Pycat syntax parser
__file__ = 'pycat.ipynb' # required to run PLY in Jupyter
import ply.lex as lex
import ply.yacc as yacc
# list of token types can be detected by the lexer
# the token can be any object (Frame-inherited in our case) but
# it must have set of required fields including .type
tokens = ['var']
t_ignore = ' \t\r'
t_ignore_comment = '%.*'
# end of lines (LF, line feed) must be counted obviously
def t_lf(t):
r'\n+'
t.lexer.lineno += len(t.value)
def t_var(t):
r'[A-Z][A-Za-z0-9_]*'
return Var(t.value)
def t_error(t):
raise SyntaxError(t)
lexer = lex.lex()
lexer.input('''
% line comment
Variable
''')
while True:
token = lexer.token()
if not token: break
print(token)
<var:Variable> id:7fa3a407e828