Further Reading: Parsing BSSN (Cartesian) Notebook
In the following section, we discuss lexical analysis (lexing) and syntax analysis (parsing). In lexical analysis, a lexical analyzer (or scanner) can tokenize a character string, called a sentence, using substring pattern matching. In syntax analysis, a syntax analyzer (or parser) can construct a parse tree, containing all syntactic information of the language (specified by a formal grammar), after receiving a token iterator from the lexical analyzer.
For LaTeX to SymPy conversion, we implemented a recursive descent parser that can construct a parse tree in preorder traversal, starting from the root nonterminal, using a right recursive grammar (partially shown below in the canonical (extended) BNF notation).
<EXPRESSION> -> <TERM> { ( '+' | '-' ) <TERM> }*
<TERM> -> <FACTOR> { [ '/' ] <FACTOR> }*
<FACTOR> -> <BASE> { '^' <EXPONENT> }*
<BASE> -> [ '-' ] ( <ATOM> | <SUBEXPR> )
<EXPONENT> -> <BASE> | '{' <BASE> '}' | '{' '{' <BASE> '}' '}'
<ATOM> -> <COMMAND> | <OPERATOR> | <NUMBER> | <TENSOR>
<SUBEXPR> -> '(' <EXPRESSION> ')' | '[' <EXPRESSION> ']' | '\' '{' <EXPRESSION> '\' '}'
<COMMAND> -> <FUNC> | <FRAC> | <SQRT> | <NLOG> | <TRIG>
⋮ ⋮
Source: Robert W. Sebesta. Concepts of Programming Languages. Pearson Education Limited, 2016.
import sympy as sp
!pip install nrpylatex~=1.2 > /dev/null
!pip freeze | grep nrpylatex
from nrpylatex import *
nrpylatex==1.2.3
scanner = Scanner(); scanner.initialize(r'(1 + x/n)^n')
print(', '.join(token for token in scanner.tokenize()))
LPAREN, INTEGER, PLUS, LETTER, DIVIDE, LETTER, RPAREN, CARET, LETTER
expr = parse_latex(r'(1 + x/n)^n')
print(expr, '\n >>', sp.srepr(expr))
(1 + x/n)**n >> Pow(Add(Integer(1), Mul(Pow(Symbol('n', real=True), Integer(-1)), Symbol('x', real=True))), Symbol('n', real=True))
Grammar Derivation: (1 + x/n)^n
<EXPRESSION> -> <TERM>
-> <FACTOR>
-> <BASE>^<EXPONENT>
-> <SUBEXPR>^<EXPONENT>
-> (<EXPRESSION>)^<EXPONENT>
-> (<TERM> + <TERM>)^<EXPONENT>
-> (<FACTOR> + <TERM>)^<EXPONENT>
-> (<BASE> + <TERM>)^<EXPONENT>
-> (<ATOM> + <TERM>)^<EXPONENT>
-> (<NUMBER> + <TERM>)^<EXPONENT>
-> (<INTEGER> + <TERM>)^<EXPONENT>
-> (1 + <TERM>)^<EXPONENT>
-> (1 + <FACTOR> / <FACTOR>)^<EXPONENT>
-> ...
In the following section, we demonstrate the process for extending the parsing module to include a (previously) unsupported LaTeX command.
grammar
dictionary in the Scanner
class with the mapping regex
↦ token
.<SQRT> -> <SQRT_CMD> [ '[' <INTEGER> ']' ] '{' <EXPRESSION> '}'
def _sqrt(self):
self.expect('SQRT_CMD')
if self.accept('LBRACK'):
integer = self.scanner.lexeme
self.expect('INTEGER')
root = Rational(1, integer)
self.expect('RBRACK')
else: root = Rational(1, 2)
self.expect('LBRACE')
expr = self._expression()
self.expect('RBRACE')
if root == Rational(1, 2):
return sqrt(expr)
return Pow(expr, root)
In addition to expression parsing, we included support for equation parsing, which can produce a dictionary mapping LHS
↦ RHS
, where LHS
must be a symbol, and insert that mapping into the global namespace of the previous stack frame, as demonstrated below.
parse_latex(r'\text{s_n} = \left(1 + \frac{1}{n}\right)^n')
('s_n',)
print('s_n =', s_n)
s_n = (1 + 1/n)**n
Furthermore, we implemented robust error messaging using the custom ParseError
exception, which should handle every conceivable case to identify, as detailed as possible, invalid syntax inside of a LaTeX sentence. The following are some runnable examples of possible error messages.
try: parse_latex(r'5x^{{4$}}')
except ScanError as e:
print(type(e).__name__ + ': ' + str(e))
ScanError: 5x^{{4$}} ^ unexpected '$' at position 6
try: parse_latex(r'\sqrt[0.1]{5x^{{4}}}')
except ParseError as e:
print(type(e).__name__ + ': ' + str(e))
ParseError: \sqrt[0.1]{5x^{{4}}} ^ expected token INTEGER at position 6
try: parse_latex(r'\int_0^5 5x^{{4}}dx')
except ParseError as e:
print(type(e).__name__ + ': ' + str(e))
ParseError: \int_0^5 5x^{{4}}dx ^ unsupported command '\int' at position 0
In the sandbox code cell below, you can experiment with converting LaTeX to SymPy using the wrapper function parse(sentence)
, where sentence
must be a Python raw string to interpret a backslash as a literal character rather than an escape sequence. You could, alternatively, use the supported cell magic %%parse_latex
to automatically escape every backslash and parse the cell (more convenient than parse(sentence)
in a notebook format).
# Write Sandbox Code Here
In the following section, we demonstrate parsing tensor notation using the Einstein summation convention. In each example, every tensor should appear either on the LHS of an equation or on the RHS of a vardef
macro before appearing on the RHS of an equation. Furthermore, an exception will be raised upon violation of the Einstein summation convention, i.e. the occurrence of an invalid free or bound index.
Configuration Grammar
<MACRO> -> <PARSE> | <SREPL> | <VARDEF> | <ATTRIB> | <ASSIGN> | <IGNORE>
<PARSE> -> <PARSE_MACRO> <ASSIGNMENT> { ',' <ASSIGNMENT> }* '\\'
<SREPL> -> <SREPL_MACRO> [ '-' <PERSIST> ] <STRING> <ARROW> <STRING> { ',' <STRING> <ARROW> <STRING> }*
<VARDEF> -> <VARDEF_MACRO> { '-' ( <OPTION> | <ZERO> ) }* <VARIABLE> [ '::' <DIMENSION> ]
{ ',' <VARIABLE> [ '::' <DIMENSION> ] }*
<ATTRIB> -> <ATTRIB_MACRO> ( <COORD_KWRD> ( <COORD> | <DEFAULT> ) | <INDEX_KWRD> ( <INDEX> | <DEFAULT> ) )
<ASSIGN> -> <ASSIGN_MACRO> { '-' <OPTION> }* <VARIABLE> { ',' <VARIABLE> }*
<IGNORE> -> <IGNORE_MACRO> <STRING> { ',' <STRING> }*
<OPTION> -> <CONSTANT> | <KRONECKER> | <METRIC> [ '=' <VARIABLE> ] | <WEIGHT> '=' <NUMBER>
| <DIFF_TYPE> '=' <DIFF_OPT> | <SYMMETRY> '=' <SYM_OPT>
<COORD> -> <COORD_KWRD> <LBRACK> <SYMBOL> [ ',' <SYMBOL> ]* <RBRACK>
<INDEX> -> ( <LETTER> | '[' <LETTER> '-' <LETTER> ']' ) '::' <DIMENSION>
parse_latex(r"""
% define hUD --dim 4
h = h^\mu{}_\mu
""", reset=True)
('hUD', 'h')
print('h =', h)
h = hUD00 + hUD11 + hUD22 + hUD33
parse_latex(r"""
% define gUU --dim 3 --metric
% define vD --dim 3
v^a = g^{ab} v_b
""", reset=True)
('vU', 'gUU', 'gDD', 'GammaUDD', 'epsilonDDD', 'vD', 'gdet')
print('vU =', vU)
vU = [gUU00*vD0 + gUU01*vD1 + gUU02*vD2, gUU01*vD0 + gUU11*vD1 + gUU12*vD2, gUU02*vD0 + gUU12*vD1 + gUU22*vD2]
parse_latex(r"""
% define vU wU --dim 3
u_i = \epsilon_{ijk} v^j w^k
""", reset=True)
('uD', 'vU', 'epsilonDDD', 'wU')
print('uD =', uD)
uD = [vU1*wU2 - vU2*wU1, -vU0*wU2 + vU2*wU0, vU0*wU1 - vU1*wU0]
The following are contextually inferred, dynamically generated, and injected into the global namespace for expansion of the covariant derivative ∇νFμν Γμba=12gμc(∂bgac+∂agcb−∂cgba)Γνba=12gνc(∂bgac+∂agcb−∂cgba)∇aFμν=∂aFμν+ΓμbaFbν+ΓνbaFμb
parse_latex(r"""
% define FUU --dim 4 --deriv dD --sym anti01
% define gDD --dim 4 --deriv dD --metric
% define k --const
J^\mu = (4\pi k)^{-1} \nabla_\nu F^{\mu\nu}
""", reset=True)
('gUU', 'FUU', 'gDD', 'JU', 'epsilonUUUU', 'GammaUDD', 'FUU_dD', 'k', 'gDD_dD', 'FUU_cdD', 'gdet')
parse_latex(r"""
% define FUU --dim 4 --deriv dD --sym anti01
% define ghatDD --dim 4 --deriv dD --metric
% define k --const
J^\mu = (4\pi k)^{-1} \hat{\nabla}_\nu F^{\mu\nu}
""", reset=True)
('ghatDD', 'JU', 'FUU', 'ghatDD_dD', 'epsilonUUUU', 'FUU_dD', 'k', 'FUU_cdhatD', 'ghatUU', 'ghatdet', 'GammahatUDD')
%load_ext nrpylatex.extension
%%parse_latex --reset --ignore-warning
% coord [t, r, \theta, \phi]
% define gDD --dim 4 --zero
% define G M --const
% ignore "\begin{align}" "\end{align}"
\begin{align}
g_{t t} &= -\left(1 - \frac{2GM}{r}\right) \\
g_{r r} &= \left(1 - \frac{2GM}{r}\right)^{-1} \\
g_{\theta \theta} &= r^2 \\
g_{\phi \phi} &= r^2 \sin^2\theta
\end{align}
% assign gDD --metric
sp.Matrix(gDD)
%%parse_latex
% ignore "\begin{align}" "\end{align}"
\begin{align}
R^\alpha{}_{\beta\mu\nu} &= \partial_\mu \Gamma^\alpha_{\beta\nu} - \partial_\nu \Gamma^\alpha_{\beta\mu}
+ \Gamma^\alpha_{\mu\gamma}\Gamma^\gamma_{\beta\nu} - \Gamma^\alpha_{\nu\sigma}\Gamma^\sigma_{\beta\mu} \\
K &= R^{\alpha\beta\mu\nu} R_{\alpha\beta\mu\nu} \\
R_{\beta\nu} &= R^\alpha{}_{\beta\alpha\nu} \\
R &= g^{\beta\nu} R_{\beta\nu} \\
G_{\beta\nu} &= R_{\beta\nu} - \frac{1}{2}g_{\beta\nu}R
\end{align}
sp.simplify(sp.Matrix(RDD))
display(sp.Matrix(GammaUDD[0][:][:]))
display(sp.Matrix(GammaUDD[1][:][:]))
display(sp.Matrix(GammaUDD[2][:][:]))
display(sp.Matrix(GammaUDD[3][:][:]))
For the Schwarzschild metric, the Kretschmann scalar K has the property that K→∞ as r→0, and hence the metric and spacetime itself are undefined at the point of infinite curvature r=0, indicating the presence of a physical singularity since the Kretschmann scalar is an invariant quantity in general relativity.
display(sp.simplify(K))
In a vacuum region, such as the spacetime described by the Schwarzschild metric, Tμν=0 and hence Gμν=0 since Gμν=8πGTμν (Einstein Equations).
sp.simplify(sp.Matrix(GDD))
%%parse_latex --ignore-warning
% coord [r, \theta, \phi]
% ignore "\begin{align}" "\end{align}"
\begin{align}
\gamma_{ij} &= g_{ij} \\
% assign gammaDD --metric
\beta_i &= g_{r i} \\
\alpha &= \sqrt{\gamma^{ij}\beta_i\beta_j - g_{r r}} \\
K_{ij} &= \frac{1}{2\alpha}\left(\nabla_i \beta_j + \nabla_j \beta_i\right) \\
K &= \gamma^{ij} K_{ij}
\end{align}
For the Schwarzschild metric (defined in the previous example), the extrinsic curvature in the ADM formalism should evaluate to zero.
display(sp.Matrix(KDD))
%%parse_latex --ignore-warning
% ignore "\begin{align}" "\end{align}"
\begin{align}
R_{ij} &= \partial_k \Gamma^k_{ij} - \partial_j \Gamma^k_{ik}
+ \Gamma^k_{ij}\Gamma^l_{kl} - \Gamma^l_{ik}\Gamma^k_{lj} \\
R &= \gamma^{ij} R_{ij} \\
E &= \frac{1}{16\pi}\left(R + K^{{2}} - K_{ij}K^{ij}\right) \\
p_i &= \frac{1}{8\pi}\left(D_j \gamma^{jk} K_{ki} - D_i K\right)
\end{align}
Every solution to the Einstein Equations, including Schwarzschild, must satisfy the Hamiltonian constraint (E=0) and the Momentum constraint (pi=0).
print('E = %s, pD = %s' % (sp.simplify(E), pD))
E = 0, pD = [0, 0, 0]
We extended our robust error messaging using the custom TensorError
exception, which should handle any inconsistent tensor dimension and any violation of the Einstein summation convention, specifically that a bound index must appear exactly once as a superscript and exactly once as a subscript in any single term and that a free index must appear in every term with the same position and cannot be summed over in any term.
%%parse_latex --reset
% define TUD uD --dim 4
v^\mu = T^\mu_\nu u_\nu
TensorError: illegal bound index 'nu' in vU
%%parse_latex --reset
% define TUD uD --dim 4
v^\mu = T^\mu_\nu u_\mu
TensorError: unbalanced free indices {'nu', 'mu'} in vU
%%parse_latex --reset
% define TUD --dim 4
% define uD --dim 3
v_\nu = T^\mu_\nu u_\mu
ParseError: index out of range; change loop/summation range
%%parse_latex --reset
% define vD --dim 4
T_{\mu\nu} = v_\mu w_\nu
ParseError: T_{\mu\nu} = v_\mu w_\nu ^ cannot index undefined variable 'wD' at position 39
%%parse_latex --reset
% define FUU --dim 4 --sym anti01
% define k --const
J^\mu = (4\pi k)^{-1} \nabla_\nu F^{\mu\nu}
ParseError: J^\mu = (4\pi k)^{-1} \nabla_\nu F^{\mu\nu} ^ cannot generate covariant derivative without defined metric 'g'
The following code cell converts this Jupyter notebook into a proper, clickable LATEX-formatted PDF file. After the cell is successfully run, the generated PDF may be found in the root NRPy+ tutorial directory, with filename Tutorial-SymPy_LaTeX_Interface.pdf. (Note that clicking on this link may not work; you may need to open the PDF file through another means).
import cmdline_helper as cmd # NRPy+: Multi-platform Python command-line interface
cmd.output_Jupyter_notebook_to_LaTeXed_PDF("Tutorial-SymPy_LaTeX_Interface")
Created Tutorial-SymPy_LaTeX_Interface.tex, and compiled LaTeX file to PDF file Tutorial-SymPy_LaTeX_Interface.pdf