Prof. Bin Shan(bshan@mail.hust.edu.cn), Huazhong University of Science and Technology
Nano Materials Design and Manufacturing research center at HUST (www.materialssimulation.com)
Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.
Special Thanks: This notebook is a shortened version of J.R. Johansson (jrjohansson at gmail.com) The latest version of the original IPython notebook can be found at http://github.com/jrjohansson/scientific-python-lectures.
Science has traditionally been divided into experimental and theoretical disciplines, but during the last several decades computing has emerged as a very important part of science. Scientific computing is often closely related to theory, but it also has many characteristics in common with experimental work. It is therefore often viewed as a new third branch of science. In most fields of science, computational work is an important complement to both experiments and theory, and nowadays a vast majority of both experimental and theoretical papers involve some numerical calculations, simulations or computer modeling.
In experimental and theoretical sciences there are well established codes of conducts for how results and methods are published and made available to other scientists. For example, in theoretical sciences, derivations, proofs and other results are published in full detail, or made available upon request. Likewise, in experimental sciences, the methods used and the results are published, and all experimental data should be available upon request. It is considered unscientific to withhold crucial details in a theoretical proof or experimental method, that would hinder other scientists from replicating and reproducing the results.
In computational sciences there are not yet any well established guidelines for how source code and generated data should be handled. For example, it is relatively rare that source code used in simulations for published papers are provided to readers, in contrast to the open nature of experimental and theoretical work. And it is not uncommon that source code for simulation software is withheld and considered a competitive advantage (or unnecessary to publish).
However, this issue has recently started to attract increasing attention, and a number of editorials in high-profile journals have called for increased openness in computational sciences. Some prestigious journals, including Science, have even started to demand of authors to provide the source code for simulation software used in publications to readers upon request.
Discussions are also ongoing on how to facilitate distribution of scientific software, for example as supplementary materials to scientific papers.
Reproducible Research in Computational Science, Roger D. Peng, Science 334, 1226 (2011).
Shining Light into Black Boxes, A. Morin et al., Science 336, 159-160 (2012).
The case for open computer programs, D.C. Ince, Nature 482, 485 (2012).
Replication and reproducibility are two of the cornerstones in the scientific method. With respect to numerical work, complying with these concepts have the following practical implications:
Replication: An author of a scientific paper that involves numerical calculations should be able to rerun the simulations and replicate the results upon request. Other scientist should also be able to perform the same calculations and obtain the same results, given the information about the methods used in a publication.
Reproducibility: The results obtained from numerical simulations should be reproducible with an independent implementation of the method, or using a different method altogether.
In summary: A sound scientific result should be reproducible, and a sound scientific study should be replicable.
To achieve these goals, we need to:
Keep and take note of exactly which source code and version that was used to produce data and figures in published papers.
Record information of which version of external software that was used. Keep access to the environment that was used.
Make sure that old codes and notes are backed up and kept for future reference.
Be ready to give additional information about the methods used, and perhaps also the simulation codes, to an interested reader who requests it (even years after the paper was published!).
Ideally codes should be published online, to make it easier for other scientists interested in the codes to access it.
Python is a modern, general-purpose, object-oriented, high-level programming language.
General characteristics of Python:
Technical details:
Advantages:
Disadvantages:
Python has a strong position in scientific computing:
Extensive ecosystem of scientific libraries and environments
Great performance due to close integration with time-tested and highly optimized codes written in C and Fortran:
Good support for
Readily available and suitable for use on high-performance computing clusters.
No license costs, no unnecessary use of research budget.
The standard way to use the Python programming language is to use the Python interpreter to run python code. The python interpreter is a program that reads and execute the python code in files passed to it as arguments. At the command prompt, the command python
is used to invoke the Python interpreter.
For example, to run a file my-program.py
that contains python code from the command prompt, use::
$ python my-program.py
We can also start the interpreter by simply typing python
at the command line, and interactively type python code into the interpreter.
This is often how we want to work when developing scientific applications, or when doing small calculations. But the standard python interpreter is not very convenient for this kind of work, due to a number of limitations.
IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a work-horse for scientific use of python. It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.
Some of the many useful features of IPython includes:
Jupyter notebook is an HTML-based notebook environment for Python, similar to Mathematica or Maple. It is based on the IPython shell, but provides a cell-based environment with great interactivity, where calculations can be organized and documented in a structured way.
Although using a web browser as graphical interface, Jupyter notebooks can be run locally, from the same computer that run the browser. To start a new IPython notebook session, run the following command:
$ jupyter-notebook.exe
from a directory where you want the notebooks to be stored. This will open a new browser window (or a new tab in an existing window) with an index page where existing notebooks are shown and from which new notebooks can be created.
Spyder is a MATLAB-like IDE for scientific computing with python. It has the many advantages of a traditional IDE environment, for example that everything from code editing, execution and debugging is carried out in a single environment, and work on different calculations can be organized as projects in the IDE environment.
Some advantages of Spyder:
The best way set-up an scientific Python environment is to use the cross-platform package manager conda
from Continuum Analytics.
You might not be aware of, but you are already running a jupyter notebook. The following line shows you the python version that you are running.
import sys
print("Python version")
print (sys.version)
Python version 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
Now implement your favorite "Hello, World!" program within this jupyter notebook environment. Enjoy!
# Enter your code here