# A brief introduction to using Python in Astronomy

The purpose of this lesson is to introduce a few tools that can be very useful for working with Python. The benefits of using some of these tools might not be apparent immediately and you might choose not to use them in the ongoing courses. This is because their utility is correlated very strongly with the age of your code, the size of your code and the number of people you are collaborating with. So in a course project that you work on alone over a short time period of a couple of weeks and which does not involve much code you could get away with ignoring a lot of what is introduced in this notebook. But you will work on your Master's project over several months, most likely write quite a lot of code and at the very least you will be working together with your supervisor, so it is highly recommended that you incorporate the tools discussed below (or their analogues if you will be coding in some other language) into your workflow by that time.

## NOTE:
Some of the packages and tools used in this manual might not be installed by default. 

## Virtual environments

It is often a good idea to run Python in a virtual environment, whether it is set up through the basic Python [venv](https://docs.python.org/3/tutorial/venv.html), the convenient [Pipenv](https://pipenv.pypa.io/en/latest/), or Anaconda. Among many other benefits, this allows you to test your code in a clean environment to ensure that you have not forgotten to list any dependencies (which is important if anyone else ever tries to run your code), but also to install Python packages without requiring root access to your computer or interfering with the Python packages installed at the system level. At the moment you are likely already running Python in a virtual environment created by Anaconda, so you should already be somewhat familiar with the topic.

## PEP 8

[The Python Style Guide](https://www.python.org/dev/peps/pep-0008/), commonly referred to as PEP 8, was already mentioned in the first lesson. Here we introduce tools that allow you to check if your code is PEP 8 compliant. 

## Ruff

[Ruff](https://docs.astral.sh/ruff/) is a modern tool for checking codestyle (lining) and formatting code (formatter). The main difference between Ruff and earlier tools is that Ruff is much faster, making it possible for one tool to do work that might otherwise be split among several. Ruff can fix issues with obvious solutions and point out any others.

If you wish to check a Python script called 'helloworld.py' you would simply run
```
$ ruff check helloworld.py
```

The example above will produce complaints, but all the error messages state exactly where the PEP 8 violations are located and what the problems are. This makes fixing them quite straightforward. It is also possible to get Ruff to automatically fix issues with simple solutions. To do this, run
```
$ ruff check --fix --show-fixes helloworld.py
```

The Ruff formatter can reformat code to make it consistent and easy to read.
It takes the burden of hand-formatting away from the user and lets them focus on the content. 

To run the Ruff formatter on all files in the current directory, run
```
ruff format
```
you can also specify a file to format with
```
ruff format helloworld.py
```

To use Ruff, first [install](https://docs.astral.sh/ruff/installation/) it, then follow the [getting started](https://docs.astral.sh/ruff/tutorial/#getting-started) section of the documentation. You probably also want to go through the [configuration](https://docs.astral.sh/ruff/configuration/) to set up things like shell auto-complete. 

## Docstrings

Docstrings are an important documentation tool in Python. In fact they are important enough that the conventions for writing docstrings are not provided in PEP 8, but separately in [PEP 257](https://www.python.org/dev/peps/pep-0257/). You can think of docstrings as special comments that, differently from normal comments, are accessible from the Python interpreter. They are often used by IDEs but also for automatically generating online documentation. Very basic use of docstrings is demonstrated below.

In [None]:
def hello_world():
    """Print 'Hello, World!'"""
    print("Hello, World!")


help(hello_world)

If you are writing something that might be used by many people and wish to adhere to good docstring conventions then it might be a good idea to check out the [NumPy docstring convention](https://numpydoc.readthedocs.io/en/latest/format.html) and the [pydocstyle](https://github.com/PyCQA/pydocstyle/) tool.
A docstring from `numpy` is provided as an example.

In [None]:
from numpy import unique

help(unique)

Inside a Jupyter notebook you can use the IPython commands `?` and `?? ` to view the docstring and source code of a function respectively. Because these are IPython commands rather than normal Python commands, some linters may think they are syntax errors, but this is not a problem for Jupyter.

In [None]:
?unique

In [None]:
??unique

## Testing

It is a good idea to write tests that check whether or not your code produces the expected output. This can help you make sure that all the dependencies of your code are properly installed and working, code changes have not resulted in unexpected consequences or that the recent addition you have made is working as it should. A good framework for performing such tests is [pytest](https://docs.pytest.org/en/latest/). Inside a Jupyter notebook we should use [ipytest](https://pypi.org/project/ipytest/), which is based on pytest. 

In [None]:
import ipytest

ipytest.autoconfig()

In [None]:
# A faulty function
def int_square(a):
    return a

In [None]:
%%ipytest

# The test that reveals the problems
def test_square():
    assert isinstance(int_square(0.), int)
    assert int_square(3) == 9

Although pytest tries to help us as much as possible in figuring out what is causing the tests to fail, you should keep in mind that pytest simply runs the tests it is told to run. How useful pytest is for figuring out problems depends on how well these tests are written.

## Performance optimization & profiling

Sometimes you need to figure out a way of improving the performance of your code. Knowing what makes a code perform faster beforehand is of course useful but not always realistic. Here we will focus on identifying what is making your already existing code slow. We call this **profiling**.

In previous lectures we already used [timeit](https://docs.python.org/3/library/timeit.html) and the corresponding magic functions.
We will now discuss tools which profile an entire script on a line-by-line basis.

### Jupyter notebook profiling

First, we will do this in a Jupyter notebook using [line_profiler](https://github.com/pyutils/line_profiler).

In [None]:
%load_ext line_profiler
import numpy as np
import matplotlib.pyplot as plt

Say we had the following function for calculating a [moving average](https://en.wikipedia.org/wiki/Moving_average).

In [None]:
def movmean(xdata, ydata, window):
    ydata_new = np.zeros(len(ydata))
    xdata_new = np.zeros(len(xdata))
    k = int(window / 2)
    for i in range(len(ydata)):
        if i < window:
            ydata_new[i] = np.mean(ydata[: (i + k)])
            xdata_new[i] = np.mean(xdata[: (i + k)])
        elif i > len(ydata) - window:
            ydata_new[i] = np.mean(ydata[(i - k) :])
            xdata_new[i] = np.mean(xdata[(i - k) :])
        else:
            ydata_new[i] = np.mean(ydata[(i - k) : (i + k)])
            xdata_new[i] = np.mean(xdata[(i - k) : (i + k)])
    return (xdata_new, ydata_new)

We use it on some noisy data that we have.

In [None]:
x, y = np.loadtxt("xy.txt")
x_med, y_med = movmean(x, y, 100)

plt.plot(x, y, ".")
plt.plot(x_med, y_med, "r")
plt.xlabel("$x$")
plt.ylabel("$y$")

Now we run `line_profiler` on the function call to identify what our bottlenecks are using `%lprun`.

In [None]:
%lprun -f movmean movmean(x, y, 100)

This brings up the results to the pager which tells us that most of our time is spent in the else statement where it should be.  

### Spyder

Spyder has implemented [spyder-line-profiler](https://github.com/spyder-ide/spyder-line-profiler) which is `line_profiler` implemented directly in Spyder.

Once installed you can use it by placing a `@profile` decorator in front of the functions that you want to be profiled. Then either press Shift + F10 or go to `Run > Profile line by line` to start the profiler.

A short demonstration of using this profiler in Spyder can be seen in the video below:

<video controls width="900" src="https://lund-observatory-teaching.github.io/lundpython/imgs/spyder_line_profiler.mov" />

### Normal Python
When we use profiling in basic Python we can make use of the above `line_profiler` again or the builtin [cProfiler](https://docs.python.org/3/library/profile.html). In a script you could for example do:

In [None]:
import cProfile

cProfile.run("movmean(x, y, 100)")

This output might not be the easiest to interpret.
Also `cProfile` only times function calls and so could miss some slow `numpy` operations like `a[large_index_array] = some_other_large_array`.

Instead we might want to use `line_profiler`. For this we move to the command line:

Once again, make sure that your function has the `@profile` decorator. Then do the following in the same directory as your .py script:

<pre style="background-color:black;color:white"> 
<code style="background-color:black;color:white"> 
 $ kernprof -l -v spyderexample.py
 
 Wrote profile results to spyderexample.py.lprof
 Timer unit: 1e-06 s
 
 Total time: 0.033536 s
 File: spyderexample.py
 Function: movmean at line 11
 
 Line #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
     11                                           @profile
     12                                           def movmean(xdata, ydata, window):
     13         1         52.0     52.0      0.2      ydata_new = np.zeros(len(ydata))
     14         1          3.0      3.0      0.0      xdata_new = np.zeros(len(xdata))
     15         1          2.0      2.0      0.0      k = int(window/2)
     16      1001        499.0      0.5      1.5      for i in range(len(ydata)):
     17      1000        412.0      0.4      1.2          if i < window:
     18       100       1635.0     16.4      4.9              ydata_new[i] = np.mean(ydata[:(i+k)])
     19       100       1574.0     15.7      4.7              xdata_new[i] = np.mean(xdata[:(i+k)])
     20       900        513.0      0.6      1.5          elif i > len(ydata)-window:
     21        99       1714.0     17.3      5.1              ydata_new[i] = np.mean(ydata[(i-k):])
     22        99       1663.0     16.8      5.0              xdata_new[i] = np.mean(xdata[(i-k):])
     23                                                   else:
     24       801      12539.0     15.7     37.4              ydata_new[i] = np.mean(ydata[(i-k):(i+k)])
     25       801      12930.0     16.1     38.6              xdata_new[i] = np.mean(xdata[(i-k):(i+k)])
     26         1          0.0      0.0      0.0      return(xdata_new, ydata_new)
     
 $ |
 
</code> 
</pre>   

This will generate a file called `<yourscriptname>.py.lpro`. The `-v` option is to let you view the results immediately. Otherwise you can view them by calling

<pre style="background-color:black;color:white"> 
<code style="background-color:black;color:white"> 
 $ python -m line_profiler spyderexample.py.lprof
 
</code> 
</pre> 

## Progress bars with [`tqdm`](https://pypi.org/project/tqdm/)

Simple progress bars can be created with `tqdm`.

In [None]:
from time import sleep

from tqdm import tqdm

for t in tqdm((0.5, 1, 0.5, 1)):
    sleep(t)

It is possible to use `tqdm()` with `range()`, but `tqdm` provides a `trange()` function.

In [None]:
from tqdm import trange

for _ in trange(5):
    sleep(0.4)

It is possible to provide a message to go along with the progress bar.

In [None]:
for t in tqdm((0.5, 1, 0.5, 1), desc="Look at me"):
    sleep(t)

The message can be updated after creation.

In [None]:
pbar = tqdm(("a", "b", "c"))
for dataset in pbar:
    pbar.set_description("Processing dataset " + dataset)
    sleep(1)
pbar.close()

# The context manager closes the progress bar automatically.
with trange(5) as pbar:
    for i in pbar:
        pbar.set_description(f"Step #{i}")
        sleep(0.8)

If you are using `tqdm` in a Jupyter notebook then you might prefer using the versions of the functions defined in `tqdm.notebook`.

In [None]:
from tqdm.notebook import tqdm, trange

for dataset in tqdm(("a", "b", "c"), desc="Working hard"):
    for _ in trange(4, desc="Processing dataset " + dataset):
        sleep(0.25)

for dataset in tqdm(("x", "y", "z"), desc="Working some more"):
    for _ in trange(4, desc="Processing dataset " + dataset, leave=False):
        sleep(0.25)

## Version control

It is a good idea to have some version control software manage your code. Not only would this allow you to restore older versions of your code repository, it can also help you to document how the code has evolved. If you are using [Git](https://git-scm.com) it is very simple to host your code (either privately or publicly) on [GitHub](https://github.com), [GitLab](https://gitlab.com) or [Bitbucket](https://bitbucket.org/) (this list is far from being complete). This functions both as a backup in the cloud but also allows you to easily share your code with your collaborators (or at the very least your supervisor), though Git can certainly be useful even if you never share your repository with anyone.

If you are interested in version control then you can read more about it from [Chapter 1 Section 1 of the Pro Git book](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control). If you are interested in using Git then you can continue on reading the book.

## Slideshows

Our presentations use a sort of "slideshow" version of Jupyter notebooks. There are a few ways this can be done, and we have used [RISE](https://rise.readthedocs.io/).

Once you have installed it, you will find the following button in your notebooks:

![](https://lund-observatory-teaching.github.io/lundpython/imgs/RISE1.png)

Which will take you into a presentation mode of your notebook. But before you do that, you need to specify which cells, both markdown and code, belong to a slide. For this, you will want to see slide types under `View > Cell Toolbar > Slideshow`, seen as <strong style="color:red">a</strong> in the following figure:

![](https://lund-observatory-teaching.github.io/lundpython/imgs/RISE2.png)

Now you can see the slide type where <strong style="color:red">b</strong> is in the above image. Every cell starting with `Slide` will be a new slide. Try out the other options to quickly figure out what they do.

## Scientific Python

The [Scientific Python community](https://learn.scientific-python.org/development/) has tonnes of useful materials that can be greatly beneficial to advanced students. In particular, they have guidance on the [process](https://learn.scientific-python.org/development/principles/process/) (how coding projects should be conducted) and [design](https://learn.scientific-python.org/development/principles/design/) (how code should be written), as well as more in-depth guides on topics already covered like [style](https://learn.scientific-python.org/development/guides/style/), [testing](https://learn.scientific-python.org/development/guides/pytest/), and [much more](https://learn.scientific-python.org/development/guides/).