This page contains the complementary material for the "Getting up to speed with Python" workshop. The material is designed to be used in the hands-on part of the workshop and is not supposed to be used as an stand-alone source of instructions.
|
|
Create a new directory for the project:
|
|
Create a virtual environment: The following command creates an isolated Python environment in the "my_venv" directory.
|
|
❓ Why might you want different virtual environments for different projects?
❓ What happens if you do not use a virtual environment at all?
Activate the virtual environment:
|
|
❓ How can you tell if your virtual environment is activated?
❓ What changes in your terminal prompt or path once the environment is active?
pip list
❓ Do you see any packages listed here? Why might it be empty or show fewer packages than your global environment?
pip install numpy
❓ Why do you install packages within the environment instead of globally?
❓ What advantages does this provide in managing dependencies?
pip list
❓ Do you see your newly installed package (e.g., numpy) in the list now?
Compare with global python environment:
Deactivate your virtual environment
deactivate
List packages in the global environment
pip list
❓ Do you notice the difference in packages between the virtual environment and the global environment?
Reactivate your virtual environment:
|
|
Upgrading pip inside the environment:
pip install --upgrade pip
Creating a requirements file:
pip freeze > requirements.txt
❓ Do you see all the packages you installed inside requirements.txt?
❓ Why is it useful to have a requirements.txt file for your project?
Recreate your environment somewhere else:
|
|
❓ What would happen if you tried to install from requirements.txt in the global environment?
❓ Why is this approach beneficial when sharing your project with others?
Deactivate your environment:
deactivate
Delete your environment: (if needed)
|
|
❓ When might you want to remove a virtual environment entirely?
NOTE
To inherit packages from the global python environment, use --system-site-packages during environment creation.
Example: python3 -m venv myvenv --system-site-packages
These are the options for Conda installation (Anaconda is not required for the workshop):https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
You may have to perform some further steps to activate Conda. Consult https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
Remove the defaults channel, and add the conda-forge channel (unless already present):
conda config --remove channels defaults
conda config --add channels conda-forge
List current channels:
conda config --show channels
Create a Conda environment. Specify Python version
conda create --name my-own-test-env python=3.11
Activate the environment:
conda activate my-own-test-env
❓ Do you now see an indication that my-own-test-env is active?
The new environment will be located inside a folder in a default location (It is possible to override the default location). To look up the location, you can use
conda info
❓ Look under active env location. Python should already be in there -- verify this with e.g. where python. Do you have the expected version of Python?
If you haven't already, activate the environment first.
Install a package with Conda:
conda install pandas
❓ Do we have Pip inside the environment?
where pip
❓ Is the pip executable inside the Conda environment folder? If not, install it:
conda install pip
Install a package with pip:
pip install scikit-learn
❓ Both Pandas and Scikit-learn depend on Numpy. Did Pip reinstall Numpy, or did it find that Numpy was already installed when we installed Pandas?
Note that Scikit-learn is also available from Conda, but we want to try out Pip here.
List the primary dependencies:
conda env export --from-history
Was scikit-learn listed?
Also try these:
conda env export
conda list
conda list --explicit
These document the environment in various level of detail.
List all dependencies, including version number, and write the output to a yaml file:
conda env export > environment.yml
(This level of detail may be too high for sharing across systems)
Deactivate the environment:
conda deactivate
Recreate the environment based on the yaml file:
conda env create --name test-my-yml --file environment.yml
Activate the new environment, and verify that packages were installed.
Poetry requires python 3.9 or higher. Refer to this earlier section. For the final demonstration of this part of the workshop, python 3.10 (or higher) is required (optional).
Once you have the right python version, it is recommended to install poetry using pipx.
See here on how to install pipx.
Once pipx is installed, see here on how to use it to install poetry
See here
Create a directory my_poetry_project (ideally, within this repo), and cd into it .
poetry init
This will create an isolated poetry project in the current (working) directory called my_poetry_proj.
This will ask you a series of questions to set up the project. You can skip this by using the -n flag:
You can in principle add packages, but due to a current "bug" (change in PyPi search results), this is not working as expected.
Note: you can alternatively, create a new directory with poetry directly, which will skip the interactive setup, and set some defaults in the configuration files
poetry new my_poetry_proj
This will create a pyproject.toml file, which is the configuration file for poetry.
By default, poetry will assume you are creating a package. For clarity, and to avoid potential any errors (unless your codebase is is set up as a valid python package),
add the following to the pyproject.toml:
package-mode = false
Download the pd_data.py script by right clicking this link and then Save link as
poetry shell
❓ Do you notice any changes in the terminal?
From here, you can run a python script:
python path/to/pd_data.py
Alternatively, you can do the above to in one step:
poetry run python path/to/pd_data.py
Note: the above two lines will not work, because we are missing the packages
Note: requires a poetry environment to be activated
poetry add {package name}
Specifically, we will add numpy and pandas:
poetry add numpy
poetry add pandas
Or in one line:
poetry add numpy pandas
This will update the pyproject.toml and poetry.lock (or create the latter if it does not exist)
❓ What exact differences do you see in the pyproject.toml and poetry.lock files?
Now, we will be able to successfully run the python script:
python path/to/python_workshop/scripts/pd_data.py
You can check the pyproject.toml or poetry.lock files.
Alternatively, you can run the following command:
poetry show
And for even more detail, add the --tree flag:
poetry show --tree
❓ What differences do you see without and with the --tree flag?
You can update all dependencies by running:
poetry update
Or, for a specific package:
poetry update {package name 1} {package name 2} ...
To recreate your environment, share your pyproject.toml and poetry.lock files with someone else. They can then run the following commands:
Note: the user needs to install the correct python and poetry versions (which can be seen in the pyproject.toml file)
Then, the user can run
poetry shell
to activate the poetry environment ("project"). The packages can be installed with
poetry install
We will see how this works by installing dependencies and running a microservice from another repo (to be done together)
Note: in the example, we will most likely have to add package-mode = false to the pyproject.toml file, as the microservice is not a valid python package.
You can either run
exit
or
deactivate
Within the directory that is defined as a poetry project, first, start poetry:
poetry shell
Then, list the available environments
poetry env list
❓ Which environment(s) do you see?
Then, delete chosen environment with
poetry env remove {environment name}
It is possible to just use poetry for package management, but set up the virtual environment with something else (e.g. conda), or not use a virtual environment at all. In this case, you would run:
poetry config virtualenvs.create false
All the above commands are relevant, with the exception of poetry shell
VS Code “python: select interpreter” will automatically detect poetry environments in the working directory
If elsewhere (e.g. subfolder), need to manually specify the path
poetry env info -p {in directory containing poetry configs}
Paste path from above into the “Enter interpreter path”
In this workshop, we will explore several extensions that enhance the functionality of VS Code for Python development. Click on each link for installation instructions:
Note: You can technically run python code without any of these extensions, but they have useful features.
Note: The "Black formatter" and "Pylint" depend on the "Python" extension. "Jupyter" will allow you to run code in an "interactive window". "Data Wrangler" will allow you to inspect pandas dataframes in a spreadsheet view.
How to install required libraries for this setup (ideally in an virtual environment):
pip install jupyter
pip install ipykernel
Additional to convert a python file (.py) to jupyter notebook (.ipynb)
pip install nbconvert
In order to run python codes interactively, you need a VS Code version from 2024.
See here for full details
You also need the jupyter extension (see above)
In general,root "folder" (i.e. the one you are in with "Open folder") is your "workspace"
Can think of it as an Rstudio project (becomes your base working directory)
Python files
demo-script.pyRun python script using the global environment
python demo-script.py
Run python script using the virtual environment
python demo-script.py
❓ This time python runs the code in the virtual environment, do you see any error? Are numpy and matplotlib installed in that environment.
Select python interpreter/environment for an interactive run in VS Code
Manage (cog wheel) -> Command Palette -> "Python: Select Interpreter" -> {choose desired environment/interpreter} Ctrl + shift + P "> "Python: Select Interpreter" -> {choose desired environment/interpreter} Run python script interactively
Formatting Python in VS Code
There are lots of formatters in python, that are available in VS Code. We will be using black in this workshop.
We will see how this changes the format of the content in in fizz_buzz.py.
Download the fizz_buzz.py script by right clicking this link and then Save link as
Right click (anywhere in open file) -> Format Document with -> Black Formatter
If you want to the formatter to be "activated" when saving a file at the workspace level, Select:
Command Palette -> Preferences: Open Workspace Settings (JSON).
This will open / create .vscode/settings.json. This should be edited to:
{
"[python]": {
"editor.formatOnSave": true,
"python.formatting.provider": "ms-python.black-formatter"
}
}
Linting python in VS Code
Checks code for semantic and stylistic problems.
Can check any script, for instance, demo-script.py from earlier
To open tab with list of “problems” press: Ctrl + shift + M:
Note: Unlike a formatter, in VS Code, this is by default activated for all python files. Need to manually turn it off if not desired.
Debugging in VS Code
Download the example code for debugging by right clicking this link and then Save link as...
The basic options are:
Run the debugger on a python file (e.g. scripts/code_debug.py) and it will continue until there is an error.
Can set a "breakpoint" where you want the code to stop, to inspect objects (called "variables" in the debugger: "locals" and "globals").
Initiate a Git repo
Source control tab and then Initialize RepositoryAdd/commit
Source control tab, add and commitPush a local repo to GitHub
publish BranchClone from a GitHub repo
HTTPS of the repoCtrl + shift + p -> type Git:Clone -> select Clone from GitHub-> paste the HTTPS of the repoGit Graph
.ipynb files
Running "cells" in interactive mode
.py file, you can create cells by adding # %% at the beginning of a line.
e.g.# %%
x = 5
y = 3
print(x + y)
# %% [markdown] at the beginning of a line.
e.g.# %% [markdown]
"""
Add any desired text (in quotation marks) that you want displayed in the rendered jupyter file
"""
❓ What happens when you try to run the above (markdown) cell?
jupyter_to_python.pySome useful shortcuts:
ctrl + enter = run current cellshift + enter = run current cell and jump to next cell[ctrl + shift + ,] A = insert cell above[ctrl + shift + ,] B = insert cell below[ctrl + shift + ,] S = insert cell below current position[ctrl + shift + ,] X = delete selected cell(s)[ctrl + shift + ,] M = change code cell to markdown[ctrl + shift + ,] C = change (markdown) cell to code[ctrl + shift + ,] U = move selected cell(s) up[ctrl + shift + ,] D = move selected cell(s) downConvert python to jupyter file (minor)
In a python file that has "cells":
Right click (anywhere in the file view) -> Export current python file as jupyter notebook
Then, to render the juptyer notebook as an HTML file, do:
jupyter nbconvert --to html --execute <name of jupyter file>.ipynb
... on top of the notebook and select Export -> HTML❓ What does this newly created file look like when you open it in VS Code?
Variable view
To replicate the "global environment" window in RStudio, you can use the "Data Wrangler" extension.
This lets you inspect a pandas dataframe in a spreadheet view, that can be opened as a separate window.
Assuming you have a pandas dataframe in an "interactive window", you will see a new button.
ipykernel package.ModuleNotFoundError. Install missing packages using the VS Code terminal.We will demonstrate a simple python job submitted with Slurm on the Fox HPC cluster. If you have access to Fox and want to follow along, you can download the scripts needed:
wget https://raw.githubusercontent.com/ArashAh/python_workshop/refs/heads/main/scripts/pandas_plots.py
wget https://raw.githubusercontent.com/ArashAh/python_workshop/refs/heads/main/scripts/run_pandas_plots.slurm
wget https://raw.githubusercontent.com/ArashAh/python_workshop/refs/heads/main/scripts/weather_data.csv
For running on other HPC systems you will need to edit the module names and the account name in the run_pandas_plots.slurm script, to match the names for your HPC.