14 KiB
Contributing Guidelines
This document describes the existing developer tooling we have in place (and what to expect of it), as well as our design and development philosophy.
Naming Conventions
Naming conventions are not automatically enforced, so please read the naming conventions section of PEP 8, which describes what each of the different styles means. A short summary of the most important parts:
- Modules (and hence files) should have short, all-lowercase names.
- Class (and exception) names should normally use the
CapWords
convention (also known asCamelCase
). - Function and variable names should be lowercase, with words separated by
underscores as necessary to improve readability (also known as
snake_case
). - To avoid collisions with the standard library, an underscore can be appended,
such as
id_
. - Always use
self
for the first argument to instance methods. - Always use
cls
for the first argument to class methods. - One leading underscore like
_data
is for non-public methods and instance variables. And it can be used by sub-classes. If it won't be used in sub-classes, use like__data
. - If there is a pair of
get_x
andset_x
methods, they should instead be a proper property, which is easy to do with the built-in@property
decorator. - Constants should be
CAPITALIZED_SNAKE_CASE
. - When importing a function, try to avoid renaming it with
import as
because it introduces cognitive overhead to track yet another name.
When in doubt, adhere to existing conventions, or check the style guide.
Automated Tooling
If you have ran LISAv3 already, then you have installed and used the poetry
tool. Poetry is a PEP 518 compliant and cross-platform build system
which handles our Python dependencies and environment.
This project’s dependencies are found in the pyproject.toml
file. This is similar to but more powerful than the familiar requirements.txt
.
With PEP 518 and PEP 621.
Metadata
The first section, tool.poetry
, defines the project’s metadata (name, version,
description, authors, and license) which will be embedded in the final built
package.
The chosen version follows Semantic Versioning, with the Python specific pre-release versioning suffix ‘.dev1’. Since this is “LISAv3” it seemed appropriate to set our version to ‘3.0.0.dev1’, that is, “the first development release of LISAv3.”
Package Dependencies
The next section, tool.poetry.dependencies
, is where poetry add <package_name>
records our required packages.
Poetry automatically creates and manages isolated environments.
From the documentation:
Poetry will first check if it’s currently running inside a virtual environment. If it is, it will use it directly without creating a new one. But if it’s not, it will use one that it has already created or create a brand new one for you.
On Linux, your initial run of poetry install
will cause Poetry to
automatically setup a new virtualenv using pyenv. If you are developing
on Windows, you will want to setup your own, perhaps using Conda.
-
python: We pinned Python to version 3.8 so everyone uses the same version.
-
psutil: TODO @squirrelsc will document
-
pyyaml: TODO @squirrelsc will document
-
retry: TODO @squirrelsc will document
-
paramiko: TODO @squirrelsc will document
-
spurplus: TODO @squirrelsc will document
-
dataclasses-json: TODO @squirrelsc will document (brings in
usjon
which requiresgcc
andlibpython
) -
portalocker: TODO @squirrelsc will document
-
azure-*: TODO @squirrelsc will document
Developer Dependencies
Similar to the previous section, tool.poetry.dev-dependencies
is where poetry add --dev <package_name>
records our developer packages. These are not
necessary for LISAv3 to execute, but are used by developers to automatically
adhere to our coding standards.
-
Black, the opinionated code formatter which settles all debates as to how our Python files should be formatted. It follows PEP 8, the official Python style guide, and where ambiguous makes the decision for us.
-
Flake8 (and integrations), the semantic analyzer, used to coordinate most of the other tools.
-
isort, the
import
sorter, which automatically splits imports into the expected, alphabetized sections. -
mypy, the static type checker, which coupled with type annotations allows us to avoid the pitfalls of Python being a dynamically typed language.
-
python-language-server (and integrations), the de facto LSP server. While Microsoft is developing their own LSP servers, they do not integrate with the existing ecosystem of tools, and their latest tool, Pyright, simply does not support
pyproject.toml
. Since pyls is used far more widely, and supports every editor, we use it. -
rope, to provide completions and renaming support to pyls.
With these packages installed and a correctly setup editor (see the readme and feel free to reach out to us), your code should automatically follow all the standards which we could automate.
The final sections, tool.black
, tool.isort
, build-system
, and the
.flake8
file (Flake8 does not yet support pyproject.toml
) configure the
tools per their recommendations.
Type Annotations
We are using mypy to enforce static type checking of our Python code. This may surprise you as Python is not a statically typed language. While dynamic typing can be useful, for a complex tool such as LISA it is more likely to introduce bugs that are found only at runtime (which the user experiences as a crash). For more information on why we (and others) do this, see Dropbox’s journey to type checking 4 million lines of Python. PEP 484 and PEP 526 (among others) introduced and defined type hints for the Python language. You can probably figuring out the syntax based on the surrounding code, but you can also see this Intro to Using Python Type Hints and mypy’s cheat sheet.
Runbook schema
Some plugins like Platform need follow this section to extend runbook schema. Runbook is the configurations of LISA runs. Every LISA run need a runbook.
The runbook uses dataclass to define, dataclass-json to deserialize, and marshmallow to validate the schema.
See more examples in schema.py, if you need to extend runbook schema.
Committing Guidelines
A best practice when using Git is to create a
series of independent and well-documented commits. Each commit should “do one
thing” and do it correctly. If a mistake is made (you need to fix a bug or
adjust formatting), you should amend it (or use an interactive
rebase
to edit it). If you’re using Emacs, the Magit package makes
all of this easy. Some of the reasons for making each commit polished is that it
aids immensely in future debugging. It lets us use tools like git bisect
to automatically find bugs, and
understand why prior code was written. Although some of it has gone out of date,
see this otherwise great essay on Git best
practices. For how Git works,
read Git from the Bottom
Up.
For writing your commit messages, see this modification of Tim Pope’s example:
Capitalized, short (72 chars or less) summary
More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.
Write your commit message in the imperative: “Fix bug” and not “Fixed bug” or “Fixes bug.” This convention matches up with commit messages generated by commands like git merge and git revert.
Further paragraphs come after blank lines.
Bullet points are okay, too
Typically a hyphen or asterisk is used for the bullet, followed by a single space, with blank lines in between, but conventions vary here
Use a hanging indent
You should also feel free to use Markdown in the commit messages, as our project is hosted on GitHub which renders it (and Markdown is human readable).
Design Patterns
The most important goal we are attempting to accomplish with LISAv3 is for it to be “simple, clean, and with a low maintenance cost.”
We should use caution when using Object Oriented Design, because when it is used without critical analysis, it creates unmaintainable code. A great talk on this subject is Stop Writing Classes, by Jack Diederich. As he says, “classes are great but they are also overused.”
This Python Design Patterns is a fantastic collection of material for writing maintainable Python code. It specifically details many of the common “Object Oriented” patterns from the Gang of Four book (which, in fact, were patterns geared toward languages like C++, and no longer apply to modern languages like Python), what lessons can be learned from them, and how to apply them (or their modern alternatives) today. It also serves as an easy-to-read guide to the Gang of Four book itself, as its principles still serve us well today.
Every time a developer chooses to use a design pattern, that person needs to reason through and document why it was chosen, and what alternatives were considered. We will recreate the problems with LISAv2 unless we take our time to carefully create a well-designed and maintainable framework.
Several popular patterns that actually do not work well in Python are:
Conversely, patterns that are a natural fit to Python include:
- The Composite Pattern
- The Iterator Pattern
(caution: it is actually better to implement these with
yield
!)
Finally, a high-level guide to all things Python is The Hitchhiker’s Guide to Python. It covers just about everything in the Python world. If you make it through even some of these guides, you will be well on your way to being a “Pythonista” (a Python developer) writing “Pythonic” (canonically correct Python) code left and right.
Async IO
With Python 3.4, the Async IO pattern found in languages such as C# and Go is
available through the keywords async
and await
, along with the Python module
asyncio
. Please read Async IO in Python: A Complete
Walkthrough to understand at a high
level how asynchronous programming works. As of Python 3.7, One major “gotcha”
is that asyncio.run(...)
should be used exactly once in
main
, it starts the
event loop. Everything else should be a coroutine or task which the event loop
schedules.
Future Sections
Just a collection of reminders for the author to expand on later.
- unittest
- doctest
- subprocess
- GitHub Actions
- ShellCheck
- Governance
- Maintenance Cost
- Parallelism and multi-plexing
- Versioned inputs and outputs