Содержание

Coding Guidelines

Test Driven Development
Do not Repeat Yourself
Single Responsibility
Python and Docstrings Style
The Zen of Python
R and Docstrings Style
Evidence-Based Software Design
You are not going to need it
Minimum Viable Product
Publish Often Publish Early
User feedback before making a release

Этот файл содержит неоднозначные символы Юникода, которые могут быть перепутаны с другими в текущей локали. Если это намеренно, можете спокойно проигнорировать это предупреждение. Используйте кнопку Экранировать, чтобы подсветить эти символы.

Coding Guidelines

Here are some coding guidelines that we have adopted in this repo.

Test Driven Development
Do not Repeat Yourself
Single Responsibility
Python and Docstrings Style
The Zen of Python
R and Docstrings Style
Evidence-Based Software Design
You are not going to need it
Minimum Viable Product
Publish Often Publish Early
User feedback before making a release

Test Driven Development

We use Test Driven Development (TDD) in our development. All contributions to the repository should have unit tests, we use pytest for Python files, testthat for R files and papermill for notebooks.

Apart from unit tests, we also will have nightly builds with smoke and integration tests. For more information about the differences, see a quick introduction to unit, smoke and integration tests for python.

You can find a guide on how to manually execute all the tests in the TESTS.md

Click here to see some examples

Basic asserts with fixtures comparing structures like list, dictionaries, numpy arrays and pandas dataframes.
Basic use of common fixtures defined in a conftest file.
Python unit tests for our data downloading.
Notebook unit tests for our [Python notebooks]

Do not Repeat Yourself

Don't Repeat Yourself (DRY) by refactoring common code.

Click here to see some examples

See how we are using DRY when testing our notebooks.

Single Responsibility

Single responsibility is one of the SOLID principles, it states that each module or function should have responsibility over a single part of the functionality.

Click here to see some examples

Without single responsibility:

def train_and_test(train_set, test_set):
    # code for training on train set
    # code for testing on test_set

With single responsibility:

def train(train_set):
    # code for training on train set

def test(test_set):
    # code for testing on test_set

Python and Docstrings Style

We use the automatic style formatter Black and code checker flake8. See the installation guide for VSCode and PyCharm.

We use Google style for formatting the docstrings.

Click here to see some examples

The Zen of Python

We follow the Zen of Python when developing general Python code.

Beautiful is better than ugly.
Explicit is better than implicit. 
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Click here to see some examples

Implementation of explicit is better than implicit with a read function:

#Implicit
def read(filename):
    # code for reading a csv or json
    # depending on the file extension

#Explicit
def read_csv(filename):
    # code for reading a csv

def read_json(filename):
    # code for reading a json

R and Docstrings Style

We use the automatic style formatter Styler and code checker Lintr.

We use Google style for formatting the docstrings and Tidyverse style for coding syntax.

Click here to see some examples

Styler formatting on R files.
[Styler formatting on Notebooks]
Docstring with Google style.

Evidence-Based Software Design

When using Evidence-Based Design (EBD), software is developed based on customer inputs, standard libraries in the industry or credible research. For a detailed explanation, see this post about EBD.

Click here to see some examples

When designing the feature engineering utility in Python, we took the decision of using classes instead of functions, following standards in the industry like scikit-learn. See our implementation of Python feature engineering utility.

You are not going to need it

You aren’t going to need it (YAGNI) principle states that we should only implement functionalities when we need them and not when we foresee we might need them.

Click here to see some examples

Question: should we start developing now financial forecasting capabilities for the Forecasting project?
Answer: No, we will wait until we see a demand of these capabilities.

Minimum Viable Product

We work through Minimum Viable Products (MVP), which are our milestones. An MVP is that version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort. More information about MVPs can be found in the Lean Startup methodology.

Click here to see some examples

Initial MVP of our repo with basic functionality.
Second MVP to give early access to selected users and customers.

Publish Often Publish Early

Even before we have an MVP, get the code base working and doing something, even if it is something trivial that everyone can "run" easily.

Click here to see some examples

We make sure that in between MVPs all the code that goes to the branches staging or master passes the tests.

User feedback before making a release

A product cycle is not finished until we get feedback from a user, we have made changes based on the feedback and all the tests are passing.

Click here to see some examples

See our branch merging strategy.