Python imports are not that hard once you understand how they work internally. I needed to revisit the topic recently, so not being my daily programming language any more, I think it would be interesting to write a short summary for my future self (and potential visitors).
The most common import scenarios are:
import os.path as pathor
from os import path
from os.path import (abspath, dirname)
You can also see that using
... as ... you can alias imports.
Importing from a file follows the same syntax:
Given the example:
/a.py /folder/b.py /c.py
c.py you can do the following:
import folder.b import a
Clear and simple, no problems so far.
Given the structure:
/src/config_folder/config.py /src/a.py /src/run.py
You can reference your current package/module via
. (as in
from . import a), and add additional dots to traverse up to parent folders, e.g.
from ..config_folder import config. However, the reference point can vary, you need to have a parent module, and things can get complicated as codebases grow and you move code around.
You can also reference your modules via absolute imports, by referencing a package path:
from src.config_folder import config. But we will see that this can also be a bit complex at times (hint: you probably don't want that
src. prefix in the import statement).
The path to module imports is resolved with the following logic:
PYTHONPATHenvironment variable (+ info)
We shouldn't mess with
sys.path, so that leaves us two choices:
PYTHONPATH, and use absolute imports: This is what you will need to do in certain scenarios, like running a cronjob, but I also like to enforce it for non-trivial projects, like those with multiple configurations
The single most critical point is that the import resolution (or "root") is calculated by default from the launched script location. If you run
python3 /a/b/c.py, the root folder to search for import modules is going to be
Using the previous section example:
python3 run.pyfrom inside
from config_folder import config, omitting the
srcpackage, because we're already inside it
If we want to namespace each subproject (common practice for example in Django projects), we'd need to arrange our code to have an additional package level, for example like:
/src/myapp/config_folder/config.py /src/myapp/a.py /src/myapp/run.py
And we should run
python3 myapp/run.py from the
src folder... But if we try, it will still give you an
ModuleNotFoundError: No module named 'myapp' error, why? It errors because, if you remember, it will switch to
myapp as the root folder to execute
run.py. And so, this is why using
PYTHONPATH always is a good approach. The following will work if run from the
$ PYTHONPATH=. python3 myapp/run.py
I've created examples of the three most common scenarios for absolute imports and uploaded them to my GitHub's Python miscellaneous repository:
The third one is often the source of headaches.
Note that I didn't created relative import examples, because a) I find absolute imports more clear, and b) I'm used to running things almost always specifying
PYTHONPATH, and very often from a container (where the entry points are also very clearly defined).
In the early 2000s, with Extreme Programming's high focus on testing as a critical aspect of software development, many of us were introduced to or became used to applying specific testing patterns that today are considered anti-patterns. Back then, some were not seen as bad, but often the reason was that we really had no other choice, as you mainly dealt with closed-source libraries and frameworks. Other than inside Stack Overflow, I find it hard nowadays to find some articles mentioning the topic, so here goes my contribution.
You shouldn't do it . Your public methods represent your class surface/API/interface, and private methods are implementation details; so, when testing private methods, you're coupling the test to the internal implementation details, which should be free to change with as less friction as possible.
Instead, do one of the following:
With some languages having either poor or no encapsulation, it becomes very appealing and an easy way to "speed writing tests", but you should remember that you're breaking object-oriented encapsulation: If the method was private, it was meant not to be used directly from the outside, not even from a unit test.
In the past, we relied on either Reflection to access some private methods, or inheritance and polymorphism (when the language had good enough support) and created a child that exposed public methods to ease testing and/or mocking. But today, I advise against this and instead go for wrapping the external class and testing its public surface only. Most, if not all, scenarios can be covered by composition.
Sometimes mentioned as "System Under Test", both represent the same wrong concept: You should never mock your any class methods of the main class you're testing in a test. If you need to do it, or think that doing so would simplify the tests, that's a clear signal of a refactor waiting to materialize: refactoring to another method or a different class.
There's really not much to it: If your class does
B is a private method only called from
A, either you test everything when testing
A (maybe ignoring the fact that you know there's a
B method), or you extract
B to a separate module/class, where it is ok to test it in isolation, and make
B, and then mock
B when testing
I've heard at times some pushback comments like "but I shouldn't rewrite my code to conform to tests". While that point can theoretically be correct, what in practice happens is that testing often surfaces problems in your existing code. It is not the cause of why you need to change your code, it is helping you identify the changes you that need to be done.
If your code were simple, then it would be easy to test.
In my opinion, tests should aim to reproduce production conditions. We are already mocking, stubbing, and faking so many things (at times maybe too many); thus, we shouldn't do yet more shortcuts.
: Sample reference: Unit Testing Principles, Practices, and Patterns book
With our current LLM wave, which is fascinating, I've begun reading about their basics. I have a draft or two of posts with small experiments I'm doing to replicate tiny pieces of their systems (text processing is a topic that I'm not sure why but sparks my curiosity), but I remembered that not long ago, I had written a simple Markov model, a Markov chain to generate variants of the sentences found in a text.
It reads a
.txt file line by line, assuming each line is a sentence, and fills a Python dictionary with the "chain" of words that forms each sentence (plus the start and end of sentence delimiters).
Using a few sentences from my sample
Be the change you want to see in this world Be the person your dog thinks you are Everything a person can imagine, others will do
When it finishes reading, it knows it can begin a sentence with "
Be" or "
Everything"; If it (randomly) chooses "
Be", then it has only seen the word "
the" after it, so must it must follow; but the third word can either be "
change" or "
person"; If it chose "
person", then the next word could either be "
your" or "
can"; and so it goes until it either picks an end of sentence delimiter, or we reach the maximum number of words per sentence we've setup.
It could generate the sentence "
Be the person can imagine, others will do .". It is incorrect, but the bigger the input text you feed, the greater the variety and potential chance of generating something making more sense.
As an example with the quotes file, running it a few times, sometimes produces funny philosopher quotes:
Experience is easy . Don’t teach them like a professional is right not improving . Be the worst . He who seek the things will never have to control complexity not a shorter letter . Spend your own happiness and go home . Boy Scout Rule Always leave the happiness and go home . Learn the best and distribute the rules like a priority . Work hard and practice something you’ve never have written a marvellous thing that you don't feel like doing them to ... He who thinks you want to think . Try to create it wrong because nobody sees it again .
I've also included a transcript of a TED talk, which again mostly generates gibberish, but at times almost looks correct.
Not precisely groundbreaking, but fun and illustrative of some basic "brute-forcing" method of creating new text.
You can find the Python code on my GitHub.
Browser automation has advanced a lot, not only regarding the frameworks and tools but also in the most fundamental piece: the browser itself. Google Chrome is now very mature, has the biggest market share (as of mid'2023), and complies with all web standards, so it is an excellent starting point for automation projects.
In this post, I'll mention the most relevant pieces you need to set it up.
Using Google's Chromium instead of the main Chrome has two main advantages:
But otherwise, it is the same browser.
There is a handy latest build link to download Chromium: https://download-chromium.appspot.com
Be aware that those builds, under Linux, come without the Widevine (DRM) compilation flag, so even if you follow the steps below, it won't work with protected content.
The ungoogled-chromium-binaries GitHub project provides Linux binaries compiled with the DRM flag. From the releases page is easy to pick either the latest version or a specific one:
An alternative site that hosts binaries for all platforms compiled with the DRM flag is: https://chromium.woolyss.com/
ChromeDriver is another critical piece, alongside an automation framework like WebDriverIO. It is easy to automate fetching a certain version via their download URLs:
As mentioned before, Chromium might come without the DRM library, Widevine. You can fetch specific versions via URLs like the following:
Following the instructions provided at the chromium-widevine GitHub project set it up, which consists of extracting the files in a certain subfolder structure inside Chromium's main folder.
Another URL you'll use a lot when setting up Chromium automation is https://peter.sh/experiments/chromium-command-line-switches/, because it contains a complete list of the hundreds of command-line arguments/flags/switches. There is no official documentation, so this is really valuable.
For debugging errors thrown by the browser, you probably want to use the flags
Suppose you plan to run automated browsers in a Linux environment without display (like a Docker container, or a CI instance installed without the X Server). In that case, you will probably want to use XVFB (and
Finally, if you are really brave, and have some spare time, you can manually download and compile Chromium from the source code, but it is time-consuming.
UPDATE #1: Added friend suggestion of another page containing binaries for all platforms (including Linux with DRM flag).
I've mentioned at least once my opinion that I would have preferred for Mercurial to have won the distributed version control systems race, because its commands were way more consistent and easy. But as of today, git has both come a long way and it it also a very powerful tool. And won the battle. So, I fully embraced git and been trying to level up lately.
I've written a git cheatsheet since quite some time, and while it still does not cover everything (and won't), I've added a bunch of new content after reading the book.
Title: Pro Git
Author(s): Scott Chacon, Ben Straub
If I had to summarize this book quickly, I'd say: If you use git, you must read it.
I've read dozens of articles and tutorials with varying difficulty levels (the hardest being at times git's own documentation). Since the first chapter, I found the explanations excellent. Everything is nicely explained, accompanied by examples, and any time the topic at hand might be non-trivial to understand, the authors will also include helpful diagrams showing branches, commits, or whatever is needed.
Need to learn about the different states a file can be (untracked, staged, committed, ...)? Check; need to learn complex strategies to bring commits from some branches to others when all of them had changes? Check; want to know how git stores commit references and even learn how to do low-level operations and other hardcore stuff? Check. To provide some context, the book is heavily based on git, and GitHub is barely mentioned here and there, so you will learn to do things in a generic but proper way and then lean on services such as GitHub or GitLab to maintain your remote repositories, user accounts and the like. But if you want, the book teaches you how to setup your own git servers (and even how the different available communication protocols work).
Over the ~520 pages, there's so much content, sometimes in so much detail, that I skipped most of the server management content. But now I know that it is explained there too, and if I need to, I can go back and check how to manage user credentials and push/pull repository permissions. I recommend picking, at minimum, all the general chapters (which can be around 50% of the book).
I wish I had read the book earlier because now I learn how git works internally, which helps me better understand any merge issue, any colleague asking "how do I xxxxx?", and how best to work with the tool.
Minor update: I just remembered mentioning another remarkable feature of the book. It is freely available for download. So there's no excuse not to give it a try.