I’ve been working on a group project this term, and we’ve been having some trouble with imports. The project structure is shown (with some details removed for simplicity).
project
│ .gitignore
│ README.md
│ requirements.txt
│ setup.py
│
└───project_name
│ __init__.py
│
├───data
│ data_loader.py
│ feature_extractor.py
│ feature_recorder.py
│ gtzan_utils.py
│ __init__.py
│
├───ml
│ │ __init__.py
│ │
│ └───c
│ ml_algo_c.py
│ training_script.py
│ write_features_script.py
│ __init__.py
│
├───resources
│ .gitignore
│
└───tests
testing.py
__init__.py
Please forgive the placeholder project_name
. I swear we’ll replace it with something brilliant any day now.
The problem is, I need to access the data
classes in training_script.py
. I spent some time trying to get relative imports working the way I wanted. Python did not like this at all, and I wound up with this terrible code.
import sys
import os
import time
import tensorflow as tf
# This import works fine since it's in the same folder
# as training_script.py
from ml_c import MlAlgoC
# Doing work to crawl up the top-level directory of the project
# and append that directory to sys.path so that Python can
# find our project files
script_dir = os.path.dirname(__file__)
mymodule_dir = os.path.join(script_dir, '..', '..')
sys.path.append(mymodule_dir)
# The preceding code was necessary to make these imports
# of local project files work.
# This is terrible!
import data.data_loader as dl # noqa: E402
import data.feature_recorder as fr # noqa: E402
# Note the inline comments needed to stop the linter
# from complaining about imports not being at the
# top of the file. Ugh!
In fact, the whole group was having this issue. In most languages, you would just do something like this.
# Not in Python, but many languages would allow an import
# like this
import '../../data/data_loader.py'
import '../../data/feature_recorder.py'
But no dice, as Python does not allow any such import syntax. What’s the problem?
When looking for imports, Python crawls sys.path
, which is a list of paths where Python expects to find modules and packages. This, by default, holds the paths Python needs to find Python library files and installed packages. It also contains the folder in which your Python execution started.
When I’m trying to run my script files in the project_name/ml/c
folder, either directly through the command line, or using my IDE (VS Code), the local folder Python supplies to sys.path
is the c
folder. Python has no idea how to find the data
folder from here — it can only find modules/packages at the same level or lower in the tree.
One note here is that PyCharm might automatically add your project folder to sys.path
, so those using PyCharm might not have this issue.
Another note: you could argue that the script files should be located together in a bin
folder somewhere, but that doesn’t really help with this particular problem.
I looked into various solutions for this, and the cleanest one I could find is to simply install the project like you would any other Python package. Then, you can access it using the same absolute import syntax you would use for any built-in Python library or installed package.
This sounds problematic, though, because do we really want a duplicate of our project files installed somewhere else on our system? Worse, are we going to have to update/reinstall the package every time we change a single line of code? Finally, what if we just want a small project and don’t want to do extra work to make it installable as a robust, reusable package?
Luckily, pip has a method to handle this. Using an editable install, you can do a very lightweight install that essentially sets up a symbolic link to your project folder, wherever it is located. The command is simple.
pip install -e path/to/SomeProject
# Or, just use this while navigated to the directory
# where your setup.py file is located
pip install -e .
Then, Python can access your project folder just like it would any installed package. Since this links to your actual project folder, any changes you make will be reflected and available in your imports immediately. You are just linking from the Python installed package folder directly to your real project directory.
Of course, you will need a setup.py
file to make this work. Fortunately, you don’t need all the fields filled in that a true installable package would require. If you just want to ease your intra-project imports, a setup.py
as simple as this is sufficient.
from setuptools import setup
setup(name='project_name')
The following pip commands may be helpful as well.
# Display info about your installed project
pip show project_name
# Uninstall your project if you want to clean up later
# Don't worry, this just removes the symbolic link and
# doesn't delete any of your project files in its
# original home.
pip uninstall project_name
With this setup, that abysmal code from earlier becomes much cleaner and more Pythonic.
import sys
import time
import tensorflow as tf
from project_name.ml.c.ml_algo_c import MlAlgoC
import project_name.data.data_loader as dl
import project_name.data.feature_recorder as fr
This will work wherever you’re doing these imports, as Python finds your project files in the same place as other installed packages!
We’re no longer trying to awkwardly force Python to do imports in a way it doesn’t want to. If you can’t beat them…
(join them… learn their secrets… then plot your revenge)
Song of the week: Cat Clyde – All the Black (acoustic)