A colleague of mine recently stumbled upon an error in a Python script that uses asynchronous I/O:
|
|
This was only apparent in the latest version of Python 3.9, namely 3.9.13, but not with the latest version at 3.10.6. Curious as I am I didn’t want to just respond with “Well, just use a newer version of Python,” but rather find out which version exactly is the one with the fix.
While I was able to just install Python 3.9 on my Fedora-based system using DNF, installing more specific patch versions of Python didn’t seem feasible. Not with a simple dnf install
at least. I considered just pulling down the relevant container images, which very readily support even alpha releases of Python, but I really didn’t want to faff about with potential issues that would only reveal themselves in containers and not on my own system.
I was reminded of Anthony Sottile’s video about how to make a virtual environment from CPython’s source and got to work to just manually testing each patch version starting from 3.10.41. These are steps I used down until 3.10.0:
- clone CPython repository:
git clone git@github.com:python/cpython.git
- this will take a while the first time around
- navigate into it:
cd cpython
- check out relevant version (e.g. 3.10.4):
git checkout tags/v3.10.4
- create directory for upcoming build of Python:
mkdir prefix
- run configuration script targeting that directory:
./configure --prefix "${PWD}/prefix"
- build Python according to the number of available processors:
make -s -j8
- this will take a while depending on the host system
- find out number of processors with:
grep "processor" /proc/cpuinfo | wc -l
- install Python into previously created directory:
make install
- missing dependencies may need to be installed beforehand
- if dependencies were installed, then
prefix
directory needs to be deleted and./configure
andmake
need to be ran again
- if dependencies were installed, then
- missing dependencies may need to be installed beforehand
- confirm Python version:
$ ./prefix/bin/python3.10 --version
Python 3.10.4
- create virtual environment using that version of Python:
./prefix/bin/python3.10 -m venv venv3.10.4
- activate virtual environment:
source venv3.10.4/bin/activate
- navigate to project’s directory with failing script
- install requirements:
pip install -r requirements.txt
- run script:
python3 script.py
Seeing the same traceback as shown above would have indicated that the version was still faulty in regards to executing this script. I used these exact same steps to divine that 3.10.0a2 was buggy but 3.10.0a3 wasn’t.
There are 285 commits between those two versions and I really didn’t want to find the culprit using the methodology above as it would have just been too time-consuming. Luckily, since I’ve pretty much watched all of his explainer videos and knew this, Anthony Sottile had made a video about finding regressions with ‘git bisect’ that I was reminded of. This paired with this blog post by Shiva Rajagopal resulted in me creating a nifty helper script that git bisect
could use automatically:
|
|
With this at the ready I just had to start the bisection using the following commands2:
git bisect start --term-new=fixed --term-old=unfixed
git bisect fixed v3.10.0a3
git bisect unfixed v3.10.0a2
git bisect run bash -c "! ./bisect.sh"
I spent a lot of time just trying pretty much an identical script, but using git bisect run ./bisect.sh
instead, which resulted in the entire run not finding anything despite a manual flow of marking individual commits as either “fixed” or “unfixed” working as expected. It turns out that when trying to find a good commit (i.e. where something was fixed) as opposed to a bad one means that the result of the helper script (i.e. its exit code) needs to be negated3. This isn’t entirely intuitive to me and if I hadn’t found that then I might still be fiddling with the damn thing.
Either way that negation makes the bisection work like a charm and made me find the commit that fixed the issue fairly quickly (around 15-20 minutes of continuous builds, installs, etc.):
|
|
But what was required to fix the failing script? It was just a matter of moving the initialization of asyncio.Semaphore
to the function that asyncio.run()
calls and then passing that semaphore to each underlying task that gets created for the event loop4. This is because prior to 3.10.0a3 asyncio.Semaphore
and other primitives (e.g. asyncio.Lock
and asyncio.Queue
) initialized a separate event loop that was outside of the one that asyncio.run()
created and didn’t attempt to synchronize against the event loop created by asyncio.run()
any time it created one. Moving the initialization fixed it because now both refer to the same event loop.
Addendum: it figures that a little time after going through this rigamarole Real Python has a post up about installing pre-release versions of Python and it seems like pyenv
is the way to go about doing that. That is definitely something I am going to keep in mind for the next time I need to do something like this.
-
That version specifically because I noticed
asyncio
-related changes in the change log. ↩︎ -
These were ran from the root of the
cpython
repository. ↩︎ -
I discovered this through an answer on Stack Overflow. As one does… ↩︎
-
More greatness from Stack Overflow. ↩︎