Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pythonPackages.apache-airflow: 1.10.5 -> 1.10.14 #106696

Closed
wants to merge 5 commits into from

Conversation

drewrisinger
Copy link
Contributor

@drewrisinger drewrisinger commented Dec 12, 2020

Motivation for this change

Update apache-airflow to the latest version, to fix CVE security issues (See #106671).
Closes #106671.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@drewrisinger
Copy link
Contributor Author

Ok, got a clean build of apache-airflow. This is ready for review.

Copy link
Member

@mweinelt mweinelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not a user, just noticed these CVEs popping up. I'll defer to @bhipple et al.

@ingenieroariel
Copy link
Member

This looks to me better than the original as it removes the need for custom patching. LGTM

pkgs/development/python-modules/hdfs/default.nix Outdated Show resolved Hide resolved
pkgs/development/python-modules/python-nvd3/default.nix Outdated Show resolved Hide resolved
@@ -1,15 +1,18 @@
{ lib
, stdenv
, buildPythonPackage
, fetchPypi
, fetchFromGitHub
, fetchpatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
, fetchpatch

@SuperSandro2000
Copy link
Member

This is a semi-automatic executed nixpkgs-review which is checked by a human on a best effort basis and does not build all packages (e.g. lumo, tensorflow or pytorch).
If you have any questions or problems please reach out to SuperSandro2000 on IRC.

Result of nixpkgs-review pr 106696 run on x86_64-linux 1

6 packages failed to build and are new build failures:
9 packages built:
  • python37Packages.cattrs
  • python37Packages.hdfs
  • python37Packages.python-nvd3
  • python38Packages.cattrs
  • python38Packages.hdfs
  • python38Packages.python-nvd3
  • python39Packages.cattrs
  • python39Packages.hdfs
  • python39Packages.python-nvd3

@mweinelt
Copy link
Member

mweinelt commented Jan 19, 2021

@bhipple @costrouc @ingenieroariel Can anyone of you review this? It's a big change and it worries me that it seems to bitrot already, while there a security issue with the whole thing.

@SuperSandro2000
Copy link
Member

This is a semi-automatic executed nixpkgs-review which is checked by a human on a best effort basis and does not build all packages (e.g. lumo, tensorflow or pytorch).
If you have any questions or problems please reach out to SuperSandro2000 on IRC.

Result of nixpkgs-review pr 106696 run on x86_64-darwin 1

4 packages failed to build and are new build failures:
11 packages built:
  • python37Packages.cattrs
  • python37Packages.hdfs
  • python37Packages.python-nvd3
  • python37Packages.sqlalchemy-jsonfield
  • python38Packages.hdfs
  • python38Packages.python-nvd3
  • python38Packages.sqlalchemy-jsonfield
  • python39Packages.cattrs
  • python39Packages.hdfs
  • python39Packages.python-nvd3
  • python39Packages.sqlalchemy-jsonfield

The following issues got detected with the above build packages.
Please fix at least the ones listed with your changed packages:

apache-airflow:

Please consider this feature to be alpha.

A substituteInPlace with an unused --replace got detected:

substituteStream(): WARNING: pattern 'jinja2>=2.10.1, <2.11.0' doesn't match anything in file 'setup.py'

Please check the offending substituteInPlace for typos or changes in source.

@SuperSandro2000
Copy link
Member

This is a semi-automatic executed nixpkgs-review which is checked by a human on a best effort basis and does not build all packages (e.g. lumo, tensorflow or pytorch).
If you have any questions or problems please reach out to SuperSandro2000 on IRC.

Result of nixpkgs-review pr 106696 run on x86_64-linux 1

3 packages failed to build and are new build failures:
12 packages built:
  • python37Packages.cattrs
  • python37Packages.hdfs
  • python37Packages.python-nvd3
  • python37Packages.sqlalchemy-jsonfield
  • python38Packages.cattrs
  • python38Packages.hdfs
  • python38Packages.python-nvd3
  • python38Packages.sqlalchemy-jsonfield
  • python39Packages.cattrs
  • python39Packages.hdfs
  • python39Packages.python-nvd3
  • python39Packages.sqlalchemy-jsonfield

The following issues got detected with the above build packages.
Please fix at least the ones listed with your changed packages:

apache-airflow:

Please consider this feature to be alpha.

A substituteInPlace with an unused --replace got detected:

substituteStream(): WARNING: pattern 'jinja2>=2.10.1, <2.11.0' doesn't match anything in file 'setup.py'

Please check the offending substituteInPlace for typos or changes in source.

python38Packages.apache-airflow:

Please consider this feature to be alpha.

A substituteInPlace with an unused --replace got detected:

substituteStream(): WARNING: pattern 'jinja2>=2.10.1, <2.11.0' doesn't match anything in file 'setup.py'

Please check the offending substituteInPlace for typos or changes in source.

@ingenieroariel
Copy link
Member

@bhipple @costrouc @ingenieroariel Can anyone of you review this? It's a big change and it worries me that it seems to bitrot already, while there a security issue with the whole thing.

@mweinelt I can run the latest version this weekend to verify that the binary and libraries work as expected. Airflow has a lot of dependencies and I am afraid this kind of work will be needed before every NixOS release. :/

Comment on lines +189 to +195
# fix issues w/ upgrade to pendulum >= 2.0
substituteInPlace airflow/settings.py \
--replace "from pendulum import Pendulum" "from pendulum import DateTime as Pendulum"
substituteInPlace tests/core/test_core.py \
--replace "from pendulum import utcnow" "import pendulum; import functools; utcnow = functools.partial(pendulum.now, 'UTC')"

substituteInPlace tests/core.py \
--replace "/bin/bash" "${stdenv.shell}"
# --replace "from pendulum import utcnow" "from datetime import datetime as _dt; utcnow = _dt.utcnow"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to provide a second, separate, pendulum_1 derivation on pythonPackages, and put this patch behind an overridable option... there are lots of bits of Airflow that import pendulum so using pendulum 2 might be the wrong approach here, see https://github.com/apache/airflow/search?q=pendulum

Here's a working override that can point to what's needed for a pendulum_1 derivation:

        pendulum = pythonSuper.pendulum.overridePythonAttrs (oldAttrs: rec {
          inherit (oldAttrs) pname;
          version = "1.4.4";
          name = "${pname}-${version}";
          src = fetchPypi {
            inherit pname version;
            sha256 = "0p5c7klnfjw8f63x258rzbhnl1p0hn83lqdnhhblps950k5m47k0";
          };

          propagatedBuildInputs = oldAttrs.propagatedBuildInputs
            ++ (with pythonSelf; [ tzlocal ]);
        });

Comment on lines +217 to +219
disabledTests = [
# Broken with KeyError: visibility_timeout in "section_dict"
"test_broker_transport_options"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had problems with a few more tests on nixos that look like they come from nixos:

Suggested change
disabledTests = [
# Broken with KeyError: visibility_timeout in "section_dict"
"test_broker_transport_options"
disabledTests = [
# unable to actually find /bin/echo on nixos
"test_deprecated_options_cmd"
# has trouble finding the tempfile created on nixos
"test_get_section_should_respect_cmd_env_variable"
# unable to actually find /bin/bash on nixos
"test_bash_operator_kill"
# Broken with KeyError: visibility_timeout in "section_dict"
"test_broker_transport_options"

@mweinelt
Copy link
Member

Airflow has a lot of dependencies and I am afraid this kind of work will be needed before every NixOS release. :/

This is not about a pre-release bump, this is about at least three CVEs that need to be addressed during the release cycle. Please see #106671.

@ingenieroariel
Copy link
Member

@mweinelt thanks for clearing up my confusion.

@drewrisinger I had to add this commit for it to build in OSX: piensa@fa9fc16

I used this shell.nix to try things out:

let
  pkgs = import (fetchTarball "https://github.com/piensa/nixpkgs/archive/dr-pr-update-airflow.tar.gz") {};
in pkgs.stdenv.mkDerivation rec {
  name = "composer";
  src = ./.;
  buildInputs = with pkgs; [
    apache-airflow
  ];
}

Once it built, I created a sample dag, by creating a dags subfolder within it and this file in there simple.py:

from __future__ import print_function

import datetime

from airflow import models
from airflow.operators import bash_operator
from airflow.operators import python_operator


default_dag_args = {
    # The start_date describes when a DAG is valid / can be run. Set this to a
    # fixed point in time rather than dynamically, since it is evaluated every
    # time a DAG is parsed. See:
    # https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date
    'start_date': datetime.datetime(2021, 1, 18),
}

# Define a DAG (directed acyclic graph) of tasks.
# Any task you create within the context manager is automatically added to the
# DAG object.
with models.DAG(
        'daily_hello_world',
        schedule_interval=datetime.timedelta(days=1),
        default_args=default_dag_args) as dag:
    def greeting():
        import logging
        logging.info('Hello World!')

    # An instance of an operator is called a task. In this case, the
    # hello_python task calls the "greeting" Python function.
    hello_python = python_operator.PythonOperator(
        task_id='hola',
        python_callable=greeting)

    # Likewise, the goodbye_bash task calls a Bash script.
    goodbye_bash = bash_operator.BashOperator(
        task_id='chao',
        bash_command='echo Goodbye.')

    # Define the order in which the tasks complete by using the >> and <<
    # operators. In this example, hello_python executes before goodbye_bash.
    hello_python >> goodbye_bash

With that, I set the airflow home to the folder where I had the checkout (not a great practice, but works for this test):
export AIRFLOW_HOME="$(pwd)"
and created the sqlite database needed before starting airflow:

airflow db init

This creates a few files, but one of them has a value that does not work with the Werkzeug version in this PR, see: apache/airflow#11873

so we need to manually change the following in the generated airflow.cfg:

# Set samesite policy on session cookie
cookie_samesite = Lax

and then run the webserver and navigated to localhost:8080 to find this:
image

In order for the sample task to actually run, we need to run airflow scheduler on a new terminal and using the same folder / env vars.

Then we can trigger the task manually on the UI:
image

and wait for evidence on the scheduler output that the task is running:
image

Functionality wise, and with some small modifications this PR looks good to me.

@bhipple
Copy link
Contributor

bhipple commented May 8, 2021

Taking another look at this for #122042, @drewrisinger are you still available to rebase? It looks like the only outstanding merge concern a few months ago was missing maintainers on a few python dependencies. You can add me to those.

In the meantime there's been an Airflow 2.0.X release, which seems to change around some dependencies:
http://airflow.apache.org/docs/apache-airflow/stable/changelog.html

I'd be fine with rebasing/polishing this PR as-is and merging, since it seems strictly better than what we have currently; but going to 2.X would be nice at some point too. It's a shame the dependency closure of Airflow is such a mess.

@ingenieroariel
Copy link
Member

ingenieroariel commented May 8, 2021 via email

@siraben siraben closed this Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update python3Packages.apache-airflow >= 1.10.13
7 participants