Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

databricks-connect: init at 7.1.0 #96638

Merged
merged 3 commits into from Aug 31, 2020

Conversation

kfollesdal
Copy link
Contributor

Python package databricks-connect is missing in nixpkgs. Used this as an opportunity to try make my first nix expression and nixpkgs.

Very happy for reviews and questions.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • Ubuntu
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of relevant binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

lib/licenses.nix Outdated Show resolved Hide resolved
@jonringer
Copy link
Contributor

is this meant to be used an application or python package? (do you just do $ databricks-connect or do you need to do import databricks-connect in python?)

@jonringer
Copy link
Contributor

according to https://docs.databricks.com/dev-tools/databricks-connect.html it looks like it's meant to be used as a cli tool

@jonringer
Copy link
Contributor

jonringer commented Aug 29, 2020

if it's only meant to be used from the cli, please use https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/python.section.md#buildpythonapplication-function as this allows for separation of python environments

EDIT:
an example is ansible #95683 , where it's available both as a python package and as a cli utility

@kfollesdal
Copy link
Contributor Author

You use it as a python package. But have some applications for setup.

Normal user case is:

  1. $ databricks-connect configure configure connection to databricks cluster
  2. $ databricks-connect test test environment and connection to cluster
  3. Then you use it as a package in your python script. But with the name pyspark.
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

@jonringer
Copy link
Contributor

Please follow CONTRIBUTING.md and manual#submitting-changes-making-patches and squash the fix-up commits.

in this case, you should have two commits:

maintainers: ...
pythonPackages.databricks-connect: ...

@kfollesdal
Copy link
Contributor Author

Have now clean up the commits, so now there are three commits:

licenses: add databricks
maintainers: add kfollesdal
pythonPackages.databricks-connect: init at 7.1.0

@kfollesdal
Copy link
Contributor Author

kfollesdal commented Aug 30, 2020

@jonringer thanks for feedback and help so fare. I have one problem/question left.

If I make a nix-shell with the following script:

with import ./nixpkgs { };

pkgs.mkShell rec {
  buildInputs = [
    python37Packages.databricks-connect
  ];
}

then using the python package pyspark and running the cli cmd pyspark works fine.

But if I install an environment with nix-env -if and the file:

with import ./nixpkgs {};

python37.withPackages (ps: with ps; [ databricks-connect  ])

Then I have to manually set the environment variable JAVA_HOME to get pyspark python package and pyspark cli command to work.

Do you have any clue what the problem can be? I think it can be something with the wrapping.

@kfollesdal
Copy link
Contributor Author

kfollesdal commented Aug 31, 2020

UPDATE
Have now figured out that you do not have to set JAVA_HOME manually after installing pythonPackages.databricks-connect. If you install withnix-env and the following file:

with import ./nixpkgs {};
buildEnv {
  name = "bricks";
  paths = [
    (python37.withPackages (ps: with ps; [ databricks-connect  ]))
    jdk
  ];
}

Then everything works fine.

But as you can see I have to add jdk in the environment. That is strange since it is already added as an dependency in the expression for pythonPackages.databricks-connect. Any advice to how I can avoid this?

@jonringer
Copy link
Contributor

sorry, I sorted python-packages, can you rebase? :)

@jonringer
Copy link
Contributor

jonringer commented Aug 31, 2020

oh, i forgot to respond

JAVA_HOME currently will only be exposed to the builder.

For it to appear in a nix-shell, you would need to do:

mkShell {
  inputsFrom = [ python3Packages.databricks-connect ];
}

however, inputsFrom has other implications, it's easier to just add jdk like you did above

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
shows usage

https://github.com/NixOS/nixpkgs/pull/96638
3 packages built:
python27Packages.databricks-connect python37Packages.databricks-connect python38Packages.databricks-connect

@jonringer
Copy link
Contributor

thanks @kfollesdal for opening your first nixpkgs PR :)

@jonringer jonringer merged commit bf74036 into NixOS:master Aug 31, 2020
@kfollesdal
Copy link
Contributor Author

Thanks @jonringer, I hope I can contribute more.

Regarding the nix-shell and nix-env, I think you misunderstood my question.

nix-shell works as expected with the following expression:

with import ./nixpkgs { };

pkgs.mkShell {
  buildInputs = [
    python37Packages.databricks-connect
  ];
}

But if I install databricks-connect with nix-env with the
following expression:

with import ./nixpkgs {};
buildEnv {
  name = "bricks";
  paths = [
    (python37.withPackages (ps: with ps; [ databricks-connect  ]))
  ];
}

Then it do not work as expected. But if I add jdk to paths, like so:

with import ./nixpkgs {};
buildEnv {
  name = "bricks";
  paths = [
    (python37.withPackages (ps: with ps; [ databricks-connect  ]))
    jdk
  ];
}

Then everything works fine as in the nix-shell case. I think this is strange since
jdk is already added as an dependency to databricks-connect. Is it anyway we can fix this
so it works as expected?

@jonringer
Copy link
Contributor

I think your issue is buildEnv which I believe just symlink joins the outputs, and will disregard stuff passed to mkDerivation

@kfollesdal
Copy link
Contributor Author

Ok, thanks. I will look more into it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants