Skip to content

F. Ricci et al. Electronic Transport Properties available through load_dataset()? #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janosh opened this issue Apr 1, 2021 · 11 comments · Fixed by #628
Closed

F. Ricci et al. Electronic Transport Properties available through load_dataset()? #606

janosh opened this issue Apr 1, 2021 · 11 comments · Fixed by #628

Comments

@janosh
Copy link
Member

janosh commented Apr 1, 2021

Is the MPContrib Electronic Transport dataset available via matminer?

This

from matminer.datasets import get_available_datasets

get_available_datasets()

prints

['boltztrap_mp',
 'brgoch_superhard_training',
 'castelli_perovskites',
 'citrine_thermal_conductivity',
 'dielectric_constant',
 'double_perovskites_gap',
 'double_perovskites_gap_lumo',
 'elastic_tensor_2015',
 'expt_formation_enthalpy',
 'expt_gap',
 'flla',
 'glass_binary',
 'glass_binary_v2',
 'glass_ternary_hipt',
 'glass_ternary_landolt',
 'heusler_magnetic',
 'jarvis_dft_2d',
 'jarvis_dft_3d',
 'jarvis_ml_dft_training',
 'm2ax',
 'matbench_dielectric',
 'matbench_expt_gap',
 'matbench_expt_is_metal',
 'matbench_glass',
 'matbench_jdft2d',
 'matbench_log_gvrh',
 'matbench_log_kvrh',
 'matbench_mp_e_form',
 'matbench_mp_gap',
 'matbench_mp_is_metal',
 'matbench_perovskites',
 'matbench_phonons',
 'matbench_steels',
 'mp_all_20181018',
 'mp_nostruct_20181018',
 'phonon_dielectric_mp',
 'piezoelectric_tensor',
 'steel_strength',
 'wolverton_oxides']

So I'm guessing not? If so, curious to know why.

Also, I'd like to suggest adding a short code block to each MPContrib detail page showing how to download it. E.g.

Use matminer (pip install matminer) to download this dataset programmatically:

from matminer.datasets import load_dataset

df = load_dataset("matbench_phonons")
@ardunn
Copy link
Contributor

ardunn commented Apr 1, 2021

Hey @janosh

Currently the full data is not available through matminer, though if @tschaume wants to make a matminer-loadable static .json.gz of it available, I'd be glad to add it to matminer.

There is an abbreviated version of it: https://hackingmaterials.lbl.gov/matminer/dataset_summary.html, boltztrap_mp available in matminer. The following columns are available:

image

@ardunn ardunn closed this as completed Apr 1, 2021
@janosh
Copy link
Member Author

janosh commented Apr 2, 2021

@ardunn Thanks for the quick reply! Do you have any information on how the 8,924 entries were selected from the 44,333 listed in the full dataset at https://contribs.materialsproject.org/projects/carrier_transport?

@ardunn
Copy link
Contributor

ardunn commented Apr 3, 2021 via email

@tschaume
Copy link
Collaborator

tschaume commented Apr 3, 2021

@janosh @ardunn I do have different versions of a potential .json.gz files we could use to link the full dataset up to matminer. I'll make them available at a persistent link in MPContribs and report back here by Monday (hopefully).

@tschaume
Copy link
Collaborator

@janosh @ardunn There's a JSON file for download now at https://contribs.materialsproject.org/projects/carrier_transport.json.gz (12.5MB). It reflects the format of the contributions as they go into the MPContribs API and does not include the temperature- and doping-level dependent tables. Happy to iterate if it isn't a suitable format to link up to matminer. FYI @fraricci

Sorry, something went wrong.

@janosh
Copy link
Member Author

janosh commented Apr 15, 2021

Thanks a lot @tschaume! 👍

I'm guessing for addition to matminer it should be in a format ready for data mining. So probably not have dtype object (i.e. strings) for target columns but floats.

Here's a version of the dataset as we would use it with models like CGCNN: https://github.com/janosh/matbench/commit/df3831319599b9aa3768dd5f97fdac5ab94bdc37.

Sorry, something went wrong.

@janosh
Copy link
Member Author

janosh commented Apr 15, 2021

What's the meaning of .v in these columns?

Sᵉ.p.v [µV/K]
Sᵉ.n.v [µV/K]
σᵉ.p.v [1/Ω/m/s]
σᵉ.n.v [1/Ω/m/s]
PFᵉ.p.v [µW/cm/K²/s]
PFᵉ.n.v [µW/cm/K²/s]
κₑᵉ.p.v [W/K/m/s]
κₑᵉ.n.v [W/K/m/s]

Sorry, something went wrong.

@janosh
Copy link
Member Author

janosh commented Apr 15, 2021

Ah. From here:

Value (v), temperature (T), and doping level (c) at the maximum of the average eigenvalue of the Seebeck coefficient

@ardunn
Copy link
Contributor

ardunn commented Apr 16, 2021

Thanks @janosh and @tschaume . I will add these to the metadata at the same time that I add Ryan Kingsbury's updated expt_gaps and _formation_enthalpy datasets. The columns will be casted to the correct dtypes before uploading as well.

@ardunn ardunn reopened this May 7, 2021
@ardunn
Copy link
Contributor

ardunn commented May 29, 2021

@janosh @tschaume I wound up using the carrier_transport_with_strucs.json.gz that @janosh referenced earlier. Unfortunately the file currently hosted on mpcontribs has a pesky data column which is not super easy to use, so the raw json.gz has been uploaded to figshare (https://figshare.com/articles/dataset/ricci_boltztrap_mp_tabular/14701110) in the meantime.

Notes for @janosh

The *_strucs.json.gz needed some minor adjustments.

  • "type" column name was changed to "functional", as "type" is ambiguous
  • all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0). They were easily corrected. Thought it might be important for you to know if you were doing calculations with these values...that the carrier concentrations were 115 cm^-3 instead of say, 1x10^15 cm^-3
  • mpid label was added to index

Notable additions to metadata beyond what was in MPContribs:

  • description in metadata was expanded to provide more details
  • description of each column was expanded to comprehensively explain each one, as otherwise it can be kind of confusing if a user doesn't know exactly what the data is

Notes for @tschaume

If there is any major problems with hosting this data temporarily on figshare lmk and it will be removed immediately. Obviously the best scenario is if the matminer-compatible .json.gz is hosted on MPContribs. If there is no major problem keeping this file on Figshare in the interim it will remain there until MPContribs has a serviceable link to the matminer-compatible .json.gz. Let me know if/when that is done and I will update the matminer link.

@janosh
Copy link
Member Author

janosh commented May 29, 2021

all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0).

@ardunn Oops! I wasn't using those columns but very good thing you noticed. Thanks for making the data easily available through matminer! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants