Skip to content

Instantly share code, notes, and snippets.

@dhimmel
Last active October 21, 2016 17:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save dhimmel/b3ab3cb48f734a8cba249180af6454c2 to your computer and use it in GitHub Desktop.
Save dhimmel/b3ab3cb48f734a8cba249180af6454c2 to your computer and use it in GitHub Desktop.
Licensing Workshop for EPID 600. https://slides.com/dhimmel/epid600

EPID 600 Workshop

This page describes the activity for the EPID 600 lecture on Open Data Science (slides).

At the start of this class, every pupil was asked to list 3 databases / datasets / data resources that they have used in their research. For each of these three resources (time permitting), please report via the comments below the following information:

  1. Is the data subject to copyright? If no, end.
  2. Does the resource have a license?
  3. If no, contact the creators and inquire whether there license that allows reuse?
  4. If yes, does the license allow:
  • unrestricted access
  • redistribution
  • modification
  • commercial reuse (does the license discriminate against any persons or groups)

If you do send an email to the creators, please link to this document (https://git.io/vPQjW) and CC daniel.himmelstein@gmail.com.

Best of luck!

@Diwadkar
Copy link

My use of publicly available data has been limited to Gene Omnibus Database. GEO provides the most up to date gene expression and hybridization array data. There is no restriction on the use or distribution of this data apart from a few contributors who may claim patent, copyright or IP rights to all or a proportion of the data. I used the GSE22356 microarray data set which is not subject to copyright.

@alhanlon
Copy link

alhanlon commented Oct 20, 2016

Census data:

Copyright protection is not available for any work of the United States Government (Title 17 U.S.C., Section 105). Thus you are free to reproduce census materials as you see fit. We would ask, however, that you cite the Census Bureau as the source.

CDC Data (for example, NHANES, BRFSS, YRBS):

Data Use Agreement--
Warning! Data Use Restrictions Read Carefully Before Using

The Public Health Service Act (Section 308 (d)) provides that the data collected by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), may be used only for the purpose of health statistical reporting and analysis.

Any effort to determine the identity of any reported case is prohibited by this law.

NCHS does all it can to assure that the identity of data subjects cannot be disclosed. All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset. Any intentional identification or disclosure of a person or establishment violates the assurances of confidentiality given to the providers of the information. Therefore, users will:

Use the data in this dataset for statistical reporting and analysis only.
Make no use of the identity of any person or establishment discovered inadvertently and advise the Director, NCHS, of any such discovery.
Not link this dataset with individually identifiable data from other NCHS or non- NCHS datasets. 

By using these data you signify your agreement to comply with the above-stated statutorily based requirements.

@somehuang
Copy link

somehuang commented Oct 20, 2016

1. For databases on WTO website

Copyright:
Permission to make digital or hard copies of any information contained in these Web pages is granted for personal or classroom use, without fee and without formal request.
Full citation and copyright notice must appear on the first page.
Copies may not be made or distributed for profit or commercial advantage. To republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee.

2. For databases on Worldbank website

Licenses

You are encouraged to use the Datasets to benefit yourself and others in creative ways. You may extract, download, and make copies of the information contained in the Datasets, and you may share that information with third parties. You may also use our application programming interfaces (“APIs”) to facilitate access to the Datasets, whether through a separate Web site or through another type of software application. However, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider(s), as well as information regarding how to contact the original content provider(s). Before incorporating any data in other products, please check the list which is available here: Restricted Data.

Attribution

By using the Datasets, you agree to provide attribution to The World Bank and its content providers in the following format: The World Bank: Dataset name: Data source (if known). When sharing or facilitating access to the Datasets, you agree to include the same acknowledgment requirement in any sub-licenses of the data that you grant, and a requirement that any sub-licensees do the same. You may meet this requirement by providing the uniform resource locator (URL) of these terms of use.

No Endorsement

You may not publicly represent or imply that The World Bank is participating in, or has sponsored, approved or endorsed the manner or purpose of your use or reproduction of the Datasets. The World Bank will prosecute, to the fullest extent of the law, any use of World Bank Materials in a manner that falsifies, misrepresents, disparages or fraudulently uses the Materials.

No Association

You may not use any trade-mark, official mark, official emblem or logo of The World Bank, or any of its other means of promotion or publicity, without The World Bank's prior written consent nor in any event to represent or imply an association or affiliation with The World Bank.

No Warranties

The World Bank reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Datasets, any means of accessing or utilizing the Datasets, or the API at our sole discretion with or without prior notice to you.

The World Bank may at our sole discretion, under any circumstances, for any or no reason whatsoever and with or without prior notice to you, terminate your access to the Datasets, any means of accessing or utilizing the Datasets or the API.

THE WORLD BANK DISCLAIMS ALL WARRANTIES OF ANY KIND RELATED TO THE PROVISION OF THE DATASETS AND THE APIS. Please review the section of The World Bank Terms and Conditions under the heading Disclaimers, Releases and Limitations on Liability for a more complete statement regarding those subjects.

Exclusion of Liability

THE WORLD BANK SHALL NOT BE RESPONSIBLE OR LIABLE TO YOU FOR ANY LOSS OR DAMAGE OF ANY SORT INCURRED BY YOU IN CONNECTION WITH YOUR USE OF THE DATASETS. The World Bank also shall not be responsible or liable for the accuracy, usefulness or availability of any data in the Datasets. Please review the section of The World Bank Terms and Conditions under the heading Disclaimers, Releases and Limitations on Liability for a more complete statement regarding those subjects.

You acknowledge that these Dataset Terms constitute a non-exclusive agreement. The World Bank may develop products or services that compete with products or services that you offer without incurring any liability.

Other parties may have ownership interests in some of the Materials contained on the Site. The World Bank in no way represents or warrants that it owns or controls all rights in all Materials, and the World Bank will not be liable to you for any claims brought against you by third parties in connection with your use of any Materials.

Nothing herein shall constitute or be considered to be a limitation upon or waiver of the privileges and immunities of The World Bank, all of which are specifically reserved.

Other

Please review The World Bank Terms and Conditions prior to using the Datasets. The World Bank Terms and Conditions incorporate these Dataset Terms by reference. By using the Datasets or any presentations of data derived from them, or by using our APIs in connection with the Datasets, you consent to be bound The World Bank Terms and Conditions, including these Dataset Terms.

These Dataset Terms may be amended by The World Bank from time to at our sole discretion. Upon amendment, we will place a notice on data.worldbank.org. Please periodically review the controlling version of these Dataset Terms. By continuing to use the Datasets subsequent to The World Bank making available an amended version of these Dataset Terms, you acknowledge, agree and consent to such amendment.

The World Bank offers other websites, services and databases that are governed by different terms of service, as stated in The World Bank terms and conditions. Please review the applicable terms of service before using any database provided or made available by The World Bank.

No agency, partnership, joint venture, employee-employer or franchiser-franchisee relationship is intended or entered by these Dataset Terms.

Please review the section of The World Bank Terms and Conditions under the heading Governing Law for a more complete statement regarding that subject.

Capitalized terms used herein shall be given the meaning assigned to them in The World Bank Terms and Conditions.

@annieichen
Copy link

annieichen commented Oct 20, 2016

NCBI Genbank (and other databases of molecular data): NCBI places no restrictions on the use or distribution, but some submitters may claim intellectual property rights (https://www.ncbi.nlm.nih.gov/home/about/policies.shtml).

EcoCyc: Copyright SRI International 1999-2016, Marine Biological Laboratory 1998-2001, DoubleTwist Inc 1998-1999. All Rights Reserved. (http://www.ecocyc.org/ECOLI/NEW-IMAGE?type=ORGANISM&object=ECOLI&orgids=AAEO224324)

EcoGene: Copyright 2011 University of Miami.

@mmmangoes
Copy link

mmmangoes commented Oct 20, 2016

Data Resource: National Ambulatory Medical Care Survey (NAMCS)

Language found on website:

Users of NCHS public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.

email sent to: EPowell-Griner@cdc.gov:

From: MARGARET MANGAALI
Sent: Thursday, October 20, 2016 2:39 PM
To: Powell-Griner, Eve (CDC/OPHSS/NCHS); daniel.himmelstein@gmail.com
Subject: NAMCS public Data use - copyright question

Dr.`Powell-Griner,

I am writing to inquire about the data use details of the NAMCS dataset. I cannot seem to find data use details on the website for the data. Is the data subject to copyright, and does the data have a license? I would like the information to include as part of a public data use project for an epidemiology course - outlined here: https://git.io/vPQjW

thanks so much for your time,

June Mangaali

Powell-Griner, Eve (CDC/OPHSS/NCHS) eep1@cdc.gov
3:55 PM (2 hours ago)
to me

There is no copyright but we do ask that you acknowledge NCHS as the data source. Also the data may be used only for statistical purposes and may not be linked to any other file. We frequently have requests to use it for classes and there is no problem I your doing so.

Sent from my BlackBerry 10 smartphone.

Additional information found online after email exchange:

Warning! Data Use Restrictions Read Carefully Before Using

The Public Health Service Act (Section 308 (d)) provides that the data collected by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), may be used only for the purpose of health statistical reporting and analysis.

Any effort to determine the identity of any reported case is prohibited by this law.

NCHS does all it can to assure that the identity of data subjects cannot be disclosed. All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset. Any intentional identification or disclosure of a person or establishment violates the assurances of confidentiality given to the providers of the information. Therefore, users will:

Use the data in this dataset for statistical reporting and analysis only.
Make no use of the identity of any person or establishment discovered inadvertently and advise the Director, NCHS, of any such discovery.
Not link this dataset with individually identifiable data from other NCHS or non- NCHS datasets.
By using these data you signify your agreement to comply with the above-stated statutorily based requirements.

@dhimmel
Copy link
Author

dhimmel commented Oct 21, 2016

@mmmangoes nice research. I dug a little further and found the actual law behind these restrictions -- 42 U.S. Code § 242m (d) -- which reads:

(d) Information; publication restrictions

No information, if an establishment or person supplying the information or described in it is identifiable, obtained in the course of activities undertaken or supported under section 242b, 242k, or 242l of this title may be used for any purpose other than the purpose for which it was supplied unless such establishment or person has consented (as determined under regulations of the Secretary) to its use for such other purpose; and in the case of information obtained in the course of health statistical or epidemiological activities under section 242b or 242k of this title, such information may not be published or released in other form if the particular establishment or person supplying the information or described in it is identifiable unless such establishment or person has consented (as determined under regulations of the Secretary) to its publication or release in other form.

Like most of American law, the writing is borderline unintelligible, which begs the question how one can be expected to comply. Nonetheless, it appears that the primary concern is preventing patient data from being deanonymized or used in ways the patient didn't consent to.

@dhimmel
Copy link
Author

dhimmel commented Oct 21, 2016

@anniechen234 it looks like BioCyc, a parent project of http://ecocyc.org/ does have a license. On the download page, EcoCyc states

Free to academics for research purposes; fee for commercial use

As the website states:

The development of EcoCyc is funded by NIH grant GM077678 from the NIH National Institute of General Medical Sciences.

It frustrates me when the NIH funds resources that then attempt to charge for reuse and discriminate against commercial users.

@dhimmel
Copy link
Author

dhimmel commented Oct 21, 2016

Great work @Diwadkar, @alhanlon, @somehuang, @anniechen234, @mmmangoes. Hope you found this activity valuable!

Lot's of you have used government resources, which as you may have noticed, are in the public domain. The NAMCS case highlighted by @mmmangoes was interesting because there was a law adding additional reuse restrictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment