Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iana-etc: 2.30 -> 20170321 #23621

Closed
wants to merge 1 commit into from
Closed

iana-etc: 2.30 -> 20170321 #23621

wants to merge 1 commit into from

Conversation

c0bw3b
Copy link
Contributor

@c0bw3b c0bw3b commented Mar 7, 2017

Motivation for this change
  • Move away from an outdated package and fetch content directly from IANA
  • Align the attribute name with the package name for clarity

Some time ago I push a quick PR to fix the source of the old iana-etc package : see #23167
When looking at it I realised NixOS relied on an outdated package that does not seem maintained anymore. The last version is from 2008-MAR-05.
Present /etc/services and /etc/protocols content is just as old as this package and probably incomplete/outdated.

This PR propose to fetch the information directly from IANA.
Truth be told, this is heavily inspired by what Arch does.

As an added bonus, this PR rename the attribute name to iana-etc instead of iana_etc. For clarity it seems proper to align attr and pkg name.

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@mention-bot
Copy link

@c0bw3b, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra, @avnik and @LnL7 to be potential reviewers.

Copy link
Member

@LnL7 LnL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best to add an entry to aliases.nix for this.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 7, 2017

Feedbacks and opinions are welcome before merging.

The files produced by this are also much less chatty.

Excerpt from protocols :

# See also protocols(5) and IANA official page :
# https://www.iana.org/assignments/protocol-numbers 
#
# Last updated: 2016-06-22

hopopt             0 HOPOPT
icmp               1 ICMP
igmp               2 IGMP
ggp                3 GGP
ipv4               4 IPv4
st                 5 ST
tcp                6 TCP
cbt                7 CBT
egp                8 EGP
igp                9 IGP

Excerpt from services :

# See also services(5) and IANA official page :
# https://www.iana.org/assignments/service-names-port-numbers 
#
# Last updated: 2017-03-03

tcpmux               1/tcp
tcpmux               1/udp
compressnet          2/tcp
compressnet          2/udp
compressnet          3/tcp
compressnet          3/udp
rje                  5/tcp
rje                  5/udp
echo                 7/tcp
echo                 7/udp
discard              9/tcp
discard              9/udp
discard              9/sctp
discard              9/dccp
systat              11/tcp
systat              11/udp
daytime             13/tcp
daytime             13/udp
qotd                17/tcp

The "last updated" dates are the ones from the source XML from IANA.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 7, 2017

For comparison here is the beginning of the present protocols :

# See also: protocols(5), http://www.sethwklein.net/projects/iana-etc/
#
# (last updated 2008-02-27)
# 
# Registries included below:
# - Assigned Internet Protocol Numbers
# 
# Registry Name: Assigned Internet Protocol Numbers
# Reference: [RFC5237]
# Registration Procedures: IESG Approval or Standards Action
# Note: In the Internet Protocol version 4 (IPv4) [RFC791] there is a field
# called "Protocol" to identify the next level protocol.  This is an 8
# bit field.  In Internet Protocol version 6 (IPv6) [RFC1883], this field
# is called the "Next Header" field.
# 
# Registry:
# Decimal    Keyword     Protocol                                 References
# ---------  ----------  ---------------------------------------  ----------
# protocol   num aliases      # comments
hopopt         0 HOPOPT       # IPv6 Hop-by-Hop Option                   [RFC1883]
icmp           1 ICMP         # Internet Control Message                 [RFC792]
igmp           2 IGMP         # Internet Group Management                [RFC1112]
ggp            3 GGP          # Gateway-to-Gateway                       [RFC823]
ip             4 IP           # IP in IP (encapsulation)                 [RFC2003]
st             5 ST           # Stream                                   [RFC1190][RFC1819]
tcp            6 TCP          # Transmission Control                     [RFC793]
cbt            7 CBT          # CBT                                      [Ballardie]
egp            8 EGP          # Exterior Gateway Protocol                [RFC888][DLM1]
igp            9 IGP          # any private interior gateway             [IANA]
#                        (used by Cisco for their IGRP)  
bbn-rcc-mon   10 BBN-RCC-MON  # BBN RCC Monitoring                       [SGC]
nvp-ii        11 NVP-II       # Network Voice Protocol                   [RFC741,SC3]
pup           12 PUP          # PUP                                      [PUP,XEROX]
argus         13 ARGUS        # ARGUS                                    [RWS4]
emcon         14 EMCON        # EMCON                                    [BN7]

And the present services :

# See also: services(5), http://www.sethwklein.net/projects/iana-etc/
#
# (last updated 2008-02-29) 
# 
# The port numbers are divided into three ranges: the Well Known Ports,
# the Registered Ports, and the Dynamic and/or Private Ports.
# 
# The Well Known Ports are those from 0 through 1023.
# 
# DCCP Well Known ports SHOULD NOT be used without IANA registration.
# The registration procedure is defined in [RFC4340], Section 19.9.
# 
# The Registered Ports are those from 1024 through 49151
# 
# DCCP Registered ports SHOULD NOT be used without IANA registration.
# The registration procedure is defined in [RFC4340], Section 19.9.
# 
# The Dynamic and/or Private Ports are those from 49152 through 65535
# 
# A value of 0 in the port numbers registry below indicates that no port 
# has been allocated.
# 
# ************************************************************************
# * PLEASE NOTE THE FOLLOWING:                                           *
# *                                                                      *
# * IESG STATEMENT TO THE IANA                                           *
# *    THE IESG BELIEVES THAT IANA MAY ALLOCATE AN ADDITIONAL PORT IN    *
# *    THE 'USER PORT' RANGE TO PROTOCOLS WHOSE CURRENT PORT ALLOCATION  *
# *    REQUIRES ACCESS TO A PRIVILEGED PORT. THIS ALLOCATION SHOULD NOT  *
# *    BE AUTOMATIC, BUT MAY OCCUR UPON APPLICATION BY AN INTERESTED     *
# *    PARTY WHOSE APPLICATION WOULD OTHERWISE FIT IANA'S POLICIES.      *
# *                                                                      *
# * 1. UNASSIGNED PORT NUMBERS SHOULD NOT BE USED.  THE IANA WILL ASSIGN *
# * THE NUMBER FOR THE PORT AFTER YOUR APPLICATION HAS BEEN APPROVED.    *
# *                                                                      *
# * 2. ASSIGNMENT OF A PORT NUMBER DOES NOT IN ANY WAY IMPLY AN          *
# * ENDORSEMENT OF AN APPLICATION OR PRODUCT, AND THE FACT THAT NETWORK  *
# * TRAFFIC IS FLOWING TO OR FROM A REGISTERED PORT DOES NOT MEAN THAT   *
# * IT IS "GOOD" TRAFFIC. FIREWALL AND SYSTEM ADMINISTRATORS SHOULD      *
# * CHOOSE HOW TO CONFIGURE THEIR SYSTEMS BASED ON THEIR KNOWLEDGE OF    *
# * THE TRAFFIC IN QUESTION, NOT WHETHER THERE IS A PORT NUMBER          *
# * REGISTERED OR NOT.                                                   *
# ************************************************************************
# 
# 
# WELL KNOWN PORT NUMBERS
# 
# The Well Known Ports are assigned by the IANA and on most systems can
# only be used by system (or root) processes or by programs executed by
# privileged users.
# 
# Ports are used in the TCP [RFC793] to name the ends of logical
# connections which carry long term conversations.  For the purpose of
# providing services to unknown callers, a service contact port is
# defined.  This list specifies the port used by the server process as
# its contact port.  The contact port is sometimes called the
# "well-known port".
# 
# To the extent possible, these same port assignments are used with the
# UDP [RFC768].
# 
# The range for assigned ports managed by the IANA is 0-1023.
# 
# Port Assignments:
# 
# Keyword         Decimal    Description                     References
# -------         -------    -----------                     ----------
#                 0/tcp    Reserved
#                 0/udp    Reserved
#                          Jon Postel <postel&isi.edu>
# spr-itunes        0/tcp    Shirt Pocket netTunes
# spl-itunes        0/tcp    Shirt Pocket launchTunes
#                          David Nanian <dnanian&shirt-pocket.com> 28 September 2007
tcpmux            1/tcp    # TCP Port Service Multiplexer
tcpmux            1/udp    # TCP Port Service Multiplexer
#                          Mark Lottor <MKL&nisc.sri.com>
compressnet       2/tcp    # Management Utility
compressnet       2/udp    # Management Utility
compressnet       3/tcp    # Compression Process
compressnet       3/udp    # Compression Process
#                          Bernie Volz <volz&cisco.com>
#                 4/tcp    Unassigned
#                 4/udp    Unassigned
rje               5/tcp    # Remote Job Entry
rje               5/udp    # Remote Job Entry
#                          Jon Postel <postel&isi.edu>
#                 6/tcp    Unassigned
#                 6/udp    Unassigned
echo              7/tcp    # Echo
echo              7/udp    # Echo
#                          Jon Postel <postel&isi.edu>
#                 8/tcp    Unassigned
#                 8/udp    Unassigned
discard           9/tcp    # Discard 
discard           9/udp    # Discard 
#                          Jon Postel <postel&isi.edu> 
discard           9/sctp   # Discard  
#                          IETF TSVWG
#                          Randall Stewart <rrs&cisco.com>
#                          [RFC4960]
discard           9/dccp   # Discard SC:DISC
#                          IETF dccp WG, Eddie Kohler <kohler&cs.ucla.edu>, [RFC4340]
#                10/tcp    Unassigned
#                10/udp    Unassigned
systat           11/tcp    # Active Users
systat           11/udp    # Active Users
#                          Jon Postel <postel&isi.edu>
#                12/tcp    Unassigned
#                12/udp    Unassigned
daytime          13/tcp    # Daytime (RFC 867)
daytime          13/udp    # Daytime (RFC 867)
#                          Jon Postel <postel&isi.edu>
#                14/tcp    Unassigned
#                14/udp    Unassigned
#                15/tcp    Unassigned [was netstat]
#                15/udp    Unassigned
#                16/tcp    Unassigned
#                16/udp    Unassigned
qotd             17/tcp    # Quote of the Day

@grahamc
Copy link
Member

grahamc commented Mar 7, 2017

I agree with @LnL7 about the alias, but LGTM.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 7, 2017

$ wc -l /etc/protocols && wc -l /nix/store/23x5l7i5rfhgkkfkbylq0khnz5xmnlxs-iana-etc-20170307/etc/protocols
522 /etc/protocols
139 /nix/store/23x5l7i5rfhgkkfkbylq0khnz5xmnlxs-iana-etc-20170307/etc/protocols

and

$ wc -l /etc/services && wc -l /nix/store/23x5l7i5rfhgkkfkbylq0khnz5xmnlxs-iana-etc-20170307/etc/services 
15551 /etc/services
11358 /nix/store/23x5l7i5rfhgkkfkbylq0khnz5xmnlxs-iana-etc-20170307/etc/services

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 7, 2017

@LnL7 @grahamc yes indeed I will add an alias for iana_etc

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 8, 2017

Alias iana_etc now points to iana-etc



phases = [ "installPhase" ];
buildInputs = [ stdenv ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to add stdenv to build inputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm I believed I needed it to be able to gawk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be; anything that's part of the stdenv is added to the build environment implicitly. Adding stdenv to inputs is like asking to extend the environment with itself, which at least conceptually is the identity operation (not sure if that's the case in practice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes total sense indeed.
Just tested without that input and it works the same. I will remove it.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 8, 2017

Removed the unnecessary buildInputs

urls = [
"https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xml"
"ftp://ftp.iana.org/assignments/protocol-numbers/protocol-numbers.xml"
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This url does not looks stable to me. Maybe we can mirror the content somewhere else?

Copy link
Contributor Author

@c0bw3b c0bw3b Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was also my concern when writing this because IANA does not version those files (contrary to tzdata for example). So if/when the source is edited this definition will need an update to refresh the SHA hashes.

But for me it is acceptable since :

  1. protocols rarely change, services is modified more often but not every two days ;
  2. we should have the package substitute available in the cache so it won't break networking on NixOS machines.

For illustration : the source of the old iana-etc package was missing for years (wrong URL) and no one noticed until #23167

But it is totally possible to change that. The alternative approach I envisioned is simply store a copy of the two XML right here on our GH repo next to default.nix and just import them. And we would just manually copy newer versions regularly, like before a new release branch-out.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 13, 2017

Do we have more concerns on this ?

@c0bw3b c0bw3b force-pushed the pkg/iana-etc branch 2 times, most recently from 1c3f1c1 to a1b6c4e Compare March 13, 2017 19:02
@c0bw3b c0bw3b changed the title iana-etc: 2.30 -> 20170703 iana-etc: 2.30 -> 20170313 Mar 13, 2017
@Mic92
Copy link
Member

Mic92 commented Mar 14, 2017

@c0bw3b I created a github project, which automatically check for new IANA updates and create new releases using python: https://github.com/Mic92/iana-etc What do you think of it?

@c0bw3b c0bw3b changed the title iana-etc: 2.30 -> 20170313 [WIP] iana-etc: 2.30 -> 20170313 Mar 18, 2017
@c0bw3b c0bw3b force-pushed the pkg/iana-etc branch 2 times, most recently from e00b417 to 3ec4dde Compare March 18, 2017 22:18
@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 18, 2017

So IANA has edited service-names-port-numbers.xml many times in the last few days, proving my initial assumption to be false.
I've looked at what @Mic92 proposed and while it is a good way to keep track of IANA revisions I feel this is somewhat overkill to rely on 200 python LOC and a TravisCI job for this. I'm inclined toward following KISS principles. Plus the XML parsing is done in a jiffy with awk scripts.

Moreover, we don't really need to keep a close track of what IANA changed. On many occasions, the update concerns people and organizations info, not the actual services numbers. We only need to have XML sources that are recent enough.

I just pushed a refactor where I copy the two XML here on the GH repo newt to default.nix and they are parsed directly with no fetching involved. This seems the most sensible way to handle this.

@c0bw3b c0bw3b changed the title [WIP] iana-etc: 2.30 -> 20170313 iana-etc: 2.30 -> 20170318 Mar 18, 2017
@Mic92
Copy link
Member

Mic92 commented Mar 19, 2017

@c0bw3b if you download the tarballs, there is no python included just the processed files. I used python because xml cannot be parsed reliable with awk and it also seems that it skipped some lines:

$ wc -l /nix/store/b52ycacfij4y5grcda5vqzz09g60g8m1-iana-etc-20170307/etc/services
11358 /nix/store/b52ycacfij4y5grcda5vqzz09g60g8m1-iana-etc-20170307/etc/services
$ wc -l services                                                                  
11546 services

@Mic92 Mic92 mentioned this pull request Mar 19, 2017
6 tasks
@Mic92
Copy link
Member

Mic92 commented Mar 19, 2017

Mhm adding service-names-port-numbers-20170317.xml directly to this repo adds 3.5 MB of data to nixpkgs. Usually we try to avoid that.

@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 19, 2017

if you download the tarballs, there is no python included just the processed files.

And 200 LOC in python to redo what can be done in 2x15 lines of awk is just making things unnecessarily complicated. + relying on a TravisCI job + python tends to break over time (version change, libs change, ...) + relying on a personal repo that can disappear or move.

I used python because xml cannot be parsed reliable with awk

Yes it can. This is an empty claim. The source can be XML or plain text it would just change the script.
Arch also disagree with you.
Last but not least : the old package we were using is actually just another awk parsing but done on the IANA files in plain text.

and it also seems that it skipped some lines:

A quick check reveals you have duplicated entries in your processed services file :

[...]
subntbcst-tftp   247/tcp    # SUBNTBCST_TFTP
subntbcst_tftp   247/tcp    # SUBNTBCST_TFTP
subntbcst-tftp   247/udp    # SUBNTBCST_TFTP
subntbcst_tftp   247/udp    # SUBNTBCST_TFTP
[...]
redstorm-join    2346/tcp   # Game Connection Port
redstorm_join    2346/tcp   # Game Connection Port
redstorm-join    2346/udp   # Game Connection Port
redstorm_join    2346/udp   # Game Connection Port
redstorm-find    2347/tcp   # Game Announcement and Location
redstorm_find    2347/tcp   # Game Announcement and Location
redstorm-find    2347/udp   # Game Announcement and Location
redstorm_find    2347/udp   # Game Announcement and Location
redstorm-info    2348/tcp   # Information to query for game status
redstorm_info    2348/tcp   # Information to query for game status
redstorm-info    2348/udp   # Information to query for game status
redstorm_info    2348/udp   # Information to query for game status
redstorm-diag    2349/tcp   # Diagnostics Port
redstorm_diag    2349/tcp   # Diagnostics Port
redstorm-diag    2349/udp   # Diagnostics Port
redstorm_diag    2349/udp   # Diagnostics Port
[...]
mapper-nodemgr   3984/tcp   # MAPPER network node manager
mapper-nodemgr   3984/udp   # MAPPER network node manager
mapper-mapethd   3985/tcp   # MAPPER TCP/IP server
mapper-mapethd   3985/udp   # MAPPER TCP/IP server
mapper-ws-ethd   3986/tcp   # MAPPER workstation server
mapper-ws_ethd   3986/tcp   # MAPPER workstation server
mapper-ws-ethd   3986/udp   # MAPPER workstation server
mapper-ws_ethd   3986/udp   # MAPPER workstation server

Also proving the point that XML parsing done in Python is not more reliable just because it's Python.
We could also do a proper XSLT transformation to build the text files but it would be utterly pointless.

@c0bw3b c0bw3b force-pushed the pkg/iana-etc branch 2 times, most recently from e3655c0 to de92806 Compare March 20, 2017 23:48
Move away from an outdated package and fetch content directly from IANA
Hence we do not rely on a third party for an up-to-date system file.

iana_etc -> iana-etc :
- align the attribute name with the package name for clarity
- iana_etc is now an alias for iana-etc
@c0bw3b c0bw3b changed the title iana-etc: 2.30 -> 20170318 iana-etc: 2.30 -> 20170321 Mar 20, 2017
@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 21, 2017

This has been reworked one last time to :

  1. fetch latest content directly from IANA
  2. use curl instead of fetchurl because it is not the right tool for the job : in this situation we want to fetch content from a trusted third-party over a secure channel rather than being able to check a hash

With this Nix(OS) won't rely on a third-party for the generation of system files (like the old iana-etc package).
Also : this needs virtually no maintenance. Once it runs on Hydra, it will continuously produce etc files from the most up-to-date information available at build time.


phases = [ "buildPhase" ];
buildInputs = [ cacert curl ];
builder = ./builder.sh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only access the network with a fixed output drv when sandboxing is enabled, I'm pretty sure this will fail on hydra.

@dtzWill
Copy link
Member

dtzWill commented Mar 24, 2017

Could this infrastructure be used to generate the /etc/protocols and /etc/services files which we just track directly? I know that's not the usual practice but

@Mic92 Mic92 closed this in 36fca93 Mar 28, 2017
Krofek pushed a commit to Krofek/nixpkgs that referenced this pull request Mar 30, 2017
@c0bw3b
Copy link
Contributor Author

c0bw3b commented Mar 31, 2017

@dtzWill it could have been a solution also yes.
FWIW Arch is having the same questions in FS#53221

Debian's netbase does not track IANA additions. They handpick and add only ports/services they deem important. So it contains much fewer entries.

@dtzWill
Copy link
Member

dtzWill commented Mar 31, 2017

Okay SGTM, thanks for the reponse (here and on the other PR)!

@c0bw3b c0bw3b deleted the pkg/iana-etc branch October 8, 2017 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants