Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to Regex::MatchData methods #3653

Conversation

dennisjbell
Copy link

In order to work more easily with matches, the following methods were
added, influenced by similar methods from Ruby:

Regex::MatchData#to_a: converts matches to Array object, allowing for
full array operations (slices, reversal, index).

Regex::MatchData#to_h: converts named matches to a hash with the name as
the key.

Regex::MatchData#group_names: lists the names of the groups in order
they appear in the RE.

Regex::MatchData#matched_group_names: lists only the named groups that
matched against the source string.

@asterite
Copy link
Member

asterite commented Dec 8, 2016

Nice, thank you!

I'll do a few corrections and suggestions later, for example there's no to_h in Ruby's MatchData (they chose named_captures, I think it was a good idea). Then some methods seem to allocate a lot of memory, like @regex.name_table.keys.sort.map {|i| @regex.name_table[i]} (3 intermediate arrays.) But more on that later. For now you'll need to run crystal tool format to make sure the code is properly formatted.

@dennisjbell
Copy link
Author

dennisjbell commented Dec 8, 2016

@asterite Odd, I'm using the vim-crystal plugin that always runs format prior to buffer save. I will check into why it didn't work and push up a fixed version of my PR.

Regarding the groups_names method, my intent was to ensure the names are in the order they appear in the regex as there is currently no way to grab them in order (short of what that function does) and I'm trying to prevent callers from having to reaching in to the @regex.name_table. I have another less straight-forward method I will replace it with in my next change.

Also, I'm interested in hearing better names for the methods, though I am partial to the consistancy of to_a and to_h. Due to the libpcre that we use, we don't have the same concerns as Ruby because we don't support reusing the same subpattern name:

icr(0.20.1) > r=/(?<letters>.)(?<letters>.)(?<letters>.)(?<letters>.)(?<letters>.)/
invalid regex: two named subpatterns have the same name at 23

I don't like the method names 'group_names' and matched_group_names'; please let me know if you have an alternate recommendation.

Finally, the #to_s method is the same as the #inspect method, though in ruby, #to_s returns the main match (i.e.: self[0]) As this is a breaking change, I didn't want to alter it to the ruby implementation, but was curious why the rational of #to_s being identical to #inspect.

In order to work more easily with matches, the following methods were
added, influenced by similar methods from Ruby:

Regex::MatchData#to_a: converts matches to Array object, allowing for
full array operations (slices, reversal, index).

Regex::MatchData#to_h: converts named matches to a hash with the name as
the key.

Regex::MatchData#group_names: lists the names of the groups in order
they appear in the RE.

Regex::MatchData#matched_group_names: lists only the named groups that
matched against the source string.
@dennisjbell
Copy link
Author

@asterite Please let me know what the next steps are on getting this accepted.

Thanks,
-Dennis

@asterite
Copy link
Member

@dennisjbell We basically need to discuss this with the team, so it might take a while before we merge this. Any additions to the language or API take time :-)

@makenowjust
Copy link
Contributor

@dennisjbell Sorry, I don't know your pull request. I made pull request #3783 which looks like yours.

@asterite
Copy link
Member

Closed in favor of #3783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants