Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Java Regular Expressions in Java

Philip Enchin
seal-mask
.a{fill-rule:evenodd;}techdegree
Philip Enchin
Full Stack JavaScript Techdegree Student 24,726 Points

What's the story with matcher.group(0)?

At about 25:00, Craig says, "if you do group zero, it will return the whole match, everything that is there. But if you do group one, it will find the first set of parenthesis.

Initially I forgot the outer set of parentheses, so my program would crash when I called matcher.group(2), but I noticed that matcher.group(0) returned the word in question (such as "Procrastination").

After I corrected my RegEx inside Pattern.compile() to Craig's code, I was able to get matcher.group(1) and matcher.group(2), and everything was peachy.

However, I got curious and inserted a line into my while loop as follows:

    while (matcher.find()) {
      System.out.printf("\"%s\" is a shushy word because of \"%s\".%n", matcher.group(1), matcher.group(2));
      System.out.println(matcher.group(0).equals(matcher.group(1)));
    }

Upon execution, the console reads:

"Procrastination" is a shushy word because of "tion".                                                                                              
true                                                                                                                                               
"surely" is a shushy word because of "su".                                                                                                         
true                                                                                                                                               
"destination" is a shushy word because of "tion".                                                                                                  
true                                                                                                                                               
"should" is a shushy word because of "sh".                                                                                                         
true                                                                                                                                               
"shiny" is a shushy word because of "sh".                                                                                                          
true 

So my question is this: Since matcher.group(0) and matcher.group(0) return identical strings, what is Craig talking about here?

1 Answer

andren
andren
28,558 Points

As Craig says group(1) finds the content that corresponds to the first pair of parenthesis, but if you look at the pattern what does the first pair of parenthesis actually wrap around? The answer is that it wraps around the entire regex pattern. Because of this it includes the entire match. If the first pair of parenthesis had not wrapped around the entire pattern (which is usually the case) then it would be different from group(0) which always returns the entire match.

Honestly I'm a bit confused at why he used this particular pattern to demonstrate grouping, as this pattern would have worked just as well with just one group, like this:

String script = "Procastrination is surely not the destination, should we talk about shiny things?";
Pattern pattern = Pattern.compile("\\w*(sh|ti|su)\\w*", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(script);
while(matcher.find()) {
    System.out.printf("%s is a sushy word because of %s%n", matcher.group(0), matcher.group(1));
}

A better example might be something like a regex pattern designed for recognizing dates and separating each part of the date into its own capture group. Take something like this for example:

String dateStr = "1/10/2018";
Pattern datePatt = Pattern.compile("([0-9]{1,2})/([0-9]{1,2})/([0-9]{2,4})");
Matcher matcher = datePatt.matcher(dateStr);
while (matcher.find()) {
    System.out.println(matcher.group(0));
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
}

That pattern will match dates that are formatted like this dd/dd/dddd (and will also accepts months/days written with 1 digit and years with 2 digits). group(0) will return the entire match (so 1/10/2018 in the example above) while the other groups will match the capture groups set up for the individual digit groups. So group(1) would be 1, group(2) would be 10 and group(3) would be 2018.