Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Java Java Data Structures Organizing Data Splitting Strings

Aditya Puri
Aditya Puri
1,080 Points

Problem about [\\w#@'] +

I do understand what the split method is but I can't understand what the line [\w#@'] + means. Can anyone please explain this line character to character to me?

Also what is the + doing there?

2 Answers

Alexander Nikiforov
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Alexander Nikiforov
Java Web Development Techdegree Graduate 22,175 Points

Quote from teacher's notes:

[^\w#@']+ (Matches one or more character that is not in word based characters, #, @ and apostrophe)

Please check resources on the web... And play around... It comes with experience... I will try my best, but you have to dig on your own, trying to apply everywhere:

OK. Here is my best explanation

I. Brackets [] mean symbol. Take example from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

When you write [a] it means only one symbol 'a' will work, more symbols or other symbols will not work, please try to play on the website that Craig suggests online Regex Tester.

When you write [ab] - it means it can be 'a' or 'b' -only one symbol

So coming back to Regex you want: Take a look at [] without PLUS symbol. With brackets we define a symbol

Lets look inside brackets:

^ character means except. Examples:

[^a] means any character but not 'a', so 'b', 'c', 'd' and all symbols that are not 'a' will work

Lets get back to [^\w]

\w is the same as [a-zA-Z_0-9] which means any character from 'a' to 'z' or 'A' to 'Z' and from 0 to 9.

So [^\w] means not 'a', 'z', not 'A-Z', not 0-9. It could be '+', '-', '=', and others ...

When we write:

[^\w#@] we write characters that want to exclude, which means not 'a-z', not 'A-Z', not 0-9, not '#' and not '@'

Now we come to plus at the end, plus is nothing else but one or more times, which means that can be combinations of the symbols that we don't want.

Craig wants to exclude punctuation signs, which means that he wants to exclude '?', '=', ',' and others... and with PLUS multiple combinations of them, like many trailing spaces...

I still strongly suggest to watch workshop:

https://teamtreehouse.com/library/regular-expressions-in-java

And read documentation:

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

It comes with experience. Try to play with online regex tester

Aditya Puri
Aditya Puri
1,080 Points

still don't understand the '+' thing..I did try to read the docs and watch the workshop but again, I couldn't understand :(

Alexander Nikiforov
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Alexander Nikiforov
Java Web Development Techdegree Graduate 22,175 Points

[a] means only one symbol, 'a' will pass

[a]+ means one or more 'a' symbols, so that 'a', 'aa', 'aaa', 'aaaa' and any amount of 'a' letters will pass

[\w] - means one word symbol and the same as [a-zA-Z0-9], so 'a', 'b', '9', 'Z' and etc will pass

[\w]+ -means one or many combination of word characters will pass, so 'a', 'ab', 'aZ', 'aZ1', 'cbdceAQWE123' will also pass

Well, I don't know how to explain more ...

Aditya Puri
Aditya Puri
1,080 Points

oh.. but why does he put in 2 // in his code instead of one / ?

Alexander Nikiforov
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Alexander Nikiforov
Java Web Development Techdegree Graduate 22,175 Points

It is just the way to pass Java correct Regex.

Actual regex have one slash: [\w]+

But when you want to put slash it in Java code, you have to escape it. And the way it is escaped is using slash, that is why when you write split("[\\w]+") you actually pass correct regexp [\w], because \\ is interpreted by java compiler as slash ...

Try to read here, for example:

http://www.tutorialspoint.com/java/java_characters.htm

If you write split("[\w]+") you will pass regexp [w].

Looks strange but that is just Java rules.

In Java slash is used as special character. When compiler finds it, he reads the next symbol right after it and not the slash itself ... If the symbol after slash is 'n', then it will be newline, so \n is transformed to newline.

Hope it does make sense .. Escape characters are use in many languages and slash is the common way to implement and write these characters : newline, backspace, tab and etc.

Aditya Puri
Aditya Puri
1,080 Points

I still don't understand..please explain character by character...Dennis has done a really vague explanation