Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trial

Christopher Wall
3,604 PointsRegular Expression for Credit Cards
Hello,
I'm doing an independent project where I'm trying to extract telephone numbers and credit card numbers from multiple text files and then compiling them into another file. To this end, I found some regex cookbooks to try to capture the most commonly found credit card numbers alongside with a regex key for capturing phone numbers.
I was able to extract the phone numbers successfully but I'm having difficulties getting the credit card numbers. Can someone explain to me what is the issue I'm having with my re.findall() method?
For clarity's sake, whenever I run the re.findall() method, the script returns back blank brackets. I tested it against three text files containing these numbers:
testfile1.txt = {215-763-6263
5544 5533 5533 6633
321-456-9824
4234 9832 8932 8921}
testfile2.txt = {456-674-2312,
5678 9854 9454 4543 ,
2356 6435 0984,
421-567-3232 }
testfile3.txt = {378282246310005
371449635398431
378734493671000
30569309025904
38520000023237
6011111111111117
6011000990139424
555555555554444
5105105105105100
4111111111111111
4012888888881881
4222222222222 }
import os
import re
path = "./test"
our_library = os.listdir(path)
ccardlist = open("credit card numbers.txt", 'w')
telephonelist = open("telephone numbers.txt", 'w')
for item in our_library:
file = os.path.join(path, item)
txt = open(file, 'r')
data = txt.read()
telephone = re.findall(r'\(?\d{3}\)?-?\s?\d{3}-\d{4}', data)
ccard = re.findall(r'''4\d{12}(\d{3})?
(5[1-5]\d{4}|677189)\d{10}
3(0[0-5]|[68]\d)\d{11}
3[47]\d{13}
(6011|65\d{2})\d{12}
''', data, re.X)
ccardlist.write("The list of credit card numbers in {} is : \n".format(file) + str(ccard) + "\n\n")
telephonelist.write("The list of telephone numbers in {} is : \n".format(file) + str(telephone) + "\n\n")
txt.close()
3 Answers

Chris Freeman
Treehouse Moderator 68,468 PointsA regex must match all or nothing. The findall
pattern contains many numbers following each other. The only way to match is if all of the targeted numbers were in that exact sequence. For example, I first flatted out your patterns a bit for clarity:
ccard = re.findall(r'''4\d{12}\d{3}?
5[1-5]\d{4}\d{10}
677189\d{10}
30[0-5]\d{11}
3[68]\d\d{11}
3[47]\d{13}
6011\d{12}
65\d{2}\d{12}
''', data, re.X)
Since the re.X
flag simply allows spreading the regex across multiple lines, it is actually the same as:
ccard2 = re.findall(r'''4\d{12}\d{3}?5[1-5]\d{4}\d{10}677189\d{10}30[0-5]\d{11}3[68]\d\d{11}3[47]\d{13}6011\d{12}65\d{2}\d{12}''', data, re.X)
Which would only match something like this:
400000000000011151000011111111116771890000000000305000000000003601111111111134000000000000060110000000000006500111111111111
You can break it up into an OR
'd listed of possible patterns by adding the or
symbol "|" at the end of each line:
ccard = re.findall(r'''4\d{12}\d{3}?|
5[1-5]\d{4}\d{10}|
677189\d{10}|
30[0-5]\d{11}|
3[68]\d\d{11}|
3[47]\d{13}|
6011\d{12}|
65\d{2}\d{12}
''', data, re.X)
This will produce the output file:
The list of credit card numbers in ./test/testfile1.txt is :
[]
The list of credit card numbers in ./test/testfile2.txt is :
[]
The list of credit card numbers in ./test/testfile3.txt is :
['378282246310005', '371449635398431', '378734493671000', '30569309025904', '38520000023237', '6011111111111117', '6011000990139424', '5105105105105100', '4111111111111111', '4012888888881881']
If you wanted to also get the ccard numbers from testfile1.txt, change the patterns to include the optional spaces between groups of four. This pattern will match the card numbers in testfile1.txt:
ccard = re.findall(r'''4\d{3}\s?\d{4}\s?\d{4}\s?\d{1}\d{3}?|
5[1-5]\d{2}\s?\d{4}\s?\d{4}\s?\d{4}|
677189\d{10}|
30[0-5]\d{11}|
3[68]\d\d{11}|
3[47]\d{13}|
6011\d{12}|
65\d{2}\d{12}
''', data, re.X)
produces:
The list of credit card numbers in ./test/testfile1.txt is :
['5544 5533 5533 6633', '4234 9832 8932 8921']
Additionally, it is uncommon to mix string format()
with concatenation. The two can be combined:
ccardlist.write("The list of credit card numbers in {} is : \n{}\n\n"
"".format(file, ccard)) # <-- a convenient way to wrap a format line

Christopher Wall
3,604 PointsChris,
Thanks for your response. I'm just trying to write a simple script where I can look through a litany of emails loaded with telephone numbers and credit card numbers, and write those numbers into different .txt files. The emails are stored as .txt in a folder I've named test.
I just generated those numbers at random and found others to test against.
-Christopher

Christopher Wall
3,604 PointsChris,
Thanks so much for your help! This makes so much sense.
-Christopher
Chris Freeman
Treehouse Moderator 68,468 PointsChris Freeman
Treehouse Moderator 68,468 PointsThe structure of your data is unclear when you use the set assignment syntax. Are the numbers simply plain text on different lines?