Help me build a Reg Ex to do something really useful!

Question

Hey again everyone! I want to build a script to scan a bunch of XML documents and delete all the ones that do not have the tag

<profilename></profilename>

My Razer mouse keeps updating with corrupt profiles from the cloud. These profiles do not have this important tag and I've manually deleted them all, but whenever i let it reconnect to the cloud service it repopulates all these bad profiles.

I figured this would be a good exercise for me and perhaps for anyone that wants to help me?

Here is what I got so far:

import re
import os

NAGAPATH = "C:\\ProgramData\\Razer\\Synapse\\Accounts\\AM_5364380\\Devices\\Naga Epic Chroma\\Profiles"

naga_profs = os.listdir(NAGAPATH)

pattern = re.compile(r'''
(?P<profilename>[\<profilename\>])
''')

for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()

os.listdir() will store each file name in the directory at NAGAPATH as a list element into naga_profs

This is what I got so far. Angular brackets are used to denote group naming per reg ex convention, but in XML documents they are also used to denote tag names, so in my pattern I've escaped them, is that the correct way to get it to recognize them? Or should I use \W ? Or would the angular bracket be considered a unicode character?

I need to go through each iteration of the data variable and search the text for my pattern, if the data has it, leave it alone, if it does not, then delete it.

Answer 1 · 2015-09-15T01:46:06Z

September 15, 2015 1:46am

Okay so a few things you'd want to change:

First, your regex pattern is a multiline string, because you're using the 3 single quotes, so you're searching for a string with a newline character, then the group containing the tag, then another newline character. Your XML may not match that exact format, so I would suggest changing it to a single line and only using the one single quote on either end.

Also, you're using the square brackets around the string for the profilename tag, so you're basically searching for any of the characters within those square brackets. Since that includes the opening and closing angle brackets, you're likely match every XML file with that.

So try this for your pattern:

pattern = re.compile(r'(?P<profilename>\<profilename\>)')

And in fact, you don't really need a named group if you're just checking that it exists or not:

pattern = re.compile(r'\<profilename\>')

In your loop, you would just add a conditional that checks if data does not contain your pattern using the re.search method, but you'll probably need a multiline flag, and maybe some others (case insensitive?), depending on the structure and format of the XML files:

for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()
    if not pattern.search(data, re.M):
        # do stuff

I don't have any experience with deleting files, but it looks like the os.remove method might be the one you're looking for.

It needs a file path though, so you'd probably want to add to a new list containing the filenames/paths of the files you want to delete, then once you have that, loop through it and call os.remove and pass in the path (might need the full path, if you're not running the Python script from that directory):

corrupt_profiles = []
for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()
    if not pattern.search(data, re.M):
        corrupt_profiles.push(os.path.join(NAGAPATH, profile))

for path in corrupt_profiles:
    os.remove(path)

Let me know how you go with that!

Answer 2 · 2015-09-15T01:47:34Z

September 15, 2015 1:47am

I did it, i actually did it! Turns out i didnt have to use Reg Exes at all. I spent so much time trying to install this module called parsel with pip to handle XML documents, turns out i didnt need that either, but i finally did it! Tagging Kenneth Love to check it out. This is very specific to a problem I'm having but now when my mouse updates with bad profiles I can get rid of them quickly! Thanks for your awesome lessons!

import os

NAGAPATH = "C:\\Users\\brand\\Desktop\\TestProfs"

naga_profs = os.listdir(NAGAPATH)

del naga_profs[-1]

for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()
    if "<ProfileName>" in data:
        print("<ProfileName> Tag IN: ", profile)
    elif "<ProfileName>" not in data:
        answer = input("Deleting Naga Profile: {} due to missing tag Y/N?".format(profile))
        if answer == 'Y':
            os.remove(profile)
        elif answer == 'N':
            continue

I had to place the script in the directory so I didn't have to put full file path names in and could just use my profile variable and i used the del function to delete the last item of the list since that is the script and i wouldn't want it to delete itself now would I.

Answer 3 · 2015-09-15T05:15:13Z

September 15, 2015 5:15am

os.remove won't delete a file that is currently in use anyways, but it would probably throw an exception when it tried.

Since os.listdir() returns the list in an arbitrary order, I wouldn't rely on the script being the last item. Instead, use the following (assuming your script was named script.py):

naga_profs.remove('script.py')

Unless you're 100% sure the tag names will always be in that format (TitleCase), you might want to make use of str.lower():

if "<profilename>" in data.lower():

And make it easier to answer yes or no by also converting to lower or uppercase:

if answer.upper() == 'Y':

Finally, you don't need an elif ... continue, since that is the default behaviour at the end of a block of code in a loop. You don't need to do anything unless the answer is Y.

Answer 4 · 2015-09-15T07:17:31Z

September 15, 2015 7:17am

Here is the RE version, the non RE version is similar. I had added a bunch of extra code to concatenate path names together and have a user enter their device name, but i realized that the name of the actual account folder may vary from user to user and is likely not static so i omitted all that complicated but slightly impressive looking code for something much simpler. By default os.listdir() with no arguments lists the directory of the folder that the script is being called from, therefore anyone facing a similar issue with their Razer Synapse compatible device need only drop the script into their folder with the profiles and run it with Python.

As you can see i call remove() twice because of the fact two scripts now exist together, one is the Reg Ex version and the other is the vanilla version :D

import os
import re


naga_profs = os.listdir()

naga_profs.remove('prof_name_check.py')
naga_profs.remove('prof_check_re.py')

pattern = re.compile(r'\<ProfileName\>')


for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()
    if pattern.search(data, re.M):
        print("<ProfileName> Tag IN: ", profile)
    elif not pattern.search(data, re.M):
        answer = input("Deleting Razer Profile: {} due to missing tag Y/N?".format(profile))
        if answer.upper() == 'Y':
            os.remove(profile)

Non RE:

import os

naga_profs = os.listdir()

naga_profs.remove('prof_name_check.py')
naga_profs.remove('prof_check_re.py')

for profile in naga_profs:
    current = open(profile)
    data = current.read()
    current.close()
    if "<ProfileName>" in data:
        print("<ProfileName> Tag IN: ", profile)
    elif "<ProfileName>" not in data:
        answer = input("Deleting Razer Profile: {} due to missing tag Y/N?".format(profile))
        if answer.upper() == 'Y':
            os.remove(profile)

Thank you for all of your help, feel free to leave more feedback as you see fit. I definitely feel powerful, I've managed to create something useful, I've managed to tell my computer what to do. I read something online about GUI vs Command Line interfaces that said "When we're little we use pictures and point at things, but when we grow up we learn to read and write." Tech is my passion and I'm definitely feeling like I'm learning to read and write instead of point and click, this is awesome.

Welcome to the Treehouse Community

Looking to learn something new?

Brandon Wall

Brandon Wall

Help me build a Reg Ex to do something really useful!

4 Answers

Iain Simmons

Iain Simmons

Brandon Wall

Brandon Wall

Brandon Wall

Brandon Wall

Kenneth Love

Kenneth Love

Iain Simmons

Iain Simmons

Brandon Wall

Brandon Wall

Brandon Wall

Brandon Wall