Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

How do I change this script to recursively consume all files of a certain type instead of using command line arguments?

The following code has three problems:

1. it needs to recursively process files of type *.gz in a particular directory/subdirectories and write newly named files without changing the existing files

2. it needs to be silent, the counting method for counting processed lines, and changed or redacted lines is echoing to standard out

3. the logging needs to be less verbose, currently too many log lines are being produced in the audit.log
#!/usr/local/bin/python3

import gzip
import logging
import os
import re
import sys

logging.basicConfig(filename='audit.log', format='%(asctime)s %(message)s',
                    level=logging.INFO)


def main(infilename, outfilename):
    """ Redact social security number from a gzip file

    Args:
            infilename (str): path of the input file
            outfilename (str): path of the output file

    """

# check if input file exists
    if not os.path.exists(infilename):
        print("Error: Input file does not exist!")
        sys.exit()

# get text from input file
    with gzip.open(infilename, 'rt') as infile:
        logging.info('infilename: processed file {}'.format(infilename))
        lines = infile.readlines()

    pattern = re.compile(r'\d{3}[^\w]\d{2}[^\w]\d{4}')
    count_redacted = 0
    count_processed = 0

    with open(outfilename, 'w') as outfile:
        for line in lines:
            match = re.search(pattern, line)
            if match:
                newline = re.sub(pattern, r'###-##-####', line)
                count_redacted = count_redacted + 1
                outfile.write(newline)
                print(count_redacted)
                logging.info('count_redacted: Lines redacted: {}'
                             .format(count_redacted))
            else:
                count_processed = count_processed + 1
                outfile.write(line)
                print(count_processed)
                logging.info('count_processed: Lines not modified: {}'
                             .format(count_processed))


if __name__ == "__main__":
    if not len(sys.argv) == 3:
        print("Usage: {} input_file output_file".format(sys.argv[0]))
    main(sys.argv[1], sys.argv[2])
Any help greatly appreciated