Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python

olegovich7
olegovich7
6,605 Points

Checking items in lists

So word_list_dirty is a list of words which can contain some non-letter characters like commas in the end of the word. The script should delete those characters from each word if there are any and return just words. The problem is that if a word contains multiple of those, like 'bye).' script only deletes one non-letter. Also word_list_dirty contains 500 000 words, so i need to be very efficient and it seems to my code could be improved radically. How would you solve the task? Thanks a lot in advance!

ch_list = ["–", '–', '(', ')', ',', '.', '»', '«']
word_list = []
for word in word_list_dirty:
    word = list(word.lower())
    for letter in word:
        if letter in ch_list:
            word.remove(letter)
            print("".join(word))
    if word:
        word = "".join(word)
        word_list.append(word)

Print() is there only for debug, ideally in print() output all items should only contain letters

1 Answer

olegovich7,

I am a bit confused on a few parts of your question, but I'll try to help. I would make use of the built in features of .isalpha() found here: Python Docs

I chose to wrap this into a function.

def delete_nonalpha(word_list):
    alpha_only_list = []

    for word in word_list:
        alpha_only = "".join(ch for ch in word if ch.isalpha())  # .isalpha() checks for alpha. Use .isalnum() to check for alpha numeric
        alpha_only_list.append(alpha_only)  # if you want all lower: alpha_only.lower()
    return alpha_only_list

In the first part I am creating an empty list to store the new data. I am not sure if you are wanting to return the list back minus the non-alpha characters. The next thing I do is loop through each item in the word_list.

alpha_only_list = []  # if your end result is to not return a list then this is not needed

For each word I am iterating through each character and asking if it is an alpha character and if it is then I am joining it together to a temporary variable (temporary for each iteration of the word_list) and then appending it to the list created in step one.

for word in word_list:
    alpha_only = "".join(ch for ch in word if ch.isalpha()) . #  pythonic way of writing this
    alpha_only_list.append(alpha_only)

I am then returning the list back after each word in word_list has been iterated through.

return alpha_only_list

Let me know if I've caused more questions.

EDIT: Keep in mind this method will remove white space as well so the input should only be a list of just single words.

Another way might be to check first if the entire word is alpha only and if it is just append the word / return the word. If it isn't you would then remove all non-alpha characters. This might save time / resources.

my_list = ['Dustin', 'jAmES', 'SLiaoijLKjhuh:@*&^#*&^@*759868@#$*&^', 'all lower']

def delete_non_alpha(word_list):
    alpha_only_list = []

    for word in word_list:
        if word.isalpha():
            alpha_only_list.append(word.lower())  # appends only lowercase of word
        else:
            word = "".join(ch for ch in word if ch.isalpha())
            alpha_only_list.append(word.lower())  # appends only lowercase of word
    print(alpha_only_list)

delete_non_alpha(my_list)