Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python

Asher Orr
seal-mask
.a{fill-rule:evenodd;}techdegree
Asher Orr
Python Development Techdegree Student 5,945 Points

While importing csv entries to a database: How can I update the db IF a duplicate csv entry is found?

Hi everyone! My project has a function called add_csv(). It adds entries in a CSV file to a database.

def add_csv():
    with open('inventory.csv') as csvfile:
        data = csv.reader(csvfile)
        next(data)
        for row in data:
            product_in_db = session.query(Product).filter(
                Product.product_name == row[0]).one_or_none()
            if product_in_db == None:
                product_name = row[0]
                product_price = clean_price(row[1])
                product_quantity = clean_quantity(row[2])
                date_updated = clean_date_updated(row[3])
                new_product = Product(product_name=product_name, product_price=product_price,
                                      product_quantity=product_quantity, date_updated=date_updated)
                session.add(new_product)
#Note: This is where the code begins dealing with a duplicate value.
            elif product_in_db == True:
                if product_in_db.date_updated < new_product.date_updated:
                    session.add(new_product)
                    session.delete(product_in_db)
                elif product_in_db.date_updated > new_product.date_updated:
                    pass
        session.commit()

As you can see, I want to check if a duplicate product_name value exists in the database.

If there IS a duplicate value, the function should check which date_updated value is the most recent. Only that entry should exist in the database.

For example, let's say I have this entry on line 10 of the csv file:

Wine - Chateau Bonnet,$7.50,66,3/10/2018

And then on line 20, this duplicate exists (same product_name value.)

Wine - Chateau Bonnet,$7.41,54,3/10/2020

Since the line 20 product is the most recently updated entry, I would like for my database entry to reflect that entry's information- not the information from 2018.

I'm having trouble getting this to work, though. If there are 2 duplicates, only the one which comes first in the CSV gets saved to the database. It doesn't matter what the date is!

For example, let's say this is line 19 and 20 in the CSV:

Wine - Chateau Bonnet,$7.50,66,3/10/2018
Wine - Chateau Bonnet,$7.41,54,3/10/2020

When I check my database, I see the Wine entry from 2018 (but not the 2020 one.)

If I switch the order in the CSV, though, then re-create my database, I only see the wine entry from 2020 (not the 2018 one - that one isn't saved to the database.)

Can anyone help me understand why this is happening, and what I can do to remedy this problem?

Thank you for reading!

Note: If it helps, here is my database model:

from sqlalchemy import (create_engine, Column,
                        Integer, String, DateTime)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker


engine = create_engine('sqlite:///inventory.db', echo=False)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()


class Product(Base):
    __tablename__ = "products"
    product_id = Column(Integer, primary_key=True)
    product_name = Column("Product Name", String)
    product_price = Column("Product Price", Integer)
    product_quantity = Column("Product Quantity", Integer)
    date_updated = Column("Date Updated", DateTime)

    def __repr__(self):
        return f'Name: {self.product_name}, Quantity: {self.product_quantity}, Price: {self.product_price}, Date Updated: {self.date_updated}'


if __name__ == "__main__":
    Base.metadata.create_all(engine)

1 Answer

Steven Parker
Steven Parker
217,518 Points

Instead of elif product_in_db == True:, try using just a plain else:. I'm guessing that when an item is found, the value is going to be the actual item and not be equal to True.

Asher Orr
seal-mask
.a{fill-rule:evenodd;}techdegree
Asher Orr
Python Development Techdegree Student 5,945 Points

Hey Steven, thanks for commenting! That helped me solve the problem. I needed to use an else, but there was one other issue:

I noticed that my new_product variable, in the first iteration of my code, was unbound. I updated my code to actually assign it a value:

def add_csv():
    with open('inventory.csv') as csvfile:
        data = csv.reader(csvfile)
        next(data)
        for row in data:
            product_in_db = session.query(Product).filter(
                Product.product_name == row[0]).one_or_none()
            if product_in_db == None:
                product_name = row[0]
                product_price = clean_price(row[1])
                product_quantity = clean_quantity(row[2])
                date_updated = clean_date_updated(row[3])
                new_product = Product(product_name=product_name, product_price=product_price,
                                      product_quantity=product_quantity, date_updated=date_updated)
                session.add(new_product)
            else:
                product_name = row[0]
                product_price = clean_price(row[1])
                product_quantity = clean_quantity(row[2])
                date_updated = clean_date_updated(row[3])
                new_product = Product(product_name=product_name, product_price=product_price,
                                      product_quantity=product_quantity, date_updated=date_updated)
                if product_in_db.date_updated < new_product.date_updated:
                    session.add(new_product)
                    session.delete(product_in_db)
                elif product_in_db.date_updated > new_product.date_updated:
                    pass
        session.commit()

That and changing the elif product_in_db == True: solved the problem. Appreciate you as always! Thanks again for responding.