Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community!

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial


need help figuring out why a sequelize findOrCreate(...) works sometimes but not others.

I am converting json file items to SQL table entries. I have 5 SQL entities that might be written. Courses (online learning experiences), CourseDetails, Learners, Outcomes and OutcomeDetails. The cardinal goal is to NOT create any redundancies in the SQL while using auto-generated Ids in all the tables. The Table definitions are roughly as follows:

Learners - 1 row for each unique individual who has ever taken any course or courses.
Courses  - 1 row for each unique Course (no mater how many Learners have taken the Course)
Course Details - 1 row for each question asked in the Course, describing the correct results
Outcomes - 1 row that captures the intersection of 1 Learner, taking 1 Course, at 1 Completion Date/Time
Outcomes Details - 1 row for each question asked in the Course, describing the Learner responses

Each json file is an instance of an Outcome with its Details. Therefore, each json file represents 1 Learner, taking 1 Course, completing it at 1 Time. However, any json file may have either a Learner or a Course (or both, if repeating a course) that has already been saved to the SQL tables, either in a previous batch OR in a previous file in this batch.

The requirement is to save any and all Outcomes/OutcomeDetails, not already saved and any and all Learners and Courses, not already saved. Remember that Outcomes/OutcomeDetails have the extra dimension of time for uniqueness, whereas Learners and Courses are static, in the sense that, once they are saved, their record persists and they never need be saved again (as a separate entity), even should the same Learner take the same Course at a later time (because that would be a new Outcome). In other words Learners and Courses are unique as static Entities, but may be included in multiple Outcomes.

To achieve this I am using .findOrCreate(...) on each of the entities. If the row is found it will not be created, otherwise it will. The problem I am running into is the asynchronous nature of javascript. Even though I am using async / await, the await function is not preventing (only in some cases) a second Course of the same Number from creating a row in the Courses table, whereas I have run the app many times and the Learners table never gets updated with redundant rows.

Here is a link to the gist with the code in it. Help is appreciated.


2 Answers

Brendan Whiting
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,731 Points

I'm not sure if this is the root of the issue, but I can point to one place that I think could be problematic:

router.get('/', (req, res) => {
    //  iterate on outcomes
    let failedOutcomes = [];
    outcomeIds.forEach((outcomeId) => {
        let jsonData;
        let uri = new URL(`${url}${outcomeId}`);
        jsonData = JSON.parse(fs.readFileSync(uri));
        processOutcome(jsonData, outcomeId, failedOutcomes);
    /*if ( failedOutcomes.length ) {
        let uri = `${url}${idsID}`;
        let dataIn  = fs.readFileSync(uri);
        let idsList = JSON.parse(dataIn).ids;
        idsList = idsList.concat(failedOutcomes);
        idsList = JSON.stringify({"ids": idsList});
        fs.writeFileSync(uri, idsList);
    res.render("process");  //  install a progress bar of some sort
}); // end of router GET

It seems like this is what we're doing here... when a client hits the "/" endpoint, we loop through each of the ids, for each of ideas we process all the json data by checking the database to see if it exists, and creates new records if it doesn't exist.

I think what you need to do is load your seed data into the database when the server boots up. It shouldn't wait until a client hits an endpoint to seed data. You're also going through the same process of seeding the data for each outcomeId. You only need to do it once. It's also generally a bad idea to fire off anything asynchronous inside a loop.

I suggest turning on logging in your sequelize config, that way it will print out to the console what SQL queries are actually being run. What I'm expecting you'll see is a lot of extra redundant queries are being run, and since they get fired off in a loop they maybe getting mixed up with each other somehow.


What is probably not clear from the gist is that the "require" (const outcomeIds = require('../services/idsList');) brings in the reconciled ids list (an array of json file ids). That is what I am iterating on in the "outcomeIds.forEach(...)". I am not sure what else I can do to "seed" the data base, unless it would be to build a service for the 5 SQL Table entities to reconcile an array for each table entity before hitting the endpoint and then "require" those services, as well, and simply iterate on each of the reconciled arrays to actually perform the SQL writes.

My outcomes Ids list looks like this:


5sM5YLnnNMN_1525523468000.json (Learner = daffy@duck.com)
5sM5YLnnNMN_1527517868000.json  (Learner = yogi@bear.com)
6tN6ZMooONO_1530730049000.json (Learner = daffy@duck.com)
6tN6ZMooONO_1541011649000.json  (Learner = foghorn@leghorn.com)

The structure is (to the left of the _) = course number, (to the right of the _) = completion date/time. They represent the 4 outcomes I expect to get (and in fact do get). I also have 3 learners, spread across the 4 outcomes (1 learner took 2 courses). The Courses table and the Learners table serve as reference tables. By that I mean there should only be one Learner in the Learners table for each Learner. There should only be one Course in the Courses table for each Course. Each Course will have n rows in the CourseDetails table that correspond to the n questions to be answered in the course.

Since I posted this I have been working and debugging. I found a few typos that resulted in some of my problems. I am actually in pretty good shape at this point. The whole thing works almost perfectly. The only problem I am having now is that I have redundant rows in my Courses table, both 5sM5YLnnNMN and 6tN6ZMooONO appear twice.

My current results are:

3 Learners -- which is as expected
4 Courses -- which is not as expected (each Course appears twice)
20 CourseDetails -- which is as expected (2 Courses X 10 questions each, no redundancies)
4 Outcomes -- which is as expected
40 OutcomeDetails -- which is as expected (4 Outcomes X 10 questions each, no redundancies)

I am sure that it is a problem with asynchronous processing. What I am not sure of is why the Course fail to operate as expected when the Learners do not fail. There is a potential for redundancy in the Learners as well, that dos not happen (which is a good thing).

Oh by the way, the apparent default for SQL logging must be console.log, because everything is displaying to the console. That is why I know my problem is an async problem I can see the Courses selecting twice for each course before the first first commit happens for the course, even though I am using async / await.

Your "seeding" idea does give me an idea though. I think I will try individual services that will build the individual arrays for each of the SQL tables. In the services I will remove any redundancies that appear. Then I will require the services in process.js. That will pass in a reconciled array for each table with only the rows that need to be written and I will iterate on those arrays to write the SQL entries. I am not sure that is best practice but maybe it will work.

Brendan Whiting
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,731 Points

If you're finding yourself with bugs related to typos, maybe you need to set yourself up with a better text editor/IDE and plugins for it. I use WebStorm and I get nice squiggly lines if I try misspell a variable and lots of other helpful things.

There's a tutorial in the sequelize docs about using it together with express. There are also some sequelize/express starters on github. Could be useful to get a feel for the right architecture.

But mainly I think you should have something like sequelize.sync().then(() => Course.bulkCreate( /* your json */)) to seed your data, and should be outside of any express route. This should be work that happens when the server boots up, not every time someone hits a route.

Brendan et al.

Problem solved. As you suspected Brendan I was mixing synchronous coding with asynchronous coding. Here is the solution and the note I wrote to my self to avoid this problem in the fulture.

//  NOTE: to self. This block of code solves the problem of mixing
    //  the synchronous "array.forEach()" with embedded asynchronous tasks
    //  by replacing it with an asynchronous "for ( item of array )" 
    //  with the same embedded asynchronous tasks and "awaiting" them.

    //  define the function
    const processOutcomeIds = async (outcomeIds) => {
        for ( const outcomeId of outcomeIds ) {
            let jsonData;
            let uri = new URL(`${url}${outcomeId}`);
            jsonData = await JSON.parse(fs.readFileSync(uri));
            await processLearner(jsonData);
            await processCourse(jsonData);
            await processOutcome(jsonData, uri, outcomeId, failedOutcomes);
    //  call the function
    // ... more code ...

This replaces the previous:

 outcomeIds.forEach((outcomeId) => {
        let jsonData;
        let uri = new URL(`${url}${outcomeId}`);
        jsonData = JSON.parse(fs.readFileSync(uri));
        processOutcome(jsonData, outcomeId, failedOutcomes);