Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Data Analysis

John Ferney Dominguez Rendon
John Ferney Dominguez Rendon
3,314 Points

Business Intelligence and data analytics architecture

I am starting a new job position as a Data Engineer. One of the current projects is to build from scratch a data analytics or BI solution. The primary sources are structured information for about 100 servers. These servers have a similar architecture, this because projects are built in Drupal (CMS).
In a team meeting, we discussed about the best way to carry out the project, and these are the two possibilities to architecture it: 1 - Datawarehouse = Sql Server, ETL= SSIS, performance = SSAS, Reporting: Power BI 2- Datawarehouse = Postgresql, ETL: Python, performance: Python, reporting: Python (what I proposed). May you suggest me other alternatives? or what are the advantages or disadvantages of each one?. Thank you in advance for your help

Brendan Whiting
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,735 Points

This isn't my area of expertise at all but I'm curious about it. Can you use Google Analytics? What kind of logging do you have set up/can you set up in your front end or Drupal backend? We're sending all our logging to Splunk and the setting up a Splunk dashboard querying certain metrics.

2 Answers

John Ferney Dominguez Rendon
John Ferney Dominguez Rendon
3,314 Points

Thank you Brendan Whiting for your reply. What our team have been thinking to use:

  • Postgresql: Here we are planning to store all the information, It would be our Datawarehouse, that it is a no-normalize architecture (a relational database is a normalize architecture).
  • A concept that explain how to store information in the datawarehouse (DW) is ETL, it stands for Extract Transform Load. There are several tools that can do that, one of the most popular ones are Microsoft's tools with SSIS, Java-based: PDI or Python.
  • After having the information in the datawarehouse(Cleaned and query-optimized), the idea is to use this information to build Machine Learning, AI, Deep Learning projects and show this data in a web page, built in Python with Flask or Django. So to reply your questions:
  • Google analytics might be useful in some extend, but the idea is to build our own cutomized indicators through Python
  • Splunk as you said would be awesome, but one of our ideas is to have non-paid software, but if paid software outperform non-paid software, our team may consider not to build all in Python.
  • Using Analytics in our Drupal web pages is not a possibility, because these ones are stored in different servers, so the idea is to integrate all of them in one repository. I hope these additional detail have helped to explain better what is our current approach and to know what are your thoughts about it (advantages, disadvantages, better suggestions).
John Ferney Dominguez Rendon
John Ferney Dominguez Rendon
3,314 Points

What data: All the data is structured, stored in mysql databases, here there are personal, geographical, transactional information. hosting: The DW will be hosted in a AWS or Google Cloud server Thank you for your reply Brendan

Brendan Whiting
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,735 Points

Have you weighed the pros and cons of using a NoSQL database? If you're pulling in data from different places and consolidating it in one place, it may be useful to not have to have a schema. Postgres has a JSON type where you can put unstructured data as well.