In the data-driven world, seamless integration between various tools is crucial for effective data analysis and visualization. This article explores a practical workflow that involves extracting data from an Excel in a csv file, loading it into Snowflake, and ultimately visualizing it in Power BI. By the end, you’ll understand how to create a robust data pipeline using these tools, enabling you to make data-driven decisions efficiently.
Step 1: Preparing the Snowflake Environment
Before we start, we need to set up the Snowflake environment. Snowflake is a powerful cloud-based data warehouse that enables you to store and analyze large volumes of data.
-
Creating a Database and Schema: We begin by creating a database named
db_data_masters
and a schema calledchallenges
to organize our data. - Creating a Table: Next, we create a table named
casa
to store our house data. This table includes various attributes such asprice
,bedrooms
,bathrooms
, and more.
With our environment ready, the next step is to load the data from Excel into Snowflake. This process involves creating an internal stage to temporarily hold the data before loading it into the casa
table.
-
Creating a Stage: We create an internal stage in Snowflake to hold the data file.
- Loading Data: The data is loaded from a CSV file (
kc_house_data.csv.gz
) into thecasa
table. We define a custom file format to ensure the data is correctly interpreted, handling date and timestamp formats appropriately.
We also handle potential errors, such as mismatched column counts, by instructing Snowflake to skip problematic files:
COPY INTO CASA
FROM @my_stage/kc_house_data.csv.gz
FILE_FORMAT = (TYPE = CSV
FIELD_DELIMITER = ‘,’
SKIP_HEADER = 1
DATE_FORMAT = ‘YYYYMMDDTHH24MISS’
TIMESTAMP_FORMAT = ‘YYYYMMDDTHH24MISS’
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE)
ON_ERROR = ‘SKIP_FILE’;
Step 3: Connecting Power BI to Snowflake
After successfully loading the data into Snowflake, the final step is to visualize it using Power BI.
-
Establishing the Connection: Open Power BI and navigate to the « Get Data » section. Select Snowflake as the data source and provide the necessary credentials to connect to your Snowflake instance.
-
Loading Data into Power BI: Once connected, select the
db_data_masters
database and thecasa
table within thechallenges
schema. Load the data into Power BI for further analysis and visualization. -
Creating Visualizations: With the data loaded, you can now create various visualizations to analyze the housing data, such as price trends, distribution of bedrooms and bathrooms, and more. Power BI’s rich set of visualization tools allows you to transform raw data into actionable insights.
You will be asked for your server and your warehouse in Snowflake.
In table visualization, you will have an overview of your table and can make any changes you want.
Afterward, we can use the report to gain insights and visualize more information related to what we’re looking for.
This project was carried out as a demonstration of a hands-on practice with my professor and friend Rodrigo Schammass – (https://www.linkedin.com/in/rodrigo-schammass/ ) for our training in Power BI and Snowflake.
Laisser un commentaire