Power BI dashboard with AWS Glue, S3 and Athena

I’ll show you a project I performed on July,2024:

Data preparation is a critical step in any data pipeline, especially when dealing with large datasets that need to be cleaned, transformed, and made ready for analysis. AWS offers powerful tools like Amazon S3 and AWS Glue to help streamline this process. Whether you’re new to AWS services or an experienced user, understanding how to effectively leverage these tools can significantly improve your data workflows. This article will guide you through the entire process, from setting up your environment to executing data transformations and querying your data.

 
 

Our gallery

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Setting Up Your Environment:

To begin, you’ll need to set up an Amazon S3 bucket, which will act as your primary data storage location. In our example, we create a bucket named laptop-donnees-s3. Within this bucket, it’s advisable to organize your data by creating two separate folders: one for raw data and another for transformed data. The raw-data folder will store unprocessed files such as CSVs, while the transformed-data folder will hold the data after it has undergone processing through AWS Glue.

Data Crawling and Cataloging with AWS Glue:

AWS Glue is an essential service when it comes to managing and transforming large datasets. One of its core features is the Crawler, a tool that automatically connects to your data source, scans the data, and determines its schema. The Crawler can handle various data sources, including S3 buckets, Oracle databases, and MySQL databases, among others.

Once the Crawler is set up and connected to your data source, it scans the available data. During this process, built-in classifiers in AWS Glue analyze the data and create a schema that describes the structure and format of your dataset. This schema information is then stored in AWS Glue’s Data Catalog as tables. These tables play a crucial role in subsequent ETL (Extract, Transform, Load) processes, as they provide a structured representation of your data.

Transforming Data with AWS Glue:

With the data cataloged, the next step is to transform it to meet your specific needs. AWS Glue allows you to define jobs that can automate these transformations. For example, if you need to clean up your data by modifying certain columns, such as converting a bigint type to an int, this can be easily done within an AWS Glue job.

AWS Glue supports a range of transformations, from simple data type conversions to more complex operations. It also allows for custom scripting in Python or Spark, enabling you to handle more sophisticated data processing tasks that aren’t covered by Glue’s out-of-the-box capabilities.

Tables: Organizing Your Data

Tables are structured formats used to organize and store data in a way that makes it easy to query and analyze. In the context of AWS Glue, when raw data is scanned by a Crawler, it is organized into tables based on the schema detected in the data. These tables are then stored in the AWS Glue Data Catalog.

Practical Example: Currency Conversion

Let’s consider a practical example where your dataset contains pricing information in Indian Rupees (INR), but you need to convert these prices to Canadian Dollars (CAD) for analysis. By applying SQL transformations within AWS Glue, you can use the current exchange rate to convert these amounts. The transformed data, now in CAD, can be saved back into the transformed-data folder in S3, making it easier to perform further analysis and comparisons.

Querying Transformed Data with Amazon Athena:

After your data has been transformed, the next logical step is to analyze it. This is where Amazon Athena comes in. Athena is a serverless query service that allows you to run SQL queries directly on your data stored in S3. Since Athena can read from the tables defined in AWS Glue’s Data Catalog, it seamlessly integrates with your ETL pipeline.

For instance, if you’re working with a dataset that includes sales data, you can use Athena to extract detailed insights such as the types of products sold, quantities, and other key metrics. By querying the transformed data in S3, you gain the ability to perform complex analyses without the need for additional infrastructure.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo, when an unknown printer took a galley.

Conclusion:

After applying data processing with AWS Glue, running queries in Amazon Athena, and performing analyses in Power BI, crucial insights into computer sales were obtained. The most in-demand models were identified, enabling more efficient stock management and strategic price adjustments to maximize sales.

Customer feedback and reviews were integrated to identify areas for product improvement, while monitoring return rates helped prevent potential issues with certain models or configurations. These efforts led to a better understanding of the market and the optimization of business operations, supporting more informed decisions aligned with consumer needs.

This project also allowed us to develop advanced skills in the use of AWS Glue, S3, and Athena, including managing ETL jobs, configuring crawlers, and optimizing data pipeline performance. Additionally, we strengthened our knowledge of data security on AWS and the use of Power BI for data visualization.

Here are some examples of queries used in AWS Athena:

Vous allez aimer aussi:

Para enviar seu comentário, preencha os campos abaixo:

Laisser un commentaire


*


*


Seja o primeiro a comentar!

Damos valor à sua privacidade

Nós e os nossos parceiros armazenamos ou acedemos a informações dos dispositivos, tais como cookies, e processamos dados pessoais, tais como identificadores exclusivos e informações padrão enviadas pelos dispositivos, para as finalidades descritas abaixo. Poderá clicar para consentir o processamento por nossa parte e pela parte dos nossos parceiros para tais finalidades. Em alternativa, poderá clicar para recusar o consentimento, ou aceder a informações mais pormenorizadas e alterar as suas preferências antes de dar consentimento. As suas preferências serão aplicadas apenas a este website.

Cookies estritamente necessários

Estes cookies são necessários para que o website funcione e não podem ser desligados nos nossos sistemas. Normalmente, eles só são configurados em resposta a ações levadas a cabo por si e que correspondem a uma solicitação de serviços, tais como definir as suas preferências de privacidade, iniciar sessão ou preencher formulários. Pode configurar o seu navegador para bloquear ou alertá-lo(a) sobre esses cookies, mas algumas partes do website não funcionarão. Estes cookies não armazenam qualquer informação pessoal identificável.

Cookies de desempenho

Estes cookies permitem-nos contar visitas e fontes de tráfego, para que possamos medir e melhorar o desempenho do nosso website. Eles ajudam-nos a saber quais são as páginas mais e menos populares e a ver como os visitantes se movimentam pelo website. Todas as informações recolhidas por estes cookies são agregadas e, por conseguinte, anónimas. Se não permitir estes cookies, não saberemos quando visitou o nosso site.

Cookies de funcionalidade

Estes cookies permitem que o site forneça uma funcionalidade e personalização melhoradas. Podem ser estabelecidos por nós ou por fornecedores externos cujos serviços adicionámos às nossas páginas. Se não permitir estes cookies algumas destas funcionalidades, ou mesmo todas, podem não atuar corretamente.

Cookies de publicidade

Estes cookies podem ser estabelecidos através do nosso site pelos nossos parceiros de publicidade. Podem ser usados por essas empresas para construir um perfil sobre os seus interesses e mostrar-lhe anúncios relevantes em outros websites. Eles não armazenam diretamente informações pessoais, mas são baseados na identificação exclusiva do seu navegador e dispositivo de internet. Se não permitir estes cookies, terá menos publicidade direcionada.

Visite as nossas páginas de Políticas de privacidade e Termos e condições.

Importante: Este site faz uso de cookies para melhorar a sua experiência de navegação e recomendar conteúdo de seu interesse. Ao utilizar nossos sites, você concorda com tal monitoramento.
Criado por WP RGPD Pro