Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Warehousing | Edureka

Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Warehousing | Edureka

Brief Summary

Alright guys, this session is all about understanding data warehousing and business intelligence (BI). We'll explore why companies need BI, the role of data warehousing, key terminologies like OLTP and OLAP, ETL processes, data marts, and metadata. Finally, we'll look at the complete data warehousing architecture and a demo of creating a data warehouse using Talent.

  • Business intelligence is important for company growth.
  • Data warehousing is a subset of business intelligence.
  • ETL is used to move data from database to data warehouse.

Need for Business Intelligence

Business intelligence (BI) is crucial for any company's growth. Successful MNCs plan, gather data, analyze it, and execute concrete plans. BI transforms raw data into useful information for business analysis, using data warehouse technology. This technology extracts, cleans, integrates, and loads data into data warehouses, which then enables data visualization and analysis for data analysts, scientists, and managers to gain business insights.

Need for Data Warehousing

Data warehouses are needed because data from various sources and databases cannot be directly visualized. Data must be integrated and processed first. A data warehouse integrates data from multiple databases, processes it, and presents it in a form that is easy to visualize. It acts as a central location for consolidated data, maintained separately from operational databases to prevent data corruption. The process involves extraction, transformation, and loading (ETL), followed by online analytical processing (OLAP) for business users to perform analysis and visualization.

What is Data Warehouse

A data warehouse is a central repository for consolidated data from multiple locations, maintained separately from operational databases for backup purposes. Operational data is extracted, transformed, and loaded into the data warehouse, enabling online analytical processing (OLAP) for business users to perform analysis and visualization. End-users can access historical data at any time using OLAP, which provides a series of snapshots for data analysis. The data warehouse is not loaded every time new data is added to the operational database.

Advantages of Data Warehouse

Data warehouses allow strategic questions to be answered by studying trends, making data retrieval faster and more accurate compared to databases. Data warehouses integrate data from multiple sources, interlinking tables using schemas, which enables users to pull data across various databases with a single query. Data warehouses make data more readable, transforming it into information that is easier to understand and use. Data warehouses are not products but strategies designed based on a company's requirements. They standardize data, remove inconsistencies, and store it in an easy format for analysis and access.

Properties of Data Warehouse

Bill Inmon, the father of data warehousing, defined it as subject-oriented, integrated, time-variant, and non-volatile. Subject-oriented means data is categorized by business subject. Integrated means data from disparate sources is stored in a single place. Time-variant means data is stored as a series of snapshots over time. Non-volatile means data is not updated or deleted, ensuring data integrity for analysis.

Key Terminologies: OLTP vs OLAP

OLTP (Online Transaction Processing) represents databases, containing current and past data, useful for running the business. OLAP (Online Analytical Processing) represents data warehouses, containing historical data, useful for analyzing the business. OLTP uses the entity-relationship model, while OLAP uses star, snowflake, or fact constellation schemas. OLTP provides primitive and detailed data, while OLAP provides summarized and consolidated data. OLTP is used for writing data, while OLAP is used for reading data. Databases range from 100 MB to 1 GB, while data warehouses range from 100 GB to 1 TB. Databases are fast and provide high performance, while data warehouses are highly flexible, offering different views through OLAP cubes.

Key Terminologies: ETL

ETL (Extract, Transform, Load) is the process of extracting data from various sources, transforming it to meet requirements, and loading it into a target data warehouse. Popular ETL tools include Talent and Informatica.

Key Terminologies: Data Mart

A data mart is a smaller version of a data warehouse, dealing with a single subject. Data marts are focused on one area, drawing data from limited sources, and take less time to build compared to data warehouses. Data marts provide more security and integrity by giving specific user bases access to certain data. Data warehouses store enterprise-wide data, while data marts store department-wise data. Data warehouses have multiple subject areas, while data marts have a single subject area. Data warehouses occupy large memory, while data marts occupy limited memory.

Types of Data Marts

There are three types of data marts: dependent, independent, and hybrid. Dependent data marts extract data from OLTP systems, populate it in a central data warehouse, and then transfer it to the data mart. Independent data marts receive data directly from the source system, suitable for small organizations. Hybrid data marts combine data from both OLTP systems and data warehouses.

Key Terminologies: Metadata

Metadata is data about data, containing information about where actual data is stored, its size, source, and creation date. In a data warehouse, metadata defines the source data, such as flat files and relational databases. Metadata saves time by defining the source and target, automating the process of updating the data warehouse. It is used to define which table is the source and target and which concepts are used to build business logic called transformation to the actual output.

Data Warehousing Architecture

Data comes from various sources, such as databases and flat files, and undergoes ETL to reach the staging area. The staging area is a temporary database before the data moves to the data warehouse. The ETL process continues during the conversion. The data warehouse stores raw data, metadata, and aggregate data, powering online analytical processing (OLAP). The data warehouse can be divided into data marts for different teams, enhancing data security.

Demonstration: Populating a Data Warehouse Using Talent

The demonstration involves using Talent to import data from an Oracle database into a data warehouse. The data set includes a 10,000-row customer table and a 50,000-row transactions table. The goal is to find customers with the lowest number of purchases. Talent Open Studio is used for data integration, with a drag-and-drop interface. The process includes configuring input connections, defining schemas, joining tables, filtering data, and setting the output to either an Excel file or a new database table.

Watch the Video

Share

Stay Informed with Quality Articles

Discover curated summaries and insights from across the web. Save time while staying informed.

© 2024 BriefRead