12 January 2021

#ETL


Assume we have employees table (around 100) record. write a query to find the which employess as joined first in the organisation?
Can there be sub-steps for each of the ETL steps?
Compose a question to refreshed employess somewhere in the range of 150 and 180 compensation dependent on office
Difference - ETL Testing vs Database Testing?
Difference - ETL tools vs OLAP tools?
Difference - ETL vs ELT.
Difference - lpad vs rpad
Difference - Manual Testing vs ETL Testing?
Difference - OLAP tools vs ETL tools ?
Difference - Power Mart vs Power Center?
Difference - PowerCenter vs PowerMart?
Difference - The data warehouse vs data mining?
Difference among rank and thick position?
Does PowerMart provide connections to ERP sources?
Explain the concept of Extraction, Transformation, and Loading?
Explain the use of Lookup Transformation?
Explain these terms Session, Worklet, Mapplet and Workflow ?
How are the tables analyzed in ETL?
How can the mapping be fine-tuned in ETL?
How does the operational data store work?
How is ETL used in Data Migration Project? Explain it.
How many steps are there in an ETL process?
How to check whether table has been loaded with data are not?
How to chk is there any duplicate records is exists in the table?
How to chk the Primary Key?
How to make another table from effectively existing table with any information make table tablename as
How to make another table from effectively existing table with no information
How we use ETL in Data Warehousing?
How we use ETL in third party management?
How you can extract SAP data using Informatica?
How you will display all the columns in the table?
If there are ten thousand records in a source system, how do we ensure that all ten thousand are loaded to the target without any function
In case you have non-OLEDB (Object Linking and Embedding Database) source for the lookup what would you do?
In what case do you use dynamic cache and static cache in connected and unconnected transformations?
list few ETL tool available in the market
Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?
Mention what is the advantage of using DataReader Destination Adapter?
Select all the employees where third charcter of first_name is ‘a'
Using SSIS ( SQL Server Integration Service) what are the possible ways to update table?
What are Cubes and OLAP Cubes?
What are ETL tester responsibilities?
What are initial load and full load?
What are joiner and lookup?
What are Schema Objects?
What are the characteristics of Data Warehouse?
What are the characteristics of snapshots?
What are the differences between connected and unconnected lookups in ETL?
What are the differences between data warehousing and data mining?
What are the differences between ETL and BI tools?
What are the differences between Unconnected and Connected lookup?
What are the ETL bugs?
What are the ETL testing operations includes?
What are the ETL tools available in the market?
What are the ETL Tools?
What are the joins accessible in sql
What are the names of the layers in ETL?
What are the smoke test can be performed on the ETL Testing?
What are the steps followed in ETL testing process?
What are the steps followed to choose the ETL process?
What are the steps involved in an ETL process?
What are the three tiers in ETL?
What are the types of Data Warehouse systems?
What are the types of partitions in ETL?
What are the various tools used in ETL?
What are various set operators available in SQL?
What are views?
What does the ODS in ETL generate?
What factless fact schema is and what is Measures?
What is a 3-tier system in ETL?
What is a Data Mart?
What is a dimension table and how it is different from the fact table?
What is a mapping, Session, Worklet, and Mapplet?
What is a materialized view?
What is a session in ETL?
What is an ETL process?
What is BI?
What is Bus Schema?
What is data purging?
What is data source view?
What is ETL mapping sheet? Define its significance.
What is ETL Pipeline?
What is ETL Testing and how its not the same as database Testing?
What is fact? What are the types of facts?
What is full load and incremental or refresh load?
What is Grain of Fact?
What is mapping in ETL?
What is a materialized view log?
What is incremental load?
What is operational data store?
What is snapshots?
What is Worklet in ETL?
What is Operation Data Source?
What is partitioning, hash partitioning and round robin partitioning?
What is the benefit of increasing the number of partitions in ETL?
What is the contrast among subquery and correclated subquery
What is the data extraction phase in ETL?
What is the Data Pipeline?
What is the need for ETL Testing?
What is the responsibility of ETL tester?
What is the staging place in the ETL Testing?
What is the three-layer architecture of an ETL cycle?
What is the transformation in ETL Testing?
What is the use of dynamic cache and static cache in transformation?
What is the use of Mapplet in ETL?
What is tracing level and what are the types?
What is transformation?
What is Workflow in ETL?
What staging area is and what is the purpose of a staging area?
When are the tables in ETL analyzed?
When we need the staging area in the ETL process?
Where the user use ETL concepts? Explain it.
Which partition is used to improve the performances of ETL transactions?
With which apps can PowerCenter be connected?
Write a query to calcualted how many employees on each department?
Write a query to display all the employees salary in 10 digit
Write a query to find all the employees working in deptartment 10,50,60
Write a query to find all the jobs for which employees are joined in the current year.
Write a query to find number of employees under each manager
Write a query to match abc=ABC
Write a query to updated all the employess salary base don department
Write a question to discover every one of the representatives who have participated in the long stretch of Jan
Write a question to discover how long representatives has functioned from is hire_date
Write a question to discover the which supervisor has overseeing in excess of 5 representatives.
Write a question to number of month representatives has working in organization
Write a question to show every one of the workers where initial 3 letter of first_name is bab
Write an inquiry to discover total of salries of departement_id is 50000
Write an inquiry to discover what number of workers has joined following 15 of the present month
Write the query to find second max salary in each department

ETL - Extract Transform and Load - FAQ

  • ETL provides a method of moving the data from various sources into a data warehouse.
  • Extract: Extract is the process of fetching (reading) the information from the database. At this stage, data is collected from multiple or different types of sources.
  • Transform: Transform is the process of converting the extracted data from its previous form into the required form. Data can be placed into another database. Transformation can occur by using rules or lookup tables or by combining the data with other data.
  • Load: Load is the process of writing the data into the target database.
  • In the first step extraction, data is extracted from the source system into the staging area.
  • In the transformation step, the data extracted from source is cleansed and transformed .
  • Loading data into the target datawarehouse is the last step of the ETL process.
  • It helps companies to analyze their business data for taking critical business decisions.
  • Transactional databases cannot answer complex business questions that can be answered by ETL example.
  • A Data Warehouse provides a common data repository
  • As data sources change, the Data Warehouse will automatically update.
  • Well-designed and documented ETL system is almost essential to the success of a Data Warehouse project.
  • Allow verification of data transformation, aggregation and calculations rules.
  • ETL process allows sample data comparison between the source and the target system.
  • ETL process can perform complex transformations and requires the extra area to store the data.
  • ETL helps to Migrate data into a Data Warehouse. Convert to the various formats and types to adhere to one consistent system.
  • ETL is a predefined process for accessing and manipulating source data into the target database.
  • ETL in data warehouse offers deep historical context for the business.
  • It helps to improve productivity because it codifies and reuses without a need for technical skills.

ETL testing

  • It is performed for data extraction, transformation and loading for BI reporting.
  • It used for information, forecasting, and analytical reporting. 
  • It contains historic data that cannot be used in business flow environment. 
  • It using multidimensional method.
  • It is applied to OLAP systems. 
  • It uses the de-normalized data with fewer joins, more indexes, and aggregations. 
  • It verifies if table relations join and keys are preservers during the transformation.
  • It identifies data sources and requirements.
  • It validate of data movement from source to the target system.
  • It does data count verification in the source and target system.
  • Tools - QuerySurge, Informatica, etc. tools are used. 

Database Testing

  • It is performed to validate and integrate the data.
  • It focuses on verification of the column in a table that has valid data values.
  • It verifies if the data is missing in the column. Here, we check that are there any null values in columns which should have a valid value?
  • To verify whether the primary or foreign key is maintained, database testing is used.
  • We verify the accuracy of data in columns.
  • This testing is used to integrate the data from multiple applications and server impact.
  • It contains the transactional system where the flow of business occurs.
  • It is using ER method.
  • It is used in OLTP system.
  • The database uses normalized data with joins.
  • Tools - QTP, Selenium tools are used in database testing.

ETL Process in Data Warehouses

Extraction - Data is extracted from the source system into the staging area

  • Three Data Extraction methods:
  •   Full Extraction
    •   Partial Extraction- without update notification.
    •   Partial Extraction- with update notification
    • Some validations are done during Extraction:
  •   Reconcile records with the source data
  •   Make sure that no spam/unwanted data loaded
  •   Data type check
  •   Remove all types of duplicate/fragmented data
  •   Check whether all the keys are in place or not


Transformation - Data extracted from source server is raw and not usable in its original form. Therefore it needs to be cleansed, mapped and transformed

Data Integrity problem

  •  Different spelling of the same person like Jon, John, etc.
  •  There are multiple ways to denote company name like Google, Google Inc.
  •  Use of different names like Cleaveland, Cleveland.
  •  Different account numbers are generated by various applications for the same customer.
  •  In some data required files remains blank
  •  Invalid product collected at POS as manual entry can lead to mistakes.

Validations

  •  Filtering – Select only certain columns to load
  •  Using rules and lookup tables for Data standardization
  •  Character Set Conversion and encoding handling
  •  Conversion of Units of Measurements like Date Time Conversion, currency conversions, numerical conversions, etc.
  •  Data threshold validation check. For example, age cannot be more than two digits.
  •  Data flow validation from the staging area to the intermediate tables.
  •  Required fields should not be left blank.
  •  Cleaning ( for example, mapping NULL to 0 or Gender Male to "M" and Female to "F" etc.)
  •  Split a column into multiples and merging multiple columns into a single column.
  •  Transposing rows and columns,
  •  Use lookups to merge data
  •  Using any complex data validation (e.g., if the first two columns in a row are empty then it automatically reject the row from processing)

Loading

  • Loading data into the target datawarehouse database is the last step of the ETL process
  • Types of Loading:
    •  Initial Load — populating all the Data Warehouse tables
    •  Incremental Load — applying ongoing changes as when needed periodically.
    •  Full Refresh —erasing the contents of one or more tables and reloading with fresh data.
  • Load verification
  • Ensure that the key field data is neither missing nor null.
  • Test modeling views based on the target tables.
  • Check that combined values and calculated measures.
  • Data checks in dimension table as well as history table.
  • Check the BI reports on the loaded fact and dimension table.

ETL Tools

  •  Open Text Integration Center
  •  Relational Junction ETL Manager (Sesame Software)
  •  CloverETL
  •  PowerCenter Informatica
  •  Talend Studio for Data Integration
  •  Oracle Warehouse Builder (OWB)
  •  Oracle Data Integrator (ODI)
  •  Data Migrator (IBI)
  •  Cognos Data Manager
  •  IBM Infosphere Warehouse Edition
  •  SQL Server Integration Services (SSIS)
  •  IBM Infosphere Information Server
  •  Pervasive Data Integrator
  •  Pentaho Data Integration
  •  19Adeptia Integration Server
  •  4SAS Data Management
  •  16Centerprise Data Integrator
  •  20Syncsort DMX
  •  10Sagent Data Flow
  •  21QlikView Expressor
  •  2SAP Data Services
  •  6Elixir Repertoire for Data ETL

No comments:

Post a Comment

Most views on this month