Friday 6 March 2015

ETL vs ELT

etl-vs-elt
ETL vs ELT
Traditional Extract Transform Load (ETL) tools operate by first Extracting the data from various sources, Transforming the data in a proprietary, middle-tier ETL engine that is used as the staging area, and then loading the transformed data into the target data warehouse, integration server, or Hadoop cluster. Hence the term ETL represents both the names and the order of the operations performed.

Drawbacks of ETL approach

  • The data transformation step of the ETL process is by far the most compute-intensive, and is performed entirely by the proprietary ETL engine on a dedicated server. The ETL engine performs data transformations and data quality checks on a row-by-row basis, and hence, can easily become the bottleneck in the overall process. 
  • Data must be moved over the network twice – once between the sources and the ETL server, and again between the ETL server and the target data warehouse or Hadoop cluster. 
  • To ensure referential integrity by comparing data flow references against values from the target data warehouse, the referenced data must be downloaded from the target to the engine, thus further increasing network traffic, download time, and leading to additional performance issues. 

Extract Load Transform (ELT) approach by Oracle Data Integrator (ODI)


In response to the issues raised by ETL architectures, a new architecture has emerged, which in many ways incorporates the best aspects of manual coding and automated code-generation approaches. Known as E-LT, this new approach changes where and how data transformation takes place, and leverages existing developer skills, RDBMS and Big Data engines, and server hardware to the greatest extent possible. In essence, E-LT moves the data transformation step to the target RDBMS, changing the order of operations to: Extract the data from the source tables, Load the tables into the destination server, and then transform the data on the target RDBMS using native SQL operators. Note, with E-LT there is no need for a middle-tier engine or server.

No comments:

Post a Comment

Do not spam here