Extract, transform, load is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s).
- Extract, extracting the data from the source system(s).
- Transform, a series of rules or functions are applied to the extracted data in order to prepare it for loading into the end target.
- Load, loads the data into the end target, which can be any data store including a simple delimited flat file or a data warehouse.
The typical real-life ETL cycle consists of the following execution steps:
- Cycle initiation
- Build reference data
- Extract (from sources)
- Validate
- Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
- Stage (load into staging tables, if used)
- Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
- Publish (to target tables)
- Archive
Tools
Top 12 Free and Open Source ETL Tools for Data Integration
- Apache Airflow, Airflow is a platform created by the community to programmatically author, schedule and monitor workflows..
- Apache Kafka, is a distributed streaming platform.
- Apache NiFi, supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
- GeoKettle, .
- Apatar, .
- CloverETL, .
- HPCC Systems, .
- Jaspersoft ETL, .
- Jedox, .
- KETL, .
- Pentaho Kettle, .
- Scriptella, .
- Talend Open Studio, .