This week let’s talk about what is ETL System testing, and what exactly do we test when we perform this type of test?
But as usual let’s start from the very beginning and understand what ETL type of system is.
What Is ETL System?
ETL – Extract Transfer Load.
This type of a system has a special structure that is usually used in case when we want to either load or extract data in big amounts that cannot be contained in a regular API. In other words we are talking about a components that their main purpose is to move around big amounts of data and manipulate it along the way based on specific logic. ETL system implementations are different; involving different data sources on different platforms; each of those systems can be broken down to a set of interfaces that work together to move the data, but despite the differences we can commonly group them based on the direction the data flows at from data source perspective into two groups:
- Inbound interfaces
- Outbound interfaces
It is important to understand that ETL system can include very complex logic and calculation that are happening during the data movement process, which is both brings the value and the complexity to such systems and makes the testing effort especially challenging.
What Do We Test in ETL System?
When we talk about ETL testing we talk about testing a process that moves data around, either extracting data from a data source or loading data into data source.
While the specifics of what is being tested is changing based on the system and implementation we can still group all the test in 3 main areas:
- Verifying the source data structure and content (file, database etc) before load/extract
- Verifying the target data structure and content (file, database etc) after load/extract
- Verifying the data manipulation logic during load/extract
What tools we use?
This type of systems are lacking in proper automation tools that can be used to automate the testing process beyond a dedicated Unit testing code created for specifics of each implementation and hence a lot of testers find them self using half automated half manuals methods to verify. The most common is the use of SQL tools for data verification (I.e SQL developer, SQL management studio) and powerful file manipulation tools (i.e notepad++, beyond compare etc)
Comments
Post a Comment