Skip to main content

What is ETL system testing?


This week let’s talk about what is ETL System testing, and what exactly do we test when we perform this type of test?
But as usual let’s start from the very beginning and understand what ETL type of system is.

What Is ETL System?

ETL – Extract Transfer Load.
This type of a system has a special structure that is usually used in case when we want to either load or extract data in big amounts that cannot be contained in a regular API. In other words we are talking about a components that their main purpose is to move around big amounts of data and manipulate it along the way based on specific logic. ETL system implementations are different; involving different data sources on different platforms; each of those systems can be broken down to a set of interfaces that work together to move the data, but despite the differences we can commonly group them based on the direction the data flows at from data source perspective into two groups:
  •         Inbound interfaces
  •        Outbound interfaces

It is important to understand that ETL system can include very complex logic and calculation that are happening during the data movement process, which is both brings the value and the complexity to such systems and makes the testing effort especially challenging.

What Do We Test in ETL System?

When we talk about ETL testing we talk about testing a process that moves data around, either extracting data from a data source or loading data into data source.
While the specifics of what is being tested is changing based on the system and implementation we can still group all the test in 3 main areas:
  •         Verifying the source data structure and content (file, database etc) before load/extract
  •         Verifying the target data structure and content (file, database etc) after load/extract
  •         Verifying the data manipulation logic during load/extract


 What tools we use?

This type of systems are lacking in proper automation tools that can be used to automate the testing process beyond a dedicated Unit testing code created for specifics of each implementation and hence a lot of testers find them self using half automated half manuals methods to verify. The most common is the use of SQL tools for data verification (I.e SQL developer,  SQL management studio) and powerful file manipulation tools (i.e notepad++, beyond compare etc) 

Comments

Popular posts from this blog

Proper QA estimation in Agile project

  Today, I want to chat about a common issue in Agile development – story point estimation. You see, it's easy to get caught up in estimating how long a feature will take to develop and forget about the crucial Quality Assurance (QA) effort.  It happens to often: your team is in the middle of sprint planning. New user stories are being discussed, and estimations are given, but there's a catch. You've estimated the development time perfectly, but you've barely even glanced at how much effort QA will take. Why? Agile talks about Development Team and you mistakenly thought that it is all about DEVELOPMENT. Sound familiar? Trust me; it's a more common scenario than you might think. So, what's the big deal? Well, when you leave QA effort out of the equation, several not-so-great things can happen: Vilocity can suffer: You might think you can squeeze more into a sprint than is humanly possible, setting your team up for disappointment and overcommitment. Quality can de

Performance testing vs Load testing vs Stress testing

Today I want to discuss a popular topic regarding a difference between Load and Performance testing. Those two types of testing are commonly used together but there are several key differences between the two. To get a better understanding of the topic lets have a real life example from one of my clients and use it to explain the difference. I have worked for a client that was building an in-house web application that provides its customers with an option to select and order different products and services. The request was, before the up-coming release, to test the performance of their product. We started the task by trying and understand what they expect from performance point of view and then went through an exercise of defining what is captured by what test. The client said that they are looking to have about 900-1300 active users on the site at a given moment, and the expected response time (for a page to load) should be less than 5 seconds. With those details what are the t

What is the velocity of an Agile scrum methodology?

Let's discuss some of the important measurements in Agile, and that is the Velocity of the Scrum team work. Based on Wikipedia definition Velocity is " ...the rate of change of its position with respect to a frame of reference and is a function of time...", which when transferred to the scrum world can be summarized as: The amount of work that the scrum team completed in a single measure of time - in a sprint. How we Calculate Velocity? Velocity is actually a very simple to calculate, it is done but totaling the number of story points of fully completed user stories from the sprint backlog. So if a current sprint included 4 user stories: 2 with 8 story points each, one with 3 story points and one with 32 story points. and by the end of the sprint the 32 one was not fully done the velocity calculation will be: 8+8+3=19 Note: the 32 story points are not part of the velocity calculation as this user story was not completed. What Velocity is used for? The v