Pentaho data integration tutorial PDI online premium courses


Did you know that more than 60% of a Business Intelligence project is about data warehouse and the ETL (data integration)?

The Introduction course is FREE
click to get it

pentaho etl tutorial

it is not a regular average (heavy) courses with boring technical details about the functionality of the Pentaho kettle PDI (Well, it also does that but not just that)

my Pentaho data integration tutorial includes hands-on and my personal tips on how to handle customer requests, also when to consider getting into tasks that will be hard to accomplish. Of course, that is if the customer understands the task (meaning: the money he is going to spend) and the complexity of it and you agree that this is the right solution. then the course would help you deliver.

you can find more data on pentaho data integration on my blog

Tip #1:
there is always more than one way to solve data integration with Pentaho,
you will learn that in this Pentaho data integration tutorial.
in fact, you don’t need it to be the best way, it just needs to work with synchronization with the customer needs.

Also, I put in a lot of case studies, examples, real-life scenarios and my personal opinion on that matter. (You don’t have to agree). I tried to add some humor and sarcasm about the way customer ask for data integration requirements and how we as developers solve staff.

“What I need is very simple, do you see that man? do you see the moon? Just put the man on the moon”

yeh right… by the way, the budget is 1.99$.

Learn ETL with Pentaho kettle PDI – dev course:

which Pentaho kettle provider you like to be


I separated the Pentaho data integration tutorials into four sections.
I built it in such a way that you can jump to a specific step you want to learn (if you already an expert on Pentaho kettle PDI) and see the example or if you’re a newbie (sorry to call you that – but you are / everybody once was) then you can go step-by-step while gaining knowledge and expertise.


concepts of pentaho data integration tutorial This section deals with questions like:

  • what is data integration
  • when do we use data integration
  • data warehouse structure consideration for business intelligence. (At the end the idea is to extract the data not just to load it)
  • what other data integration tools are there and why we decided to work with Pentaho kettle

in case you’re already a developer you can skip the concept and go to the next section.


Software to install and that I recommend using for better data integration flow.
I go over some 10 installations that you’ll need, including the database example that is in order for the Pentaho data integration tutorial to be practical


hands on real scenarios with Pentaho kettle PDI walkthrough

MySQL database has several examples of real-life scenarios,

I have chosen two of them in order to take an origin (square) database and develop a data integration solution with all the steps needed, then load it to the target(round).

In this section (most of the course) I take you step-by-step from easy and understandable features to more complex scenarios. The beauty of it is that you can go over all the flow from the beginning to end or use it as a dictionary and look at it as an example for specific Pentaho kettle step.

Some of the subjects:

  • Understanding Pentaho kettle environment
  • Connection
  • Jobs
  • Transformations
  • input steps (tables, files)
  • handling text/string
  • sorting / merging / lookup
  • mapping
  • calculations
  • output steps (table output, insert update, update, dimension lookup…)
  • Scripting Steps
  • handling datatypes

and many more… I also added five case studies of using steps outside our scenario
because I thought it would help you understand/improve your abilities.

Master ETL with Pentaho kettle PDI

going to production: this is an advanced course of Pentaho data integration tutorial: There are steps you need to develop in order to make your data integration stable and reliable some of the steps are:

  • Validating data
  • Working with variables
  • Repository
  • slow changing dimension
  • error handling
  • logging
  • scheduling

click here for more information about pentaho kettle at wikipedia

you can find more materials about pentaho data integration