Top 10 tips for data integration with pentaho kettle : tips #4-#6

Tip #4: there is no right or wrong

Pentaho kettle is a wonderful tool with a lot of functionality and options.
In fact you can do almost anything in more than one way.
So you are probably have a dilemma on how to solve a specific task, and you go to the Internet and try to find the right perfect way to solve it.
You need to look at pentaho as a toolbox. You can use your imagination to solve it.
It is just like asking what is the right material to build a house?pentaho data integration

  • Wood
  • Aluminum
  • Steel
  • Concrete
  • Mongolian tent(YURT)
  • Igloo
  • Green construction
  • Others?

you can use each one of them, you need to decide.
It is just a matter of complexity (how much you need to work in order to make it functional) optimization and maintenance.
In my course I demonstrate how you can use several steps to make the same thing.
For example I did a comparison between table output, update/insert, just update, bulk load.

They all load the data into the database, the question of when do I use which?
You can buy my course, there are a lot more tips and case studies.
By the way I, myself, solve most of my complex jobs while walking. In order to do that you need to know most of pentaho kettle steps by heart.
I’m sure you’ll get there, you just need persistence and customers with high demands (and money of course)

Tip #5: use variables.

Don’t you hate it that the customer calls you two years after the project finished and blame you that the solution you developed doesn’t work anymore?
We all know he changed something. Transformation and jobs doesn’t get broken by themselves.

It can be one of two options:pentaho data integration wheel
a. A manager tried to save some bucks and did it on his own, while damaging the project in the process.
b. The customer changed the environment, meaning: servers, network, computer names, folders and more.
Unfortunately, there is no cure for dumb customers but for changing the environment there is. Use variables.

I know it’s easier and faster to just hard copy everything inside the transformation and jobs but it is not sustainable.
As soon as the customer will change something instead of changing Parameter on the or in the tables, you will need to find all the places you hardcoded IP, names, folder, file names and such.
Think even on the simple scenario that you want a QA environment, the parameters are different between the production and the QA.

You can skip a lot of steps on the way like logging, e-mail, error handling but don’t skip variables. The others will just make you work a little harder. Lack of variables would make you suffer.

Tip #6: Note on every transformation

Ever tried to read code that somebody else wrote and your job was to alter it?
Well that’s a pain right?
So don’t do to others what you don’t want others to do to you.
Hell, don’t do it for yourself.

You never know when would be the next time you need to examine the transformation and change something in it. Always note.
Imagine yourself reading C sharp code of 3000 rows without remarks, the developer thoughts he is developing it for himself and no one else, ever, will read it.

Even he would find out that three years later, he doesn’t remember anything about that code. So think about transformation as code, little widgets that are part of a machine that do great things.

If you want to get the all e-book free access, please subscribe to our newsletter
(scroll up its on the right side – enter your email)

or you can goto the next post to read

click here for : Top 10 tips for data integration with pentaho kettle whats next

Trackback from your site.

Leave a comment