Hi all, after several posts about the infrastructure and the initial setups performed so far on ODI environment along with types of installations that we as consultants often perform, I will start a series of posts focused on the development tasks, tips and best practices that are common on a typical day for a DW admin/developer, these activities are the ones that really add value and support the business to achieve the best results possible using the BI Oracle tools.
In this post, I will demonstrate how to load multiple flat files dynamically using ODI. I will break this tutorial into two parts since it can be quite lengthy for some since my approach will really make sure you can follow it and make things work,
There won’t be any missing step or gotchas.
Loading flat files is a common task when you work with ODI, the files can be generated or come from a variety of data sources, either internal system can generate it or you can receive it via an FTP process that may dump a files to a directory either in a file server or any other shared server, flat files are a common way for companies to exchange data.
Often times when a requirement for loading a file comes to me, what I often see is that the requestor usually don’t bother to provide details that are really important for the ODI development, some or all of those details are usually taken for granted by them with a high expectation that we will handle, of course we can handle that but we need to understand the requirements details in order to implement a good, simple and elegant solution.
I always ask the following questions:
Data Volume is always important to know, any ETL, DW developer who appreciates delivering quality work should always ask this question to be able to take actions.
At this stage, I only need a sample of the files to define the structure of tables and ODI metadata.
1-: In which, location/directory the files will arrive?
2-How many files will arrive in the source directory in a given time?
- If multiple files will arrive at the same time, is the layout and type of file the same? This question is important since it may be necessary to create extra steps and more logic when developing the ODI package. This part of the topic provides good background information which informs us if the filename is dynamic or the same.
- How often these files will arrive? Will that be daily, weekly, monthly?
These questions help you to define the schedule you will create under the ODI agent. Those 4 main definitions should give you enough guidance in terms of developing the ODI objects to accomplish your goal.
Below I demonstrate how to implement the file integration step by step with some screenshots and examples, this demo was performed on a windows server but the steps are also applicable to a Linux box as well, if the first 2 parts are basic steps that you already have already one, you can skip them and go directly to part 3 which is the loop part, otherwise follow the below steps:
- Basically, we need to create a connection if it doesn’t already exist
(Physical and logical topology) to the target and to the source metadata (data stores) and one table in your database to receive the incoming data.
Let’s get started! We will point it to the directory where the files will be loaded, as a sample I used two files which you can download and use it as well:
Create a folder like this and copy the path:
We will create a physical schema. To do that, in topology module, go to “technologies”, expand the “File technology” and right click on the data server that is already there, if it is not created right click on File technology and create a new data server:
You should be able to see the below screen;
In the fields “directory (schema and work schema)” place the directory path like this:
Save it (observe that the path provided is where the files will arrive )
Double click on the “Data Server” again and test the connection:
Next step is to create a “Logical Schema”, the logical schema works as a bridge, it is how you link your physical schema (physical database/connection or directory) to the logical data store (metadata) via context.
To create the LS go to topology and expand the logical architecture,
Right click on “File” technology and click “New Logical Schema”:
Assign a name to the LS and in the definition pane set the “physical schema” that has been created in previous steps, note that you also need to link a context, so if you haven’t created one it is time to create:
In this case the context is “Global”
Save it, that is it, in the next step we will create the model and data stores
I will explain the importance of the logical schema:
Go to the Designer, and create a new project or you can use an existing one, in this case
I create a new one:
Under “Projects ” go to “ Models ” and create a new model folder if you don’t have one yet:
Right click in the “Model” folder and create a new “Model”:
You will need to define some information in the model pane such as “model name”, “technology type” and ”Logical schema” just like the screenshot below, go ahead and save it:
Right click in the newly created model and create a “New data store”:
In the “definition tab” Define the data store’s name:
Also in the resource name go to the right corner and click on the menu to point to your file:
In the figure below you can see the files we will use. Select one of the files click ok and save it and go to the “files” tab:
Now we need to define how ODI will read the file, in other words, we tell ODI the type of file we want to retrieve e.g. file format (delimited or fixed), heading (here you tell ODI if your file has a header or not), which type of operating system generated the file, this is an important step.
In this case, it is a delimited file, it has a row header, the field separator is a comma and is coming from a windows OS:
Now go to “Columns” tab and click “reverse engineer”, by doing that you will reverse the columns from the file, basically the file structure:
Leave the data types as it is:
Now you can test the data store can read the file, go to the data store and right click on it,
Click on “view data” or “data”:
You should be able to see the following result:
Ok, that was the first part of the configuration where we set up topology
(logical and physical architecture ) and make sure the metadata is set up correctly.
This is an important step, I have seen a few people really struggling to get that part correct, mapping topology and context wrongly is a problem that will bite you going forward and depending on the size of your ODI environment it may be a nightmare to detect which part of it is not setup correctly, especially if you are in the middle of an urgent development, this may take your attention away from your actual delivery which is not ideal.