Hi all, after several posts about the infrastructure, and the initial setups performed so far in ODI environment, along with different types of installations that we as consultants often perform, I will start a series of posts focused on the development tasks, tips and best practices that are common on a typical day for a DW admin/developer, these activities are the ones that really add value and support the business to achieve the best results possible using the BI Oracle tools.
In this post, I will demonstrate how to load multiple flat files dynamically using ODI. I will break this tutorial into two parts since it can be quite lengthy for some since my approach will really make sure you can follow it and make things work,
There won’t be any missing step or gotchas.
Loading flat files is a common task when you work with ODI, the files can be generated or come from a variety of data sources, either internal system can generate it or you can receive it via an FTP process that may dump a files to a directory either in a file server or any other shared server, flat files are a common way for companies to exchange data.
Often times when a requirement for loading a file comes to me, what I often see is that the requestor usually don’t bother to provide details that are really important for the ODI development, some or all of those details are usually taken for granted by these requests, with a high expectation that we will handle, of course we can handle that but we need to understand their requirement, more functional details about the datasets and the files arriving so that we can design and implement the file ingestion, integration, orchestration etc…
I always ask the following questions:
What is the data volume, how big, wide are the files?
Data Volume is always important to know, any ETL, DW developer who appreciates delivering quality work should always ask this question to be able to take actions.
At this stage, I only need a sample of the files to define the structure of tables and ODI metadata.
1-: In which location/directory the files will arrive? internal, or external location, ftp, sftp?
2-How many files will arrive in the source directory in a given time?
- Will multiple files arrive at the same time, is the layout and type of file the same? This question is important since it may be necessary to create extra steps and more logic when developing the ODI package. This part of the topic provides good background information which informs us if the filename is dynamic or is the same all the time.
- How often these files will arrive? Will that be daily, weekly, monthly?
These questions help you to define the frequency the process will run, defines the schedule to created. Those 4 main definitions should give you enough guidance in terms of developing the ODI objects to accomplish your goal.
Below I demonstrate how to implement the file integration step by step with some screenshots and examples, this demo was performed on a windows server but the steps are also applicable to a Linux box as well, if the first 2 parts are basic steps that you have already performed you can skip them and go directly to part 3 which is the loop part, otherwise follow the below steps:
- Basically, we need to create a connection if it doesn’t already exist
Define Physical topology-> which are basically physical connections to the target and source DBs servers .
Define logical topology-> it is basically the source and target metadata (data stores).
Create a table as per your file layout, this is the staging table.
Let’s get started!
1- create the files or have them prepared in a directory like the screenshot below
2- login to the ODI repository and go to topology module.
3-Under physical architecture, create a new data server.
4-Under the data server, right click and create a new physical schema.
Point it to the directory where the files will be loaded, as a sample I used two files which you can download and use it as well:
Create a folder like this and copy the path:
We will create a physical schema. To do that, in topology module, go to “technologies”, expand the “File technology” and right click on the data server that is already there, if it is not created right click on File technology and create a new data server:
You should be able to see the below screen;
In the fields “directory (schema and work schema)” place the directory path like this:
Save it (observe that the path provided is where the files will arrive )
Double click on the “Data Server” again and test the connection:
Next step is to create a “Logical Schema”, the logical schema works as a bridge, it is how you link your physical schema (physical database/connection or directory) to the logical data store (metadata) via context.
To create the LS go to topology and expand the logical architecture,
Right click on “File” technology and click “New Logical Schema”:
Assign a name to the LS and in the definition pane set the “physical schema” that has been created in previous steps, note that you also need to link a context, so if you haven’t created one it is time to create:
In this case the context is “Global”
Save it, that is it, in the next step we will create the model and data stores
I will explain the importance of the logical schema:
Go to the Designer, and create a new project or you can use an existing one, in this case
I create a new one:
Under “Projects ” go to “ Models ” and create a new model folder if you don’t have one yet:
Right click in the “Model” folder and create a new “Model”:
You will need to define some information in the model pane such as “model name”, “technology type” and ”Logical schema” just like the screenshot below, go ahead and save it:
Right click in the newly created model and create a “New data store”:
In the “definition tab” Define the data store’s name:
Also in the resource name go to the right corner and click on the menu to point to your file:
In the figure below you can see the files we will use. Select one of the files click ok and save it and go to the “files” tab:
Now we need to define how ODI will read the file, in other words, we tell ODI the type of file we want to retrieve e.g. file format (delimited or fixed), heading (here you tell ODI if your file has a header or not), which type of operating system generated the file, this is an important step.
In this case, it is a delimited file, it has a row header, the field separator is comma and is coming from a windows OS:
Now go to “Columns” tab and click “reverse engineer”, by doing that you will reverse the columns from the file, basically the file structure:
Leave the data types as it is:
Now you can test the data store can read the file, go to the data store and right click on it,
Click on “view data” or “data”:
You should be able to see the following result:
Ok, that was the first part of the configuration where we set up topology
(logical and physical architecture ) and make sure the metadata is set up correctly.
This is an important step, the second part of it, is to make ODI to read multiple files dynamically, to do that we need an ODI package and a few variables, this will be demonstrated in another post .