pentaho,data-warehouse , Loading fact table with SCD type 2 dimension

Loading fact table with SCD type 2 dimension


Tag: pentaho,data-warehouse

I have got a dimension tables with 1 million records which is SCD type 2.I am using pentaho Dimension lookup step for populating this dimension table. I am getting a version number,start date and end date. Now I want to populate the fact table based on the scd type2. What is the best approach for this?


Use the 'Dimension lookup/update' step for looking up the surrogate id ('Technical key field'), based on the natural key(s) ('Keys') and timestamp ('Stream Datefield'). Uncheck 'Update the dimension' if you only do lookup.


Natural Key and Fact tables

I'm new on dimensional modelling I believe that you guys can help me in the following doubts. In the production system I have a transaction table, sales table for example.The unique identifier is a primary key called SaleId. Example: My doubt is when modelling the fact table should the SaleID...

Compare Data between 2 DW Tables

I'm a little confused here. I'm testing some data quality issues in a DW, I need to know if the LOAN_SID in one table matches the other table. I was using this query but I'm not sure if I'm correct, if it matches there is an issue if it doesn't...

Errors in the OLAP storage engine: The attribute key cannot be found when processing

I know this is mainly a design problem. I 've read that there is a workaround for this issue by customising errors at processing time but I am not glad to have to ignore errors, also the cube process is scheduled so ignore errors is not a choice at least...

How to change second y-axis text font size in pentaho ccc charts

How to change second y-axis text font-size in pentaho chart I putted some text (i.e Monthly Cost ($000) in orthoAxisTitle it shows fine. How to put some text in second y-axis also...

Execute .jar file in Spoon (Pentaho Kettle)

I need to execute a java jar file from Spoon. The program has only one class, and all I want is to run it with or without parameters. The class is named "Limpieza", and is inside a package named: com.overflow.csv.clean I have deploy the jar to: C:\Program Files (x86)\Kettle\data-integration\lib And...

Generate a random value in the set {0,1}

Actually the Generate random value input allows me to generate an random int, but not in the set I want. How to generate a random value in the set {0,1} with Pentaho Data Integration ?...

Pentaho report designer , checkbox parameter checked by default

we have set a check box parameter in Pentaho report designer . When we launch the report the check box is unchecked . what we want is , to set the default value of that Check box to be checked so when we launch the report we don't have to...

hide a sub report containing a chart pentaho

Hi we are using Pentaho report designer and we want to hide a subreport if there is no data . We have tried to use this formula : not(isemptydata()) in the visible expression but it does not seem to work . So how to hide a subreport if no data...

Kettle - Read CSV with comma as decimal mark

I have a transformation on Pentaho Data Integration (aka Kettle) where the first thing I do is I use the "CSV Input" to map my flat file. I've never had a problem with this step on windows, but now I'm chaning the server where spoon is going to run to...

How to Reload CDA and Mondrian cache in Pentaho CE 4.8?

I'm currently stuck in some performance issue for my Dashboard. I've created a dashboard in Pentaho Community edition 4.8. For my charts, using the SQL and MDX (Mondrian) queries. My Problem is that, When I first time open my dashboards after clearing cda and Mondrian cache. It take 50 secs...

Saiku File not showing in Ivy Dashboard Designer Pentaho BI Server CE

Hi I am using pentaho Bi Server community edition. I created a Saiku Analytics File (say demo.saiku) and saved it in /home/admin folder. After that i created a new Ivy Dashboard, Drag and dropped an Analytics Menu in a dashboard Window. Set the title and layout properties. Now when i...

Checkpoints in Pentaho Spoon

The pentaho documentation ( specifies that, as of version 5.0, you can define "checkpoints" and "checkpoint logs" to let you restart ETL jobs from the most recently failed point so you don't have to go back and re-run a bunch of steps that already completed successfully. I'm running Pentaho Data...

Comparing filenames in PDI

I am trying to import a certain .CSV file into my database using PDI (Kettle). Normally this would be rather easy, as you could just link up a CSV file input step with a Table output step and be good to go. However, the problem is that I don't know...

Loop over file names in sub job (Kettle job)

The task is to get file names from the folder and then loop the same task (job) over all the files one by one. I created a simple job with transformation (get files names) and then job with flag "Execute for each row" (now is just logging the name of...

How to return no matched row in Pentaho Data Inegration (Kettle)?

I look for a solution to perform SSIS lookup in Pentaho Data Integration. I'll try to explain with an exemple : I have two tables A and B. Here , data in table A : 1 2 3 4 5 Here , data in table B: 3 4 5 6...

Pentaho Kettle Error in Archive Files - org.apache.commons.vfs.FileSystemException: File closed

I have a job which is set to archive files in a directory. It looks like it is running into the error org.apache.commons.vfs.FileSystemException: File closed when it attempts to create the zip file. However, the zip file does get created, and the files are added to it. I've sent the...

'Too many connections' created in postgres when creating a dashboard in Pentaho

I was creating a Dashboard in Pentaho PUC which uses a postgres connection as the data source. Most of the time this causes the postgres to say Too many clients already in Postgres' SHOW max_connections; Query shows maximum connections of 200 I used this query select * from pg_stat_activity;. From...

Pentaho convert string to integer with decimal point

I am importing text values into a transformation using a Fixed Width input step. Everything is coming in as a string. I want to convert some of the string values to integers with a decimal point at a specified spot. Here are some examples of the before (left hand side)...

pentaho report designer display field over field2 if 'field2' is not present

Let's say I want to make 2 text fields next to eachother. On the screenshot below you can see 2 fields. The field on the left will always show. But if the field on the right isn't filled in, I want to make it disappear but the text from the...

concurrent statistics gathering on Oracle 11g partiitioned table

I am developing a DWH on Oracle 11g. We have some big tables (250+ million rows), partitioned by value. Each partition is a assigned to a different feeding source, and every partition is independent from others, so they can be loaded and processed concurrently. Data distribution is very uneven, we...

INSERT INTO statement in MySQL

I'm trying to work with YEAR function on one column in the DB and then add the results to a different table in the DWH. What am I doing wrong? INSERT INTO example_dwh1.dim_time (date_year) SELECT YEAR(time_taken) FROM; When removing the INSERT INTO line, I get the results I want,...

Pentaho to convert tree structure data

I have a stream of data from a CSV. It is a flat structured database. E.g.: a,b,c,d a,b,c,e a,b,f This essentially transforms into: Node id,Nodename,parent id,level 100, a , 0 , 1 200, b , 100 , 2 300, c , 200 , 3 400, d , 300 , 4...

Handling change of grain for a snapshot fact table in a star-schema

The question How do you handle a change in grain (from weekly measurement to daily measurement) for a snapshot fact table. Background info For a star-schema design I want to incorporate the results of a survey as a fact (e.g. in week 2 of 2015 80% of the respondents have...

Difference between sql query aggregation and aggegration and querying an OLAP cube

I have a query with respect to the advantages of building a OLAP cube vs aggregating data in database table for querying ,data of say 6 months and then archiving the sql table later for analytics purpose. Which one is better, table or OLAP cube? and why since I can...

Aggregate Transformation vs Sort (remove Duplicate) in SSIS

I'm trying to populate dimension tables on a regular basis and I've thought of two ways of getting distinct values for my dimension: Using an Aggregate transformation and then using the "Group by" operation. Using a Sort transformation while removing duplicates. I'm not sure which one is better (more efficient),...

Star Schema Design for User Utilization Reports

Scenario: There are 3 kinds of utilization metrics that i have derive for the users. In my application, users activity are tracked using his login history, number of customer calls made by the user, number of status changes performed by user. All these information are maintained in 3 different tables...

Generating a dynamic date based on a row number using pentaho pdi

I want to generate a date dynamically based on row numbers using pentaho pdi. for example: row 1 =====>Date=2015-06-08 **01**:56:30 row 2 =====>Date=2015-06-08 **02**:56:30 row 3 =====>Date=2015-06-08 **03**:56:30 row 4 =====>Date=2015-06-08 **04**:56:30 All my data come from an excel spreadsheet with row number and date fields and I want the...

Pentaho Kettle: how to pass variable from transformation to another transformation inside job

I have two transformations in the job. In the first trasnformation - I get details about the file. Now I would like to pass this information to the second transformation, I have set variable in the settings parameters of the trasnformation #2 and use Get Variables inside - but the...

Pentaho Dimension lookup/update

I have seen Dimension Lookup/Update documentation here and a few other blogs. But I cannot seem to get a clear idea. I have a table with the following structure: Key Name Code Status IN Out Active The key name code status active comes from a csv file . I need...

Pentaho CDE nested sql query

We have set a nested SQL query on pentaho CDE . Query : select dataissue.value,count(value) as nbreticket,substring(issue.entry,1,3) from DataIssue,issue where field = 'version(s)_corrigée(s)' and dataissue.issue = and issue in ( select issue from dataissue,issue where dataissue.issue = and value = 'récit' and substring(issue.entry,1,3) = 'ema' ) and issue...

Omniture Data Warehouse Segments Issue

Currently, I'm trying to create a segment filter called "Only Search Page" which filters out one particular server from a list of several thousand. Currently, I'm a little stuck and it might be easier to explain with screenshots. In the Segment Manager I set up a segment to check for...

Pentaho - CSV Input not understanding special character [Windows to Linux]

I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file. I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server...

How to create an embedded document from a table using pentaho

I have to table student and record, the relationship is a student have many records (one to many). How I can represent a transformation on pentaho so that I can insert every line in the record table as an embedded document in the student document. All this is for migrate...

How to set field in previous step as JASON Output File Name in Pentaho?

I want to use a Concatenated field as the Json output filename in my Pentaho Data Integration transformation, but as long as I don't see any "Accept field as filename" option, I don't know how to make this happen. Could someone help me to sort it out? Thanks in advance!...

Split the text file on date based in pentaho

I have one text file and I want to split this file into multiple output files on the basis of dates. Dates Keywords 201506-17 iphone 5 201506-16 iphone 4 201506-15 iphone 3 201506-14 iphone 2 201506-13 iphone 1 ...

Pentaho User Console(PUC) parameter error

When I publish the prpt file in Pentaho User Console(PUC), I am getting error in parameter field when I select parameter values. after select the value from drop down the value is disappear. (I have attached image) .I have selected those values in prpt(Check the image) but its working fine...

Should the “count” measure be stored in the fact table?

I have a fact table that includes "wait times in hours" for certain services. I have a lot of dimensions that could describe the wait-times based on different slices; however, I am also interested in knowing how many people (counts) came for services through the filters of the same dimensions....

Import .prpt file in Pentaho Server using Command Line

I want to upload .prpt (Pentaho Report File) in Pentaho BI Server. I am using the following command: ./ --import --url=https://server/pentaho/ --username=user --password=pass --source=file-system --type=files --charset=UTF-8 --path=/public--file-path=/home/kishan/folder/Clients/abc/Daily_Reports/Prpt/xyz.prpt --logfile=/home/user/upload.log --permission=true --overwrite=true --retainOwnership=true So, I want to pick up the file located at the file-path value above and upload it to the...

Merge fields in pentaho

I have two columns "ID_TT" in select values 1 and "ID_ARC" in select values 2. ID_TT has below values [blank] 121 [blank] ID_ARC has below values 146 [blank] 171 I need to merge these two . I used calculator but it does not work. How can we solve this. output...

Anchor Modeling - are data types part of the Model?

A question about data types in the Anchor Model database design. The question assume separation of anchor model implementation from the anchor model itself. In the Anchor Model xml we have following kind information related to data types: dataRange="varchar(42)" identity="int" timeRange="datetime" They are stored in Anchor Model entities (anchor/attribute) xml...

Calculated columns in pentaho Cde

I am new to the use of Pentaho and let me know how it works the "Calculated Columns" option into "sql query" object. I need to calculate the average value.

Pentaho: Insert a set of dynamic records into a database

Using Pentaho, I would like to SELECT a number of records from a database and INSERT them into another one. I have no problem with the first part and using Input Table step, I have selected my desired records. But I have no idea about how to develop a step...

Loading fact table with SCD type 2 dimension

I have got a dimension tables with 1 million records which is SCD type 2.I am using pentaho Dimension lookup step for populating this dimension table. I am getting a version number,start date and end date. Now I want to populate the fact table based on the scd type2. What...

Table output name from command line in pentaho kettle

There is a case in my ETL where i am trying to take "table output" name from command line. The table name does not correspond to any streaming field's name. Is there any way to get it done in pentaho kettle?

Select organizations that their income represent around 60% of the total income SQL Server2008

In advance, I apologize for imperfect English In this data warehouse, we have organization which composed of multiple Organizations, I have [FactFinance] table which has information about the income of each organization. I have the following query in data warehouse which select the (Organization Name) from the [organization dimension table]...

SQL 2008 Change tracking and detecting Updated data

I plan to implement this in an SSIS project. Since I don't have enterprise version of SQL server 2008, I have to make use of other methods. Another way is to use triggers, but I am trying to avoid to many triggers. With change tracking I'm having difficulties detecting the...

Simple MYSQL count, group by, not working using Pentaho Report Designer CE

I need to write a query which will pull from two different tables, count the results and return to me in one row, the total results. I've come across a few problems. When I run the query without a count expression, I am returned 645 rows. 645 is the correct...

How to change display value in parameter selection pentaho reports

I need to display "All" when my data in parameter list is -1. Just to display in the parameter selection. Help me with this Thanks, Keerthi KS...

Using the (A*B) function in calculator- Pentaho spoon-

I'm trying since yesterday to use the function (A * B), very simple like operation, but it does not work. Any help! thank you.