pentaho,data-warehouse , Loading fact table with SCD type 2 dimension


Loading fact table with SCD type 2 dimension

Question:

Tag: pentaho,data-warehouse

I have got a dimension tables with 1 million records which is SCD type 2.I am using pentaho Dimension lookup step for populating this dimension table. I am getting a version number,start date and end date. Now I want to populate the fact table based on the scd type2. What is the best approach for this?


Answer:

Use the 'Dimension lookup/update' step for looking up the surrogate id ('Technical key field'), based on the natural key(s) ('Keys') and timestamp ('Stream Datefield'). Uncheck 'Update the dimension' if you only do lookup.


Related:


Split the text file on date based in pentaho


pentaho
I have one text file and I want to split this file into multiple output files on the basis of dates. Dates Keywords 201506-17 iphone 5 201506-16 iphone 4 201506-15 iphone 3 201506-14 iphone 2 201506-13 iphone 1 ...

Omniture Data Warehouse Segments Issue


bigdata,data-warehouse,adobe-analytics
Currently, I'm trying to create a segment filter called "Only Search Page" which filters out one particular server from a list of several thousand. Currently, I'm a little stuck and it might be easier to explain with screenshots. In the Segment Manager I set up a segment to check for...

Pentaho to convert tree structure data


pentaho,etl,kettle
I have a stream of data from a CSV. It is a flat structured database. E.g.: a,b,c,d a,b,c,e a,b,f This essentially transforms into: Node id,Nodename,parent id,level 100, a , 0 , 1 200, b , 100 , 2 300, c , 200 , 3 400, d , 300 , 4...

Pentaho report designer , checkbox parameter checked by default


pentaho
we have set a check box parameter in Pentaho report designer . When we launch the report the check box is unchecked . what we want is , to set the default value of that Check box to be checked so when we launch the report we don't have to...

Anchor Modeling - are data types part of the Model?


database-design,data-warehouse,temporal-database,6nf,anchor-modeling
A question about data types in the Anchor Model database design. The question assume separation of anchor model implementation from the anchor model itself. In the Anchor Model xml we have following kind information related to data types: dataRange="varchar(42)" identity="int" timeRange="datetime" They are stored in Anchor Model entities (anchor/attribute) xml...

Natural Key and Fact tables


data-warehouse,business-intelligence,dimensional-modeling,fact-table,natural-key
I'm new on dimensional modelling I believe that you guys can help me in the following doubts. In the production system I have a transaction table, sales table for example.The unique identifier is a primary key called SaleId. Example: My doubt is when modelling the fact table should the SaleID...

Pentaho CDE nested sql query


pentaho
We have set a nested SQL query on pentaho CDE . Query : select dataissue.value,count(value) as nbreticket,substring(issue.entry,1,3) from DataIssue,issue where field = 'version(s)_corrigée(s)' and dataissue.issue = issue.id and issue in ( select issue from dataissue,issue where dataissue.issue = issue.id and value = 'récit' and substring(issue.entry,1,3) = 'ema' ) and issue...

How to set field in previous step as JASON Output File Name in Pentaho?


pentaho,pdi
I want to use a Concatenated field as the Json output filename in my Pentaho Data Integration transformation, but as long as I don't see any "Accept field as filename" option, I don't know how to make this happen. Could someone help me to sort it out? Thanks in advance!...

Aggregate Transformation vs Sort (remove Duplicate) in SSIS


ssis,data-warehouse,business-intelligence,bids
I'm trying to populate dimension tables on a regular basis and I've thought of two ways of getting distinct values for my dimension: Using an Aggregate transformation and then using the "Group by" operation. Using a Sort transformation while removing duplicates. I'm not sure which one is better (more efficient),...

Pentaho Kettle: how to pass variable from transformation to another transformation inside job


pentaho,kettle,spoon
I have two transformations in the job. In the first trasnformation - I get details about the file. Now I would like to pass this information to the second transformation, I have set variable in the settings parameters of the trasnformation #2 and use Get Variables inside - but the...

Generate a random value in the set {0,1}


pentaho
Actually the Generate random value input allows me to generate an random int, but not in the set I want. How to generate a random value in the set {0,1} with Pentaho Data Integration ?...

Merge fields in pentaho


pentaho
I have two columns "ID_TT" in select values 1 and "ID_ARC" in select values 2. ID_TT has below values [blank] 121 [blank] ID_ARC has below values 146 [blank] 171 I need to merge these two . I used calculator but it does not work. How can we solve this. output...

Comparing filenames in PDI


pentaho,etl,kettle,data-integration,pdi
I am trying to import a certain .CSV file into my database using PDI (Kettle). Normally this would be rather easy, as you could just link up a CSV file input step with a Table output step and be good to go. However, the problem is that I don't know...

Pentaho Kettle Error in Archive Files - org.apache.commons.vfs.FileSystemException: File closed


java,pentaho,kettle
I have a job which is set to archive files in a directory. It looks like it is running into the error org.apache.commons.vfs.FileSystemException: File closed when it attempts to create the zip file. However, the zip file does get created, and the files are added to it. I've sent the...

Pentaho - CSV Input not understanding special character [Windows to Linux]


linux,pentaho,transformation,business-intelligence,pdi
I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file. I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server...

'Too many connections' created in postgres when creating a dashboard in Pentaho


postgresql,database-connection,pentaho
I was creating a Dashboard in Pentaho PUC which uses a postgres connection as the data source. Most of the time this causes the postgres to say Too many clients already in Postgres' SHOW max_connections; Query shows maximum connections of 200 I used this query select * from pg_stat_activity;. From...

Calculated columns in pentaho Cde


pentaho
I am new to the use of Pentaho and let me know how it works the "Calculated Columns" option into "sql query" object. I need to calculate the average value.

Compare Data between 2 DW Tables


sql,database,oracle,compare,data-warehouse
I'm a little confused here. I'm testing some data quality issues in a DW, I need to know if the LOAN_SID in one table matches the other table. I was using this query but I'm not sure if I'm correct, if it matches there is an issue if it doesn't...

How to return no matched row in Pentaho Data Inegration (Kettle)?


java,pentaho,lookup,etl,kettle
I look for a solution to perform SSIS lookup in Pentaho Data Integration. I'll try to explain with an exemple : I have two tables A and B. Here , data in table A : 1 2 3 4 5 Here , data in table B: 3 4 5 6...

How to create an embedded document from a table using pentaho


mongodb,pentaho
I have to table student and record, the relationship is a student have many records (one to many). How I can represent a transformation on pentaho so that I can insert every line in the record table as an embedded document in the student document. All this is for migrate...

Loading fact table with SCD type 2 dimension


pentaho,data-warehouse
I have got a dimension tables with 1 million records which is SCD type 2.I am using pentaho Dimension lookup step for populating this dimension table. I am getting a version number,start date and end date. Now I want to populate the fact table based on the scd type2. What...

Should the “count” measure be stored in the fact table?


data-warehouse,dimensions,fact-table,datamart
I have a fact table that includes "wait times in hours" for certain services. I have a lot of dimensions that could describe the wait-times based on different slices; however, I am also interested in knowing how many people (counts) came for services through the filters of the same dimensions....

How to change second y-axis text font size in pentaho ccc charts


pentaho,pentaho-cde
How to change second y-axis text font-size in pentaho chart I putted some text (i.e Monthly Cost ($000) in orthoAxisTitle it shows fine. How to put some text in second y-axis also...

Execute .jar file in Spoon (Pentaho Kettle)


java,javascript,pentaho,kettle
I need to execute a java jar file from Spoon. The program has only one class, and all I want is to run it with or without parameters. The class is named "Limpieza", and is inside a package named: com.overflow.csv.clean I have deploy the jar to: C:\Program Files (x86)\Kettle\data-integration\lib And...

Checkpoints in Pentaho Spoon


pentaho
The pentaho documentation (http://wiki.pentaho.com/display/EAI/Job+checkpoints+and+restartability) specifies that, as of version 5.0, you can define "checkpoints" and "checkpoint logs" to let you restart ETL jobs from the most recently failed point so you don't have to go back and re-run a bunch of steps that already completed successfully. I'm running Pentaho Data...

Pentaho convert string to integer with decimal point


string,format,pentaho,mask,kettle
I am importing text values into a transformation using a Fixed Width input step. Everything is coming in as a string. I want to convert some of the string values to integers with a decimal point at a specified spot. Here are some examples of the before (left hand side)...

Errors in the OLAP storage engine: The attribute key cannot be found when processing


ssas,foreign-key-relationship,data-warehouse,olap-cube,dimensional-modeling
I know this is mainly a design problem. I 've read that there is a workaround for this issue by customising errors at processing time but I am not glad to have to ignore errors, also the cube process is scheduled so ignore errors is not a choice at least...

SQL 2008 Change tracking and detecting Updated data


sql,versioning,data-warehouse
I plan to implement this in an SSIS project. Since I don't have enterprise version of SQL server 2008, I have to make use of other methods. Another way is to use triggers, but I am trying to avoid to many triggers. With change tracking I'm having difficulties detecting the...

Saiku File not showing in Ivy Dashboard Designer Pentaho BI Server CE


pentaho,saiku,dashboard-designer
Hi I am using pentaho Bi Server community edition. I created a Saiku Analytics File (say demo.saiku) and saved it in /home/admin folder. After that i created a new Ivy Dashboard, Drag and dropped an Analytics Menu in a dashboard Window. Set the title and layout properties. Now when i...

Generating a dynamic date based on a row number using pentaho pdi


excel,pentaho,kettle
I want to generate a date dynamically based on row numbers using pentaho pdi. for example: row 1 =====>Date=2015-06-08 **01**:56:30 row 2 =====>Date=2015-06-08 **02**:56:30 row 3 =====>Date=2015-06-08 **03**:56:30 row 4 =====>Date=2015-06-08 **04**:56:30 All my data come from an excel spreadsheet with row number and date fields and I want the...

Simple MYSQL count, group by, not working using Pentaho Report Designer CE


mysql,sql,count,pentaho
I need to write a query which will pull from two different tables, count the results and return to me in one row, the total results. I've come across a few problems. When I run the query without a count expression, I am returned 645 rows. 645 is the correct...

Difference between sql query aggregation and aggegration and querying an OLAP cube


analytics,data-warehouse,olap,olap-cube,star-schema
I have a query with respect to the advantages of building a OLAP cube vs aggregating data in database table for querying ,data of say 6 months and then archiving the sql table later for analytics purpose. Which one is better, table or OLAP cube? and why since I can...

hide a sub report containing a chart pentaho


pentaho
Hi we are using Pentaho report designer and we want to hide a subreport if there is no data . We have tried to use this formula : not(isemptydata()) in the visible expression but it does not seem to work . So how to hide a subreport if no data...

Handling change of grain for a snapshot fact table in a star-schema


data-warehouse,star-schema
The question How do you handle a change in grain (from weekly measurement to daily measurement) for a snapshot fact table. Background info For a star-schema design I want to incorporate the results of a survey as a fact (e.g. in week 2 of 2015 80% of the respondents have...

Using the (A*B) function in calculator- Pentaho spoon-


pentaho,data-integration,spoon
I'm trying since yesterday to use the function (A * B), very simple like operation, but it does not work. Any help! thank you. https://drive.google.com/folderview?id=0B3XPAOxNJYxMfno0V0I3N21wblBhR1lyekhpNWlzb21XN2pHckJYRkdpSDNMX1NGT1hzQVk&usp=sharing...

concurrent statistics gathering on Oracle 11g partiitioned table


oracle,oracle11g,etl,data-warehouse,table-statistics
I am developing a DWH on Oracle 11g. We have some big tables (250+ million rows), partitioned by value. Each partition is a assigned to a different feeding source, and every partition is independent from others, so they can be loaded and processed concurrently. Data distribution is very uneven, we...

How to change display value in parameter selection pentaho reports


pentaho
I need to display "All" when my data in parameter list is -1. Just to display in the parameter selection. Help me with this Thanks, Keerthi KS...

How to Reload CDA and Mondrian cache in Pentaho CE 4.8?


caching,mdx,pentaho,mondrian,cda
I'm currently stuck in some performance issue for my Dashboard. I've created a dashboard in Pentaho Community edition 4.8. For my charts, using the SQL and MDX (Mondrian) queries. My Problem is that, When I first time open my dashboards after clearing cda and Mondrian cache. It take 50 secs...

Select organizations that their income represent around 60% of the total income SQL Server2008


sql-server,sum,data-warehouse,percentage
In advance, I apologize for imperfect English In this data warehouse, we have organization which composed of multiple Organizations, I have [FactFinance] table which has information about the income of each organization. I have the following query in data warehouse which select the (Organization Name) from the [organization dimension table]...

Pentaho Dimension lookup/update


csv,pentaho,etl,kettle
I have seen Dimension Lookup/Update documentation here and a few other blogs. But I cannot seem to get a clear idea. I have a table with the following structure: Key Name Code Status IN Out Active The key name code status active comes from a csv file . I need...

Star Schema Design for User Utilization Reports


data-warehouse,star-schema,microstrategy,fact-table,snowflake-schema
Scenario: There are 3 kinds of utilization metrics that i have derive for the users. In my application, users activity are tracked using his login history, number of customer calls made by the user, number of status changes performed by user. All these information are maintained in 3 different tables...

INSERT INTO statement in MySQL


mysql,sql,database,data-warehouse
I'm trying to work with YEAR function on one column in the DB and then add the results to a different table in the DWH. What am I doing wrong? INSERT INTO example_dwh1.dim_time (date_year) SELECT YEAR(time_taken) FROM exampledb.photos; When removing the INSERT INTO line, I get the results I want,...

Import .prpt file in Pentaho Server using Command Line


pentaho
I want to upload .prpt (Pentaho Report File) in Pentaho BI Server. I am using the following command: ./import-export.sh --import --url=https://server/pentaho/ --username=user --password=pass --source=file-system --type=files --charset=UTF-8 --path=/public--file-path=/home/kishan/folder/Clients/abc/Daily_Reports/Prpt/xyz.prpt --logfile=/home/user/upload.log --permission=true --overwrite=true --retainOwnership=true So, I want to pick up the file located at the file-path value above and upload it to the...

Pentaho: Insert a set of dynamic records into a database


insert,pentaho,kettle,pdi
Using Pentaho, I would like to SELECT a number of records from a database and INSERT them into another one. I have no problem with the first part and using Input Table step, I have selected my desired records. But I have no idea about how to develop a step...

Table output name from command line in pentaho kettle


pentaho,kettle
There is a case in my ETL where i am trying to take "table output" name from command line. The table name does not correspond to any streaming field's name. Is there any way to get it done in pentaho kettle?

Pentaho User Console(PUC) parameter error


pentaho
When I publish the prpt file in Pentaho User Console(PUC), I am getting error in parameter field when I select parameter values. after select the value from drop down the value is disappear. (I have attached image) .I have selected those values in prpt(Check the image) but its working fine...

Kettle - Read CSV with comma as decimal mark


linux,pentaho,business-intelligence,kettle,pdi
I have a transformation on Pentaho Data Integration (aka Kettle) where the first thing I do is I use the "CSV Input" to map my flat file. I've never had a problem with this step on windows, but now I'm chaning the server where spoon is going to run to...

pentaho report designer display field over field2 if 'field2' is not present


pentaho,report-designer
Let's say I want to make 2 text fields next to eachother. On the screenshot below you can see 2 fields. The field on the left will always show. But if the field on the right isn't filled in, I want to make it disappear but the text from the...

Loop over file names in sub job (Kettle job)


pentaho,kettle,spoon
The task is to get file names from the folder and then loop the same task (job) over all the files one by one. I created a simple job with transformation (get files names) and then job with flag "Execute for each row" (now is just logging the name of...