r,excel,statistics,dataset,google-adwords , Disaggregate one row of data to multiple rows

Disaggregate one row of data to multiple rows


Tag: r,excel,statistics,dataset,google-adwords


I am having some trouble with my dataset. I am using a Google AdWords export for data analysis and I want to fit a logit regression model to the data to determine whether an experiment I have conducted impacts the conversion.

The problem is that the data is aggregated and to be able to perform logit regression, the dependent variable needs to be binary. So Instead of having a data point with (e.g.) 10 impressions, 5 Clicks and 2 conversions, I want 10 datapoints of which 5 are clicked on of which 2 have converted.

So I want to go from a data frame that looks like this (very simplified)

| Keyword      | Impressions | Clicks     | Conversions |
| SampleName   |      10     |      5     |     2       |

to this:

| Keyword      | Clicked     | Converted   |
| SampleName   |      1      |      1      |
| SampleName   |      1      |      1      |
| SampleName   |      1      |      0      |
| SampleName   |      1      |      0      |
| SampleName   |      1      |      0      |
| SampleName   |      0      |      0      |
| SampleName   |      0      |      0      |
| SampleName   |      0      |      0      |
| SampleName   |      0      |      0      |
| SampleName   |      0      |      0      |

How would I be able to do this for a very large dataset? I have looked everywhere, but I can't seem to find the solution. I would prefer to use R to do this, but I also have Excel and Stata installed.

Thanks in advance!

Edit Here is some code (extended with an extra row and column) for the data frame. I am quite new to R and this platform. This probably isn't the most clean way to code this, but here it goes:

Key <- c("Sample1", "Sample2")
Imp <- c(10, 6)
Cli <- c(5, 3)
Con <- c(2, 1)
CPC <- c(0.26, 0.15)
df1 <- data.frame(Key, Imp, Cli, Con, CPC)
colnames(df1) <- c("Keyword", "Impressions", "Clicks", "Conversions", "CostPerClick")

Also, I am now running into the problem that things like average costs per click need to be repeated for clicks, since for each click a price is paid. So in the end, I need a dataframe that looks like this:

| Keyword   | Clicked     | Converted   |     CPC     |
| Sample1   |      1      |      1      |     0.26    |
| Sample1   |      1      |      1      |     0.26    |
| Sample1   |      1      |      0      |     0.26    |
| Sample1   |      1      |      0      |     0.26    |
| Sample1   |      1      |      0      |     0.26    |
| Sample1   |      0      |      0      |     0.00    |
| Sample1   |      0      |      0      |     0.00    |
| Sample1   |      0      |      0      |     0.00    |
| Sample1   |      0      |      0      |     0.00    |
| Sample1   |      0      |      0      |     0.00    |
| Sample2   |      1      |      1      |     0.15    |
| Sample2   |      1      |      0      |     0.15    |
| Sample2   |      1      |      0      |     0.15    |
| Sample2   |      0      |      0      |     0.00    |
| Sample2   |      0      |      0      |     0.00    |
| Sample2   |      0      |      0      |     0.00    |

Edit 2 (SOLVED)

akrun's solution seems to be right one when tested on the sample dataset, but if I try to test in on my actual dataset, it is giving the following error:

> result <- setDT(df1)[, list(Clicked=rep(c(1,0), c(Clicks, Impressions-Clicks)), 
+  Converted=rep(c(1,0), c(Conversions, Impressions-Conversions)), 
+  CPC=rep(c(CostPerClick, 0), c(Clicks,Impressions-Clicks))), Keyword]
Error in rep(c(1, 0), c(Clicks, Impressions - Clicks)) : 
  invalid 'times' argument

The keywords don't contain any duplicates and the data does not have NA's:

> length(unique(df1$Keyword))
[1] 186145
> nrow(df1)
[1] 186145
> nrow(df1[complete.cases(df1),]) == nrow(df1)
[1] TRUE

a summary of the data:

> summary(df1)
   Keyword           Impressions          Clicks        Conversions       CostPerClick  
 Length:186145      Min.   :   1.00   Min.   : 1.000   Min.   :0.00000   Min.   :0.010  
 Class :character   1st Qu.:   7.00   1st Qu.: 1.000   1st Qu.:0.00000   1st Qu.:0.130  
 Mode  :character   Median :  16.00   Median : 1.000   Median :0.00000   Median :0.210  
                    Mean   :  32.93   Mean   : 2.167   Mean   :0.03368   Mean   :0.246  
                    3rd Qu.:  39.00   3rd Qu.: 2.000   3rd Qu.:0.00000   3rd Qu.:0.320  
                    Max.   :1521.00   Max.   :91.000   Max.   :4.00000   Max.   :3.680 



setDT(df1)[, list(Clicked=rep(c(1,0), c(Clicks, Impressions-Clicks)),
 Converted=rep(c(1,0), c(Conversions, Impressions-Conversions))) , Keyword]
#       Keyword Clicked Converted
# 1: SampleName       1         1
# 2: SampleName       1         1
# 3: SampleName       1         0
# 4: SampleName       1         0
# 5: SampleName       1         0
# 6: SampleName       0         0
# 7: SampleName       0         0
# 8: SampleName       0         0
# 9: SampleName       0         0
#10: SampleName       0         0


Using the updated dataset in the OP's post

setDT(df1)[, list(Clicked=rep(c(1,0), c(Clicks, Impressions-Clicks)), 
 Converted=rep(c(1,0), c(Conversions, Impressions-Conversions)), 
 CPC=rep(c(CostPerClick, 0), c(Clicks,Impressions-Clicks))), Keyword]
#    Keyword Clicked Converted  CPC
# 1: Sample1       1         1 0.26
# 2: Sample1       1         1 0.26
# 3: Sample1       1         0 0.26
# 4: Sample1       1         0 0.26
# 5: Sample1       1         0 0.26
# 6: Sample1       0         0 0.00
# 7: Sample1       0         0 0.00
# 8: Sample1       0         0 0.00
# 9: Sample1       0         0 0.00
#10: Sample1       0         0 0.00
#11: Sample2       1         1 0.15
#12: Sample2       1         0 0.15
#13: Sample2       1         0 0.15
#14: Sample2       0         0 0.00
#15: Sample2       0         0 0.00
#16: Sample2       0         0 0.00


 df1 <- structure(list(Keyword = "SampleName", Impressions = 10L, 
 Clicks = 5L, 
 Conversions = 2L), .Names = c("Keyword", "Impressions", "Clicks", 
 "Conversions"), class = "data.frame", row.names = c(NA, -1L))


How to split a text into two meaningful words in R

I had a text data frame having sentences, and as I wanted the list of separate words in another dataframe I used the "qdap package" function "all_words" Words = all_words(df$problem_note_text, begins.with=NULL , alphabetical = FALSE, apostrophe.remove = TRUE, char.keep = char2space, char2space = "~~") Now have a dataframe which has...

optimization algorithm for circular data

Background: I am interested in localizing a sound source from a suite of audio recorders. Each audio array consists of 6 directional microphones spaced evenly every 60 degrees (0, 60, 120, 180, 240, 300 degrees). I am interested in finding the neighboring pair of microphones with the maximum set of...

Correlate by levels of a variable in R

I would like to correlate two variables and have the output reported separately for levels of a third variable. My data are similar to this example: var1 <- c(7, 8, 9, 10, 11, 12) var2 <- c(18, 17, 16, 15, 14, 13) categories <- c(1, 2, 3, 1, 2, 3)...

How to quickly read a large txt data file (5GB) into R(RStudio) (Centrino 2 P8600, 4Gb RAM)

I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks

Which is faster in Excel, an if formula giving 1 or 0 instead of true/false or --?

I've got a large spreadsheet that I'm trying to optimise as it has over 12,000 lines of data, with in excess of 28 columns. It currently takes a significant amount of time to execute and I'm therefore starting to pare it down. As part of this I've started looking at...

Using a stored integer as a cell reference

Dim x As Integer Dim y As Integer For y = 3 To 3 For x = 600 To 1 Step -1 If Cells(x, y).Value = "CD COUNT" Then Cells(x, y).EntireRow.Select Selection.EntireRow.Hidden = True End if If Cells(x, y).Value = "CD Sector Average" Then Cells(x, y).EntireRow.Select Selection.Insert Shift:=xlDown Cells(x +...

how to get values from selectInput with shiny

I am playing around with the shiny packages for some hours now, and wanted to make a select input widget that enables me to download a certain data set from the server. So i figured out a way to get me this data frame containing all my IDs for downloading:...

Excel VBA Loop Delete row does not start with something

I have some data at work looks like this: 00 some data here... 00 some data here... 00 some data here... 00 some data here... Other data I want to remove Other data I want to remove Other data I want to remove Other data I want to remove 00I...

Select / subset spatial data in R

I am working on a large data set with spatial data (lat/long). My data set contains some positions that I don´t want in my analysis (it makes the files to heavy to process in ArcMap- many Go of data). This is why I want to subset the relevant data for...

How to build a 'for' loop with input$i in R Shiny

In my shiny app, I build a a number of checkboxes using a for loop, like this: landelist <- c("Danmark", "Tjekkiet", "Østrig", "Belgien", "Tyskland", "Sverige", "USA", "Norge", "Island") landecheckbox <- c() for (land in landelist){ landechek <- paste0("<label class=\"checkbox inline\"><input id=\"", land, "\" type=\"checkbox\" checked><span>", land, "</span></label>") landecheckbox <- c(landechek,...

R: recursive function to give groups of consecutive numbers

Given a sorted vector x: x <- c(1,2,4,6,7,10,11,12,15) I am trying to write a small function that will yield a similar sized vector y giving the last consecutive integer in order to group consecutive numbers. In my case it is (defining groups 2, 4, 7, 12 and 15): > y...

excel search engine using vba and filters?

I am using the following vba code to filter my rows in excel based on the value in my cell C5 Sub DateFilter() 'hide dialogs Application.ScreenUpdating = False 'filter for records that have June 11, 2012 in column 3 ActiveSheet.Range("C10:AS30").AutoFilter Field:=1, Criteria1:="*" & ActiveSheet.Range("C5").Value & "*" Application.ScreenUpdating = True End...

Highlighting specific ranges on a Graph in R

library(season) plot(CVD$yrmon, CVD$cvd, type = 'o',pch = 19,ylab = 'Number of CVD deaths per month',xlab = 'Time') if i wanted to highlight a region of the graph based on x values say from 1994-1998 how do i do this? Any thought would be appreciated Thanks....

Sleep Shiny WebApp to let it refresh… Any alternative?

I have a WebApp that have some renderUI({})... and some of them depend on the input of another. This makes that, briefly, a red error in the webpage appear when I select some options. Because the if() clause of some renderUI({}) depend on the input of a selectizer. The error...

ggplot2 & facet_wrap - eliminate vertical distance between facets

I'm working with some data that I want to display as a nxn grid of plots. Edit: To be more clear, there's 21 categories in my data. I want to facet by category, and have those 21 plots in a 5 x 5 square grid (where the orphan is by...

Keep the second occurrence in a column in R

I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

Serial modification of objects in R

I have a number of matrices of the same size: m1.m <- matrix(c(1,2,3,4), nrow=2, ncol=2) m2.m <- matrix(c(5,6,7,8), nrow=2, ncol=2) ... I want to set uniform column and row names to all of them. Currently I am doing it like this: new_col_names <- c("Col1","Col2") new_row_names <- c("Row1","Row2") change_names <- function(m,...

how to call Java method which returns any List from R Language? [on hold]

How to call java method which returns list from R Language.

copy a list of data.tables

I have the following situation: 1) a list of data tables 2) For testing purposes I deliberately want to (deeply) copy the whole list including the data tables 3) I want to take some element from the copied list and add a new column. Here is the code: library(data.table) x...

Using VLOOKUP formula or other function to compare two columns

I have one table like this: SHORT TERM BORROWING 1/6/2009 94304 12/31/2010 177823 6/30/2011 84188 12/31/2011 232144 6/30/2012 94467 9/30/2012 91445 12/31/2012 128523 3/31/2013 83731 6/30/2013 78330 9/30/2013 70936 12/31/2013 104020 3/31/2014 62345 6/30/2014 62167 9/30/2014 63494 12/31/2014 104239 3/31/2015 69056 I have another column which lists each date from...

How to plot data points at particular location in a map in R

I have a dataset that looks like this: LOCALITY numbers 1 Airoli 72 2 Andheri East 286 3 Andheri west 208 4 Arya Nagar 5 5 Asalfa 7 6 Bandra East 36 7 Bandra West 72 I want to plot bubbles (bigger the number bigger would be the bubble) inside...

VBA - Unable to pass value from Private to Public Sub

I have a tool which I am designing to present a number of questions to a user in a set of userforms. The form will generate a score via passing an integer result from the userform to a main sub, which passes the code to a worksheet. My problem is...

Appending a data frame with for if and else statements or how do put print in dataframe

How do I put what I printed in a dataframe with a for loop and if else statements? Basically, this code: list<-c("10","20","5") for (j in 1:3){ if (list[j] < 8) print("Greater") else print("Less") }) #[1] "Less" #[1] "Less" #[1] "Greater" Or should it be something more like this? f3 <-...

Translating Stata to R: collapse

Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

Aggregating data in R

user_id date datetime page 217568 6/12/2015 49:23.9 Vodafone | How to get in touch with Vodafone 135437 6/10/2015 43:35.7 My Vodafone – Manage your Vodafone Pay Monthly Account Online – Vodafone 196094 6/13/2015 33:39.4 Check the status of Vodafone’s mobile network in real-time 74197 6/6/2015 52:46.1 undefined 153501 6/5/2015 02:55.5...

Histogram-like summary for interval data

How do I get a histogram-like summary of interval data in R? My MWE data has four intervals. interval range Int1 2-7 Int2 10-14 Int3 12-18 Int4 25-28 I want a histogram-like function which counts how the intervals Int1-Int4 span a range split across fixed-size bins. The function output should...

R Program Vector, record Column Percent

This is my vector head(sep) I must find percent of all SEP 11 in each row. For instance, in first row, percent of SEP 11 is 100 * ((63 + 124)/ (63 + 124 + 0 + 0)) And would like this stored in newly created 8th column Thanks dput...

timestamp SQL to Excel

If this is a duplicate, please let me know, I haven't found anything. I have written a php file that can read content from a database table and write it into a excel .xls file. Everything works fine except by that timestamps. In my generated .xls file every timestamp is...

Return Column Names when True in R

I am using R for a project and I have a data frame in in the following format: A B C 1 1 0 0 2 0 1 1 I want to return a data frame that gives the Column Name when the value is 1. i.e. Impair1 Impair2 1...

Find multiple consecutive empty lines

I'm trying to chop up a text file into the articles it contains. Usually this is done by identifying a pattern each article begins with. Unfortunately the database I downloaded the articles from doesn't have that. The only pattern I can find is that after each article there are 3...

VBA “Compile Error: Statement invalid outside Type Block”

I am running a VBA Macro in Excel 2010 with tons of calculations, so data types are very important, to keep macro execution time as low as possible. My optimization idea is to let the user pick what data type all numbers will be declared as (while pointing out the...

Remove quotes to use result as dataset name

I've got a vector with a long list of dataset names. E.g myvector<-c('ds1','ds2,'ds3') I'd like to use the names ds1..ds3 to write a file, taking the file name from the vector. Like this: write.csv(dataset[i],file=paste(myvector[i],'.csv',sep='') with dataset being d1...ds3, but without quotes. How can I remove the quotes and refer to...

R — frequencies within a variable for repeating values

I've got a column A, which has several values, some of them repeating. So, example: A = c(5, 9, 6, 5, 5). I need to go through A and count the frequencies of each of the values in A. So, for this example, for the set of 5s in A,...

Excel - select a cell based on adjacent cell value

I have the following excel spreadsheet and I am trying to work out how I can write a formula in order to provide the values in column D. In each row, there is a test date, I am trying to calculate the day difference from each test date to the...

Rbind in variable row size not giving NA's

The initial data frame mergedDf is PROD_CODE 1 PRD0900033,PRD0900135,PRD0900220,PRD0900709 2 PRD0900097,PRD0900550 3 PRD0900121 4 PRD0900353 5 PRD0900547,PRD0900614 After calling mergedDf<-data.frame(do.call('rbind', strsplit(as.character(mergedDf$PROD_CODE),',',fixed=TRUE))) Output becomes X1 X2 X3 X4 1 PRD0900033 PRD0900135 PRD0900220 PRD0900709 2 PRD0900097 PRD0900550 PRD0900097 PRD0900550 3 PRD0900121 PRD0900121 PRD0900121 PRD0900121 4 PRD0900353 PRD0900353 PRD0900353 PRD0900353 5 PRD0900547 PRD0900614...

Comparing cell contents against string in Excel

Following is my table file:*.css file:*.csS file:*.PDF file:*.PDF file:*.ppt file:*.xls file:*.xls file:*.doc file:*.doc file:*.CFM file:*.dot file:*.cfc file:*.CFM file:*.CFC file:*.cfc file:*.DOC I need a formula to populate the H column with True or False if it finds column G in column F (exact case). I used following but nothing seems to...

Using R to Assign Treatments to Groups

We have seven exposures and 24 groups. We would like to randomly assign five of the seven exposures to groups while also ensuring that we end up with a consistent count for each exposure, meaning that each exposure ends up being exposed about the same number of times. I have...

Fitting a subset model with just one lag, using R package FitAR

I am trying to fit a subset model with only lag 4. In the manual it's written "you must use p=c(0,0,0,4) since p=4 will fit a full AR(4)". I did this. #fit a subset model with just lag 4 Fit=FitAR(p=c(0,0,0,4), lag.max = "default", ARModel = "ARz") However, I get the...

Subtract time in r, forcing unit of results to minutes [duplicate]

This question already has an answer here: Getting consist units from diff command in R 4 answers I successfully subtracted two POSIXct cols of df1 (below). However, since the time differences are >= 1 hour in all rows, R gives the results in hours. I know that this make...

Store every value in a sequence except some values

If I do the following to a string of letters: x <- 'broke' y <- nchar(x) z <- sequence(y) How do I store every value of the z that isn't the first, last, or middle values of the sequence. In this example if z is 1 2 3 4 5...

Skip some lines with fread

I am interested to skip some lines of my data frame before the header names . How can i do it by skiping all the lines before ID_REF or if ID_REF is not present, check for the pattern ILMN_ and deleting all the lines keeping immediate first if not containing...

If cell value starts with a specific set of numbers, replace data

My cell values are strings of numbers (always greater than 5 numbers in a cell, ie 67391853214, etc.) If a cell starts with three specific numbers (ie 673 in a cell value 67391853214) I want the data in the cell to be replaced with a different value (if 673 are...

How (in a vectorized manner) to retrieve single value quantities from dataframe cells containing numeric arrays?

I've got a dataframe that includes columns like the one on the right here: lengthArray speed_max 1 4 24, 18, 24, 18 2 10 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 3 4 -999, -999, -999, -999 4 2 -999, -999 5 2 18, 18 6 1...

ggplot equivalent for matplot

Is there an equivalent in ggplot2 to plot this dataset? I use matplot, and read that qplot could be used, but it really does not work. ggplot/matplot data<-rbind(c(6,16,25), c(1,4,7), c(NA, 1,2), c(NA, NA, 1)) as.data.frame(data) matplot(data, log="y",type='b', pch=1) ...

Excel - Pulling data from one cell within a list

I use PowerPoint as a graphics template to type up football player names and there squad numbers. It can be a long procedure and so far following YouTube tutorials i have managed to create a form in Excel which can update the text boxes in PowerPoint at the click of...

R stops displaying maps

Few days ago I was familiarizing myself with displaying maps, plotting points on the map from http://rpubs.com/nickbearman/r-google-map-making Today, I have intermittent success in displaying maps. library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') Outputs Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Anaheim&zoom=10&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false And when I go to the URL...

Count number of rows meeting criteria in another table - R PRogramming

I have two tables, one with property listings and another one with contacts made for a property (i.e. is someone is interested in the property they will "contact" the owner). Sample "listings" table below: listings <- data.frame(id = c("6174", "2175", "9176", "4176", "9177"), city = c("A", "B", "B", "B" ,"A"),...

12 Characters Including leading and following zeros

I am finding this difficult to explain, but ultimately I am wanting a cells value to be 12 characters long including +/- a decimal point and following zeroes. Examples are 1200 would become +1200.000000 -20 would become -20.00000000 99999999 would become +99999999.00 I have tried FIXED, LENGTH, and formatting rules...

Linear multivariate regression in R

I want to model that a factory takes an input of, say, x tonnes of raw material, which is then processed. In the first step waste materials are removed, and a product P1 is created. For the "rest" of the material, it is processed once again and another product P2...

Subsetting rows by passing an argument to a function

I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...