r,ggplot2,dplyr,tidyr , Plotting (ggplot) numeric values from mixed long format column of class character


Plotting (ggplot) numeric values from mixed long format column of class character

Question:

Tag: r,ggplot2,dplyr,tidyr

Following the tidy data standard, I have my data in long format with a key and a value column. The values for some keys are numeric, for others are characters, and so R has the entire column set as character class.

When I use filter() to pipe only the numeric data to ggplot (data with key 'a' is numeric), and then use as.numeric() on the value definition, it does not convert correctly - I see to just get sequence numbers instead of values. What am I doing wrong?

filter(data, measure == "a") %>% 
  ggplot(aes(x = as.numeric(value), 
             x = as.factor(subject_instance), 
             color = as.factor(subject_instance))) + 
  geom_boxplot()

What is the best way to handle mixed classes in long format (i.e. is it other than the way I'm doing it).

OR

How to I get ggplot to convert to numeric correctly?

Workable example (sample of 40 rows from larger set):

    mydata <- structure(list(ResponseID = c("R_40LUIW7O8Lnj7Cd", "R_aXo4IXJ2eRTyThr", 
"R_9sHFiKGtn4ZhNiJ", "R_0BMN3JynUPiB0dn", "R_9mqmDcAKzae6ko5", 
"R_4T7qN9appsgbnxj", "R_5BeXW1ygKISxISV", "R_3JJY4UGvbzzDYTX", 
"R_0AN81Cdgz7ncPDD", "R_aXo4IXJ2eRTyThr", "R_40LUIW7O8Lnj7Cd", 
"R_8BOtUltxr8O6AeN", "R_40LUIW7O8Lnj7Cd", "R_1KUj25KpGbKOaGh", 
"R_5BeXW1ygKISxISV", "R_0AN81Cdgz7ncPDD", "R_aXo4IXJ2eRTyThr", 
"R_aXo4IXJ2eRTyThr", "R_0N8LUMfEP12P4Wh", "R_0wuddsG9KJhHkRn", 
"R_1R3kGCm3vPWi4dL", "R_50W5K8wp8m1yOZ7", "R_0wuddsG9KJhHkRn", 
"R_ctKujSc0Zr5fldz", "R_4SDzTFmolPaB8wt", "R_0Ng4gEnCnkCTuoB", 
"R_0Ng4gEnCnkCTuoB", "R_eb5LkAh0nBVqc9n", "R_0vqNorszrDGN6MB", 
"R_40LUIW7O8Lnj7Cd", "R_6s1Q2hFaqRLMKKF", "R_8BOtUltxr8O6AeN", 
"R_4SDzTFmolPaB8wt", "R_3JJY4UGvbzzDYTX", "R_3JJY4UGvbzzDYTX", 
"R_77mJUnh0OPtvCEl", "R_bxtLgQnlf4iaCWx", "R_6s1Q2hFaqRLMKKF", 
"R_7X8L8LwKo6UdWgR", "R_9mqmDcAKzae6ko5"), ID = c("R_3I0G7xzqlA4lUmm", 
"R_12m5J3hXrv8ObMa", "R_3PmEIrRgCUr0X3L", "R_YQuCAn43cgRMHy9", 
"R_51GdFWDnxQ7zvpv", "R_x9g4FVQzeqAJG8h", "R_QmDHvIxNJUypJip", 
"R_2cuyzZ8C4khOGs8", "R_3fUUNvffCN7GUrn", "R_12m5J3hXrv8ObMa", 
"R_3I0G7xzqlA4lUmm", "R_xgbhYoALaqQ9TDX", "R_3I0G7xzqlA4lUmm", 
"R_28I21bSyxgRcyGo", "R_QmDHvIxNJUypJip", "R_3fUUNvffCN7GUrn", 
"R_12m5J3hXrv8ObMa", "R_12m5J3hXrv8ObMa", "R_9L8RxssmQOGrPAR", 
"R_3iExjba1az5mpLw", "R_2wodtnGyQkaGTbX", "R_dnln2Bzdjahd3ax", 
"R_3iExjba1az5mpLw", "R_29gE0fK7dB6HENJ", "R_2E0mBlZmT618zQp", 
"R_3EVZt1ncuzTbVRr", "R_3EVZt1ncuzTbVRr", "R_2anUpVhXXReyZAX", 
"R_1dz55WFaXZ3Lm3Y", "R_3I0G7xzqlA4lUmm", "R_vUJsBPPRxV9J6CJ", 
"R_xgbhYoALaqQ9TDX", "R_2E0mBlZmT618zQp", "R_2cuyzZ8C4khOGs8", 
"R_2cuyzZ8C4khOGs8", "R_3LYcR4i5YB2k0N0", "R_yL9qi0TMXHfuJK9", 
"R_vUJsBPPRxV9J6CJ", "R_1DqckuFAYHkKjDg", "R_51GdFWDnxQ7zvpv"
), icon = c(".rprt", ".mddm", ".cnsl", "ord.cnsl", "sgn.alrt", 
"ent.advr", "flg.lab2", "ord.lab2", ".mddm", ".mds2", "rmv.prb2", 
"sch.imgn", "edt.not2", "edt.prb4", "ord.lab", "grp.lab", "src.note", 
"sgn.alrt", "sgn.imgn", "sch.lab", "sch.lab", ".note", "viw.imgn", 
"flg.lab2", ".mddm", "ent.prbl", "ent.vtls", "ord.med", ".hstr", 
"rnw.alrt", "ent.vtls", "viw.vtls", "sch.lab2", "edt.note", "rnw.med", 
"ord.prcd", "rmv.prbl", "crt.grph", "edt.prb3", "ent.prb2"), 
    measure = c("firstclick", "lastclick", "subject", "clickcount", 
    "pagesubmit", "firstclick", "subject", "lastclick", "subject", 
    "pagesubmit", "firstclick", "subject", "clickcount", "lastclick", 
    "clickcount", "firstclick", "pagesubmit", "pagesubmit", "pagesubmit", 
    "lastclick", "action", "clickcount", "firstclick", "clickcount", 
    "subject", "clickcount", "firstclick", "lastclick", "subject", 
    "pagesubmit", "action", "lastclick", "lastclick", "pagesubmit", 
    "clickcount", "firstclick", "firstclick", "action", "pagesubmit", 
    "subject"), value = c("2.602", "4.849", "Consult(s)", "6", 
    "180", "1.456", "Lab / Imaging / Diagnostic", "70.335", "Medication(s)", 
    "180", "1.133", "Lab / Imaging / Diagnostic", "4", "3.938", 
    "4", "3.003", "180", "180", "180", "20.519", "Schedule", 
    "4", "4.758", "4", "Medication(s)", "4", "1.706", "8.582", 
    "Patient history", "11.599", "Enter", "9.098", "11.897", 
    "180", "4", "1.728", "2.423", "Search", "180", "Problem(s)"
    ), file = structure(c(60L, 37L, 4L, 41L, 67L, 17L, 25L, 44L, 
    37L, 39L, 57L, 63L, 11L, 15L, 43L, 29L, 66L, 67L, 68L, 64L, 
    64L, 40L, 78L, 25L, 37L, 20L, 22L, 45L, 33L, 58L, 22L, 87L, 
    65L, 10L, 59L, 47L, 56L, 7L, 14L, 21L), .Label = c("alert.png", 
    "allergies.png", "check-order.png", "consult.png", "copy-graph.png", 
    "create-encounter.png", "create-graph.png", "create-note.png", 
    "create-report.png", "edit-note.png", "edit-note2.png", "edit-problem.png", 
    "edit-problem2.png", "edit-problem3.png", "edit-problem4.png", 
    "encounter.png", "enter-adverse.png", "enter-med.png", "enter-medadmin.png", 
    "enter-problem.png", "enter-problem2.png", "enter-vitals.png", 
    "flag-imaging.png", "flag-lab.png", "flag-lab2.png", "flag-order.png", 
    "followup.png", "forward-alert.png", "graph-lab.png", "graph-lab2.png", 
    "graph-vitals.png", "graph.png", "history.png", "imaging.png", 
    "lab1.png", "lab2.png", "medadmin.png", "meds1.png", "meds2.png", 
    "note.png", "order-consult.png", "order-imaging.png", "order-lab.png", 
    "order-lab2.png", "order-med.png", "order-med2.png", "order-procedure.png", 
    "order-procedure2.png", "order.png", "problem1.png", "problem2.png", 
    "procedure1.png", "procedure2.png", "refill-med.png", "refill-med2.png", 
    "remove-problem.png", "remove-problem2.png", "renew-alert.png", 
    "renew-med.png", "report.png", "schedule-consult.png", "schedule-followup.png", 
    "schedule-imaging.png", "schedule-lab.png", "schedule-lab2.png", 
    "search-note.png", "sign-alert.png", "sign-imaging.png", 
    "sign-lab.png", "sign-lab2.png", "sign-note.png", "sign-order.png", 
    "sign-report.png", "sort-alert.png", "sort-vitals.png", "view-adverse.png", 
    "view-history.png", "view-imaging.png", "view-lab.png", "view-lab2.png", 
    "view-med.png", "view-note.png", "view-order.png", "view-problem.png", 
    "view-problem2.png", "view-report.png", "view-vitals.png", 
    "vitals.png"), class = "factor"), icon_action = c("", "", 
    "", "order", "sign", "enter", "flag", "order", "", "", "remove", 
    "schedule", "edit", "edit", "order", "graph", "search", "sign", 
    "sign", "schedule", "schedule", "", "view", "flag", "", "enter", 
    "enter", "order", "", "renew", "enter", "view", "schedule", 
    "edit", "renew", "order", "remove", "create", "edit", "enter"
    ), icon_subject = c("report", "medadmin", "consult", "consult", 
    "alert", "adverse", "lab2", "lab2", "medadmin", "meds2", 
    "problem2", "imaging", "note2", "problem4", "lab", "lab", 
    "note", "alert", "imaging", "lab", "lab", "note", "imaging", 
    "lab2", "medadmin", "problem", "vitals", "med", "history", 
    "alert", "vitals", "vitals", "lab2", "note", "med", "procedure", 
    "problem", "graph", "problem3", "problem2"), instance = structure(c(2L, 
    8L, 7L, 26L, 80L, 49L, 78L, 24L, 11L, 7L, 83L, 24L, 77L, 
    43L, 26L, 67L, 73L, 38L, 31L, 74L, 27L, 12L, 26L, 87L, 15L, 
    31L, 53L, 42L, 2L, 53L, 88L, 57L, 47L, 62L, 54L, 37L, 40L, 
    78L, 32L, 33L), .Label = c("1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", 
    "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
    "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
    "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
    "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", 
    "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", 
    "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", 
    "77", "78", "79", "80", "81", "82", "83", "84", "85", "86", 
    "87", "88"), class = "factor"), action_instance = c(NA, NA, 
    NA, 2L, 7L, 3L, 4L, 1L, NA, NA, 2L, 1L, 5L, 2L, 2L, 2L, 1L, 
    6L, 2L, 4L, 1L, NA, 2L, 4L, NA, 2L, 3L, 4L, NA, 1L, 6L, 9L, 
    1L, 5L, 2L, 2L, 1L, 3L, 2L, 1L), subject_instance = c(1L, 
    1L, 1L, 2L, 4L, 2L, 7L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 1L, 6L, 
    5L, 3L, 2L, 3L, 2L, 1L, 2L, 7L, 1L, 2L, 4L, 2L, 1L, 5L, 5L, 
    3L, 4L, 4L, 3L, 1L, 3L, 3L, 1L, 2L)), .Names = c("ResponseID", 
"ID", "icon", "measure", "value", "file", "icon_action", "icon_subject", 
"instance", "action_instance", "subject_instance"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -40L))

filter(mydata, measure=="pagesubmit") %>% ggplot(aes(y=as.numeric(value), x=as.factor(subject_instance), color=as.factor(subject_instance))) + geom_boxplot()

Also, on an semi-related note, why doesn't this work?:

filter(icon, measure=="pagesubmit") %>% mean(value)

Answer:

As mentioned by @Spacedman, you example is not reproducible/consistent.

Part 1

This works:

filter(mydata, measure == "pagesubmit") %>% 
  ggplot(aes(
    y = as.numeric(value), 
    x = as.factor(subject_instance), 
    color = as.factor(subject_instance))) + 
  geom_boxplot()

Part 2

If you want the mean of all measure equal to pagesubmit, you can do:

filter(mydata, measure == "pagesubmit") %>% 
  summarise(mean = mean(as.numeric(value)))

Your initial attempt does not work because you are trying to take the mean() out of the whole data frame. In fact, when you do ... %>% mean(value) you get the following error message:

# Warning message:
# In mean.default(., value) : argument is not numeric or logical: returning NA

Where . is the left-hand argument (mydata filtered to retain only measure equal to pagesubmit) that %>% pipes forward to mean(). Following the same logic you should instead do:

mydata %>% 
  filter(measure == "pagesubmit") %>%
  .$value %>% ## extract a character vector of values
  as.numeric() %>% ## convert it to numeric
  mean() ## calculate the mean

Related:


Count number of rows meeting criteria in another table - R PRogramming


r
I have two tables, one with property listings and another one with contacts made for a property (i.e. is someone is interested in the property they will "contact" the owner). Sample "listings" table below: listings <- data.frame(id = c("6174", "2175", "9176", "4176", "9177"), city = c("A", "B", "B", "B" ,"A"),...

Return Column Names when True in R


r
I am using R for a project and I have a data frame in in the following format: A B C 1 1 0 0 2 0 1 1 I want to return a data frame that gives the Column Name when the value is 1. i.e. Impair1 Impair2 1...

Am I using sapply incorrectly?


r,sapply
This code is suppose to take in a word, and compute values for letters of the word, based on the position of the letter in the word. So for a word like "broke" it's suppose to compute the values for the letter "r" and "k" strg <- 'broke' #this part...

optimization algorithm for circular data


r,optimization,circular,maximization
Background: I am interested in localizing a sound source from a suite of audio recorders. Each audio array consists of 6 directional microphones spaced evenly every 60 degrees (0, 60, 120, 180, 240, 300 degrees). I am interested in finding the neighboring pair of microphones with the maximum set of...

Select / subset spatial data in R


r,dictionary,spatial
I am working on a large data set with spatial data (lat/long). My data set contains some positions that I don´t want in my analysis (it makes the files to heavy to process in ArcMap- many Go of data). This is why I want to subset the relevant data for...

how to read a string as a complex number?


r
I have a string which has a complex format, how can I use complex() to treat it as a complex number? For example: myStr="0.76+0.41j" now I want to do sth like: myStr_complex=complex(myStr) # my question is how should I do this part? Eventually Im(myStr_complex) should print 0.41 ...

Keep the second occurrence in a column in R


r,conditional,subset,find-occurrences
I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

Using R to Assign Treatments to Groups


r
We have seven exposures and 24 groups. We would like to randomly assign five of the seven exposures to groups while also ensuring that we end up with a consistent count for each exposure, meaning that each exposure ends up being exposed about the same number of times. I have...

Translating Stata to R: collapse


r,data.table,stata,code-translation
Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

Rbind in variable row size not giving NA's


r,rbind
The initial data frame mergedDf is PROD_CODE 1 PRD0900033,PRD0900135,PRD0900220,PRD0900709 2 PRD0900097,PRD0900550 3 PRD0900121 4 PRD0900353 5 PRD0900547,PRD0900614 After calling mergedDf<-data.frame(do.call('rbind', strsplit(as.character(mergedDf$PROD_CODE),',',fixed=TRUE))) Output becomes X1 X2 X3 X4 1 PRD0900033 PRD0900135 PRD0900220 PRD0900709 2 PRD0900097 PRD0900550 PRD0900097 PRD0900550 3 PRD0900121 PRD0900121 PRD0900121 PRD0900121 4 PRD0900353 PRD0900353 PRD0900353 PRD0900353 5 PRD0900547 PRD0900614...

Limit the color variation in R using scale_color_grey


r,colors,ggplot2
Before I start, allow me to explain my graph: I have two Genotypes (WTB and whd) and each have two conditions (0 and 7), so I have four lines. Now, I want to make a plot where each variable and its condition is the same color. Anything with whd will...

Sleep Shiny WebApp to let it refresh… Any alternative?


r,shiny,sleep
I have a WebApp that have some renderUI({})... and some of them depend on the input of another. This makes that, briefly, a red error in the webpage appear when I select some options. Because the if() clause of some renderUI({}) depend on the input of a selectizer. The error...

Remove quotes to use result as dataset name


r,string
I've got a vector with a long list of dataset names. E.g myvector<-c('ds1','ds2,'ds3') I'd like to use the names ds1..ds3 to write a file, taking the file name from the vector. Like this: write.csv(dataset[i],file=paste(myvector[i],'.csv',sep='') with dataset being d1...ds3, but without quotes. How can I remove the quotes and refer to...

how to call Java method which returns any List from R Language? [on hold]


java,r,rjava
How to call java method which returns list from R Language.

Subsetting rows by passing an argument to a function


r,subset
I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...

how to get values from selectInput with shiny


r,shiny
I am playing around with the shiny packages for some hours now, and wanted to make a select input widget that enables me to download a certain data set from the server. So i figured out a way to get me this data frame containing all my IDs for downloading:...

How to plot data points at particular location in a map in R


r,google-maps,ggmap
I have a dataset that looks like this: LOCALITY numbers 1 Airoli 72 2 Andheri East 286 3 Andheri west 208 4 Arya Nagar 5 5 Asalfa 7 6 Bandra East 36 7 Bandra West 72 I want to plot bubbles (bigger the number bigger would be the bubble) inside...

R stops displaying maps


r,google-maps,ggmap
Few days ago I was familiarizing myself with displaying maps, plotting points on the map from http://rpubs.com/nickbearman/r-google-map-making Today, I have intermittent success in displaying maps. library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') Outputs Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Anaheim&zoom=10&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false And when I go to the URL...

Correlate by levels of a variable in R


r,correlation
I would like to correlate two variables and have the output reported separately for levels of a third variable. My data are similar to this example: var1 <- c(7, 8, 9, 10, 11, 12) var2 <- c(18, 17, 16, 15, 14, 13) categories <- c(1, 2, 3, 1, 2, 3)...

Convert strings of data to “Data” objects in R [duplicate]


r,date,csv
This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

R Program Vector, record Column Percent


r,vector,percentage
This is my vector head(sep) I must find percent of all SEP 11 in each row. For instance, in first row, percent of SEP 11 is 100 * ((63 + 124)/ (63 + 124 + 0 + 0)) And would like this stored in newly created 8th column Thanks dput...

Appending a data frame with for if and else statements or how do put print in dataframe


r,loops,data.frame,append
How do I put what I printed in a dataframe with a for loop and if else statements? Basically, this code: list<-c("10","20","5") for (j in 1:3){ if (list[j] < 8) print("Greater") else print("Less") }) #[1] "Less" #[1] "Less" #[1] "Greater" Or should it be something more like this? f3 <-...

ggplot equivalent for matplot


r,ggplot2
Is there an equivalent in ggplot2 to plot this dataset? I use matplot, and read that qplot could be used, but it really does not work. ggplot/matplot data<-rbind(c(6,16,25), c(1,4,7), c(NA, 1,2), c(NA, NA, 1)) as.data.frame(data) matplot(data, log="y",type='b', pch=1) ...

How to quickly read a large txt data file (5GB) into R(RStudio) (Centrino 2 P8600, 4Gb RAM)


r,large-data
I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks

R — frequencies within a variable for repeating values


r,count,duplicates
I've got a column A, which has several values, some of them repeating. So, example: A = c(5, 9, 6, 5, 5). I need to go through A and count the frequencies of each of the values in A. So, for this example, for the set of 5s in A,...

R: recursive function to give groups of consecutive numbers


r,if-statement,recursion,vector,integer
Given a sorted vector x: x <- c(1,2,4,6,7,10,11,12,15) I am trying to write a small function that will yield a similar sized vector y giving the last consecutive integer in order to group consecutive numbers. In my case it is (defining groups 2, 4, 7, 12 and 15): > y...

How can I minimize this function in R?


r,function,optimization,mathematical-optimization
I'm attempting to write a formula that will determine a value of a that minimizes the function output myfun (i.e. a-fptotal). MWE: c <- as.matrix(c(.25,.5,.25)) d <- as.matrix(c(10000,12500,15000)) e <- 700 f <- 1.1 tr <- .30 myfun <- function(a) { b <- max(a-e,0) df <- data.frame(u1=c(c*b*.40),u2=c(c*b*.60)) df$year <- 1:nrow(df)...

ggplot2 & facet_wrap - eliminate vertical distance between facets


r,ggplot2
I'm working with some data that I want to display as a nxn grid of plots. Edit: To be more clear, there's 21 categories in my data. I want to facet by category, and have those 21 plots in a 5 x 5 square grid (where the orphan is by...

Skip some lines with fread


r,fread
I am interested to skip some lines of my data frame before the header names . How can i do it by skiping all the lines before ID_REF or if ID_REF is not present, check for the pattern ILMN_ and deleting all the lines keeping immediate first if not containing...

How to build a 'for' loop with input$i in R Shiny


r,loops,for-loop,shiny
In my shiny app, I build a a number of checkboxes using a for loop, like this: landelist <- c("Danmark", "Tjekkiet", "Østrig", "Belgien", "Tyskland", "Sverige", "USA", "Norge", "Island") landecheckbox <- c() for (land in landelist){ landechek <- paste0("<label class=\"checkbox inline\"><input id=\"", land, "\" type=\"checkbox\" checked><span>", land, "</span></label>") landecheckbox <- c(landechek,...

Linear multivariate regression in R


r
I want to model that a factory takes an input of, say, x tonnes of raw material, which is then processed. In the first step waste materials are removed, and a product P1 is created. For the "rest" of the material, it is processed once again and another product P2...

Fitting a subset model with just one lag, using R package FitAR


r,time-series
I am trying to fit a subset model with only lag 4. In the manual it's written "you must use p=c(0,0,0,4) since p=4 will fit a full AR(4)". I did this. #fit a subset model with just lag 4 Fit=FitAR(p=c(0,0,0,4), lag.max = "default", ARModel = "ARz") However, I get the...

Highlighting specific ranges on a Graph in R


r,graph,highlight
library(season) plot(CVD$yrmon, CVD$cvd, type = 'o',pch = 19,ylab = 'Number of CVD deaths per month',xlab = 'Time') if i wanted to highlight a region of the graph based on x values say from 1994-1998 how do i do this? Any thought would be appreciated Thanks....

copy a list of data.tables


r,data.table
I have the following situation: 1) a list of data tables 2) For testing purposes I deliberately want to (deeply) copy the whole list including the data tables 3) I want to take some element from the copied list and add a new column. Here is the code: library(data.table) x...

Aggregating data in R


r
user_id date datetime page 217568 6/12/2015 49:23.9 Vodafone | How to get in touch with Vodafone 135437 6/10/2015 43:35.7 My Vodafone – Manage your Vodafone Pay Monthly Account Online – Vodafone 196094 6/13/2015 33:39.4 Check the status of Vodafone’s mobile network in real-time 74197 6/6/2015 52:46.1 undefined 153501 6/5/2015 02:55.5...

Replace -inf, NaN and NA values with zero in a dataset in R


r,time-series,nan,zoo
I am trying to run some trading strategies in R. I have downloaded some stock prices and calculated returns. The new return dataset has a number of -inf, NaN, and NA values. I am reproducing a row of the dataset (log_ret). Its a zoo dataset. library(zoo) log_ret <- structure( c(0.234,-0.012,-Inf,NaN,0.454,Inf),...

R: Using the “names” function on a dataset created within a loop


r,paste,assign,names
I am using a for loop to read in multiple csv files and naming the datasets import1, import2, etc. For example: assign(paste("import",i,sep=""), read.csv(files[i], header=FALSE)) However, I now want to rename the variables in each dataset. I have tried the following: names(as.name(paste("import",i,sep=""))) <- c("xxxx", "yyyy") But get the error "target of...

Fitted values in R forecast missing date / time component


r,time-series,forecasting
I've been doing a variety of models in R with time series data (in XTS format) and I keep running into the same issue where there's no date / time component to the fitted values / forecasts and thus I can't graph them on the same graph as the original...

Converting column from military time to standard time


r,excel
I'm trying to convert a column showing the time of road traffic accidents from military time to standard time. The data looks like this: Col1 Time..24hr. 1 1404 2 322 3 1945 4 1005 5 945 I'd then like to convert to 12hr so for '322' I'd like to make...

Store every value in a sequence except some values


r
If I do the following to a string of letters: x <- 'broke' y <- nchar(x) z <- sequence(y) How do I store every value of the z that isn't the first, last, or middle values of the sequence. In this example if z is 1 2 3 4 5...

Serial modification of objects in R


r,oop
I have a number of matrices of the same size: m1.m <- matrix(c(1,2,3,4), nrow=2, ncol=2) m2.m <- matrix(c(5,6,7,8), nrow=2, ncol=2) ... I want to set uniform column and row names to all of them. Currently I am doing it like this: new_col_names <- c("Col1","Col2") new_row_names <- c("Row1","Row2") change_names <- function(m,...

Find multiple consecutive empty lines


r
I'm trying to chop up a text file into the articles it contains. Usually this is done by identifying a pattern each article begins with. Unfortunately the database I downloaded the articles from doesn't have that. The only pattern I can find is that after each article there are 3...

How (in a vectorized manner) to retrieve single value quantities from dataframe cells containing numeric arrays?


r,dataframes,vectorization
I've got a dataframe that includes columns like the one on the right here: lengthArray speed_max 1 4 24, 18, 24, 18 2 10 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 3 4 -999, -999, -999, -999 4 2 -999, -999 5 2 18, 18 6 1...

Subtract time in r, forcing unit of results to minutes [duplicate]


r,posix,posixct
This question already has an answer here: Getting consist units from diff command in R 4 answers I successfully subtracted two POSIXct cols of df1 (below). However, since the time differences are >= 1 hour in all rows, R gives the results in hours. I know that this make...

How to split a text into two meaningful words in R


r,string-split,stemming,text-analysis
I had a text data frame having sentences, and as I wanted the list of separate words in another dataframe I used the "qdap package" function "all_words" Words = all_words(df$problem_note_text, begins.with=NULL , alphabetical = FALSE, apostrophe.remove = TRUE, char.keep = char2space, char2space = "~~") Now have a dataframe which has...

Histogram-like summary for interval data


r,statistics,histogram
How do I get a histogram-like summary of interval data in R? My MWE data has four intervals. interval range Int1 2-7 Int2 10-14 Int3 12-18 Int4 25-28 I want a histogram-like function which counts how the intervals Int1-Int4 span a range split across fixed-size bins. The function output should...

How to set x-axis with decreasing power values in equal sizes


r,plot,ggplot2,cdf
Currently I am doing some cumulative distribution plot using R and I tried to set x-axis with decreasing power values (such as 10000,1000,100,10,1) in equal sizes but I failed: n<-ceiling(max(test)) qplot(1:n, ecdf(test)(1:n), geom="point",xlab="check-ins", ylab="Pr(X>=x)")+ geom_step() +scale_x_reverse(breaks=c(10000,1000,100,10,1)) +scale_shape_manual(values=c(15,19)) It seems that the output has large interval for 10000, then all the...