(using: R 3.1.0)

Hi - I feel like this should be simpler than I'm finding it. I have a set of sequences and I'd like to visualise them as a directed network. A pure graph probably isn't right because each sequence can have multiple instances of nodes and the repetition order is important in the sequence. So, for example I might have:

```
Seq Count
AB 8000
AC 5500
CB 4900
CBA 4300
ACD 4000
ACACA 3740
CA 2800
... ...
```

Where the sequence ends up is interesting, so for each final node I'd like to show the paths to it and their weights. So in my (very small) example above:

end point B:

**A->B**has weight 8000 and**C->B**has weight 4900.`8000 A-+ |-->B 4900 C-+`

end point A:

**C->B->A**has weight 4300,**A->C->A->C->A**has weight 3740,**C->A**has weight 2800`4300 C--->B-+ | 4740 A-->C-->A-->C-+--->A | 2800 C-+`

Its important to note that route CA is not part of ACACA, but a separate route.

The raw data is actually a list of events in time grouped by a sequence number, so it may be easier to start from that point (rather than the aggregated view above). Like this:

```
seqNo. Node Time
1 A 0.0
1 B 2.1
2 A 0.0
2 C 3.2
3 C 0.0
3 B 8.1
4 C 0.0
4 B 1.2
4 A 2.3
... ... ...
```

I'd like to know what package (if any) is best to use to work with sequences like this, and how to reduce the data to a directed network view. The iGraph package looks like it could help but I think there might be some concepts I'm missing, particularly in this case where an adjacency matrix isn't really valid (due to multiple adjacencies in the graph for each pair of nodes).

UPDATE - this this is an idea of the type of output I'm looking for:

Cheers and thanks for any help,

Andy.

Answer:

You seem to be saying that only start and end nodes are of interest as nodes so you could use these nodes as vertices and display the intermediate nodes as edge labels as shown in the following code and plot. Assume `df`

contains your aggregate data.

```
library(igraph)
last_char <- nchar(as.character(df$Seq))
df_g <- cbind(v1=substr(df$Seq, 1,1),
v2=substr(df$Seq, last_char, last_char), df)
g <- graph.data.frame(df_g)
plot(g, edge.label=paste(E(g)$Seq, "\n", E(g)$Count))
```

The visual presentation of the plot could be improved but this shows a way in which the aggregate data can produce a directed network view. One could imagine some alternative ways of representing the interior nodes between start and end nodes but these would seem to lead to more complicated plots.

UPDATE 2

Your comment made things clearer. Most of the work in getting your diagram is generating the edges and vertices for a graph from your sequence data. Once that is defined, you can format and send to a plotting package to display. The code below constructs a data frame `df_g`

containing the edge connectivity and end locations, uses `df_g`

to generate a data frame `df_v`

containing vertex data, and then passes both to `igraph`

for plotting. You can get an idea of what the code is doing by examining `df_g`

and `df_v`

.

```
library(igraph)
last_char <- nchar(df$Seq)
df <- df[order(substr(df$Seq, last_char, last_char), df$Seq),]
edges <- as.character(df$Seq)
df_g <- data.frame(v1=NA_character_, v2=NA_character_, Seq=NA_character_,
Count=NA_character_, label=NA_character_, arrow.mode = NA_character_, end = NA_character_,
x1 = NA_integer_, x2 = NA_integer_, y1=NA_integer_, y2=NA_integer_, type=NA_character_,
stringsAsFactors=FALSE)
for( i in 1:nrow(df)){
# Make sequence edges
edge <- edges[i]
num_vert <- nchar(edge)
j <- 1:(num_vert-1)
df_g_j <- data.frame( v1=paste(edge, j,sep="_"), v2=paste(edge, j+1,sep="_"),
Seq=edge, Count=df$Count[i], label=sapply(j, function(x) substr(edge, x, x)),
arrow.mode = ">", end=substr(edge,num_vert,num_vert),
x1=j-num_vert, x2=j+1-num_vert, y1=i, y2=i, type="seq", stringsAsFactors=FALSE)
df_g_j[num_vert-1, "arrow.mode"] <- "-" # make connector vertex
df_g_con <- transform(df_g_j[num_vert-1,], v1=v2, v2=paste(end, "connector", sep="_"), x1=0, label=NA, type="connector")
df_g <- rbind(df_g, df_g_j, df_g_con)
}
df_g <- df_g[-1,]
df_g[df_g$type=="connector",] <- within(df_g[df_g$type=="connector",], y2 <- tapply(y2, v2, mean)[v2])
cn_vert <- aggregate(v2 ~ end, data=df_g[df_g$type=="connector", ], length)
colnames(cn_vert) <- c("end","num")
for( end in cn_vert$end){
cn_vert_row <- which(df_g$end == end & df_g$type == "connector")[1]
if( cn_vert$num[cn_vert$end==end] > 1 ) {
df_g <- rbind(df_g,with(df_g[cn_vert_row,],
data.frame(v1=v2, v2=end, Seq=NA_character_, Count=NA_character_, label=NA,
arrow.mode = ">", end=end, x1=x2, x2= 1, y1 = y2, y2=y2, type = "common_end",
stringsAsFactors=FALSE)) ) }
else df_g[cn_vert_row,] <- transform(df_g[cn_vert_row,], v2=end, label=NA, arrow.mode=">", x2=1,type="common_end")
}
# make vertices
df_v <- with(df_g, data.frame(v=v1, label = label, x=x1, y=y1, color = "black", size = 15, stringsAsFactors=FALSE))
df_v <- rbind(df_v, with(df_g[df_g$type == "common_end",],
data.frame(v=end, label = v2, x=x2, y=y2, color="black", size=15, stringsAsFactors=FALSE)))
df_v[is.na(df_v$label),] <- transform(df_v[is.na(df_v$label),], color = NA, size = 0)
#
# make graph from edges and vertices
g <- graph.data.frame(df_g, vertices=df_v)
E(g)$label <- NA # assign Counts as labels to sequence start vertices
e_start <- grep("_1",get.edgelist(g)[,1])
E(g)[e_start]$label <- E(g)[e_start]$Count
# adjust and scale edge label positions
h_jst <- 0 # values between 0 and .2
edge_label_x <- 1 - 2*(1.5 + h_jst - E(g)$x1)/diff(range(V(g)$x))
num_color <-12 # assign colors to Count labels; num_color is number of colors in pallette
counts <- as.integer(E(g)$Count)
edge_label_color <- rainbow(num_color, start=0, end=.75)[num_color-
floor((num_color-1)*(counts-min(counts,na.rm=TRUE))/diff(range(counts,na.rm=TRUE)))]
plot(g, vertex.label.color="white", vertex.frame.color=V(g)$color,
edge.color="blue", edge.arrow.size=.6, edge.label.x= edge_label_x,
edge.label.color=edge_label_color, edge.label.font=2, edge.label.cex=1.1)
```

For your sample data, this gives the diagram shown below. The Count labels have greater separation from the vertices when the plots are enlarged but you can further adjust this by with the variable h_jst inside the code.

r,time-series,forecasting

I've been doing a variety of models in R with time series data (in XTS format) and I keep running into the same issue where there's no date / time component to the fitted values / forecasts and thus I can't graph them on the same graph as the original...

r,colors,ggplot2

Before I start, allow me to explain my graph: I have two Genotypes (WTB and whd) and each have two conditions (0 and 7), so I have four lines. Now, I want to make a plot where each variable and its condition is the same color. Anything with whd will...

r,data.table,stata,code-translation

Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

r,string-split,stemming,text-analysis

I had a text data frame having sentences, and as I wanted the list of separate words in another dataframe I used the "qdap package" function "all_words" Words = all_words(df$problem_note_text, begins.with=NULL , alphabetical = FALSE, apostrophe.remove = TRUE, char.keep = char2space, char2space = "~~") Now have a dataframe which has...

r,loops,data.frame,append

How do I put what I printed in a dataframe with a for loop and if else statements? Basically, this code: list<-c("10","20","5") for (j in 1:3){ if (list[j] < 8) print("Greater") else print("Less") }) #[1] "Less" #[1] "Less" #[1] "Greater" Or should it be something more like this? f3 <-...

r,shiny

I am playing around with the shiny packages for some hours now, and wanted to make a select input widget that enables me to download a certain data set from the server. So i figured out a way to get me this data frame containing all my IDs for downloading:...

r,sapply

This code is suppose to take in a word, and compute values for letters of the word, based on the position of the letter in the word. So for a word like "broke" it's suppose to compute the values for the letter "r" and "k" strg <- 'broke' #this part...

java,r,rjava

How to call java method which returns list from R Language.

r,if-statement,recursion,vector,integer

Given a sorted vector x: x <- c(1,2,4,6,7,10,11,12,15) I am trying to write a small function that will yield a similar sized vector y giving the last consecutive integer in order to group consecutive numbers. In my case it is (defining groups 2, 4, 7, 12 and 15): > y...

r,data.table

I have the following situation: 1) a list of data tables 2) For testing purposes I deliberately want to (deeply) copy the whole list including the data tables 3) I want to take some element from the copied list and add a new column. Here is the code: library(data.table) x...

r,google-maps,ggmap

Few days ago I was familiarizing myself with displaying maps, plotting points on the map from http://rpubs.com/nickbearman/r-google-map-making Today, I have intermittent success in displaying maps. library(ggmap) map <- qmap('Anaheim', zoom = 10, maptype = 'roadmap') Outputs Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Anaheim&zoom=10&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false And when I go to the URL...

r,posix,posixct

This question already has an answer here: Getting consist units from diff command in R 4 answers I successfully subtracted two POSIXct cols of df1 (below). However, since the time differences are >= 1 hour in all rows, R gives the results in hours. I know that this make...

r,ggplot2

Is there an equivalent in ggplot2 to plot this dataset? I use matplot, and read that qplot could be used, but it really does not work. ggplot/matplot data<-rbind(c(6,16,25), c(1,4,7), c(NA, 1,2), c(NA, NA, 1)) as.data.frame(data) matplot(data, log="y",type='b', pch=1) ...

r,shiny,sleep

I have a WebApp that have some renderUI({})... and some of them depend on the input of another. This makes that, briefly, a red error in the webpage appear when I select some options. Because the if() clause of some renderUI({}) depend on the input of a selectizer. The error...

r,google-maps,ggmap

I have a dataset that looks like this: LOCALITY numbers 1 Airoli 72 2 Andheri East 286 3 Andheri west 208 4 Arya Nagar 5 5 Asalfa 7 6 Bandra East 36 7 Bandra West 72 I want to plot bubbles (bigger the number bigger would be the bubble) inside...

r

I have a string which has a complex format, how can I use complex() to treat it as a complex number? For example: myStr="0.76+0.41j" now I want to do sth like: myStr_complex=complex(myStr) # my question is how should I do this part? Eventually Im(myStr_complex) should print 0.41 ...

r,plot,ggplot2,cdf

Currently I am doing some cumulative distribution plot using R and I tried to set x-axis with decreasing power values (such as 10000,1000,100,10,1) in equal sizes but I failed: n<-ceiling(max(test)) qplot(1:n, ecdf(test)(1:n), geom="point",xlab="check-ins", ylab="Pr(X>=x)")+ geom_step() +scale_x_reverse(breaks=c(10000,1000,100,10,1)) +scale_shape_manual(values=c(15,19)) It seems that the output has large interval for 10000, then all the...

r

user_id date datetime page 217568 6/12/2015 49:23.9 Vodafone | How to get in touch with Vodafone 135437 6/10/2015 43:35.7 My VodafoneÂ â€“ Manage your Vodafone Pay Monthly Account Online â€“ Vodafone 196094 6/13/2015 33:39.4 Check the status of Vodafoneâ€™s mobile network in real-time 74197 6/6/2015 52:46.1 undefined 153501 6/5/2015 02:55.5...

r,statistics,histogram

How do I get a histogram-like summary of interval data in R? My MWE data has four intervals. interval range Int1 2-7 Int2 10-14 Int3 12-18 Int4 25-28 I want a histogram-like function which counts how the intervals Int1-Int4 span a range split across fixed-size bins. The function output should...

r,loops,for-loop,shiny

In my shiny app, I build a a number of checkboxes using a for loop, like this: landelist <- c("Danmark", "Tjekkiet", "Østrig", "Belgien", "Tyskland", "Sverige", "USA", "Norge", "Island") landecheckbox <- c() for (land in landelist){ landechek <- paste0("<label class=\"checkbox inline\"><input id=\"", land, "\" type=\"checkbox\" checked><span>", land, "</span></label>") landecheckbox <- c(landechek,...

r,timer

I have a program to execute per 15 seconds, how can I achieve this, the program is as followed: print_test<-function{ cat("hello world") } ...

r,time-series,nan,zoo

I am trying to run some trading strategies in R. I have downloaded some stock prices and calculated returns. The new return dataset has a number of -inf, NaN, and NA values. I am reproducing a row of the dataset (log_ret). Its a zoo dataset. library(zoo) log_ret <- structure( c(0.234,-0.012,-Inf,NaN,0.454,Inf),...

r,count,duplicates

I've got a column A, which has several values, some of them repeating. So, example: A = c(5, 9, 6, 5, 5). I need to go through A and count the frequencies of each of the values in A. So, for this example, for the set of 5s in A,...

r,optimization,circular,maximization

Background: I am interested in localizing a sound source from a suite of audio recorders. Each audio array consists of 6 directional microphones spaced evenly every 60 degrees (0, 60, 120, 180, 240, 300 degrees). I am interested in finding the neighboring pair of microphones with the maximum set of...

r,large-data

I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks

r,function,optimization,mathematical-optimization

I'm attempting to write a formula that will determine a value of a that minimizes the function output myfun (i.e. a-fptotal). MWE: c <- as.matrix(c(.25,.5,.25)) d <- as.matrix(c(10000,12500,15000)) e <- 700 f <- 1.1 tr <- .30 myfun <- function(a) { b <- max(a-e,0) df <- data.frame(u1=c(c*b*.40),u2=c(c*b*.60)) df$year <- 1:nrow(df)...

r

I am using R for a project and I have a data frame in in the following format: A B C 1 1 0 0 2 0 1 1 I want to return a data frame that gives the Column Name when the value is 1. i.e. Impair1 Impair2 1...

r,time-series

I am trying to fit a subset model with only lag 4. In the manual it's written "you must use p=c(0,0,0,4) since p=4 will fit a full AR(4)". I did this. #fit a subset model with just lag 4 Fit=FitAR(p=c(0,0,0,4), lag.max = "default", ARModel = "ARz") However, I get the...

r,date,csv

This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

r,ggplot2

I'm working with some data that I want to display as a nxn grid of plots. Edit: To be more clear, there's 21 categories in my data. I want to facet by category, and have those 21 plots in a 5 x 5 square grid (where the orphan is by...

r,correlation

I would like to correlate two variables and have the output reported separately for levels of a third variable. My data are similar to this example: var1 <- c(7, 8, 9, 10, 11, 12) var2 <- c(18, 17, 16, 15, 14, 13) categories <- c(1, 2, 3, 1, 2, 3)...

r,oop

I have a number of matrices of the same size: m1.m <- matrix(c(1,2,3,4), nrow=2, ncol=2) m2.m <- matrix(c(5,6,7,8), nrow=2, ncol=2) ... I want to set uniform column and row names to all of them. Currently I am doing it like this: new_col_names <- c("Col1","Col2") new_row_names <- c("Row1","Row2") change_names <- function(m,...

r

We have seven exposures and 24 groups. We would like to randomly assign five of the seven exposures to groups while also ensuring that we end up with a consistent count for each exposure, meaning that each exposure ends up being exposed about the same number of times. I have...

r

If I do the following to a string of letters: x <- 'broke' y <- nchar(x) z <- sequence(y) How do I store every value of the z that isn't the first, last, or middle values of the sequence. In this example if z is 1 2 3 4 5...

r,dictionary,spatial

I am working on a large data set with spatial data (lat/long). My data set contains some positions that I don´t want in my analysis (it makes the files to heavy to process in ArcMap- many Go of data). This is why I want to subset the relevant data for...

r,rbind

The initial data frame mergedDf is PROD_CODE 1 PRD0900033,PRD0900135,PRD0900220,PRD0900709 2 PRD0900097,PRD0900550 3 PRD0900121 4 PRD0900353 5 PRD0900547,PRD0900614 After calling mergedDf<-data.frame(do.call('rbind', strsplit(as.character(mergedDf$PROD_CODE),',',fixed=TRUE))) Output becomes X1 X2 X3 X4 1 PRD0900033 PRD0900135 PRD0900220 PRD0900709 2 PRD0900097 PRD0900550 PRD0900097 PRD0900550 3 PRD0900121 PRD0900121 PRD0900121 PRD0900121 4 PRD0900353 PRD0900353 PRD0900353 PRD0900353 5 PRD0900547 PRD0900614...

r,subset

I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...

r,paste,assign,names

I am using a for loop to read in multiple csv files and naming the datasets import1, import2, etc. For example: assign(paste("import",i,sep=""), read.csv(files[i], header=FALSE)) However, I now want to rename the variables in each dataset. I have tried the following: names(as.name(paste("import",i,sep=""))) <- c("xxxx", "yyyy") But get the error "target of...

r,twitter

I am working on a project where I need to find the reach of some social events. I want to know how many people who were exposed to comments on a festival called Tinderbox in Denmark. What I do is to get the statusses on Twitter including the word "tinderbox"...

r,dataframes,vectorization

I've got a dataframe that includes columns like the one on the right here: lengthArray speed_max 1 4 24, 18, 24, 18 2 10 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 3 4 -999, -999, -999, -999 4 2 -999, -999 5 2 18, 18 6 1...

r,shiny,dplyr

I have a Shiny app that takes input from radio button and then use that to perform filter to the data frame using dplyr in the server side. It works, but now I want to expand it to take multiple inputs to filter, and I have no idea how to...

r,conditional,subset,find-occurrences

I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

r,graph,highlight

library(season) plot(CVD$yrmon, CVD$cvd, type = 'o',pch = 19,ylab = 'Number of CVD deaths per month',xlab = 'Time') if i wanted to highlight a region of the graph based on x values say from 1994-1998 how do i do this? Any thought would be appreciated Thanks....

r,excel

I'm trying to convert a column showing the time of road traffic accidents from military time to standard time. The data looks like this: Col1 Time..24hr. 1 1404 2 322 3 1945 4 1005 5 945 I'd then like to convert to 12hr so for '322' I'd like to make...

r

I'm trying to chop up a text file into the articles it contains. Usually this is done by identifying a pattern each article begins with. Unfortunately the database I downloaded the articles from doesn't have that. The only pattern I can find is that after each article there are 3...

r

I have two tables, one with property listings and another one with contacts made for a property (i.e. is someone is interested in the property they will "contact" the owner). Sample "listings" table below: listings <- data.frame(id = c("6174", "2175", "9176", "4176", "9177"), city = c("A", "B", "B", "B" ,"A"),...

r,string

I've got a vector with a long list of dataset names. E.g myvector<-c('ds1','ds2,'ds3') I'd like to use the names ds1..ds3 to write a file, taking the file name from the vector. Like this: write.csv(dataset[i],file=paste(myvector[i],'.csv',sep='') with dataset being d1...ds3, but without quotes. How can I remove the quotes and refer to...

r,fread

I am interested to skip some lines of my data frame before the header names . How can i do it by skiping all the lines before ID_REF or if ID_REF is not present, check for the pattern ILMN_ and deleting all the lines keeping immediate first if not containing...

r,vector,percentage

This is my vector head(sep) I must find percent of all SEP 11 in each row. For instance, in first row, percent of SEP 11 is 100 * ((63 + 124)/ (63 + 124 + 0 + 0)) And would like this stored in newly created 8th column Thanks dput...

r

I want to model that a factory takes an input of, say, x tonnes of raw material, which is then processed. In the first step waste materials are removed, and a product P1 is created. For the "rest" of the material, it is processed once again and another product P2...