csv,gnuplot,histogram , histogram in gnuplot vs histogram in unix utilities

histogram in gnuplot vs histogram in unix utilities


Tag: csv,gnuplot,histogram

I have csv file, I want to create histogram from column 6. Using Linux utilities this is simple:

└──> cut -f6 -d, nupic_out.csv | grep -vi [a-z] | grep -v '^$' |  sort | uniq -c | sort -k2,2n
    563 0.0
     72 0.025
     35 0.05
     22 0.075
     14 0.1
     21 0.125
     14 0.15
     10 0.175
      5 0.2
      3 0.225
      7 0.25
      3 0.275
      6 0.3
      5 0.325
      3 0.35
      1 0.375
      3 0.4
      1 0.425
      3 0.45
      3 0.475
      5 0.5
      7 0.525
     11 0.55
      3 0.575
      4 0.6
      3 0.625
     11 0.65
      5 0.675
      9 0.7
      5 0.725
      7 0.75
      8 0.775
      5 0.8
      3 0.825
      3 0.85
      4 0.875
      2 0.9
      1 0.925
      1 0.975
    109 1.0

But I would like to plot it using gnuplot my attempt was to modify following script that I've found. This is my modified version:

#!/usr/bin/gnuplot -p
# http://psy.swansea.ac.uk/staff/carter/gnuplot/gnuplot_frequency.htm


set datafile separator ",";
# set term dumb

set key off
set border 3

# Add a vertical dotted line at x=0 to show centre (mean) of distribution.
set yzeroaxis

# Each bar is half the (visual) width of its x-range.
set boxwidth 0.05 absolute
set style fill solid 1.0 noborder

bin_width = 0.1;
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )

# plot dataset_path using (rounded($6)):(6) smooth frequency with boxes

plot "nupic_out.csv" using 6:6 smooth frequency with boxes

And this is the result which is saying something completely different than unix tools. In gnuplot I've seen various types of histograms, e.g. some follows normal distribution pattern, others were ordered according to frequency (as if I replace the last sort -k2,2n with sort -n) another were ordered according to numbers from which histogram was created (mine case), etc. it would be nice if I could choose.


smooth frequency renders the data monotonic in x (i.e. the value given in the first using column, in your case the numerical value from column 6), and then sums up all y-values (the values given in the second using column).

Here you also give the the sixth column, which is wrong if you want to count the number of occurrences of each distinct value in the sixth column, use using 6:(1), i.e. the numerical value 1 in the second column, to count the actual number of occurrences of each value:

set style fill solid noborder
set boxwidth 0.8 relative
set datafile separator ','
plot 'nupic_out.csv' using 6:(1) smooth frequency with boxes notitle

enter image description here

To apply a logscale to the smoothed data, you must first save them to a temporary file with set table ...; plot and then plot this temporary file.

set datafile separator ','
set table 'tmp.dat'
plot 'nupic_out.csv' using 6:(1) smooth frequency with lines
unset table

Here you must pay attention, because a bug in gnuplot adds a wrong last line to the output file which you must skip. You can either skip this by a filter in the using statement with e.g.

plot 'tmp.dat' using (strcol(3) eq "i" ? $1 : 1/0):2 with boxes

which works fine here, or you could use head to cut the last two lines like

plot '< head -n-2 tmp.dat' using 1:2 with boxes

Another point to note is, that gnuplot always uses white spaces to write out its data files, so you must change the data file separator back to whitespace before plotting tmp.dat.

A full working script could be

set style fill solid noborder
set boxwidth 0.8 relative
set datafile separator ','

set table 'tmp.dat'
plot 'nupic_out.csv' using 6:(1) smooth frequency with lines notitle
unset table

set datafile separator whitespace
set logscale y
set yrange [0.8:*]
set autoscale xfix
plot '< head -n-2 tmp.dat' using 1:2 with boxes notitle

enter image description here

Now, using a binning function for the values in the sixth column, you must replace the 6 in using 6:(1) by an function which operates on the value given in the sixth column. This function must be enclosed in () and you reference the current value in the sixth column using $6 inside the function, like

plot 'nupic_out.csv' using (bin($6)):(1) smooth frequency with lines

Again, a full working script, using ChrisW's binning function could be

set style fill solid noborder
set datafile separator ','

set boxwidth 0.09 absolute
Min = -0.05
Max = 1.05
n = 11.0
width = (Max-Min)/n
bin(x) = width*(floor((x-Min)/width)+0.5) + Min

set table 'tmp.dat'
plot 'nupic_out.csv' using (bin($6)):(1) smooth frequency with lines notitle
unset table

set datafile separator whitespace
set logscale y
set xrange [-0.05:1.05]
set tics nomirror out
plot '< head -n-2 tmp.dat' using 1:2 with boxes notitle

enter image description here


Adding data from a csv file to a List using a loop

I am new to developing and this is my first question, so please excuse any misunderstanding in my explanation. I am trying to import the data from a csv file into a list of objects I defined. My file has 2 columns and 5 rows: 1 for the date and...

Hive external table not reading entirety of string from CSV source

Relatively new to the Hadoop world so apologies if this is a no-brainer but I haven't found anything on this on SO or elsewhere. In short, I have an external table created in Hive that reads data from a folder of CSV files in HDFS. The issue is that while...

Exporting Data from Cassandra to CSV file

Table Name : Product uid | productcount | term | timestamp 304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000 6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000 Command : COPY product (uid, productcount, term, timestamp) TO 'temp.csv'; Error: Improper COPY command. Am I missing something? ...

Specific rows from CSV as dictionary and logic when keys are the same - Python

abc11 bvc ex 123 456 somestuffhere abc11 bvc ex 456 476 somestuffhere abc12 bvc ex 173 426 somestuffhere abc12 bvc ex 426 496 somestuffhere abc13 bvc ex 143 796 somestuffhere abc13 bvc ex 743 896 somestuffhere I am trying to put the above CSV file as a dictionary, {'abc11':['123','476'],'abc12':['173','496'],'abc13':['143','896']}. I...

Group instances based on NA values in r

I am reading a csv file and unfortunately my dataframe has many missing values. A small snip is as following: df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), Value= c(900, NA, 1300, 1100, NA), Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'), Num1 = c(2, NA, 3, 2, NA), Num2 = c(2,3,3,1,2),...

Compare 2 seperate csv files and write difference to a new csv file - Python 2.7

I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7. import csv f1 = open ("olddata/file1.csv") oldFile1 = csv.reader(f1) oldList1 = [] for row in oldFile1: oldList1.append(row) f2 = open ("newdata/file2.csv") oldFile2 = csv.reader(f2) oldList2 = [] for...

Gnuplot plotting all datafiles automatically

I have multiple *.data files each has same format and can be plotted with this simple script: cat << __EOF | gnuplot -persist set terminal pdf set output 'out.pdf' set datafile separator ";" set boxwidth 0.5 set style fill solid plot "xxx.dat" using 1:3:xtic(2) with boxes __EOF How can i...

Parse text from a .txt file using csv module

I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

How can i add two tables in excel csv file?

I am creating excel CSV file using the following code its working fine for me generating the first image table.But i want to generate the CSV for the image second.Can anyone please tell me how can i add two tables in single file and how can i insert record where...

Python: Using panda to import csv. Trying to plot a column but gives me an error saying “no numerical data to plot”

I'm trying to read following csv-file with panda and plot a column of it: data type,approved mining area,mined area,coal content,earth rate,coal rate,waste ratio unit,ha,ha,Mt,Mm3/a,Mt/a, Garzweiler,11400,3096,1246,140,37.5,4.4 Hambach,8500,4224,1500,275,40,5.2 Inden,4500,1655,358,87.5,22.5,3.6 which gives me the following (only a part so it fits here): data type approved mining area mined area coal content earth rate...

How Do I Transform This CSV / Tabular Data Into A Different Shape?

I have a sparse n-column spreadsheet where the first two columns describe a person and the rest of the (n-2) columns track RSVP and attendance data for various events (each of which take up one column). It looks like this: PersonID, Name, Event29108294, Event01838401, Event10128384 12345, John Smith, Registered -...

Adding time/duration from CSV file

I am trying to add time/duration values from a CSV file that I have but I have failed so far. Here's the sample csv that I'm trying to add up. Is getting this output possible? Output: I have been trying to add up the datetime but I always fail: finput...

Read CSV and plot colored line graph

I am trying to plot a graph with colored markers before and after threshold value. If I am using for loop for reading the parsing the input file with time H:M I can plot and color only two points. But for all the points I cannot plot. Input akdj 12:00...

gnuplot rowstacked histogram: how to put sum above bars

This question is related to gnuplot histogram: How to put values on top of bars. I have a datafile file.dat: x y1 y2 1 2 3 2 3 4 3 4 5 and the gnuplot: set style data histogram; set style histogram rowstacked; plot newhistogram 'foo', 'file.dat' u 2:xtic(1) t...

How to plot columns numbered greater than 1000 in gnuplot

I have a file with 1600 columns. plot "file" using 1:999 title columnhead(999) plots column 999, but plot "file" using 1:1000 title columnhead(1000) produces column 100 instead of 1000, producing the same result as plot "file" using 1:100 title columnhead(100) Apparently large column numbers are wrapped. Is there a work-around...

Print lines in a csv file if the values are consecutive

I have the following csv file: Value1,Value2,Value3,Value4 11,12,27,28 5,6,101,102 111,112,55,56 1,7,33,34 3,4,55,57 I want to print lines if Value1 & Value2 are consecutive AND Value3 & Value4 are consecutive. The desired answer would be: 11,12,27,28 5,6,101,102 111,112,55,56 I tried something like this but it didn't work f = open('test.csv', 'rU')...

Python: isolating re.search results

So I have this code (probably super inefficient, but that's another story) that is pulling urls from html code of a blog. I have the html in a .csv, which I am putting into python, then running the regex to get the urls. Here is the code: import csv, re...

Set label on group multiplot in gnuplot

Im plotting one picture with 4 different graphs using gnuplot. Labels for their x and y axes have the same meaning. If Im plotting it like this: set multiplot layout 2,2 rowsfirst set xlabel "x" set ylabel "y" set title offset -3,-3 set xrange [20:70] set yrange [0:15000] set title...

Compare 2 csv files and output different rows to a 3rd CSV file using Python 2.7

I am trying to compare two csv files and find the rows that are different using python 2.7. The rows are considered different when all columns are not the same. The files will be the same format with all the same columns and will be in this format. oldfile.csv ID...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)

I have a problem with the csv reader and writer in python. Whenever I try to take one CSV file and par down the number of columns from roughly 37 to 6, this is the kind of output I am getting. Example of one row: 0,"JOHNSON, JOHN J.",JOHN J. JOHNSON,TECH879,INSPECTION...

gnuplot highlighting points when with lines

I have 4 points I would like to plot using gnuplot, but with lines. The problem is, I can't find how to highlight these 4 points on the drawn line. I would like for the plot to be a line through the points, but that these points are also clearly...

Python csv writer: write a string in front of an array, all in individual columns

Given the following data: names = ['a','b','c','d'] matrix = [array[1,2,3,4],array[5,6,7,8],array[9,10,11,12],array[13,14,15,16]] I am trying to print one name in front of each array on a csv file, like so: Desired Output: 'a',1,2,3,4 'b',5,6,7,8 etc... So far, I have this code: with open('test.csv', 'a') as csvfile: writer = csv.writer(csvfile, dialect='excel') counter =...

Replace improper commas in CSV file

This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the...

How to collect data from text file to dict in Python?

I have the following table generated in a text file,"fasta.txt" A C G T 0 0.195965417867 0.322766570605 0.35446685879 0.126801152738 A1 0.25 0.1875 0.3125 0.25 C1 0.25 0.475 0.225 0.05 G1 0.135135135135 0.243243243243 0.405405405405 0.216216216216 T1 0.142857142857 0.285714285714 0.285714285714 0.285714285714 A2 0.125 0.208333333333 0.625 0.0416666666667 C2 0.0833333333333 0.416666666667 0.305555555556 0.194444444444 G2...

Convert delimited string to array and group using LINQ in C#

I have a string that has a delimited format like this: orgname: firstname lastname, firstname lastname; (this can repeat with orgnames and variable number of names for each org) Example: **XXX University**: Martha Zander, Rick Anderson; **Albert School**: Nancy Vanderburg, Eric Towson, George Branson; **Hallowed Halls**: Jane Goodall, Ann Crabtree,...

How to process the irregular json file in python to get the serialized values in CSV?

{\n \"1 & 1 Internet\": {\n \"category\": \"Infrastructure\",\n \"xyz\": 55,\n \"abc\": \"low\"\n },\n \"1 website hosting\": {\n \"category\": \"Infrastructure\",\n \"xyz\": 0,\n \"abc\": \"poor\"\n },\n \"10000ft\": {\n \"category\": \"Collaboration\",\n \"xyz\": 48,\n \"abc\": \"poor\"\n },\n \"1010data\": {\n \"category\": \"Big Data\",\n \"xyz\": 56,\n \"abc\": \"low\"\n },\n \"101domains\": {\n \"category\": \"Infrastructure\",\n \"xyz\": 0,\n \"abc\":...

create multidimensional associative array from CSV in PHP

I'm trying to create a multidimensional array in PHP where the inner arrays are associative for the following example CSV string $csv: # Results from 2015-06-16 to 2015-06-16. date,time,label,artist,composer,album,title,duration 2015-06-16,12:00 AM,Island,U2,"Clayton- Adam,The Edge,Bono,Mullen- Larry- Jr",Songs Of Innocence,SONG FOR SOMEONE,03:46 2015-06-16,12:04 AM,Lowden Proud,"Fearing & White, Andy White, Stephen Fearing","White- Andy,Fearing- Stephen",Tea...

type conversion performance optimizable?

The following snippet converts xml data to csv data in a data processing application. element is a XElement. I'm currently trying to optimize the performance of the application and was wondering if I could somehow combine the two operations going on below: Ultimately I still want access to the string...

Can I set a default Y range in gnuplot?

My graphs will be generated based on runtime data, such as user-provided time ranges. I can't know ahead of time whether data within the requested xrange will actually exist. In the case that it's not, I'd like to show a blank plot (with X=time range as requested, but arbitrary Y...

Run 3 variables at once in a python for loop.

For loop with multiple variables in python 2.7. Hello, I am not certain how to go about this, I have a function that goes to a site and downloads a .csv file. It saves the .csv file in a particular format: name_uniqueID_dataType.csv. here is the code import requests name =...

export mysql query array using fputcsv

I am trying to export the results of a sql query to CSV using fputcsv however keep getting the error "fputcsv() expects parameter 2 to be array, string given". I followed the answer given here - Query mysql and export data as CSV in PHP - but it throws the...

Saying there are 0 arguments when I have 2? Trying to open a CSV file to write into

I'm trying to read from a CSV file and codify people into groups using an equation. I append the name of their group they fall into to the end of the array that their row creates. Then I write it to a new file so I don't overwrite the original...

Convert strings of data to “Data” objects in R [duplicate]

This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

How to pivot array into another array in Ruby

I have a multidimensional array like this one : myArray = [["Alaska","Rain","3"],["Alaska","Snow","4"],["Alabama","Snow","2"],["Alabama","Hail","1"]] I would like to end up with CSV output like this. State,Snow,Rain,Hail Alaska,4,3,nil Alabama,2,nil,1 I know that to get this outputted to CSV the way I want it I have to have output array like this: outputArray =[["State","Snow","Rain","Hail"],["Alaska",4,3,nil],["Alabama",2,nil,1]]...

Addition of two dates on python 3

I try adding date and hours from csv file in one datetime variable. I read questions about adding some timedelta and official doc https://docs.python.org/3/library/datetime.html#timedelta-objects, but don't understend how it works. My csv row looks like - ['2005.02.28', '17:38', '1.32690', '1.32720', '1.32680', '1.32720', '5'].I convert row[0] = 2005.02.28 to date and...

How can I use a variable to get an Input$ in Shiny?

I am new to R and I am creating a shiny application to read a csv and filter data. I am reading the csv file, then creating dropdowns with a loop using the column names and the unique values: output$dropdowns <- renderUI({ if (is.null(x())) { return(NULL) } lapply(1:ncol(x()), function(i) {...

Extract Values from CSV file to then multiply with values from other CSV? Python

I'm very new to programing and using python especially. I have a CSV file that is 20 by x, and has a variety of probability values. it will look somethign like this: A B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13...

BASH - conditional sum of columns and rows in csv file

i have CSV file with some database benchmark results here is the example: Date;dbms;type;description;W;D;S;results;time;id Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99...

Perl: Using Text::CSV to print AoH

I have an array of hashes (AoH) which looks like this: $VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Create an external Hive table from an existing external table

I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to...

How to rearrange CSV / JSON keys columns? (Javascript)

I am converting a JSON object array to CSV using Papa Parse JavaScript Library. Is there a way to have the CSV columns arranged in a certain way. For e.g; I get the column as: OrderStatus, canOp, OpDesc, ID, OrderNumber, FinishTime, UOM, StartTime but would like to be arranged as:...

Gnuplot bar chart from this CSV input

I have csv file data.dat for example with these values: #W=1 0;sqlite;11500 1;hsql;14550 2;h2;17550 #W=2 0;sqlite;11000 1;hsql;13800 2;h2;16500 #W=3 0;sqlite;11000 1;hsql;13800 2;h2;16500 #W=4 0;sqlite;11000 1;hsql;13800 2;h2;16500 I need to plot bar charts into pdf. each data for each graph starts with title #W1,#W2.... i have tried this script: cat <<...

Is it possible to output to a csv file with multiple sheets?

I need to output data to a CSV file from Java, but in that csv file I hope to create multiple sheets so that data can be organized in a better way. After some googling, it seems this is not possible. A CSV file can only receive one-sheet data. Is...

gnuplot - get errors on fit parameters, get fit output values as variables, print variable to screen

Initial Question (Partially Answered) I am using gnuplot's fitting routines to fit a function to some data, and extract a "characteristic decay time constant". (I call this parameter d in my fitting function.) I have used the script code set fit quiet to prevent reams of text being printed to...

Entity framework to CSV float values

I'm trying to write an object to csv, the thing is that my objects have float valure for exemple (14,9) i want to change them to (14.9) so it won't cause any problem with the csv format string csv = ""; using (var ctx = new NBAEntities2()) { var studentList...

How to split a CSV file into multiple files based on column value

I have CSV file which could look like this: name1;1;11880 name2;1;260.483 name3;1;3355.82 name4;1;4179.48 name1;2;10740.4 name2;2;1868.69 name3;2;341.375 name4;2;4783.9 there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will...

Resampling and merging data frame with python

Hi I have created a dictionary of dataFrame with this code import os import pandas import glob path="G:\my_dir\*" dataList={} for files in glob.glob(path): dataList[files]=(read_csv(files,sep=";",index_col='Date')) The different dataframe present in the dictory have different time sample. An example of dataFrame(A) is Date Volume Value 2014-01-04 06:00:02 6062 108000.0 2014-01-04 06:06:05 6062...

How to stop foreach loop from printing duplicate data?

I am trying to generate unique CSV files from the csv data that I have using the following loop. $k =1; foreach ($csv_tbl as $_csv) { $filename = "Agent_" . $k . ".csv"; $file_path = "agents/$filename"; file_put_contents($file_path, $_csv); if (file_exists($_csv)) { header('Content-Description: File Transfer'); header('Content-type: text/csv'); header('Content-Disposition: attachment; filename=' ....

CSV File header part is coming parsing by LINQ

the below way i am parsing csv file by LINQ but i found header part is coming when i inspect user class data. what is wrong there in code. var csvlines = File.ReadAllLines(filename); // IEnumerable<string> var csvLinesData = csvlines.Select(l => l.Split(',').Skip(1).ToArray()); // IEnumerable<string[]> int flag = 0; var users =...

Panda's Write CSV - Append vs. Write

I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command: df.to_csv('filename.csv',mode = 'a',header ='column_names') The write or append succeeds, but it seems like the header is written every time an append takes...