bash,csv,awk , How to remove duplicate comma separate strings using awk

How to remove duplicate comma separate strings using awk


Tag: bash,csv,awk

I have a csv file like this: (named test2.csv)

lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,

I want to remove the duplicate entries

The closest I have got is the following awk command

awk '{a[$0]++} END {for (i in a) print RS i}' RS="," test2.csv

it works but causes new problems, it take the values out of order and puts them in rows like this:

,Elementary Algebra 38
,2015-05-07 15:30:43
,Sentence Skills 104

I need to keep the order they are in and keep them in one line ( I can fix the row issue, but don't know how to fix the order issue)

Update with Solution:

The answer from anubhava worked great, I added a question about removing the time from the date and Ed Morton helped out with that, here is the full query

awk 'BEGIN{RS=ORS=","} {sub(/ ..:..:..$/,"")} !seen[$0]++' test2.csv


You can just use this awk:

awk 'BEGIN{RS=ORS=","} !seen[$0]++' test2.csv
lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Elementary Algebra 38,


How to extract first letters of dashed separated words in a bash variable?

I would like to extract the first letter of dashed separated words value of my bash variable, like this: MY_TEXT=this-is-my-custom-text I would like to create a second variable like this: MY_INITIALS=timct...

Bash script that removes C source comments

I need to write a bash script which copies all .c files from a directory $2 to another directory $1, but without comments. I only have to remove comments that begin with //, might have tabs/spaces before the comment, but not letters. Also, I need to do it with only...

Convert strings of data to “Data” objects in R [duplicate]

This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

Run 3 variables at once in a python for loop.

For loop with multiple variables in python 2.7. Hello, I am not certain how to go about this, I have a function that goes to a site and downloads a .csv file. It saves the .csv file in a particular format: name_uniqueID_dataType.csv. here is the code import requests name =...

How to pivot array into another array in Ruby

I have a multidimensional array like this one : myArray = [["Alaska","Rain","3"],["Alaska","Snow","4"],["Alabama","Snow","2"],["Alabama","Hail","1"]] I would like to end up with CSV output like this. State,Snow,Rain,Hail Alaska,4,3,nil Alabama,2,nil,1 I know that to get this outputted to CSV the way I want it I have to have output array like this: outputArray =[["State","Snow","Rain","Hail"],["Alaska",4,3,nil],["Alabama",2,nil,1]]...

using sed to replace a line with back slashes in a shell script

I am trying to replace the bottom one of these 2 lines with sed in a file. <rule>out_prefix=orderid ^1\\d\+ updatemtnotif/</rule>\n\ <rule>out_prefix=orderid ^2\\d\+ updatemtnotif/</rule>\n\ And the following command seems to do that when executed as a command at the bash prompt sed -i '[email protected]_prefix=orderid ^2\\\\d\\+ updatemtnotif/@out_prefix=orderid ^2\\\\d\\+ updatemtnotif_fr/@g' /opt/temp/rules.txt however, when...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk

I am trying to use a script to append the host name at the end of a multi-line entry of a specific Host_Alias field in sudoers file. The current sudoers file has an entry similar to : Host_Alias srv_linuxestate= \ host10,host12,host13,host1,host50,\ host16,host1,host2,host11,host15,host21,\ host3,host14 My required output would be something like...

Entity framework to CSV float values

I'm trying to write an object to csv, the thing is that my objects have float valure for exemple (14,9) i want to change them to (14.9) so it won't cause any problem with the csv format string csv = ""; using (var ctx = new NBAEntities2()) { var studentList...

type conversion performance optimizable?

The following snippet converts xml data to csv data in a data processing application. element is a XElement. I'm currently trying to optimize the performance of the application and was wondering if I could somehow combine the two operations going on below: Ultimately I still want access to the string...

How to rearrange CSV / JSON keys columns? (Javascript)

I am converting a JSON object array to CSV using Papa Parse JavaScript Library. Is there a way to have the CSV columns arranged in a certain way. For e.g; I get the column as: OrderStatus, canOp, OpDesc, ID, OrderNumber, FinishTime, UOM, StartTime but would like to be arranged as:...

Assign and use of a variable in the same subshell

I was doing something very simple like: v=5 echo "$v" and expected it to print 5. However, it does not. The value that was just set is not available for the next command. I recently learnt that "In most shells, each command of a pipeline is executed in a separate...

Panda's Write CSV - Append vs. Write

I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command: df.to_csv('filename.csv',mode = 'a',header ='column_names') The write or append succeeds, but it seems like the header is written every time an append takes...

How to change svn:externals from bash file non-interactive

I have multi externals need to be set within a file externals.txt and I attempt to change the svn:externals from a bash: svn pe svn:externals svn://hostname/branchname -F extenals.txt But the command throws out an error: svn: E205007: None of the environment variables SVN_EDITOR, VISUAL or EDITOR are set, and no...

How do I silence the HEAD of a curl request while using the silent flag?

When I run the curl command and direct the data to a file, I get back the content of the site as expected. $ curl "" > file.txt $ head file.txt Top of site ... However, this command shows a progress bar, which I do not want: % Total %...

Bash script using sed acts differently when passing variable

I have a script that I am writing that checks a value and then based on the value modifies it. I am trying to understand why it works this one way and not the other. Based on the google and stackoverflow searches I did, nothing really fits what I am...

Aggregate failure codes in bash

I've got a script which has multiple stages, and at each stage it's possible to fail, but the script can carry on running. Concretely, I generate some json, and check if the diff is correct. The diff could be wrong, but it doesn't stop the next stage of json being...

How can I use a variable to get an Input$ in Shiny?

I am new to R and I am creating a shiny application to read a csv and filter data. I am reading the csv file, then creating dropdowns with a loop using the column names and the unique values: output$dropdowns <- renderUI({ if (is.null(x())) { return(NULL) } lapply(1:ncol(x()), function(i) {...

CSV File header part is coming parsing by LINQ

the below way i am parsing csv file by LINQ but i found header part is coming when i inspect user class data. what is wrong there in code. var csvlines = File.ReadAllLines(filename); // IEnumerable<string> var csvLinesData = csvlines.Select(l => l.Split(',').Skip(1).ToArray()); // IEnumerable<string[]> int flag = 0; var users =...

Identifying when a file is changed- Bash

So in my bash shell script, I have it running through a for loop. Inside the for loop, I use find "$myarray[i]" >> $tmp to look for a certain directory each time through the loop. Sometimes, it finds the variable in myarray[i] and sometimes it doesn't. When it does find...

While loop in bash using variable from txt file

I am new to bash and writing a script to read variables that is stored on each line of a text file (there are thousands of these variables). So I tried to write a script that would read the lines and automatically output the solution to the screen and save...

Why does `sort file > file` result in an empty file? [duplicate]

This question already has an answer here: bash redirect input from file back into same file 7 answers When you try to sort a file in-place with sort afile > afile you silently end up with afile being an empty file. Why is that? I'd expect either an error...

AWK write to new column base on if else of other column

I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D. For example: if value in column D <0.7, value in column E =...

how to modify an array value with given index?

I want to modify an array cell, which I can do when I know the cell as a number. However here my cell position is given by $i. pomme[`${i}`]="" I tried without the `` and it doesn't work either? How am I suppose to do it?...

Python: isolating results

So I have this code (probably super inefficient, but that's another story) that is pulling urls from html code of a blog. I have the html in a .csv, which I am putting into python, then running the regex to get the urls. Here is the code: import csv, re...

shell script for counting replacements

Ten files located in a directory. Each file content has "apple" word. write a bash shell script to replace "apple" with "banana" in all ten files and Print the number of replacements for each file. Have tried in this way but dont know how to get number of replacement. can...

linux - running a process and tailing a file simultaneously

I want a run a long task on a remote machine (with python fabric using ssh). It logs to a file on the remote machine. What I want to do is to run that script and tail (actively display) the log file content until the script execution ends. The problem...

Matching string inside file and returning result

I've got a few peculiar issues with trying to search for a string inside of a .db file. The way I tried was by using grep, which does apparently find the string(s), although this is the output: $ grep "ext" *.db Binary file enormous.db matches There are a couple problems...

print filenames into scripts in bash

How do you print each name of a file from a directory into a string and make make new scripts? to print each file name for i in `ls new_manifest*`; do echo $i; done but when I try and print the rest of the string with $i like this is...

bash interactive script pass input

I'm running the following interactive Jar. java -jar script.jar argument-line-here \n Now I'm creating a bash script which runs the jar file. How do I pass the argument line and the conformation "\n" to this interactive script? these are input lines for the script. This question has some answers, expect...

Read CSV and plot colored line graph

I am trying to plot a graph with colored markers before and after threshold value. If I am using for loop for reading the parsing the input file with time H:M I can plot and color only two points. But for all the points I cannot plot. Input akdj 12:00...

Why can I view some Unix executable files in Mac OS X and not others?

I am on a Macbook Pro on Mac OS X 10.10 (Yosemite). When I go to /usr/bin, git is there as a unix executable file. When I open it up in Sublime Text, all I get is unreadable machine code. However, when I open up a different Unix executable file—in...

Capitalize all files in a directory using Bash

I have a directory called "Theme_1" and I want to capitalize all the filenames starting with "theme". The first letter of every file name is lowercase and I want it to be upcase. I've tried this but all the letters in the filename are upcase, which is not what I...

Saying there are 0 arguments when I have 2? Trying to open a CSV file to write into

I'm trying to read from a CSV file and codify people into groups using an equation. I append the name of their group they fall into to the end of the array that their row creates. Then I write it to a new file so I don't overwrite the original...

Resampling and merging data frame with python

Hi I have created a dictionary of dataFrame with this code import os import pandas import glob path="G:\my_dir\*" dataList={} for files in glob.glob(path): dataList[files]=(read_csv(files,sep=";",index_col='Date')) The different dataframe present in the dictory have different time sample. An example of dataFrame(A) is Date Volume Value 2014-01-04 06:00:02 6062 108000.0 2014-01-04 06:06:05 6062...

AWK|BASH, use double FS and ternary operator

Is it possible? I was wondering how to do: Count fields differentiated by comma. Only the obtained first field of the previous step, count words differentiated by space. If there is more than 2 words, print NF, otherwise $0. Input cellular biol immunogenet, rosario escuela estadist, medellin medellin Expected output...

Bash alias function with predefined argument

I have an alias gl which is an wrapper for git log. Basically like this function gitLog(){ if [ $# -eq 0 ] then git log else git log -n $1 } alias gl=gitLog I want to add an alias which just calls gitLog with an argument like this alias...

Parse text from a .txt file using csv module

I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

Replace [a-z],[a-z] with [a-z], [a-z] and keep the letters

How can I replace [a-z],[a-z] with [a-z], [a-z] and keeping the letters? Input suny stony brook, stony brook,usa. Output suny stony brook, stony brook, usa. What I have tried sed 's/[a-z],[a-z]/[a-z], [a-z]/g' <<< "suny stony brook, stony brook,usa." sed 's/[a-z],[a-z]/, /g' <<< "suny stony brook, stony brook,usa." ...

Perl: Using Text::CSV to print AoH

I have an array of hashes (AoH) which looks like this: $VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)

I have a problem with the csv reader and writer in python. Whenever I try to take one CSV file and par down the number of columns from roughly 37 to 6, this is the kind of output I am getting. Example of one row: 0,"JOHNSON, JOHN J.",JOHN J. JOHNSON,TECH879,INSPECTION...

Create an external Hive table from an existing external table

I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to...

shell script cut from variables

The file is like this aaa&123 bbb&234 ccc&345 aaa&456 aaa$567 bbb&678 I want to output:(contain "aaa" and text after &) 123 456 I want to do in in shell script, Follow code be consider #!/bin/bash raw=$(grep 'aaa' 1.txt) var=$(cut -f2 -d"&" "$raw") echo $var It give me a error like...

Extra backslash when storing grep in a value

In a bash script I have: Check="grep -e '"'\(-S mount\)'"' /etc/audit/audit.rules" set -x When you run it it shows it as: CHECK='grep -e '\''\(-S mount\)'\'' /etc/audit/audit.rules' Now it works exactly what I want but I want to understand it. Why is there 2 extra \'s?...

Group instances based on NA values in r

I am reading a csv file and unfortunately my dataframe has many missing values. A small snip is as following: df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), Value= c(900, NA, 1300, 1100, NA), Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'), Num1 = c(2, NA, 3, 2, NA), Num2 = c(2,3,3,1,2),...

Bash modify CSV to change a field

I have a very big CSV file (aprox. 10.000 rows and 400 columns) and I need to modify certain columns (like 15, 156, 220) to change format from 20140321132233 to 2014-03-21 13:22:33. All fields that I need to modify are datetime. I saw some examples using awk but for math...

Python: can't access newly defined environment variables

I can't access my env var: import subprocess, os print os.environ.get('PATH') # Works well print os.environ.get('BONSAI') # doesn't work But the env var is well added in my /home/me/.bashrc: BONSAI=/home/me/Utils/bonsai_v3.2 export BONSAI And I can access this env var from a new terminal....

export mysql query array using fputcsv

I am trying to export the results of a sql query to CSV using fputcsv however keep getting the error "fputcsv() expects parameter 2 to be array, string given". I followed the answer given here - Query mysql and export data as CSV in PHP - but it throws the...

Replace improper commas in CSV file

This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the...

Macports switch PHP CLI version

I'm trying to switch my Terminal PHP version to 5.4 because I ran into some issues with Drush while updating my Drupal core. The reason for these issues is my Terminal PHP version is different then my localhost. php -v in Terminal returns PHP 5.5.13 (cli) but my localhost...

How do I check whether a file or file directory exist in bash?

I currently have this bash script (which is located in my home directory, i.e., /home/fusion809/ and I am running it as root as it's necessary for the icon copying lines): cd /home/fusion809/Pictures/Icon* declare -a A={Arch,Debian,Fedora,Mageia,Manjaro,OpenSUSE} declare -a B={Adwaita,Faenza,gnome,Humanity} for i in $A; do for j in $B; do if test...