bash,csv,awk , How to remove duplicate comma separate strings using awk


How to remove duplicate comma separate strings using awk

Question:

Tag: bash,csv,awk

I have a csv file like this: (named test2.csv)

lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,

I want to remove the duplicate entries

The closest I have got is the following awk command

awk '{a[$0]++} END {for (i in a) print RS i}' RS="," test2.csv

it works but causes new problems, it take the values out of order and puts them in rows like this:

,Elementary Algebra 38
,2015-05-07 15:30:43
,Sentence Skills 104
,FirstName
,LastName
,1997-05-20
,83494989

I need to keep the order they are in and keep them in one line ( I can fix the row issue, but don't know how to fix the order issue)

Update with Solution:

The answer from anubhava worked great, I added a question about removing the time from the date and Ed Morton helped out with that, here is the full query

awk 'BEGIN{RS=ORS=","} {sub(/ ..:..:..$/,"")} !seen[$0]++' test2.csv

Answer:

You can just use this awk:

awk 'BEGIN{RS=ORS=","} !seen[$0]++' test2.csv
lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Elementary Algebra 38,

Related:


Python: can't access newly defined environment variables


python,bash,environment-variables
I can't access my env var: import subprocess, os print os.environ.get('PATH') # Works well print os.environ.get('BONSAI') # doesn't work But the env var is well added in my /home/me/.bashrc: BONSAI=/home/me/Utils/bonsai_v3.2 export BONSAI And I can access this env var from a new terminal....

Bash modify CSV to change a field


linux,bash,awk
I have a very big CSV file (aprox. 10.000 rows and 400 columns) and I need to modify certain columns (like 15, 156, 220) to change format from 20140321132233 to 2014-03-21 13:22:33. All fields that I need to modify are datetime. I saw some examples using awk but for math...

How to rearrange CSV / JSON keys columns? (Javascript)


javascript,json,csv,papaparse
I am converting a JSON object array to CSV using Papa Parse JavaScript Library. Is there a way to have the CSV columns arranged in a certain way. For e.g; I get the column as: OrderStatus, canOp, OpDesc, ID, OrderNumber, FinishTime, UOM, StartTime but would like to be arranged as:...

While loop in bash using variable from txt file


linux,bash,rhel
I am new to bash and writing a script to read variables that is stored on each line of a text file (there are thousands of these variables). So I tried to write a script that would read the lines and automatically output the solution to the screen and save...

Entity framework to CSV float values


c#,entity-framework,csv
I'm trying to write an object to csv, the thing is that my objects have float valure for exemple (14,9) i want to change them to (14.9) so it won't cause any problem with the csv format string csv = ""; using (var ctx = new NBAEntities2()) { var studentList...

bash interactive script pass input


bash,command-line-arguments
I'm running the following interactive Jar. java -jar script.jar argument-line-here \n Now I'm creating a bash script which runs the jar file. How do I pass the argument line and the conformation "\n" to this interactive script? these are input lines for the script. This question has some answers, expect...

Run 3 variables at once in a python for loop.


python,loops,variables,csv,for-loop
For loop with multiple variables in python 2.7. Hello, I am not certain how to go about this, I have a function that goes to a site and downloads a .csv file. It saves the .csv file in a particular format: name_uniqueID_dataType.csv. here is the code import requests name =...

Create an external Hive table from an existing external table


csv,hadoop,hive
I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to...

Python: isolating re.search results


python,regex,csv
So I have this code (probably super inefficient, but that's another story) that is pulling urls from html code of a blog. I have the html in a .csv, which I am putting into python, then running the regex to get the urls. Here is the code: import csv, re...

Replace improper commas in CSV file


regex,r,csv
This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the...

How do I check whether a file or file directory exist in bash?


bash,if-statement
I currently have this bash script (which is located in my home directory, i.e., /home/fusion809/ and I am running it as root as it's necessary for the icon copying lines): cd /home/fusion809/Pictures/Icon* declare -a A={Arch,Debian,Fedora,Mageia,Manjaro,OpenSUSE} declare -a B={Adwaita,Faenza,gnome,Humanity} for i in $A; do for j in $B; do if test...

how to modify an array value with given index?


arrays,linux,bash
I want to modify an array cell, which I can do when I know the cell as a number. However here my cell position is given by $i. pomme[`${i}`]="" I tried without the `` and it doesn't work either? How am I suppose to do it?...

print filenames into scripts in bash


bash,printf,echo
How do you print each name of a file from a directory into a string and make make new scripts? to print each file name for i in `ls new_manifest*`; do echo $i; done but when I try and print the rest of the string with $i like this is...

Parse text from a .txt file using csv module


python,python-2.7,parsing,csv
I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

linux - running a process and tailing a file simultaneously


bash,shell,tail
I want a run a long task on a remote machine (with python fabric using ssh). It logs to a file on the remote machine. What I want to do is to run that script and tail (actively display) the log file content until the script execution ends. The problem...

Read CSV and plot colored line graph


python,csv,matplotlib,graph,plot
I am trying to plot a graph with colored markers before and after threshold value. If I am using for loop for reading the parsing the input file with time H:M I can plot and color only two points. But for all the points I cannot plot. Input akdj 12:00...

Aggregate failure codes in bash


bash
I've got a script which has multiple stages, and at each stage it's possible to fail, but the script can carry on running. Concretely, I generate some json, and check if the diff is correct. The diff could be wrong, but it doesn't stop the next stage of json being...

Identifying when a file is changed- Bash


bash,shell,unix
So in my bash shell script, I have it running through a for loop. Inside the for loop, I use find "$myarray[i]" >> $tmp to look for a certain directory each time through the loop. Sometimes, it finds the variable in myarray[i] and sometimes it doesn't. When it does find...

CSV File header part is coming parsing by LINQ


c#,linq,csv
the below way i am parsing csv file by LINQ but i found header part is coming when i inspect user class data. what is wrong there in code. var csvlines = File.ReadAllLines(filename); // IEnumerable<string> var csvLinesData = csvlines.Select(l => l.Split(',').Skip(1).ToArray()); // IEnumerable<string[]> int flag = 0; var users =...

How to pivot array into another array in Ruby


arrays,ruby,csv
I have a multidimensional array like this one : myArray = [["Alaska","Rain","3"],["Alaska","Snow","4"],["Alabama","Snow","2"],["Alabama","Hail","1"]] I would like to end up with CSV output like this. State,Snow,Rain,Hail Alaska,4,3,nil Alabama,2,nil,1 I know that to get this outputted to CSV the way I want it I have to have output array like this: outputArray =[["State","Snow","Rain","Hail"],["Alaska",4,3,nil],["Alabama",2,nil,1]]...

Bash script that removes C source comments


bash
I need to write a bash script which copies all .c files from a directory $2 to another directory $1, but without comments. I only have to remove comments that begin with //, might have tabs/spaces before the comment, but not letters. Also, I need to do it with only...

Capitalize all files in a directory using Bash


osx,bash,rename
I have a directory called "Theme_1" and I want to capitalize all the filenames starting with "theme". The first letter of every file name is lowercase and I want it to be upcase. I've tried this but all the letters in the filename are upcase, which is not what I...

Panda's Write CSV - Append vs. Write


python,csv,pandas
I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command: df.to_csv('filename.csv',mode = 'a',header ='column_names') The write or append succeeds, but it seems like the header is written every time an append takes...

Group instances based on NA values in r


r,file,csv,instance,na
I am reading a csv file and unfortunately my dataframe has many missing values. A small snip is as following: df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), Value= c(900, NA, 1300, 1100, NA), Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'), Num1 = c(2, NA, 3, 2, NA), Num2 = c(2,3,3,1,2),...

AWK|BASH, use double FS and ternary operator


bash,awk
Is it possible? I was wondering how to do: Count fields differentiated by comma. Only the obtained first field of the previous step, count words differentiated by space. If there is more than 2 words, print NF, otherwise $0. Input cellular biol immunogenet, rosario escuela estadist, medellin medellin Expected output...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk


linux,bash,awk,sed,sh
I am trying to use a script to append the host name at the end of a multi-line entry of a specific Host_Alias field in sudoers file. The current sudoers file has an entry similar to : Host_Alias srv_linuxestate= \ host10,host12,host13,host1,host50,\ host16,host1,host2,host11,host15,host21,\ host3,host14 My required output would be something like...

shell script for counting replacements


bash,replace,count
Ten files located in a directory. Each file content has "apple" word. write a bash shell script to replace "apple" with "banana" in all ten files and Print the number of replacements for each file. Have tried in this way but dont know how to get number of replacement. can...

shell script cut from variables


bash,shell,shellcode
The file is like this aaa&123 bbb&234 ccc&345 aaa&456 aaa$567 bbb&678 I want to output:(contain "aaa" and text after &) 123 456 I want to do in in shell script, Follow code be consider #!/bin/bash raw=$(grep 'aaa' 1.txt) var=$(cut -f2 -d"&" "$raw") echo $var It give me a error like...

Extra backslash when storing grep in a value


linux,bash
In a bash script I have: Check="grep -e '"'\(-S mount\)'"' /etc/audit/audit.rules" set -x When you run it it shows it as: CHECK='grep -e '\''\(-S mount\)'\'' /etc/audit/audit.rules' Now it works exactly what I want but I want to understand it. Why is there 2 extra \'s?...

How do I silence the HEAD of a curl request while using the silent flag?


bash,shell,curl,command-line,pipe
When I run the curl command and direct the data to a file, I get back the content of the site as expected. $ curl "www.site.com" > file.txt $ head file.txt Top of site ... However, this command shows a progress bar, which I do not want: % Total %...

Why can I view some Unix executable files in Mac OS X and not others?


git,bash,shell,unix,binary
I am on a Macbook Pro on Mac OS X 10.10 (Yosemite). When I go to /usr/bin, git is there as a unix executable file. When I open it up in Sublime Text, all I get is unreadable machine code. However, when I open up a different Unix executable file—in...

type conversion performance optimizable?


c#,xml,csv,optimization,type-conversion
The following snippet converts xml data to csv data in a data processing application. element is a XElement. I'm currently trying to optimize the performance of the application and was wondering if I could somehow combine the two operations going on below: Ultimately I still want access to the string...

Bash script using sed acts differently when passing variable


bash,sed
I have a script that I am writing that checks a value and then based on the value modifies it. I am trying to understand why it works this one way and not the other. Based on the google and stackoverflow searches I did, nothing really fits what I am...

Matching string inside file and returning result


regex,string,bash,shell,grep
I've got a few peculiar issues with trying to search for a string inside of a .db file. The way I tried was by using grep, which does apparently find the string(s), although this is the output: $ grep "ext" *.db Binary file enormous.db matches There are a couple problems...

Bash alias function with predefined argument


bash
I have an alias gl which is an wrapper for git log. Basically like this function gitLog(){ if [ $# -eq 0 ] then git log else git log -n $1 } alias gl=gitLog I want to add an alias which just calls gitLog with an argument like this alias...

Why does `sort file > file` result in an empty file? [duplicate]


bash
This question already has an answer here: bash redirect input from file back into same file 7 answers When you try to sort a file in-place with sort afile > afile you silently end up with afile being an empty file. Why is that? I'd expect either an error...

Resampling and merging data frame with python


python,csv,pandas,resampling,merging-data
Hi I have created a dictionary of dataFrame with this code import os import pandas import glob path="G:\my_dir\*" dataList={} for files in glob.glob(path): dataList[files]=(read_csv(files,sep=";",index_col='Date')) The different dataframe present in the dictory have different time sample. An example of dataFrame(A) is Date Volume Value 2014-01-04 06:00:02 6062 108000.0 2014-01-04 06:06:05 6062...

using sed to replace a line with back slashes in a shell script


regex,bash,shell,ssh,sed
I am trying to replace the bottom one of these 2 lines with sed in a file. <rule>out_prefix=orderid ^1\\d\+ updatemtnotif/</rule>\n\ <rule>out_prefix=orderid ^2\\d\+ updatemtnotif/</rule>\n\ And the following command seems to do that when executed as a command at the bash prompt sed -i '[email protected]_prefix=orderid ^2\\\\d\\+ updatemtnotif/@out_prefix=orderid ^2\\\\d\\+ updatemtnotif_fr/@g' /opt/temp/rules.txt however, when...

Replace [a-z],[a-z] with [a-z], [a-z] and keep the letters


bash,awk,sed
How can I replace [a-z],[a-z] with [a-z], [a-z] and keeping the letters? Input suny stony brook, stony brook,usa. Output suny stony brook, stony brook, usa. What I have tried sed 's/[a-z],[a-z]/[a-z], [a-z]/g' <<< "suny stony brook, stony brook,usa." sed 's/[a-z],[a-z]/, /g' <<< "suny stony brook, stony brook,usa." ...

AWK write to new column base on if else of other column


linux,bash,shell,awk,sed
I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D. For example: if value in column D <0.7, value in column E =...

Macports switch PHP CLI version


php,bash,drupal,terminal,macports
I'm trying to switch my Terminal PHP version to 5.4 because I ran into some issues with Drush while updating my Drupal core. http://drupal.stackexchange.com/questions/112090/drush-command-errors The reason for these issues is my Terminal PHP version is different then my localhost. php -v in Terminal returns PHP 5.5.13 (cli) but my localhost...

export mysql query array using fputcsv


php,mysql,arrays,csv,fputcsv
I am trying to export the results of a sql query to CSV using fputcsv however keep getting the error "fputcsv() expects parameter 2 to be array, string given". I followed the answer given here - Query mysql and export data as CSV in PHP - but it throws the...

Perl: Using Text::CSV to print AoH


arrays,perl,csv
I have an array of hashes (AoH) which looks like this: $VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Assign and use of a variable in the same subshell


bash,scope,subshell
I was doing something very simple like: v=5 echo "$v" and expected it to print 5. However, it does not. The value that was just set is not available for the next command. I recently learnt that "In most shells, each command of a pipeline is executed in a separate...

Convert strings of data to “Data” objects in R [duplicate]


r,date,csv
This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

How to change svn:externals from bash file non-interactive


bash,svn,svn-externals
I have multi externals need to be set within a file externals.txt and I attempt to change the svn:externals from a bash: svn pe svn:externals svn://hostname/branchname -F extenals.txt But the command throws out an error: svn: E205007: None of the environment variables SVN_EDITOR, VISUAL or EDITOR are set, and no...

Saying there are 0 arguments when I have 2? Trying to open a CSV file to write into


ruby,file,csv,dir
I'm trying to read from a CSV file and codify people into groups using an equation. I append the name of their group they fall into to the end of the array that their row creates. Then I write it to a new file so I don't overwrite the original...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)


python,csv
I have a problem with the csv reader and writer in python. Whenever I try to take one CSV file and par down the number of columns from roughly 37 to 6, this is the kind of output I am getting. Example of one row: 0,"JOHNSON, JOHN J.",JOHN J. JOHNSON,TECH879,INSPECTION...

How can I use a variable to get an Input$ in Shiny?


r,variables,csv,shiny
I am new to R and I am creating a shiny application to read a csv and filter data. I am reading the csv file, then creating dropdowns with a loop using the column names and the unique values: output$dropdowns <- renderUI({ if (is.null(x())) { return(NULL) } lapply(1:ncol(x()), function(i) {...

How to extract first letters of dashed separated words in a bash variable?


linux,string,bash,shell,variables
I would like to extract the first letter of dashed separated words value of my bash variable, like this: MY_TEXT=this-is-my-custom-text I would like to create a second variable like this: MY_INITIALS=timct...