awk,etl , Using blank-line delimited records and colon-separated fields in awk


Using blank-line delimited records and colon-separated fields in awk

Question:

Tag: awk,etl

I'd like to be able to work with a file in awk where records are separated by a blank line and each field consists of a name followed by a colon, some optional whitespace to be ignored/discarded, followed by a value. E.g.

Name: Smith, John
Age: 42

Name: Jones, Mary
Age: 38

Name: Mills, Pat
Age: 62

I understand that I can use RS="" to have awk understand the blank-lines as record separators and FS="\n" to split the fields properly. However, I'd like to then create an array of namevalue pairs that I can use for further processing of the form

if a["Age"] > 40 {print a["Name"]}

The order is usually consistent, but since it would be dumped in an associative array, the incoming order shouldn't matter or be assumed consistent.

How can I transform the data into an awk associative array with the least fuss?


Answer:

Method 1

We use split to split each field into two parts: the key and the value. From these, we create associative array a:

$ awk -F'\n' -v RS=  '{for (i=1;i<=NF;i++) {split($i,arr,/: /); a[arr[1]]=arr[2];} if (a["Age"]+0>40) print a["Name"];}' file
Smith, John
Mills, Pat

Method 2

Here, we split fields at either a colon or a newline. Then, we know that the odd numbered fields are keys and the even ones the values:

$ awk -F':|\n' -v RS=  '{for (i=1;i<=NF;i+=2) {a[$i]=$(i+1);} if (a["Age"]+0>40) print a["Name"];}' file
 Smith, John
 Mills, Pat

Improvement

Is there a chance that any record will be missing a value? If so, we should clear the array a between each record. In GNU awk, this is easy. We just add a delete statement:

awk -F':|\n' -v RS=  '{delete a; for (i=1;i<=NF;i+=2) {a[$i]=$(i+1);} if (a["Age"]+0>40) print a["Name"];}' file

For other awks, you may be required to delete the array one element at a time like:

for (k in a) delete a[k];

Related:


Split a column's values into multiple rows awk/unix/python


python,unix,awk
Here is a sample row of input data : header: id,indicator,{(pid,days_remaining)} row: id_558314,1,{(property_66021,7),(property_24444,1),(property_285395,6)} Expected output of data : header: id,indicator,pid,days_remaining row1: id_558314,1,property_66021,7 row2: id_558314,1,property_24444,1 row3: id_558314,1,property_285395,6 Is this something easily doable using awk/unix/python? Thanks in advance. Your time and help is appreciated....

Grep specific part of string from another file


awk,grep
I want to grep the first three digits of numbers in 1.txt from the first three digits after zeros in 2.txt. cat 1.txt 23456 12345 6789 cat 2.txt 20000023485 xxx888 20000012356 xxx888 20000067234 xxx234 Expected output 20000023485 xxx888 20000012356 xxx888 ...

AWK - Search for a pattern-add it as a variable-search for next line that isn't a variable & print it + variable


regex,linux,awk,sed,gawk
I have a given file: application_1.pp application_2.pp #application_2_version => '1.0.0.1-r1', application_2_version => '1.0.0.2-r3', application_3.pp #application_3_version => '2.0.0.1-r4', application_3_version => '2.0.0.2-r7', application_4.pp application_5.pp #application_5_version => '3.0.0.1-r8', application_5_version => '3.0.0.2-r9', I would like to be able to read this file and search for the string ".pp" When that string is found, it...

Taking multiple header (rows matching condition) and convert into a column


bash,perl,command-line,awk,sed
Hello I have a file that has multiple Headers in it that I need to have turned into column values. The file looks like this: Day1 1,Smith,London 2,Bruce,Seattle 5,Will,Dallas Day2 1,Mike,Frisco 4,James,LA I would like the file to end up looking like this: Day1,1,Smith,London Day1,2,Bruce,Seattle Day1,5,Will,Dallas Day2,1,Mike,Frisco Day2,4,James,LA The file...

Finding the average of a column excluding certain rows using AWK


linux,bash,awk,scripting
I want to calculate the average of the 5th column (last column) excluding the rows with the value "9999". Would appreciate your feedback. 77.300 16 1 3.6112914285714268 9.4 77.300 16 2 -0.001737142857145102 20.0 77.300 16 3 5.1570742857142857 8.9 77.300 17 0 3.6112914285714268 8.9 77.300 17 1 2.9484342857142849 11.7 77.300 17...

Search A Variable in Specific Column in BASH with awk


linux,bash,awk
I have LOG_FILE like below: LOG|Server301|2015|05|12|00|58|35|572617|0|COMP|002.01.003.00|DSOHMSN2ViewCallback.cpp|142|CALLBACK:DSOHMSN2ViewCallback::FLIPBypassOn Changed SN=2 old value =2 new value =1 LOG|Client2|2015|05|12|00|58|35|593088|0|COMP|002.01.003.00|AMSN.cpp|12577|CALLBACK:OHMOAM::ohmSN2ViewChange, Old FLIPBypassOnMT LOG|Server302|2015|05|12|00|58|35|593116|0|COMP|002.01.003.00|AMSN.cpp|12590|CALLBACK:OHMOAM::ohmSN4ViewChange, New FLIPBypassOnMT...

awk script to eliminate special characters


awk
My output looks like below judi# *diff -C 0 $* mountYday mountTday *** mountYday Sat Jun 13 02:57:09 2015 --- mountTday Sat Jun 13 02:59:49 2015 *************** *** 20 **** - /test on /dev/vx/dsk/test/test read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=48986a5 on Wed Apr 22 22:28:04 2015 --- 19 ---- judi# I need to get only...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk


linux,bash,awk,sed,sh
I am trying to use a script to append the host name at the end of a multi-line entry of a specific Host_Alias field in sudoers file. The current sudoers file has an entry similar to : Host_Alias srv_linuxestate= \ host10,host12,host13,host1,host50,\ host16,host1,host2,host11,host15,host21,\ host3,host14 My required output would be something like...

remove line from file if more than one pattern appears in different line


shell,awk,sed,file-processing
I have a file with a patter like this: 1 1 1 2 0 1 0.5 1 2 2 2 0 2 0.5 2 1 1 1 0 1 0.25 2 1 2 2 0 2 0.5 2 3 3 3 0 3 0.25 I want to remove a line...

awk ternay operator, count fs with ,


awk
How to make this command line: awk -F "," '{NF>0?$NF:$0}' to print the last field of a line if NF>0, otherwise print the whole line? Working data bogota dept math, bogota ...

Change a Script to a For Do Done Loop


linux,bash,for-loop,awk
I have a script that I need to turn into a loop, the script works exactly as I need it to (My Awk-fu is extremely weak, so be nice) when I put in the file name like the example below. #!/bin/bash awk -v FS='(<LastName>|<\/LastName>)' '{print $2}' 17822624.xml >> test.csv awk...

AWK|BASH, use double FS and ternary operator


bash,awk
Is it possible? I was wondering how to do: Count fields differentiated by comma. Only the obtained first field of the previous step, count words differentiated by space. If there is more than 2 words, print NF, otherwise $0. Input cellular biol immunogenet, rosario escuela estadist, medellin medellin Expected output...

Selecting unique lines based on two columns


unix,awk
I've been trying to figure out a way to select lines that are unique based on the values of two columns. For example, here is a sample/example of my file: chr1 10 12 chr1 10 12 chr1 10 11 chr1 9 12 chr2 15 20 And this is what I...

Bash, Using grep, sed or awk to extract section of text and then match


bash,awk,sed,grep
I have a text file and want to extract all interfaces matching "blue" random text random text random text random text random text int 1 random text blue random text random text int 2 random text random text red random text int 3 random text random text random text blue...

Finding columns with only white space in a text file and replace them with a unique separator


regex,r,bash,awk,sed
I have a file like this: aaa b b ccc 345 ddd fgt f u 3456 e r der der 5 674 As you can see the only way that we can separate the columns is by finding columns that have only one or more spaces. How can we identify...

Bash Scripting - AWK with Delimiter?


bash,unix,awk
I've got a header in a text file that looks like this: First-Name:Last-Name:City:Home-Phone:Cell-Phone What I need to do is display the Last-Name and the Cell-Phone number of each entry. I'm new to bash scripting, and I'm not really sure what I can use to do this. I was told awk...

Using AWK to parse fields with commas


regex,bash,shell,awk
Edited - TLDR: Using awk to parse fields that include commas. # original config file - confile1 $ cat confile1 list=( app1,"HOSTNAME - port - application name - alert1",99.0,99.0 app2,"HOSTNAME - port - application name - alert1",99.0,99.0 app3,"HOSTNAME - port - service name - alert2",99.0,99.0 web1,"URL - HOSTNAMES(01,02) - http://someurl.com/...

Using blank-line delimited records and colon-separated fields in awk


awk,etl
I'd like to be able to work with a file in awk where records are separated by a blank line and each field consists of a name followed by a colon, some optional whitespace to be ignored/discarded, followed by a value. E.g. Name: Smith, John Age: 42 Name: Jones, Mary...

Check for decimal point and add it at the end if its not there using awk/perl


regex,perl,shell,awk
I have test.dat file with values given below: 20150202,abc,,,,3625.300000,,,,,-5,,,,,,,,,,,,,,,,,,,,,, 20150202,def,,,,32.585,,,,,0,,,,,,,,,,,,,,,,,,,,,, 20150202,xyz,,,,12,,,,,0.004167,,,,,,,,,,,,,,,,,,,,,, My expected output is shown below: 20150202,abc,,,,3625.300000,,,,,-5.,,,,,,,,,,,,,,,,,,,,,, ^. added here 20150202,def,,,,32.585,,,,,0.,,,,,,,,,,,,,,,,,,,,,, ^. added here 20150202,xyz,,,,12.,,,,,0.004167,,,,,,,,,,,,,,,,,,,,,, ^. added here So if column 6 and 11 doesn't have decimal point in it, then we should add '.' at the end of...

How to find average and maximum in an interval using Shell [closed]


linux,bash,shell,unix,awk
I would like to extract sum, mean and average in each 6 numbers interval from a column. I found many discussions related to this problem, but all those are for whole column. e.g. To compute sum of a column: awk '{sum+=$1} END { print sum}' To calculate Average: awk '{sum+=$1}...

Extract lines from File2 already found File1


linux,text,awk,sed,grep
Using linux commandline, i need to output the lines from text file2 that are already found in file1. File1: C A G E B D H F File2: N I H J K M D L A Output: A D H Thanks!...

Suppressing system command called from awk script


windows,awk,system
I am currently running this script in Windows 7. So, I have a program that is meant to color-code output from another command (mkmk) and tally up varying numbers of errors and other notable stats, etc. So right now, it starts as a batch file which Turns off echo Sets...

Ignore first few lines and last few lines in a file Linux


linux,awk
I have a file like this and would like to print $0 except the first two and last three lines in linux. Tried awk command but no luck, is there any options I am using the following command - I suppose I am doing something wrong, but not able to...

Bash modify CSV to change a field


linux,bash,awk
I have a very big CSV file (aprox. 10.000 rows and 400 columns) and I need to modify certain columns (like 15, 156, 220) to change format from 20140321132233 to 2014-03-21 13:22:33. All fields that I need to modify are datetime. I saw some examples using awk but for math...

AWK write to new column base on if else of other column


linux,bash,shell,awk,sed
I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D. For example: if value in column D <0.7, value in column E =...

How to remove characters before and including an underscore?


linux,string,bash,unix,awk
In bash scripting what's an efficient way to do the following please? var="fooo_barrrr" What is the best way to remove all characters before and including the '_' so that var becomes "barrrr" please?...

Replace [a-z],[a-z] with [a-z], [a-z] and keep the letters


bash,awk,sed
How can I replace [a-z],[a-z] with [a-z], [a-z] and keeping the letters? Input suny stony brook, stony brook,usa. Output suny stony brook, stony brook, usa. What I have tried sed 's/[a-z],[a-z]/[a-z], [a-z]/g' <<< "suny stony brook, stony brook,usa." sed 's/[a-z],[a-z]/, /g' <<< "suny stony brook, stony brook,usa." ...

Awk Script to process data from a trace file


table,awk,rows,trace
I have a table (.tr file) with different rows (events). **Event** **Time** **PacketLength** PacketId sent 1 100 1 dropped 2 100 1 sent 3 100 2 sent 4.5 100 3 dropped 5 100 2 sent 6 100 4 sent 7 100 5 sent 8 100 6 sent 10 100 7...

Extracting columns within a range AWK


unix,awk
I've been trying to extract lines where a number in one columns falls within a particular range. Lets say I have a file that looks as so: chrom prediction chrom1 0.75 chrom2 0.5 chrom4 0.76 If I wanted to print lines where the prediction value was in the range from...

awk — adding a new delimiter to the default space delimiter


unix,awk
awk default delimiter space treats any amount of space between two fields as equivalent.. echo "1 2"|awk '{for (i=1;i<=NF;i++) print $i}' #which gives the result (two spaces between 1 and 2) 1 2 How can I add "=" to this existing delimiter? I have tried the following and that has...

how to deletes line from a text file that are taken from another file [duplicate]


shell,awk,sed,grep,sh
This question already has an answer here: Remove duplicates from text file based on second text file 4 answers I have a data.txt file with a lot of lines in it and a lines.txt that contains some lines. I want to delete all lines from data.txt that match any...

AWK count number of times a term appear with respect to other columns


linux,shell,command-line,awk,sed
Given a CSV file: id, fruit, binary 1, apple, 1 2, orange, 0 3, pear, 1 4, apple, 0 5, peach, 0 6, apple, 1 How can i calculate for each unique values in fruit, the number of times the binary value =1 / number of occurences of that fruit...

Repeating the format specifiers in awk


awk,printf,gawk
I am trying to format the output of the AWK's printf() function. More precisely, I am trying to print a matrix with very long rows and I would like to wrap them and continue on the next line. What I am trying to do is best illustrated using Fortran. Consider...

BASH - conditional sum of columns and rows in csv file


linux,bash,csv,awk
i have CSV file with some database benchmark results here is the example: Date;dbms;type;description;W;D;S;results;time;id Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99...

Extracting Substring from File Name


string,awk,substring,extract,cut
I have a list of files with the following file name format: [some unknown amount of characters][_d][yyyymmdd][some unknown amount of characters] I want to extract the substring that contains the date (yyyymmdd) which I know will always be proceeded by "_d". So basically I want to extract the first 8...

if a string exist (including a variable), flip it using awk or sed


bash,awk,sed
I have a csv file like this: ,College Level Math 55,Elementary Algebra 112 ,Elementary Algebra 79, ,College Level Math 102,Elementary Algebra 54 ,,College Level Math 54 I need an awk or sed command that does the following if College Level Math *,Elementary Alegrbra * exist flip it so it looks...

Print the last 1,2,3..Nth or first 1,2,3…Nth matching block pattern using awk or sed


bash,awk,sed
pattern1 a b pattern2 cd pattern1 re pattern2 gh pattern1 ef pattern2 qw e I can show all matching pattern by sed -n '/pattern1/,/pattern2/p' Choose the second matching pattern or any Nth by awk -vM=2 '(x+=/pattern1/)==M&&x+=/pattern2/' file pattern1 re pattern2 Print only last matching pattern by awk 'x+=/pattern1|pattern2/{!y++&&B="";B=B?B"\n"$0:$0;x==2&&y=x=0}END{print B}' file...

Convert AWK command to sqlite query


sql,awk,sqlite3
I have to parse very big (abt 40Gb) text files(logs) very often. Usually AWK/grep is enough for my needs, but logs are growing and now I'm curious whether sqlite3 will allow me to do the same things more efficiently. I chosen sqlite for my tests as it installed out-of-the-box (SQLite...

How to match and change strings in a column of a semicolon separated file?


regex,awk,sed,gawk
I have a semicolon separated csv-file which looks like this: column1;column2;;123564;128;;IJL;value;;;;;3705;;;;;;;; column1;column2;;26789786413423;;CCE;value value;;;;;;3705;;;;;;;; column1;column2;;4564564;128;;SSE;value;;;;;;;;;;;;; column1;column2;;4645646;128;;JJY;someting X;;;;;;;;;;;;; column1;column2;;123132;128;;ASA;X value;;;;;;;;;;;;; column1;column2;;45643123;128;;TT;9 someting;;;;;;;;;;;;; column1;column2;;456464;128;;KK;VALUE 9 VALUE;;;;;;;;;;;;; column1;column2;;4646;128;;ST;value 6;;;;;;;;;;;;;...

Using a command-line utility to perform the following map-updates


shell,command-line,awk,terminal
I'm a complete newbie to using command-line utilities and am wondering how to process information as following: mapping.txt: 80 001 002 81 011 012 013 014 82 021 022 ... input.txt: 81 103823044 80 103823054 81 103823064 ... Desired output.txt: 103823044|011| 103823044|012| 103823044|013| 103823044|014| 103823054|001| 103823054|002| 103823064|011| 103823064|012| 103823064|013| 103823064|014|...

Access a bash array in awk loop


arrays,bash,awk
I have a bash array like myarray = (1 2 3 4 5 ... n) Also I am reading a file with an input of only one line for example: 1 2 3 4 5 ... n I am reading it line by line into an array and printing it...

using awk for subtraction


awk,gawk
Similar post here: awk if statement with simple math Below is great but now I need to subtract 20 from field $4 if it's less than 20 and if its greater than 20, field five can be set to 0. 31590,Foo,70,3.5 28327,Bar,291,14.55 25155,Baz,583,29.15 24179,Food,694,34.7 28670,Spaz,67,3.35 22190,bawk,4431,132.93 29584,alfred,142,7.1 27698,brian,379,18.95 24372,peter,22,1.1 25064,weinberger,8,.04...

loop add a comma after nth comma using awk


bash,awk
I feel like this should be a fairly straight forward question, but I cant seem to get it to work. I have a csv file and I need to add comma after the nth comma in each row. I believe I have to use gsub to get it to loop....

Remove part of a column, if its in a specific Column number. (The column has a variable)


bash,awk,sed
I have a csv with lines like this: Last,First,A00XXXXXX,1492-01-10,2015-06-17,,Sentence Skills 104,,Elementary Algebra 38, Last,First,A00XXXXXX,1492-01-10,2015-06-17,,,,Elementary Algebra 101,College Level Math 56 Last,First,A00XXXXXX,1492-01-10,2015-06-17,Reading Comprehension 102,,,, Last,First,A00XXXXXX,1492-01-10,2015-06-17,,,,Elementary Algebra 118,College Level Math 97 I want to remove the word "Reading Comprehension" but leave the number, but only if its in column 6, if its in...

Match repetitive data from file with second column from second file


arrays,bash,awk,sed
Hello I have a file with repetitive data as such: ENGLAND ENGLAND ENGLAND JAPAN JAPAN JAPAN JAPAN AMERICA AMERICA AMERICA And a second file with unique data that has two columns(separated by "=" ), with the first column being considered a key: ENGLAND=LONDON JAPAN=TOKYO AMERICA=WASHINGTON DC AUSTRALIA=SYDNEY IRELAND=DUBLIN I am...

How to split a CSV file into multiple files based on column value


bash,csv,awk
I have CSV file which could look like this: name1;1;11880 name2;1;260.483 name3;1;3355.82 name4;1;4179.48 name1;2;10740.4 name2;2;1868.69 name3;2;341.375 name4;2;4783.9 there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will...

Indent and wrap consecutive matching lines with string


regex,bash,awk,sed,markdown
I would like to convert a predictably-formatted file containing code snippets into Markdown. The file looks like this: MY CODE SNIPPETS 2015-05-01 This file contains useful code snippets for every day usage in the Linux command line. SED sed 's/\(.*\)1/\12/g' # Modify anystring1 to anystring2 sed '/^ *#/d; /^ *$/d'...

find numbers divisible by 3 in csv file using shell script


bash,shell,unix,awk
I have csv file having content like below : 1|2|3 4|5|6 7|8|9 Now I would like to find the numbers which are divisible by 3 using shell scripting. I would like to use awk command for this. I am learning shell scripting. So could you please help me out to...

awk not capturing first line / separator


bash,awk
I don't understand the following behaviour: This is a text file: example.txt 12345 4321 hello hello this is a test blobb 14324 2131 another test , incoming ! blubb 52341 1231 last test now shutting down bla ... It consists of x rows of text, 4 tab-separated columns each. I...