FAQ Database Discussion Community


awk: how to append lines in a file until finding a file separator symbol?

windows,awk,gawk
Am running GnuWin32 under Windows 7. Have file with this structure: |<text_0> <text_1> <text_2> until <text_16> |<text_0> <text_1> <text_2> until <text_12> |<text_0> <text_1> <text_2> until <text_31> < more of the same > There is a variable number of lines between lines that begin with the pipe (the separator symbol). Desired...

calculating sum and average only for selected data set only

awk,gawk
I have a dataset as below: col-1 col-2 col-3 col-4 col-5 col-6 col-7 col-8 0 17 215 55.7059 947 BMR_42 O22-BMR_1 O23-H23 1 1 1 1.0000 1 BMR_42 O23-BMR_1 O23-H23 2 31 3 1.0968 34 BMR_31 O22-BMR_1 O26-H26 3 11 2 1.0909 12 BMR_31 O13-BMR_1 O26-H26 4 20 5 1.8500...

Multiple awk programs in one go

awk,sed,gawk
Say for data file sampleTest1.dat Type,^*p,[email protected]_!d,C^*[email protected]^*de,[email protected]_Pr^*f!t_Center,USF_BrC^*de,D!v_Nbr,[email protected]^*rt,[email protected]^*per,@ddress1,@ddress2,_C!ty,Reg!^*n,P^*[email protected]^*de,C^*untry,Ph^*ne,@C_Type,[email protected][email protected][email protected] @C,1,1220,1410,US0001,,,[email protected]^*N,"_ [email protected]^*N, LLC",44 [email protected] STREET,,[email protected]^*N,MS,320683,[email protected],60135411,B,Y @C,3,1225,1400,US0003,,,[email protected]^*NV!LLE,"_ [email protected]^*NV!LLE, !NC.",15 LEW!S [email protected] R^*@D,,[email protected]^*NV!LLE,FL,32540000,U[email protected],,B,Y @C,4,1095,1400,US0004,,,[email protected]@#L [email protected]!F^*[email protected],"_ [email protected] [email protected]!F^*[email protected],...

Using Gawk and Printf in a Bash script

bash,scripting,printf,file-handling,gawk
I am trying to separate a file into smaller files with gawk and rename the smaller files in order from the original file. for i in *.txt do gawk -v RS="START_of_LINE_to_SEPARATE" 'NF{ print RS$0 > "new_file_"++n".txt"}' $i done The output gives me: new_file_1.txt new_file_2.txt ect... I would like the output...

initialising field seperators on condition in awk

awk,gawk
I know that initialising FS in BEGIN is the correct practice but what if i need different field seperators for different lines(lines containing a particular pattern)? eg: my awk script is {if($0 ~ /.*youtube.*/){FS="=";print $2}} This code is not processing the first line.How to fix this?...

ioStat and Awk Output and Format with Megabyte and end of each field

awk,format,gawk,iostat,megabyte
Would it be possible to add a "Mb" to the end of just MB_r/s and MB_wn/s. Awk is getting 3 fields and reporting them during the test line to line like below: example: Format output below: ^ ^ 90, 11 Kb, 12 Kb, 101253, 1.45, 1890.77, 427911.58 74, 11 Kb,...

comparison of two file & mark changed fields.

awk,gawk
this is a extended question from awk-to-compare-two-file-by-identifier-output-in-a-specific-format As specified let's say two file have some difference file 1 a||d||f||a 1||2||||4 file 2 a||d||f||a 1||1||3||4 1||2||r||f where the desired output will be 1||1#2||3#||4 1||2||r||f where you can see i only want to compare the files & print accordingly such as if...

normalize column data with maximum value of that column

awk,gawk
I have a data file with two columns. I want to find out the maximum data value from the second column and divide each entries of second column witht he maximum value. (So I will get all the entries in second column <= 1.00). I tried with this command below:...

awk to process the first two lines then the next two and so on

bash,unix,awk,gawk
Suppose i have a very file which i created from two files one is old & another is the updated file by using cat & sort on the primary key. File1 102310863||7097881||6845193||271640||06007709532577|||| 102310863||7097881||6845123||271640||06007709532577|||| 102310875||7092992||6840808||023740||10034500635650|||| 102310875||7092992||6840818||023740||10034500635650|||| So pattern of this file is line 1 = old value & line 2 =...

awk if statement with simple math

unix,awk,gawk
I'm just trying to do some basic calculations on a CSV file. Data: 31590,Foo,70 28327,Bar,291 25155,Baz,583 24179,Food,694 28670,Spaz,67 22190,bawk,4431 29584,alfred,142 27698,brian,379 24372,peter,22 25064,weinberger,8 Here's my simple awk script: #!/usr/local/bin/gawk -f BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"; OFS=","; OFMT="%.2f"; } NR > 1 END { if ($3>1336) $4=$3*0.03; if ($3<1336) $4=$3*0.05;}1** Wrong output: 31590,Foo,70...

Trim first N bases in multi fasta file with awk and print with max width format

awk,gawk,fasta
Background The multi fasta format contains several record of sequences, each record begins with a single-line description, followed by several lines of sequence (RNA, DNA, protein). The description line has greaterthan symbol in the beginning, following ">" is the identifier of the sequence, and the rest of the line contains...

Add filename to appended text file in shell script above pasted text

osx,bash,shell,terminal,gawk
Working on a script (Mac OSX Yosemite, working in Terminal), where I analyse a lot of files and where I want to add the output of the script in one text file. gawk -f ./user/filter.dat ./user/outputpre1.out >> ./user/outputpost.out gawk -f ./user/filter.dat ./user/outputpre2.out >> ./user/outputpost.out For interpretation finding the source of...

use awk to group and calculate data

csv,awk,gawk,bsd
I'd like to use gawk to group some data and do calculations on a csv file. Raw sample data: 2600,AEIOU-2600,stack,2,04/01/2015,C C S,S,10.65 2600,AEIOU-2600,stack,3,04/20/2015,C C R,S,100 2600,AEIOU-2600,stack,1,04/28/2015,C C R,S,1.07 2600,AEIOU-2600,stack,4,04/29/2015,C C R,S,200 2601,"over, L.P. - 00001",stack,0,04/01/2015,C C S,s,50 2601,"over, L.P. - 00001",stack,1,04/01/2015,C C S,s,16.43 2601,"over, L.P. - 00001",stack,2,04/10/2015,D C S,s,17.16 2602,UEIA,stack,2,04/19/2015,C...

Using awk on multiple input files

bash,awk,gawk
There's a bash script I've been working on and within this script at some point, I have been trying to figure out how to process two CSV files at once using awk, which will be used to produce several output files. Shortly, there's a main file which keeps the content...

Repeating the format specifiers in awk

awk,printf,gawk
I am trying to format the output of the AWK's printf() function. More precisely, I am trying to print a matrix with very long rows and I would like to wrap them and continue on the next line. What I am trying to do is best illustrated using Fortran. Consider...

Calculating ratio value within a line which contain binary numbers “0” & “1”

awk,gawk
I have a data file which contain more than 2000 lines and 45001 columns. The first column is actually a "string" which explains the data type. Start from column #2, up to column #45001, the data is reprsented as "1" or "0" For example, the pattern of data in a...

awk to compare two file by identifier & output in a specific format

awk,pattern-matching,gawk
I have 2 large files i need to compare all pipe delimited file 1 a||d||f||a 1||2||3||4 file 2 a||d||f||a 1||1||3||4 1||2||r||f Now I want to compare the files & print accordingly such as if any update found in file 2 will be printed as updated_value#oldvalue & any new line added...

awk: how to include file names when concatenating files?

regex,awk,gawk
Am running GNUwin32 under windows 7. Have many files in a single directory with file names that look like this: chem.001.txt chem.002.b4.txt chem.003.md6.txt (more files.txt) ... In their current form, none of the files includes the file name. Need to clean these files for further use. Want to concatenate all...

awk calculations based on field value

csv,awk,gawk,bsd
BIG EDIT I had a misunderstanding on the calculations so I'll need to re-do this. Raw sample data: 2600,AEIOU-2600,stack,01,04/28/2015,C C R,S,1.07 2600,AEIOU-2600,stack,02,04/01/2015,C C S,S,10.65 2601,"over, L.P. - 00001",stack,01,04/01/2015,C C S,s,16.43 2601,"over, L.P. - 000001",stack,02,04/01/2015,D C S,s,17.16 2602,UEIA,stack,01,04/28/2015,C C R,s,10 2602,UEIA,stack,02,04/28/2015,C C R,s,20 Field $1:numeric Field $2: name Field $3: account...

awk to match two files and output the difference with column name & value

awk,gawk
I want to compare two files with awk and output the unmatched columns. what i have tried till now awk -F',' 'FNR==NR{ for(i=0;i<=NF; i++){ a[i]; }next; } for (i=1;i<=NF;i++){ if($i in a){ {printf("Match: %s , col-> %d" $i,i-1)} else {printf("UN- Match: %s , col-> %d" $i,i-1) } } }' but...

awk error: each rule must have a pattern or an action part

linux,gawk
From here I got the following command: awk '/em1/ {i++; rx[i]=$2; tx[i]=$10}; END{print rx[2]-rx[1] " " tx[2]-tx[1]}' \ <(cat /proc/net/dev; sleep 1; cat /proc/net/dev) which is fully working as intended. However, I would the output in Mbps, so I created 2 commands, one for upload, one for download (both working):...

AWK: go through the file twice, doing different tasks

awk,gawk
I am processing a fairly big collection of Tweets and I'd like to obtain, for each tweet, its mentions (other user's names, prefixed with an @), if the mentioned user is also in the file: users = new Dictionary() for each line in file: username = get_username(line) userid = get_userid(line)...

how to use regular expression in awk or sed, for find all homopolymers in DNA sequence?

regex,awk,sed,bioinformatics,gawk
Background Homopolymers are a sub-sequence of DNA with consecutives identical bases, like AAAAAAA. Example in python for extract it: import re DNA = "ACCCGGGTTTAACCGGACCCAA" homopolymers = re.findall('A+|T+|C+|G+', DNA) print homopolymers ['A', 'CCC', 'GGG', 'TTT', 'AA', 'CC', 'GG', 'A', 'CCC', 'AA'] my effort I made a gawk script that solves the...

How to use “awk” command to retrieve particular field, but the field number is given in runtime?

bash,awk,gawk
Im struggling to solve this. ./analyze.sh 9 Inside Script: lField=$1 (Command line argument) cat "$(pwd)/results1/DSFTPTCPstats.6" | awk '{ print $lField }' I want the values of the 9th field but it doesnt work. Tried different combinations in the 'print' section , cant solve... Would be really helpful. Thanks...

using awk for subtraction

awk,gawk
Similar post here: awk if statement with simple math Below is great but now I need to subtract 20 from field $4 if it's less than 20 and if its greater than 20, field five can be set to 0. 31590,Foo,70,3.5 28327,Bar,291,14.55 25155,Baz,583,29.15 24179,Food,694,34.7 28670,Spaz,67,3.35 22190,bawk,4431,132.93 29584,alfred,142,7.1 27698,brian,379,18.95 24372,peter,22,1.1 25064,weinberger,8,.04...

How to match and change strings in a column of a semicolon separated file?

regex,awk,sed,gawk
I have a semicolon separated csv-file which looks like this: column1;column2;;123564;128;;IJL;value;;;;;3705;;;;;;;; column1;column2;;26789786413423;;CCE;value value;;;;;;3705;;;;;;;; column1;column2;;4564564;128;;SSE;value;;;;;;;;;;;;; column1;column2;;4645646;128;;JJY;someting X;;;;;;;;;;;;; column1;column2;;123132;128;;ASA;X value;;;;;;;;;;;;; column1;column2;;45643123;128;;TT;9 someting;;;;;;;;;;;;; column1;column2;;456464;128;;KK;VALUE 9 VALUE;;;;;;;;;;;;; column1;column2;;4646;128;;ST;value 6;;;;;;;;;;;;;...

Stable sorting two files into one with the duplicates

bash,sorting,unix,awk,gawk
I've been trying to sort two files and get the output. say for file 1: 102310863||7097881||6845123||271640||06007709532577|||| 102310875||7092992||6840818||023740||10034500635650|||| and file 2: 102310863||7097881||6845193||271640||06007709532577|||| 102310875||7092992||6840808||023740||10034500635650|||| The desired output is: 102310863||7097881||6845123||271640||06007709532577|||| 102310863||7097881||6845193||271640||06007709532577|||| 102310875||7092992||6840818||023740||10034500635650||||...

Awk, little endian order and 4 hex digits

awk,hex,gawk,little-endian,dec
I suppose that I have a decimal number, e.g., 97254 ---> 00017BE6 (hex value) using: echo "" | awk '{printf("%08X", 97254)}' Now, if I want to convert hex number (00017BE6, in this case) into 4 numbers of 2 digits (max 8 numbers in input) in little endian order and CSV...

awk to ignore double quote and compare two files

awk,gawk
I have two input file FILE 1 123 125 123 129 and file 2 "a"|"123"|"anc" "b"|"124"|"ind" "c"|"123"|"su" "d"|"122"|"aus" OUTPUT: "b"|"124"|"ind" "d"|"122"|"aus" now how can i compare and print the difference of $1 from file1 and $2 from file2. i'm having trouble cause of the double quote("). So how can I...

deleting header lines with no following content lines using awk

awk,sed,gawk
I think I've done this a couple of times but I can't do it this morning. I have a file like this for example. (this is the result of comparison of many files using foreach and diff, with file names enclosed with ### pattern) << file gg >> ### ./translations/qt_fr.ts...

Confusing backslash before awk, what does it mean?

bash,hadoop,awk,gawk
#!/usr/bin/env bash for year in all/* do echo -ne `basename $year .gz`"\t" gunzip -c $year | \ awk '{ temp = substr($0, 88, 5) + 0; q = substr($0, 93, 1); if (temp !=9999 && q ~ /[01459]/ && temp > max) max = temp } END { print max...

how swap lines with awk with only a single pass and limited memory use?

awk,swap,gawk
in a previous post, this answer was shown: answer user2138595, though beautiful , the problem is that you should read the input file twice. I wish to make a GNU awk script to read input only once. cat swap_line.awk you get BEGIN { if(init > end){ exit 1; } flag...

AWK - Search for a pattern-add it as a variable-search for next line that isn't a variable & print it + variable

regex,linux,awk,sed,gawk
I have a given file: application_1.pp application_2.pp #application_2_version => '1.0.0.1-r1', application_2_version => '1.0.0.2-r3', application_3.pp #application_3_version => '2.0.0.1-r4', application_3_version => '2.0.0.2-r7', application_4.pp application_5.pp #application_5_version => '3.0.0.1-r8', application_5_version => '3.0.0.2-r9', I would like to be able to read this file and search for the string ".pp" When that string is found, it...

awk unmatched with blank file

bash,awk,gawk
Recently I've encountered a strange behavioral problem with awk say I have two files one with blank file & the another is with populated data so let me apply a simple unmatched code awk -v var=0 'NR==FNR{a[$var]++;next} !($var in a)' file1 file2 say file1 & file 2 a b v...

How to select range for two operations?

awk,gawk
I want to calculate average and standard deviation for 60000 data set (so the NR=60000). My data set file has two columns and my concern is in column #2. I want to use "awk" to do the job. The script goes as below: awk ' { sum+=$2; array[NR]=$2 } END...