linux,shell,command-line,awk,sed , AWK count number of times a term appear with respect to other columns


AWK count number of times a term appear with respect to other columns

Question:

Tag: linux,shell,command-line,awk,sed

Given a CSV file:

id, fruit, binary
1, apple, 1
2, orange, 0
3, pear, 1
4, apple, 0
5, peach, 0
6, apple, 1

How can i calculate for each unique values in fruit,

the number of times the binary value =1 / number of occurences of that fruit appearing in the fruit column ?

Another way to do it is to sum the value of the binary column for for each unique fruit.

For example:

For the fruit apple, it appeared with binary = 1 two times and had a frequency of 3. Hence i will get 2/3.

How can i write this in an efficient AWK code?

I know that i can do this to get unique values from the second column:

cut -d , -f2 file.csv | sort | uniq | 

or

awk '{ a[$2]++ } END { for (b in a) { print b } }' file.csv

So my non-working code looks like this:

 cat file.csv | awk '{ a[$2]++ } END { for (b in a) if ($3==1) {sum+=$3} END {print $0 sum}' 

and

awk '{ a[$2]++ } END { for (b in a) { sum +=1 } }' file.csv

need help in correcting my syntax and merging the 2 awk codes


Answer:

Almost same as the other answer, but printing 0 instead of blank.

AMD$ awk -F, 'NR>1{a[$2]+=$3;b[$2]++} END{for(i in a)print i, a[i], b[i]}' File
pear 1 1
apple 2 3
orange 0 1
peach 0 1

Taking , as field seperator. For all lines except the first, update array a. i.e $2(fruit name) is taken as index and adding up the number of times binary is 1 for this fruit. Also increase b[$2] by one, this will be the number of times the fruit is seen. At the end, print the fruit, binary count and num of times fruit seen. Hope it is clear.


Related:


AWK count number of times a term appear with respect to other columns


linux,shell,command-line,awk,sed
Given a CSV file: id, fruit, binary 1, apple, 1 2, orange, 0 3, pear, 1 4, apple, 0 5, peach, 0 6, apple, 1 How can i calculate for each unique values in fruit, the number of times the binary value =1 / number of occurences of that fruit...

Java read bytes from Socket on Linux


linux,windows,sockets,network-programming,raspberry-pi
I'm trying to send a file from my Windows machine to my Raspberry-Pi 2, and I have a client and a server. The client should be able to send a zip file over the network to my server on my linux machine. I know my client and server work on...

What does it indicate if /proc/PID/maps shows zero for all addresses?


linux,linux-kernel
I'm debugging a problem with a Linux DNS server. Curiously, when I look at /proc/PID/maps for the DNS server process, this is what I get: 00000000-00000000 r-xp 00000000 00:0e 2344 /usr/sbin/unbound 00000000-00000000 rw-p 00000000 00:0e 2344 /usr/sbin/unbound 00000000-00000000 ---p 00000000 00:00 0 00000000-00000000 rw-p 00000000 00:00 0 [heap] 00000000-00000000 rw-p...

linux running command as root from c code that run as normal user


c++,linux
I have a c++ code and I need to running from it a command to adjust the system time. so I thought using system("su root -c date hh:mm") command from my c++ code. The problem is that when I write 'su root -c date hh:mm' in the terminal its requires...

What are correct permissions for Linux Apache2 PHP 5.3 log file?


php,linux,apache,logging,permissions
I discovered the reason why I was not getting entries into my php_errors.log file related to permissions. Right now, I have set it to 666 (rw-rw-rw-) but surely this is a security weakness? Thus, my question. php.ini file: error_log /var/log/httpd/php_errors.log log_errors On # ls -ld /var/log /var/log/httpd /var/log/httpd/php_errors.log drwxr-xr-x 6...

Ignore first few lines and last few lines in a file Linux


linux,awk
I have a file like this and would like to print $0 except the first two and last three lines in linux. Tried awk command but no luck, is there any options I am using the following command - I suppose I am doing something wrong, but not able to...

How to append entry the end of a multi-line entry using any of stream editors like sed or awk


linux,bash,awk,sed,sh
I am trying to use a script to append the host name at the end of a multi-line entry of a specific Host_Alias field in sudoers file. The current sudoers file has an entry similar to : Host_Alias srv_linuxestate= \ host10,host12,host13,host1,host50,\ host16,host1,host2,host11,host15,host21,\ host3,host14 My required output would be something like...

Use Unix Executable File to Run Shell Script and MPKG File


osx,shell,unix
I have 2 shell scripts and 2 mpkg installer, I am trying to use an unix excitable file to run them all. here is the script I have, but it always has error message "No such file or directory" ? #!/bin/sh # Find the absolute script current path path=$( cd...

how to deletes line from a text file that are taken from another file [duplicate]


shell,awk,sed,grep,sh
This question already has an answer here: Remove duplicates from text file based on second text file 4 answers I have a data.txt file with a lot of lines in it and a lines.txt that contains some lines. I want to delete all lines from data.txt that match any...

ret_from_syscall source code and when it is called


linux,linux-kernel,kernel,linux-device-driver,system-calls
In the below call trace we see function called ret_from_syscall. Which function is this ? When it will called during system call ? Where is the corresponding code for this ? May 7 16:40:34.322086 warn TCU-0 kernel: [cf83ddc0] [00009751] 0x9751 (unreliable) May 7 16:40:34.322086 warn TCU-0 kernel: [cf83ddd0] [c00469ac] do_syslog+0x198/0x424...

How to extract first letters of dashed separated words in a bash variable?


linux,string,bash,shell,variables
I would like to extract the first letter of dashed separated words value of my bash variable, like this: MY_TEXT=this-is-my-custom-text I would like to create a second variable like this: MY_INITIALS=timct...

Force linux to use php as php55


php,linux,fedora
Helllo, I have a little problem here. I have PHP 5.3 installed and it's accessible via php command, I also have php 5.5 accessible as php55 command. Now I need to force linux to use php55 when I write php command. Simple way: How I switch the versions of php?...

Using an ad-hoc libc with a tool which is an argument of another tool


linux,shared-libraries
I need to use a particular libc to run a tool (cp). The problem is that this tool has to be used as argument of another tool (for example timeout) and I don't want to use the modified libc with this one. I tried to do: timeout 10 LD_LIBRARY_PATH=/path/to/mod/libc/ cp...

sed and PHP tags


regex,linux,sed
I have a problem using SED. I have a php file whit this structure in the very first line: <?php echo 'first' ?><?php echo 'second' ?><?php echo 'third';?> I'm trying to remove the first two statements and have as a result: <?php echo 'third';?> I've tried this code: sed -i...

AWK write to new column base on if else of other column


linux,bash,shell,awk,sed
I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D. For example: if value in column D <0.7, value in column E =...

Missing one condition to display a string


shell
I am working on a small script that simply takes a parameter and read the output of ls -l and displays the user and name of the file that start with that parameter. For exemple : $> ls -l | ./script.sh "ok" John ok_file Mark ok_test Here is what the...

make error during building webkitgtk


linux,makefile,cmake,make
I use UBuntu 14.04 LTS. I need to build webkitgtk 2.8.3 Here is an example instruction which I have used: linuxfromscratch When I run sudo make -j8 I get following log: Scanning dependencies of target JavaScriptCore-4-gir Scanning dependencies of target fake-generated-webkitdom-headers [ 0%] Scanning dependencies of target WebKit2-fake-api-headers Scanning dependencies...

How do I silence the HEAD of a curl request while using the silent flag?


bash,shell,curl,command-line,pipe
When I run the curl command and direct the data to a file, I get back the content of the site as expected. $ curl "www.site.com" > file.txt $ head file.txt Top of site ... However, this command shows a progress bar, which I do not want: % Total %...

How does the kernel separate threads from processes


linux,multithreading,linux-kernel
Suppose I have a browser process like Firefox, that has pid = 123. Firefox has 5 opened tabs each running in a separate thread, so in total it has 5 threads. So I want to know in depth, how the kernel will separate the process into the thread to execute...

While loop in bash using variable from txt file


linux,bash,rhel
I am new to bash and writing a script to read variables that is stored on each line of a text file (there are thousands of these variables). So I tried to write a script that would read the lines and automatically output the solution to the screen and save...

using sed to replace a line with back slashes in a shell script


regex,bash,shell,ssh,sed
I am trying to replace the bottom one of these 2 lines with sed in a file. <rule>out_prefix=orderid ^1\\d\+ updatemtnotif/</rule>\n\ <rule>out_prefix=orderid ^2\\d\+ updatemtnotif/</rule>\n\ And the following command seems to do that when executed as a command at the bash prompt sed -i '[email protected]_prefix=orderid ^2\\\\d\\+ updatemtnotif/@out_prefix=orderid ^2\\\\d\\+ updatemtnotif_fr/@g' /opt/temp/rules.txt however, when...

fread(), solaris to unix portability and use of uninitialised values


c,linux,memory,stack,portability
Valgrind found the following error and I, after reading the documentation, the code and other questions in here couldn't figure it out why. Valgrind: first warning ~$ valgrind --vgdb=yes --vgdb-error=0 --read-var-info=yes --leak-check=yes --track-origins=yes debitadmin* debitadmin ==20720== Conditional jump or move depends on uninitialised value(s) ==20720== at 0x4013BC6: initialise (dbg.c:199) ==20720==...

How to test if a command is a shell reserved word?


bash,shell
I am writing a bash script and I would like to verify if a string is a shell reserved word (like if, for, alias, etc...). How can I can do this?...

How do I align my output?


shell
I'm writing a script that takes a parameter and reads the output of ls -l. It then displays the user and the name of the file STARTING with that parameter. For example : $> ls -l | ./script.sh "o" John ok_test RandomUser o_file My script works just fine, but I...

Using the shell provided with NodeJS


javascript,node.js,shell,require,node-modules
Once you've installed NodeJS, you'll have a executable in your computer named NodeJS which is a shell. I was wondering what can I do with that... here you're able to run JS code as, for example, in the browser's console, great. Now, is it possible to require modules in that...

Get system startup time (without reading /proc/uptime)


php,linux
I cannot open /proc/uptime due to open_basedir restriction. The command uptime is too old and doesn't have the -s flag support. How can I - in PHP - get time when the server started? My current code is this, but it does not work on the production server (for the...

Matching string inside file and returning result


regex,string,bash,shell,grep
I've got a few peculiar issues with trying to search for a string inside of a .db file. The way I tried was by using grep, which does apparently find the string(s), although this is the output: $ grep "ext" *.db Binary file enormous.db matches There are a couple problems...

linux - running a process and tailing a file simultaneously


bash,shell,tail
I want a run a long task on a remote machine (with python fabric using ssh). It logs to a file on the remote machine. What I want to do is to run that script and tail (actively display) the log file content until the script execution ends. The problem...

Linux-wget command


linux,shell,wget
I need a quick help on customizing my wget command in a shell script: The wget command looks something like this: wget http://infamvn:8081/nexus/content/groups/LDM_REPO_LIN64/com/infa/com.infa. products.ldm.ingestion.server.scala/10.0.0.135.527-SNAPSHOT/com.infa.products.ldm.ingestion.server.scala-10.0.0.135.527-20150622.210643-1-sources.jar Here I'd like to add the 10.0.0.135.527 in a variable, so I created a script something like this: n = 10.0.0.135.527 wget...

shell script cut from variables


bash,shell,shellcode
The file is like this aaa&123 bbb&234 ccc&345 aaa&456 aaa$567 bbb&678 I want to output:(contain "aaa" and text after &) 123 456 I want to do in in shell script, Follow code be consider #!/bin/bash raw=$(grep 'aaa' 1.txt) var=$(cut -f2 -d"&" "$raw") echo $var It give me a error like...

Why can I view some Unix executable files in Mac OS X and not others?


git,bash,shell,unix,binary
I am on a Macbook Pro on Mac OS X 10.10 (Yosemite). When I go to /usr/bin, git is there as a unix executable file. When I open it up in Sublime Text, all I get is unreadable machine code. However, when I open up a different Unix executable file—in...

Identifying when a file is changed- Bash


bash,shell,unix
So in my bash shell script, I have it running through a for loop. Inside the for loop, I use find "$myarray[i]" >> $tmp to look for a certain directory each time through the loop. Sometimes, it finds the variable in myarray[i] and sometimes it doesn't. When it does find...

sed string with special character New


linux,sed,special-characters
when i use this script to replace : mrm.fr.mycompany.com by 10.70.89.40:8081/artifactory sed -i -e "s/mrm.fr.mycompany.com/10.70.89.40:8081/artifactory/g" config.xml i have the error : sed: -e expression n°1, caractère 41: option inconnue pour `s' can anyone help me thanks in advance regard, Youssef...

How can I resolve the “Could not fix timestamps in …” “…Error: The requested feature is not implemented.”


linux,build,f#
I have been trying to build a project in F# on Linux that I have located here on github. It's a basic kata project that I am working on as a demo. However on Linux (specifically Ubuntu 14.04 LTS Desktop) I haven't been able to get it to build yet...

Delete some lines from text using Linux command


linux,shell,sed,grep,pattern-matching
I know how to match text using regex patterns but not how to manipulate them. I have used grep to match and extract lines from a text file, but I want to remove those lines from the text. How can I achieve this without having to write a python or...

Bash modify CSV to change a field


linux,bash,awk
I have a very big CSV file (aprox. 10.000 rows and 400 columns) and I need to modify certain columns (like 15, 156, 220) to change format from 20140321132233 to 2014-03-21 13:22:33. All fields that I need to modify are datetime. I saw some examples using awk but for math...

NASM: copying a pointer from a register to a buffer in .data


linux,assembly,nasm,x86-64
I am new to asm. I am trying to copy a pointer from a register to a .data variable using NASM, on linux 64-bit. Concider this program: section .data ptr: dq 0 section .text global _start _start: mov [ptr], rsp mov rax, 60 mov rdi, 0 syscall Here I try...

Extra backslash when storing grep in a value


linux,bash
In a bash script I have: Check="grep -e '"'\(-S mount\)'"' /etc/audit/audit.rules" set -x When you run it it shows it as: CHECK='grep -e '\''\(-S mount\)'\'' /etc/audit/audit.rules' Now it works exactly what I want but I want to understand it. Why is there 2 extra \'s?...

Syncing Vagrant VMs across different physical servers


linux,vagrant,backup,virtual-machine,sync
I'm using Vagrant to deploy my VMs and my current setup looks like this: server1 = VM1, VM2, VM3 ( main production server ) server2 = VM1, VM2, VM3 ( backup server ) My questions is, can I somehow sync the VMs across the different physical servers in case one...

Linux - sh script - download multiple files from FTP


linux,ftp,sh
I need script that can download files from ftp via my sh code I have use expect with ftp, but if I do for loop inside code, I got wrong # args: should be "for start test next command" while executing "for v in "a b c"" My code /usr/bin/expect...

how to modify an array value with given index?


arrays,linux,bash
I want to modify an array cell, which I can do when I know the cell as a number. However here my cell position is given by $i. pomme[`${i}`]="" I tried without the `` and it doesn't work either? How am I suppose to do it?...

How to extract single-/multiline regex-matching items from an unpredictably formatted file and put each one in a single line into output file?


linux,shell,unix,replace,grep
I have a very huge file which looks like this: <a>text</a>text blah <b>data1</b>abc<b>data2</b> <b>data3</b>blahblah <c>text</c> <d>text</d> <x>blahblah<b>data4 data5 data6</b> <b>data7 </x> That is, its formatting is unpredictable. I need to extract each <b>...</b> item (it might contain multiline text!) and put every one of them in a single separate line....

Django MySQLClient pip compile failure on Linux


python,linux,django,gcc,pip
Django documentation as of v1.8 recommends using mysqlclient connector for the framework. I'm attempting to pip install the package on Ubuntu 14.04 with Python 3.4 and running into a GCC error that I'm unable to find reference to. I'm not an expert on compiling software, so was hoping somebody can...

Git post-receive hook is not executed


linux,git,githooks,git-post-receive
The following post-receive hook: #!/bin/bash echo "-> Post-receive test" is not executed when pushing to my remote repository. The remote is ssh://[email protected]:2222/home/git/repo.git (it's a VM) and works, as when I manually checkout it I see the modifications I've made. Some additional informations ... $ ls -al /home/git drwxr-xr-x 7 git...