parsing,csv,awk , awk select columns from input list

awk select columns from input list


Tag: parsing,csv,awk

I'd like an awk script to select the columns from a file based on a list of columns in another file. For example:

$cat cols
3 2 6 4

$cat text
a b c d e f g
h i j k l m n

$awk_script cols text
c b f d
j i m k

So the 3rd, 2nd, 6th and fourth columns have been selected in that order.



You can use this:

awk 'NR==FNR{n=split($0,c);next}{for(i=1;i<n;i++){printf "%s%s", $c[i], OFS};print ""}' cols text

We are passing two input files to awk, first the cols then the text. awk counts the number of input lines processed in the internal variable NR. FNR is the record number in the current file. When reading the first (and only) line of cols NR and FNR have a value of 1 meaning the following block gets executed.

{n=split($0,c);next} splits the whole line which is stored in $0 into the the array c using the global field delimiter and saves the number of columns to print in n. We will later use n in a for loop. next tells awk to stop processing the current line and read the next line of input.

The block {for(i=1;i<=n+1;i++){printf "%s",$c[i],OFS};print ""} gets executed on all other lines since it is not prefixed with a condition. The for loop iterates through cols and prints the corresponding columns delimited by the output file separator OFS. Finally we print a new line.


c b f d
j i m k


How to set up XPath query for HTML parsing?

Here is some HTML code from in Google Chrome that I want to parse the website for some project. <div id="names"> <h2>Names and Synonyms</h2> <div class="ds"><button class="toggle1Col"title="Toggle display between 1 column of wider results and multiple columns.">&#8596;</button> <h3 id="yui_3_18_1_3_1434394159641_407">Name of Substance</h3> <ul> <li id="ds2"> `` <div>Acetaldehyde</div> </li> </ul>...

Create an external Hive table from an existing external table

I have a set of CSV files in a HDFS path and I created an external Hive table, let's say table_A, from these files. Since some of the entries are redundant, I tried creating another Hive table based on table_A, say table_B, which has distinct records. I was able to...

Resampling and merging data frame with python

Hi I have created a dictionary of dataFrame with this code import os import pandas import glob path="G:\my_dir\*" dataList={} for files in glob.glob(path): dataList[files]=(read_csv(files,sep=";",index_col='Date')) The different dataframe present in the dictory have different time sample. An example of dataFrame(A) is Date Volume Value 2014-01-04 06:00:02 6062 108000.0 2014-01-04 06:06:05 6062...

Specific rows from CSV as dictionary and logic when keys are the same - Python

abc11 bvc ex 123 456 somestuffhere abc11 bvc ex 456 476 somestuffhere abc12 bvc ex 173 426 somestuffhere abc12 bvc ex 426 496 somestuffhere abc13 bvc ex 143 796 somestuffhere abc13 bvc ex 743 896 somestuffhere I am trying to put the above CSV file as a dictionary, {'abc11':['123','476'],'abc12':['173','496'],'abc13':['143','896']}. I...

How to pivot array into another array in Ruby

I have a multidimensional array like this one : myArray = [["Alaska","Rain","3"],["Alaska","Snow","4"],["Alabama","Snow","2"],["Alabama","Hail","1"]] I would like to end up with CSV output like this. State,Snow,Rain,Hail Alaska,4,3,nil Alabama,2,nil,1 I know that to get this outputted to CSV the way I want it I have to have output array like this: outputArray =[["State","Snow","Rain","Hail"],["Alaska",4,3,nil],["Alabama",2,nil,1]]...

jquery get elements by class name

I'm using Jquery to get a list of elements having a class "x". html: <p class="x">Some content</p> <p class="x">Some content#2</p> If we use Jquery to get both these html elements and do something with it- we use something like: $(".x").text("changed text"); This will change the text of both the paragraphs....

CSV File header part is coming parsing by LINQ

the below way i am parsing csv file by LINQ but i found header part is coming when i inspect user class data. what is wrong there in code. var csvlines = File.ReadAllLines(filename); // IEnumerable<string> var csvLinesData = csvlines.Select(l => l.Split(',').Skip(1).ToArray()); // IEnumerable<string[]> int flag = 0; var users =...

How to get xml attribute and values using JAXB

I am new in Jaxb i have one xml file which contain many attribute so i want the attribute with value My XMl <message_mapping> <message Rtype="DIAGNOSTIC" direction="2" name="Diagnostic" mode=""> <field tag="USERNAME" source="I" tranData="username" required="false" dataType="string" defaultValue="" /> <field tag="PASSWORD" source="I" tranData="password" required="true" dataType="string" defaultValue="" /> <field tag="LOCALDATETIME" source="E" tranData="trxDateTime" required="true"...

export mysql query array using fputcsv

I am trying to export the results of a sql query to CSV using fputcsv however keep getting the error "fputcsv() expects parameter 2 to be array, string given". I followed the answer given here - Query mysql and export data as CSV in PHP - but it throws the...

How to get a sub parameter of JSON

Sorry for the ambiguous title but I don't know how to explain better. Anyway, I made a code for parse a Json in c#, this structure: { "_links": { "self": { "href": "" }, "teams": { "href": "" }, "fixtures": { "href": "" }, "leagueTable": { "href": "" } },...

Getting error in JSON parsing with array in iOS

I am newly to iPhone, basically in my application I am parsing below JSON model and need to save it in my custom Entity class. JSON model : { "Products": [ { "Pcs": [ { "product_id": 2, "product_name": "MyProduct", "category": { "Clamshells": [ { "product_category_id": 11, "product_category_name": "MyProductCategory", "Sub_category": [...

How can I use a variable to get an Input$ in Shiny?

I am new to R and I am creating a shiny application to read a csv and filter data. I am reading the csv file, then creating dropdowns with a loop using the column names and the unique values: output$dropdowns <- renderUI({ if (is.null(x())) { return(NULL) } lapply(1:ncol(x()), function(i) {...

Compare 2 csv files and output different rows to a 3rd CSV file using Python 2.7

I am trying to compare two csv files and find the rows that are different using python 2.7. The rows are considered different when all columns are not the same. The files will be the same format with all the same columns and will be in this format. oldfile.csv ID...

Python: isolating results

So I have this code (probably super inefficient, but that's another story) that is pulling urls from html code of a blog. I have the html in a .csv, which I am putting into python, then running the regex to get the urls. Here is the code: import csv, re...

How Do I Transform This CSV / Tabular Data Into A Different Shape?

I have a sparse n-column spreadsheet where the first two columns describe a person and the rest of the (n-2) columns track RSVP and attendance data for various events (each of which take up one column). It looks like this: PersonID, Name, Event29108294, Event01838401, Event10128384 12345, John Smith, Registered -...

Saying there are 0 arguments when I have 2? Trying to open a CSV file to write into

I'm trying to read from a CSV file and codify people into groups using an equation. I append the name of their group they fall into to the end of the array that their row creates. Then I write it to a new file so I don't overwrite the original...

BASH - conditional sum of columns and rows in csv file

i have CSV file with some database benchmark results here is the example: Date;dbms;type;description;W;D;S;results;time;id Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50 Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99 Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99...

type conversion performance optimizable?

The following snippet converts xml data to csv data in a data processing application. element is a XElement. I'm currently trying to optimize the performance of the application and was wondering if I could somehow combine the two operations going on below: Ultimately I still want access to the string...

How to translate parts of source program to library calls without writing a full parser?

To give an example: Say I have a very simple library that allows C code to be called from another language L. In order to use your C code from L you need to change certain constructs in your C code such as changing function types to void, replacing function...

Compare 2 seperate csv files and write difference to a new csv file - Python 2.7

I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7. import csv f1 = open ("olddata/file1.csv") oldFile1 = csv.reader(f1) oldList1 = [] for row in oldFile1: oldList1.append(row) f2 = open ("newdata/file2.csv") oldFile2 = csv.reader(f2) oldList2 = [] for...

Convert delimited string to array and group using LINQ in C#

I have a string that has a delimited format like this: orgname: firstname lastname, firstname lastname; (this can repeat with orgnames and variable number of names for each org) Example: **XXX University**: Martha Zander, Rick Anderson; **Albert School**: Nancy Vanderburg, Eric Towson, George Branson; **Hallowed Halls**: Jane Goodall, Ann Crabtree,...

Create XSD based on root element

I have a XSD as shown below , i need to extract all the root Elements in the XSD and create a separate XSD for each root element pragmatically in java, is there some framework of java library that can aid me in achieving this. <?xml version='1.0' encoding='windows-1252'?> <xsd:schema xmlns:xsd=""...

String parsing with batch scripting

I have a file called pictures.xml and it contains some pictures information like: <ResourcePicture Name="a.jpg"> <GeneratedPicture Name="b.jpg"/> <GeneratedPicture Name="c.jpg"/> </ResourcePicture> <ResourcePicture Name="z1.jpg"> <GeneratedPicture Name="z2.jpg"/> <GeneratedPicture Name="z3.jpg"/> <GeneratedPicture Name="z4.jpg"/> </ResourcePicture> What I want do do is to get each line in for loop and print the names of the pictures. Sample...

Find element by class name

I'm trying to find one tag using we.find_element_by_css_selector('p.p1.transfer').text The problem is that there is sometimes a 'strong' tag before the tag I'm searching for which is very similar but it's class is: class="ng-binding ng-hide" instead of class="ng-binding". But when I try to find it it finds the first tag....

Replace improper commas in CSV file

This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the...

How to define a Regex in StandardTokenParsers to identify path?

I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well. class InfixToPostfix extends StandardTokenParsers { import lexical._ def...

Exporting Data from Cassandra to CSV file

Table Name : Product uid | productcount | term | timestamp 304ad5ac-4b6d-4025-b4ea-8b7991a3fe72 | 26 | dress | 1433110980000 6097e226-35b5-4f71-b158-a1fe39a430c1 | 0 | #751104 | 1433861040000 Command : COPY product (uid, productcount, term, timestamp) TO 'temp.csv'; Error: Improper COPY command. Am I missing something? ...

create multidimensional associative array from CSV in PHP

I'm trying to create a multidimensional array in PHP where the inner arrays are associative for the following example CSV string $csv: # Results from 2015-06-16 to 2015-06-16. date,time,label,artist,composer,album,title,duration 2015-06-16,12:00 AM,Island,U2,"Clayton- Adam,The Edge,Bono,Mullen- Larry- Jr",Songs Of Innocence,SONG FOR SOMEONE,03:46 2015-06-16,12:04 AM,Lowden Proud,"Fearing & White, Andy White, Stephen Fearing","White- Andy,Fearing- Stephen",Tea...

Extracting strings from HTML with Python wont work with regex or BeautifulSoup

Im using Python 2.7, BeautifulSoup4, regex, and requests on windows 7. I've scraped some code from a website and I am having problems parsing and extracting the bits I want and storing them in a dictionary. What I'm after is text that is presented as follows in the code: @CAD_DTA\">I...

Run 3 variables at once in a python for loop.

For loop with multiple variables in python 2.7. Hello, I am not certain how to go about this, I have a function that goes to a site and downloads a .csv file. It saves the .csv file in a particular format: name_uniqueID_dataType.csv. here is the code import requests name =...

Read CSV and plot colored line graph

I am trying to plot a graph with colored markers before and after threshold value. If I am using for loop for reading the parsing the input file with time H:M I can plot and color only two points. But for all the points I cannot plot. Input akdj 12:00...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)

I have a problem with the csv reader and writer in python. Whenever I try to take one CSV file and par down the number of columns from roughly 37 to 6, this is the kind of output I am getting. Example of one row: 0,"JOHNSON, JOHN J.",JOHN J. JOHNSON,TECH879,INSPECTION...

Entity framework to CSV float values

I'm trying to write an object to csv, the thing is that my objects have float valure for exemple (14,9) i want to change them to (14.9) so it won't cause any problem with the csv format string csv = ""; using (var ctx = new NBAEntities2()) { var studentList...

Group instances based on NA values in r

I am reading a csv file and unfortunately my dataframe has many missing values. A small snip is as following: df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), Value= c(900, NA, 1300, 1100, NA), Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'), Num1 = c(2, NA, 3, 2, NA), Num2 = c(2,3,3,1,2),...

How do we get text from a file (word-by-word) into a 2D array in PHP?

I have a text file with some stuff that I would like to put into a 2D array. That text file comprises of sentences of equal length. How do I put each word into an array? Example of text file is - This is stackoverflow I am user This file...

Parse text from a .txt file using csv module

I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

Reading JSON file from dropbox iOS

I have a scenario where app reads file from server (dropbox) and checks version. If new version is available then download the app. I'm trying to read file from link but getting null after JSON parsing. NSError *error; NSString *strFileContent = [NSString stringWithContentsOfURL:[NSURL URLWithString:@""] encoding:NSUTF8StringEncoding error:&error]; if(!error) { //Handle error...

pandas parse dates from csv

I am trying to read a csv file which includes dates. The csv looks like this: h1,h2,h3,h4,h5 A,B,C,D,E,20150420 A,B,C,D,E,20150420 A,B,C,D,E,20150420 For reading the csv I use this code: df = pd.read_csv(filen, index_col=None, header=0, parse_dates=[5], date_parser=lambda t:parse(t)) The parse function looks like this: def parse(t): string_ = str(t) try: return[:4]),...

Adding time/duration from CSV file

I am trying to add time/duration values from a CSV file that I have but I have failed so far. Here's the sample csv that I'm trying to add up. Is getting this output possible? Output: I have been trying to add up the datetime but I always fail: finput...

How to stop foreach loop from printing duplicate data?

I am trying to generate unique CSV files from the csv data that I have using the following loop. $k =1; foreach ($csv_tbl as $_csv) { $filename = "Agent_" . $k . ".csv"; $file_path = "agents/$filename"; file_put_contents($file_path, $_csv); if (file_exists($_csv)) { header('Content-Description: File Transfer'); header('Content-type: text/csv'); header('Content-Disposition: attachment; filename=' ....

Node.js - Browserify: Error on parsing tar file

I'm trying to download a tar file (non-compressed) over HTTP and piping it's response to the tar-stream parser for further processing. This works perfect when executed on the terminal without any errors. For the same thing to be utilized on browser, a bundle.js file is generated using browserify and is...

Convert strings of data to “Data” objects in R [duplicate]

This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

How to instantiate lexical.Scanner in a JavaTokenParsers class?

I am writing a parser which inherits from JavaTokenParsers in that I have a function as follow: import scala.util.parsing.combinator.lexical._ import scala.util.parsing._ import scala.util.parsing.combinator.RegexParsers; import scala.util.parsing.combinator.syntactical.StdTokenParsers import scala.util.parsing.combinator.token.StdTokens import scala.util.parsing.combinator.lexical.StdLexical import scala.util.parsing.combinator.lexical.Scanners import scala.util.parsing.combinator.lexical.Lexical import...

Panda's Write CSV - Append vs. Write

I would like to use pd.write_csv to write "filename" (with headers) if "filename" doesn't exist, otherwise to append to "filename" if it exists. If I simply use command: df.to_csv('filename.csv',mode = 'a',header ='column_names') The write or append succeeds, but it seems like the header is written every time an append takes...

Perl: Using Text::CSV to print AoH

I have an array of hashes (AoH) which looks like this: $VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Parse JSON on PHP and extract the particular value(s)

Objective: To parse following json string and get mentioned values separately, later those separated values are going to be inserted to mysql database. I have checked my json string on JsonLint user_name, selected_date, selected_project, tasks , 4.1.task_name, 4.2 work_hours { "user_name": "USER", "selected_date": "2015-06-08", "selected_project": "Project1", "tasks": [ { "task_name":...

Is it possible to output to a csv file with multiple sheets?

I need to output data to a CSV file from Java, but in that csv file I hope to create multiple sheets so that data can be organized in a better way. After some googling, it seems this is not possible. A CSV file can only receive one-sheet data. Is...

How to parse output of external command in Julia?

Let us say that I have an external command called "Busca01.x" which returns three integers separated by tabs, like this: [email protected]: Busca01.x 192 891 9029 So, I can call this from julia and store the result as a string using either readall or readchomp. I need the data as an...

How to rearrange CSV / JSON keys columns? (Javascript)

I am converting a JSON object array to CSV using Papa Parse JavaScript Library. Is there a way to have the CSV columns arranged in a certain way. For e.g; I get the column as: OrderStatus, canOp, OpDesc, ID, OrderNumber, FinishTime, UOM, StartTime but would like to be arranged as:...