python,scala,csv,apache-spark , Populate csv with Scala


Populate csv with Scala

Question:

Tag: python,scala,csv,apache-spark

I have a csv file which is more or less "semi-structured)

    rowNumber;ColumnA;ColumnB;ColumnC;
    1;START; b; c;
    2;;;;
    4;;;;
    6;END;;;
    7;START;q;x;
    10;;;;
    11;END;;;

Now I would like to get data of this row --> 1;START; b; c; populated until it finds a 'END' in columnA. Then it should take this row --> 7;START;q;x; and fill the cells below with the values until the next 'END' (here:11;END;;;)

I am a complete beginner and it is pretty tough for me, how I should start:

    import au.com.bytecode.opencsv.CSVReader
    import java.io.FileReader
    import scala.collection.JavaConversions._
    import scala.collection.mutable.ListBuffer


    val masterList = new CSVReader(new FileReader("/root/csvfile.csv"),';')
    var startList = new ListBuffer[String]()
    var derivedList = new ListBuffer[String]()



    for (row <- masterList.readAll) {
        row(1) = row(1).trim
        if (row(1) == "START")
          startList += row(0)      
    }
    var i = 0 
    for (i <- i to startList.length ) {
      for(row <- masterList.readAll)
      if (row(0) > startList(i) && row(0) < startList(i+1)) {
        derivedList += row
      }
    }

I started to read the file with CSVReader and create a masterList. I created a loop and iterate und put all the START values into it (so I know the range from START to the next START). I created a second loop where I wanted to put in the datasets into a new ListBuffer. But this does not work

The next step would be to merge masterList + derived List.

I need some good ideas, or a push in the right direction, how I could proceed or how I could do this a bit easier? Help very much appreciated!!

I don't know, if there is a big difference in the first place, but I want to create a Apache Spark application. There is also the option to do this in Python (if it is easier)

Output should look like this: It should look like

    1;START; b; c;
    2;;b;c;
    4;;b;c;
    6;END;;;
    7;START;q;x;
    10;;q;x;
    11;END;;;

You never touch the line with END. Just fill up the lines below START with ColumnB and ColumnC


Answer:

Kind of little functional using the scanleft[1]. drop(1) from the readAll because it reads the header too. Also drop(1) at the end because scanleft starts with start result.

ScanLeft kind of lets you work with previous value of the computation, It takes a function with two params, first the result of the previous computation and the second parameter the current value of the list (iterator..). It needs a sentinel kind of value for the first element of the list, and I am providing empty list in this case.

Now in the function which I am passing to scanleft ( prev,curr) => .... If the curr which is current row of the csv, starts with "START" or "END", don't need to do anything that is expected value. Other wise (case _) we need take the first two values of the current row and append previous row except first two columns ( i.e. drop 2 ). you can also append prev(2) and prev(3) too.

And finally there is a drop(1), because scan left returns the firstrow with sentinel i.e. the start value, which we don't need.

val reader = new CSVReader(new FileReader("test.in"),';')
val start = List("","","","")
val res = reader.readAll.drop(1).scanLeft(start)((prev,curr) => {
  curr(1) match { 
    case "START" => curr.toList
    case "END" => curr.toList
    case _ => curr(0) ::  curr(1) :: (prev drop 2)
  }
}).drop(1)

To view result

for(r <- res) {
  println(r)
}

This will output

List(1, START,  b,  c, )
List(2, ,  b,  c, )
List(4, ,  b,  c, )
List(6, END,  ,  , )
List(7, START, q, x, )
List(10, , q, x, )
List(11, END, , , )

[1]. Reduce, fold or scan (Left/Right)?


Related:


Twilio Client Python not Working in IOS Browser


javascript,python,ios,flask,twilio
I have created a simple twilio client application to make phone calls from Web Browser to phones. I used a sample Flask app to generate a secure Capability Token and used twilio.min.js library to handle calls from my HTML. The functionality works fine in Computer Browsers ans Android Phone Browsers,...

Spring-integration scripting with Python


python,spring-integration,jython
I'm trying to use Python with spring-integration and jython-standalone-2.7.0: Here is my application context: <int:inbound-channel-adapter id="in" channel="exampleChannel" > <int:poller fixed-rate="1000" /> <int-script:script lang="python" location="script/message.py" /> </int:inbound-channel-adapter> <int:channel id="exampleChannel" /> <int-ip:udp-outbound-channel-adapter id="udpOut" channel="exampleChannel" host="192.168.0.1" port="11111" /> Here is my script in Python: print "Python"...

Retrieving TriangleCount


scala,apache-spark,spark-graphx
I'm trying to retrieve the amount of triangles from a graph using graphX. As I'm new to both Scala and graphX, I'm currently quite stuck. I'm creating a graph from an edgefile: 1 2 1 3 2 3 This should be 1 triangle. Next I'm using the build in function...

Python - Opening and changing large text files


python,replace,out-of-memory,large-files
I have a ~600MB Roblox type .mesh file, which reads like a text file in any text editor. I have the following code below: mesh = open("file.mesh", "r").read() mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") mesh = "{"+mesh+"}" f = open("p2t.txt", "w") f.write(mesh) It returns: Traceback (most recent call last): File...

Python Popen - wait vs communicate vs CalledProcessError


python,python-2.7,error-handling,popen
Continuing from my previous question I see that to get the error code of a process I spawned via Popen in python I have to call either wait() or communicate() (which can be used to access the Popen stdout and stderr attributes): app7z = '/path/to/7z.exe' command = [app7z, 'a', dstFile.temp,...

Future yielding with flatMap


scala
Given Futures fa, fb, fc, I can use f: Function1[(A,B,C), Future[D]], to return a Future[D] either by: (for { a <- fa b <- fb c <- fc } yield (a,b,c)).flatMap(f) which has the unenviable property of declaring the variables a,b,c twice. or a.zip(b).zip(c).flatMap{ case (a, (b, c)) => f(a,...

How to generalize the round methods


scala
I have the following four methods, using BigDecimal to round a number: private def round(input: Byte, scale: Int): Byte = { BigDecimal(input).setScale(scale, RoundingMode.HALF_UP).byteValue() } private def round(input: Short, scale: Int): Short = { BigDecimal(input).setScale(scale, RoundingMode.HALF_UP).shortValue() } private def round(input: Int, scale: Int): Int = { BigDecimal(input).setScale(scale, RoundingMode.HALF_UP).intValue() } private def...

odoo v8 - Field(s) `arch` failed against a constraint: Invalid view definition


python,xml,view,odoo,add-on
I want to create a new view with a DB-view. When I try to install my app, DB-view was created then I get error: 2015-06-22 12:59:10,574 11988 ERROR odoo openerp.addons.base.ir.ir_ui_view: Das Feld `datum` existiert nicht Fehler Kontext: Ansicht `overview.tree.view` [view_id: 1532, xml_id: k. A., model: net.time.overview, parent_id: k. A.] 2015-06-22...

Convert RDD[Map[String,Double]] to RDD[(String,Double)]


scala,apache-spark,rdd
I did some calculation and returned my values in a RDD containing scala map and now I want to remove this map and want to collect all keys values in a RDD. Any help will be appreciated....

Identify that a string could be a datetime object


python,regex,algorithm,python-2.7,datetime
If I knew the format in which a string represents date-time information, then I can easily use datetime.datetime.strptime(s, fmt). However, without knowing the format of the string beforehand, would it be possible to determine whether a given string contains something that could be parsed as a datetime object with the...

group indices of list in list of lists


python,list
I am looking for an elegant solution for the following problem. I have a list of ints and I want to create a list of lists where the indices with the same value are grouped together in the order of the occurrences of said list. [2, 0, 1, 1, 3,...

Count function counting only last line of my list


python,python-2.7
Count function counting only last line of my list N = int(raw_input()) cnt = [] for i in range(N): string = raw_input() for j in range(1,len(string)): if string[j] =='K': cnt.append('R') elif string[j] =='R': cnt.append('R') if string[0] == 'k': cnt.append('k') elif string[0] == 'R': cnt.append('R') print cnt.count('R') if I am giving...

Displaying a 32-bit image with NaN values (ImageJ)


python,image-processing,imagej
I wrote a multilanguage 3-D image denoising ImageJ plugin that does some operations on an image and returns the denoised image as a 1-D array. The 1-D array contains NaN values (around the edges). The 1-D array is converted back into an image stack and displayed. It is simply black....

Using counter on array for one value while keeping index of other values


python,collections
After reading the answers on this question How to count the frequency of the elements in a list? I was wondering how to count the frequency of something, and at the same time retreive some extra information, through something like an index. For example a = ['fruit','Item#001'] b = ['fruit','Item#002']...

SyntaxError: invalid syntax?


python,syntax
Good afternoon, I am developing a script in python and while I am trying to compile it from the terminator/terminal i always get this error, but I cannot understand where is the syntax error? File "_case1.py", line 128 print ('########################') ^ SyntaxError: invalid syntax Then I just change the position...

Parse text from a .txt file using csv module


python,python-2.7,parsing,csv
I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

trying to understand LSH through the sample python code


python,similarity,locality-sensitive-hash
the concise python code i study for is here Question A @ line 8 i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature" Question B @ line 49 (SOLVED BY myself through another Q&A) "xor = r1^r2" does not really...

How to put an image on another image in python, using ImageTk?


python,user-interface,tkinter
I want to put an image in front of another one, then use this combined image as a button's background image in Tkinter. How can I do it? I am free to import Tkimage, Image. Clarify: I want to stick this on the center of this so that something like...

represent an index inside a list as x,y in python


python,list,numpy,multidimensional-array
I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list. I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator...

How do I read this list and parse it?


python,list
I'm using requests and the output I get from the sites API is a list, I've been stuck trying to parse it to get the data from it. I use r = requests.get(urlas, params=params) r.json() to get the data I want. Here is a snippet of the list [{'relation_type': None,...

How to remove structure with python from this case?
python,python-2.7
How to remove "table" from HTML using python? I had case like this: paragraph = ''' <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quidem molestiae consequuntur officiis corporis sint.<br /><br /> <table> <tr> <td> text title </td> <td> text title 2 </td> </tr> </table> <p> lorem ipsum</p> ''' how...

Inserting a variable in MongoDB specifying _id field


python,mongodb,pymongo
I want to insert a variable, say, a = {1:2,3:4} into my database with a particular id "56". It is very clear from the docs that I can do the following: db.testcol.insert({"_id": "56", 1:2, 3:4}) However, I cannot figure out any way to insert "a" itself, specifying an id. In...

Django: html without CSS and the right text


python,html,css,django,url
First of all, this website that I'm trying to build is my first, so take it easy. Thanks. Anyway, I have my home page, home.html, that extends from base.html, and joke.html, that also extends base.html. The home page works just fine, but not the joke page. Here are some parts...

Pandas - Dropping multiple empty columns


python,pandas
I have some tables where the first 11 columns are populated with data, but all columns after this are blank. I tried: df=df.dropna(axis=1,how='all') which didn't work. I then used: df = df.drop(df.columns[range(11,36)], axis=1) Which worked on the first few tables, but then some of the tables were longer or shorter...

sys.argv in a windows environment


python,windows,python-3.x
I'm attempting to learn python using the book 'a byte of python'. The code: import sys print('the command line arguments are:') for i in sys.argv: print(i) print('\n\nThe PYTHONPATH is', sys.path, '\n') outputs: the command line arguments are: C:/Users/user/PycharmProjects/helloWorld/module_using_sys.py The PYTHONPATH is ['C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Python34\\python34.zip', 'C:\\Python34\\DLLs', 'C:\\Python34\\lib', 'C:\\Python34', 'C:\\Python34\\lib\\site-packages']...

Sum of two variables in RobotFramework


python,automated-tests,robotframework
I have two variables: ${calculatedTotalPrice} = 42,42 ${productPrice1} = 43,15 I executed ${calculatedTotalPrice} Evaluate ${calculatedTotalPrice}+${productPrice1} I got 42,85,15 How can I resolve it?...

Matplotlib: Plot the result of an SQL query


python,sql,matplotlib,plot
from sqlalchemy import create_engine import _mssql from matplotlib import pyplot as plt engine = create_engine('mssql+pymssql://**:[email protected]:1433/AffectV_Test') connection = engine.connect() result = connection.execute('SELECT Campaign_id, SUM(Count) AS Total_Count FROM Impressions GROUP BY Campaign_id') for row in result: print row connection.close() The above code generates an array: (54ca686d0189607081dbda85', 4174469) (551c21150189601fb08b6b64', 182) (552391ee0189601fb08b6b73', 237304) (5469f3ec0189606b1b25bcc0',...

Sort when values are None or empty strings python


python,list,sorting,null
I have a list with dictionaries in which I sort them on different values. I'm doing it with these lines of code: def orderBy(self, col, dir, objlist): if dir == 'asc': sorted_objects = sorted(objlist, key=lambda k: k[col]) else: sorted_objects = sorted(objlist, key=lambda k: k[col], reverse=True) return sorted_objects Now the problem...

Scala first program issue


scala,recursion,case,frequency
I have just started to learn Scala after some experience with functional programming in other languages. def freq(c:Char, y:String, list:List[(Char,Int)]): List[(Char,Int)] = list match{ case _ => freq(c, y.filter(_ == c), list :: List((count(c,y),c))) case nil => list } In the above code I am getting an error when trying...

Python: can't access newly defined environment variables


python,bash,environment-variables
I can't access my env var: import subprocess, os print os.environ.get('PATH') # Works well print os.environ.get('BONSAI') # doesn't work But the env var is well added in my /home/me/.bashrc: BONSAI=/home/me/Utils/bonsai_v3.2 export BONSAI And I can access this env var from a new terminal....

Calling function and passing arguments multiple times


python,function,loops
I want to call the function multiple time and use it's returned argument everytime when it's called. For example: def myfunction(first, second, third): return (first+1,second+1,third+1) 1st call: myfunction(1,2,3) 2nd call is going to be pass returned variables: myfunction(2,3,4) and loop it until defined times. How can I do such loop?...

how to enable a entry by clicking a button in Tkinter?


python,tkinter
I need to activate many entries when button is clicked please do not write class based code, modify this code only because i need to change the whole code for the project as i did my whole project without classes from Tkinter import * import ttk x='disabled' def rakhi(): global...

Replace nodejs for python?


python,node.js,webserver
i'm working in a HTML5 multiplayer game, and i need a server to sync player's movement, chat, battles, etc. So I'm looking for ways to use python instead nodejs, because i have I have more familiarity with python. The server is simple: var express = require('express'); var app = express();...

Strange Behavior: Floating Point Error after Appending to List


python,python-2.7,behavior
I am writing a simple function to step through a range with floating step size. To keep the output neat, I wrote a function, correct, that corrects the floating point error that is common after an arithmetic operation. That is to say: correct(0.3999999999) outputs 0.4, correct(0.1000000001) outputs 0.1, etc. Here's...

How to effectively get indices of 1s for given binary string using Scala?


scala,functional-programming,higher-order-functions
Suppose we have a binary string such as 10010010. All I want is a function returning indices of 1s for that string: indicesOfOnes("10010010") -> List(0, 3, 6) indicesOfOnes("0") -> List() And what I implemented is: def indicesOfOnes(bs: String): List[Int] = { val lb = ListBuffer[Int]() bs.zipWithIndex.foreach { case (v, i)...

Python: histogram/ binning data from 2 arrays.


python,histogram,large-files
I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity: e.g. a small section of the data. First column is radius and the second is the intensities. 29.77036614 0.04464427 29.70281027 0.07771409 29.63523525 0.09424901 29.3639355 1.322793 29.29596385 2.321502 29.22783249...

How to use template within Django template?


python,html,django,templates,django-1.4
I have the django template like below: <a href="https://example.com/url{{ mylist.0.id }}" target="_blank"><h1 class="title">{{ mylist.0.title }}</h1></a> <p> {{ mylist.0.text|truncatewords:50 }}<br> ... (the actual template is quite big) It should be used 10 times on the same page, but 'external' html elements are different: <div class="row"> <div class="col-md-12 col-lg-12 block block-color-1"> *django...

Pandas Dataframe Complex Calculation


python,python-2.7,pandas,dataframes
I have the following dataframe,df: Year totalPubs ActualCitations 0 1994 71 191.002034 1 1995 77 2763.911781 2 1996 69 2022.374474 3 1997 78 3393.094951 I want to write code that would do the following: Citations of currentyear / Sum of totalPubs of the two previous years I want something to...

Peewee: reducing where conditionals break after a certain length


python,peewee
This is what I have: SomeTable.select.where(reduce(operator.or_, (SomeTable.stuff == entry for entry in big_list))) The problem arises when I have a relatively large list of elements in big_list and I get this: RuntimeError: maximum recursion depth exceeded Is there another way to approach this that doesn't involve splitting up the list...

Python recursive function not recursing


python,recursion
I'm trying to solve a puzzle, which is to reverse engineer this code, to get a list of possible passwords, and from those there should be one that 'stands out', and should work function checkPass(password) { var total = 0; var charlist = "abcdefghijklmnopqrstuvwxyz"; for (var i = 0; i...

Find the tf-idf score of specific words in documents using sklearn


python,scikit-learn,tf-idf
I have code that runs basic TF-IDF vectorizer on a collection of documents, returning a sparse matrix of D X F where D is the number of documents and F is the number of terms. No problem. But how do I find the TF-IDF score of a specific term in...

Scala running issue on eclipse


eclipse,scala
I configured everthing within eclipse for scala. I create a snippet to show you the issue, i can't see in run options run as scala application, i also tried to find my main class under build configuration option but i can't find it. How i can solve it?...

How do variables inside python modules work?


python,module,python-module
I am coming from a Java background with Static variables, and I am trying to create a list of commonly used strings in my python application. I understand there are no static variables in python so I have written a module as follows: import os APP_NAME = 'Window Logger' APP_DATA_FOLDER_PATH...

How does the class_weight parameter in scikit-learn work?


python,scikit-learn
I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in...

How to check for multiple attributes in a list


python,python-2.7
I am making a TBRPG game using Python 2.7, and i'm currently making a quest system. I wanted to make a function that checks all of the quests in a list, in this case (quests), and tells you if any of of the quests in the list have the same...

The event loop is already running


python,python-3.x,pyqt,pyqt4
I have the following 5 files: gui.py # -*- coding: utf-8 -*- from PyQt4 import QtCore, QtGui try: _fromUtf8 = QtCore.QString.fromUtf8 except AttributeError: def _fromUtf8(s): return s try: _encoding = QtGui.QApplication.UnicodeUTF8 def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig, _encoding) except AttributeError: def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig)...

How to change the IP address of Amazon EC2 instance using boto library


python,amazon-web-services,boto
How can I assign a new IP address (or Elastic IP) to an already existing AWS EC2 instance using boto library.

Create an exe with Python 3.4 using cx_Freeze


python,python-3.4,cx-freeze
I have found two other articles about this problem on Stack Exchange but none of them has a clear answer: is it possible to create a .exe of a Python 3.4 script? The only solution I found was to use cx_Freeze. I used it, and it indeed created an executable...

SQLAlchemy. 2 different relationships for 1 column


python,sqlalchemy
I have a simple many-to-many relationship with associated table: with following data: matches: users: users_mathces: ONE user can play MANY matches and ONE match can involve up to TWO users I want to realize proper relationships in both "Match" and "User" classes users_matches_table = Table('users_matches', Base.metadata, Column('match_id', Integer, ForeignKey('matches.id', onupdate="CASCADE",...

In sklearn, does a fitted pipeline reapply every transform?


python,scikit-learn,pipeline,feature-selection
Apologies if this is obvious but I couldn't find a clear answer to this: Say I've used a pretty typical pipeline: feat_sel = RandomizedLogisticRegression() clf = RandomForestClassifier() pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()), ('feature_selection', feat_sel), ('classification', clf)]) pl.fit(X,y) Now when I apply pl on a new set, pl.predict(X_classify); is RandomizedLogisticRegression going...