FAQ Database Discussion Community


Accessing a large number of unsorted array elements in Python

python,r,bigdata,sparse-matrix,large-data
I'm not very skilled in Python. However, I'm pretty handy with R. Yet, I do have to use Python since it has an up-to-date interface with Cplex. I'm also trying to avoid all the extra coding I would have to do in C/C++ That being said, I have issues with...

Code either overloads memory or wont compile VBA

excel,vba,excel-vba,large-data
Trying to write a macro to insert a hyphen at specific points in a text string depending on how long the string is or delete all text after said point. i.e - if 6 characters, insert a hyphen between char 4+5 or delete all text after char 4 - if...

Searching for info in large arrays

powershell,large-data
I've got a script which indexes information from our equipment. To store and analyze the information I've created a class: Add-Type @' public class CPObject { public int id; public string name; public string displayname; public string classname; public string ip; public string netmask; public string ManagementServer; } '@ Then...

Long Vector Linear Programming in R?

python,r,large-data,linear-programming
Hello and thanks in advance. Fresh off the heels of this question I acquired some more RAM and now have enough memory to fit all the matrices I need to run a linear programming solver. Now the problem is none of the Linear Programming packages in R seem to support...

Best practices for creating a huge SQL table

mysql,database,database-design,coding-style,large-data
I want to create a table about "users" for each of the 50 states. Each state has about 2GB worth of data. Which option sounds better? Create one table called "users" that will be 100GB large OR Create 50 separate tables called "users_{state}", each which will be 2GB large I'm...

Mode filter for large matrices

matlab,filtering,large-data,mode
I am trying to filter some 4672 by 3001 matrices with values of 0 and 1 by finding the most common value in a given window size. I.e. finding the mode in a window around each pixel. A solution is to use colfilt(A,[3 3],'sliding',@mode) The below link shows a solution...

Simple path queries on large graphs

data-mining,networkx,large-data,jung,spark-graphx
I have a question about large graph data. Suppose that we have a large graph with nearly 100 million edges and around 5 million nodes, in this case what is the best graph mining platform that you know of that can give all simple paths of lengths <=k (for k=3,4,5)...

MongoDB: fastest way to compare two large sets

mongodb,large-data,bloom-filter
I have a MongoDB collection with more than 20 millions documents (and growing fast). Some document have an 'user_id' (others, don't). I regularly need to check if some user_id exists in the collection. But there is a lot of 'some'. 10K to 100K. How would you do that? The first...

Powershell random shuffle/split large text file

powershell,large-files,large-data
Is there a fast implementation in Powershell to randomly shuffle and split a text file with 15 million rows using a 15%-85% split? Many sources mention how to do it using Get-Content, but Get-Content and Get-Random is slow for large files: Get-Content "largeFile.txt" | Sort-Object{Get-Random}| Out-file "shuffled.txt" I was looking...

Efficiently find objects satisfying relationship

python,large-data
Let's say I have some objects, like in this example (JSON code): { "people" : { "Alice" : { "position" : "Manager", "company" : "Company1" }, "Bob" : { "position" : "CEO", "company" : "Company1" }, "Charlie" : { "position" : "CEO", "company" : "Company2" } }, "companies" : [...

R - Why adding 1 column to data table nearly doubles peak memory used?

r,memory,data.table,large-data
After getting help from 2 kind gentlemen, I managed to switch over to data tables from data frame+plyr. The Situation and My Questions As I worked on, I noticed that peak memory usage nearly doubled from 3.5GB to 6.8GB (according to Windows Task Manager) when I added 1 new column...

Which vector and map, uses less memory (large set of data and unknown size)

c++,stdvector,large-data,stdmap
I wonder which container uses less memory between std::map and std::vector with a large set of data. Loads of posts talk about efficiency, and my priority is not efficiency but memory consumption. So, if we don't know the number of our data (in my case can be over 12,000,000 entries,...

PROC SQL Update Efficiency for Large Datasets

sql,performance,sas,large-data
I have a SAS Master Dataset with 10 Million Rows and 1800 columns. I need to update 10 columns using a transaction dataset with 15 million records, only for records with matching key. I tried running a proc sql update statement using the following code. proc sql; UPDATE lib1.master1 a...

Sub setting very large data frames in R efficiently

r,data.frame,bigdata,large-data
So I have a data frame of 16 columns and ~17 million rows. I would first like to do some ddply on the data frame and then look at the correlations between the different columns. What’s the best and most efficient way to achieve this? My current approach takes too...

fast conversion of a text file to arrays C++

c++,arrays,string,file,large-data
I've been working on a project that involves large heightmaps (3000x3000 ~60MB). . What i need to do is to split the data into several 200x200 arrays (15x15 of them), then save them separately (but this time in a format which is as fast as possible to load again). I've...

Neo4j & Spring Data Neo4j 4.0.0 : Importing large datasets

logging,import,neo4j,large-data,spring-data-neo4j-4
I want to insert real-time logging data into Neo4j 2.2.1 through Spring Data Neo4j 4.0.0. The logging data is very big which may reach hundreds of thousands records. How is the best way to implement this kind of functionality? Is it safe to just using the .save(Iterable) method at the...

Excel VBA method fails to run and returns no error

vba,excel-vba,large-data
I intended my code to search through an excel spreadsheet filled with data and return entire rows whose p-values were below 0.05. But, I do not receive any syntax errors and the code looks about right. I am working with a large data set, ~780000 row entries so I don't...

How to quickly read a large txt data file (5GB) into R(RStudio) (Centrino 2 P8600, 4Gb RAM)

r,large-data
I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks

How to pull large SQL server tables into C# for analysis

c#,sql,sql-server,large-data
I'm in need of a bit advice on how to best approach this problem. I inherited a project to build a reporting utility from an existing SQL server database. The database contains a "Raw Data" table where every production data point is dumped. The report needs to provide the average...

Selecting specific rows from a large dataset using column values

r,unique,large-data
I have a large data set (about 2000 rows and 38 columns) that looks like this (there is missing data in some columns): species crab cmass gill gmass treatment months avglw avgils 222 Cm 65 34.273 p 0.198 Newtons Cove 0 68.108 93.181 223 Cm 57 33.506 p 0.166 Newtons...

Move large amount of data from old database to new database using Entity Framework 5

c#,database,entity-framework,large-data
I'm creating an application to move data from old to new database (different schema). I'm using Visual Studio 2013, C#, Entity Framework 5, Microsoft SQL Server 2012. This table, Customer, has more than 40 thousand records. private void TransferCustomer() { int counter = 0; // Load all old customers var...

How to read large (~20 GB) xml file in R?

r,large-data
I want to read data from large xml file (20 GB) and manipulate them. I tired to use "xmlParse()" but it gave me memory issue before loading. Is there any efficient way to do this? My data dump looks like this, <tags> <row Id="106929" TagName="moto-360" Count="1"/> <row Id="106930" TagName="n1ql" Count="1"/>...

Reading and parsing a large .dat file

parsing,large-data
I am trying to parse a huge .dat file (4gb). I have tried with R but it just takes too long. Is there a way to parse a .dat file by segments, for example every 30000 lines? Any other solutions would also be welcomed. This is what it looks like:...

Java Swing Displaying Large Amounts of Data from ArrayLists

java,swing,arraylist,large-data
I cannot provide code because this is an abstract problem that I am currently facing. I am working on a program which allows a user to track players from games. The program essentially stores a players profile information in an ArrayList. One feature of my program that I would like...