FAQ Database Discussion Community


Confusion about kinds in FORTRAN

fortran,precision,hdf5,double-precision
I have been in the process of writing a FORTRAN code for numerical simulations of an applied physics problem for more than two years and I've tried to follow the conventions described in Fortran Best Practices. More specifically, I defined a parameter as integer, parameter:: dp=kind(0.d0) and then used it...

HDF5 : create 1 dimension attribute

c,hdf5
I am able to create very complex hdf5 file with attribut. I am using low api from hdf5 to manage my dataset and use the hdf5 lite api to manage attributes. Problem is that hdf5 lite seems to create array for everything. It seems to be the same for low...

Why are CSV files smaller than HDF5 files when writing with Pandas?

python,csv,pandas,hdf5,hdf
import numpy as np import pandas as pd df = pd.DataFrame(data=np.zeros((1000000,1))) df.to_csv('test.csv') df.to_hdf('test.h5', 'df') ls -sh test* 11M test.csv 16M test.h5 If I use an even larger dataset then the effect is even bigger. Using an HDFStore like below changes nothing. store = pd.HDFStore('test.h5', table=True) store['df'] = np.zeros((1000000,1)) store.close() Edit:...

Pandas/Pytable memory overhead when writing to hdf

pandas,hdf5,pytables
I use pandas and hdf5 files in order to handle big amounts of data (e.g. 10GB and more). I would like to use the table format in order to be able to query the data efficiently when reading it. However, when I want to write my data to an hdf...

Is there a way to return C++ datatypes as a variable?

c++,hdf5
I'm wondering whether I could do something like: return int So I don't want to return a variable of type int, but the datatype int itself. In case you're wondering whether this could be useful: I like to convert HDF5-datatypes to normal C++ datatypes. So I like to have method...

Why can't I read a C constant from Golang properly?

c,go,mingw,hdf5
I am using go-hdf5 to read an hdf5 file into golang. I am on windows7 using a pretty recent copy of mingw and hdf5 1.8.14_x86 and it seems like trying to use any of the predefined types doesn't work, let's focus for example on T_NATIVE_UINT64. I have reduced the issue...

How convert this type of data to something more readable in the python?

python,python-2.7,hdf5,h5py
I have quite big dataset. All information stored in the hdf5 format file. I found h5py library for python. All works properly except of the [<HDF5 object reference>] I have no idea how to convert it in something more readable. Can I do it at all ? Because documentation in...

Pandas multiindex and pytables… separate indexes or one concatenated index?

python,pandas,hdf5,pytables
What is the structure of a pandas multiindex on HDF5 when the data frame is saved to HDF5 through pytables? Are each of the parts a separate index or is there one concatenated index?

When reading huge HDF5 file with “pandas.read_hdf() ”, why do I still get MemoryError even though I read in chunks by specifying chunksize?

python,pandas,hdf5
Problem description: I use python pandas to read a few large CSV file and store it in HDF5 file, the resulting HDF5 file is about 10GB. The problem happens when reading it back. Even though I tried to read it back in chunks, I still get MemoryError. Here is How...

pandas read_hdf with 'where' condition limitation?

python,pandas,hdf5,pytables
I need to query an HDF5 file with where clause with 3 conditions, one of the condition is a list with a length of 30: myList = list(xrange(30)) h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString') The query above gives me ValueError: too many inputs and the error is...

Memory-efficient Benjamini-Hochberg FDR correction using numpy/h5py

python,numpy,statistics,hdf5,h5py
I am trying to calculate a set of FDR-corrected p-values using Benjamini & Hochberg's method. However, the vector I am trying to run this on contains over 10 billion values. Given the amount of data the normal method from statsmodel's multicomp module quickly runs out of memory. Looking at the...

'/' in names in HDF5 files confusion

python,pandas,hdf5,pytables,h5py
I am experiencing some really weird interactions between h5py, PyTables (via Pandas), and C++ generated HDF5 files. It seems that, h5check and h5py seem to cope with type names containing '/' but pandas/PyTables cannot. Clearly, there is a gap in my understanding, so: What have I not understood here? The...

Read HDF5 based file as a numpy array in Python

python,arrays,numpy,hdf5
How can I load in a .hws file as a numpy array? Based on the description in http://kingler.net/2007/05/22/90 which says it is a HDF5 based format, so I found https://confluence.slac.stanford.edu/display/PSDM/How+to+access+HDF5+data+from+Python might be useful. However, by following the instruction described in the page: hdf5_file_name = '/reg/d/psdm/XPP/xppcom10/hdf5/xppcom10-r0546.h5' dataset_name = '/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppSb4Pim.1:Tm6740.1/image' event_number =...

Concatenate two big pandas.HDFStore HDF5 files

python,pandas,hdf5,pytables
This question is somehow related to "Concatenate a large number of HDF5 files". I have several huge HDF5 files (~20GB compressed), which could not fit the RAM. Each of them stores several pandas.DataFrames of identical format and with indexes that do not overlap. I'd like to concatenate them to have...

How can I store an array or list of Strings in a PyTable?

python,database,hdf5,pytables
For example, I have the following table description. In SpectrumL, I would like to store a (spectrogram) I do not know the exact size yet. Similarly, I would like to store some tags (which will be strings) and their size will vary by record. However, when I try to execute...

Strings vs binary for storing variables inside the file format

c++,file,hdf5,dataformat
We aim at using HDF5 for our data format. HDF5 has been selected because it is a hierarchical filesystem-like cross-platform data format and it supports large amounts of data. The file will contain arrays and some parameters. The question is about how to store the parameters (which are not made...

Pandas and h5py load the same data (ndarray) differently

python,pandas,scipy,hdf5
I have a file in HDF5 format. It was created using the HDF5's C++ API using these: struct SignalDefH5 { char id [128]; char name [ 64]; char units[ 16]; float min; float max; hvl_t tags; /* This right there does not work in Pandas... */ }; struct TagDefH5 {...

How to install PyTables 2.3.1 with Anaconda, missing HDF5 library

python,pip,hdf5,pytables,conda
I need to run an older verion of PyTables, that is 2.3.1, in and Anaconda environment on Linux. But I cannot install it. conda install -n myenv pytables=2.3.1 fails finding the appropriate version. conda install -n myenv pytables=2 installs PyTables 2.4.0 successfully. But I need 2.3.1. Also activating the environment...

pandas to_hdf function get Illegal instruction

python,osx,pandas,docker,hdf5
It seems an OS X 10.10 and docker specific error. When I try import pandas as pd df = pd.DataFrame([[1,2,3], [2,3,4]], columns=['a', 'b', 'c']) df.to_hdf( 'test.h5', 'sites', data_columns=True, format='t', complevel=5, complib='blosc' ) I got error/message Illegal instruction. However, I tried the same code in OS X directly and the same...

Adding a new variable to a .mat file using the Python package hdf5storage

python,matlab,hdf5,hdf5storage
Is it possible to add a new variable to a .mat file (v7.3) using the Python package hdf5storage? Example: I wrote in Matlab: test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'}; save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py In Python I would like to add...

Pandas get specific rows from HDF5 by index

python,pandas,hdf5
I have a pandas DataFrame that I have written to an HDF5 file. The data is indexed by Timestamps and looks like this: In [5]: df Out[5]: Codes Price Size Time 2015-04-27 01:31:08-04:00 T 111.75 23 2015-04-27 01:31:39-04:00 T 111.80 23 2015-04-27 01:31:39-04:00 T 113.00 35 2015-04-27 01:34:14-04:00 T 113.00...

Library compiling errors with alternate build of gcc

gcc,ld,hdf5
I have some fortran programs that would not compile in old versions of gfortran. I have to run multiple instances of this program and am using another system (a cluster system) which has centos5_x64 with gcc-4.1 !! Therefore I had to build new version of gcc; I built both gcc-4.8.3...

HDF5 rename header field

hdf5
I have some HDF5 files those header fields are less than explicit. So, I would like to edit those to replace them with something else. For example, using h5dump --header file.h5 I get this: GROUP "/" { DATASET "log" { DATATYPE H5T_COMPOUND { H5T_COMPOUND { H5T_STD_U32LE "sec"; H5T_STD_U32LE "usec"; }...

Why if I put multiple empty Pandas series into hdf5 the size of hdf5 is so huge?

python,pandas,hdf5
If I create hdf5 file with pandas with following code: import pandas as pd store = pd.HDFStore("store.h5") for x in range(1000): store["name"+str(x)] = pd.Series() all series are empty, so why "store.h5" file takes 1.1GB space on hardrive? ...

Difference between str() and astype(str)?

string,python-3.x,pandas,hdf5
I want to save the dataframe df to the .h5 file MainDataFile.h5 : df.to_hdf ("c:/Temp/MainDataFile.h5", "MainData", mode = "w", format = "table", data_columns=['_FirstDayOfPeriod','Category','ChannelId']) and get the following error : *** Exception: cannot find the correct atom type -> > [dtype->object,items->Index(['Libellé_Article', 'Libellé_segment'], dtype='object')] If I modifify the column 'Libellé_Article' in this...

Resize HDF5 dataset in Julia

julia-lang,hdf5
Is there a way to resize a chunked dataset in HDF5 using Julia's HDF5.jl? I didn't see anything in the documentation. Looking through the source, all I found was set_dims!(), but that cannot extend a dataset (only shrink it). Does HDF5.jl have the ability to enlarge an existing (chunked) dataset?...

Get last row in pandas HDF5 query

python,pandas,hdf5
I am trying to get the index of the last row of a pandas dataframe stored in HDF5 without having to pull the whole dataset or index into memory. I am looking for something like this: from pandas import HDFStore store = HDFStore('file.h5') last_index = store.select('dataset', where='index == -1').index Except...

data exchange format ocaml to python numpy or pandas

python,numpy,ocaml,export-to-csv,hdf5
I'm generating time series data in ocaml which are basically long lists of floats, from a few kB to hundreds of MB. I would like to read, analyze and plot them using the python numpy and pandas libraries. Right now, i'm thinking of writing them to csv files. A binary...

Rename a column within a HDF5 file

python,hdf5,pytables,h5py,hdf5storage
I am looking to rename a column within one of my HDF5 file to something else but I cannot fathom how to do it. >>> h5 = h5py.File(hdf5file, 'r') >>> h5['/ook'].dtype dtype([('fubar', '<f4'), ... )] I want to rename 'fubar' to something else. Clearly, I want to rename all the...

Close an open h5py data file

python,ipython,hdf5,h5py
In our lab we store our data in hdf5 files trough the python package h5py. At the beginning of an experiment we create an hdf5 file and store array after array of array of data in the file (among other things). When an experiment fails or is interrupted the file...

ld not finding existing library

linux,ld,gfortran,hdf5
I am compiling a fortran code that requires hdf5 libraries which are installed in a local directory. This is my Makefile: FC = gfortran FCFLAGS = -g -fcheck=all -Wall -fdefault-real-8 INCLUDES = -I/home/bharat/hdf5/include LFLAGS = -L/home/bharat/hdf5/lib LIBS= -lhdf5_fortran main: main.o param.o dmotifs.o ssa.o $(FC) $(LFLAGS) $(LIBS) -o main $^ param.o:...

HDF gzip compression vs. ASCII gzip compression

c,gzip,hdf5
I have a 2D matrix with 1100x1600 data points. Initially, I stored it in an ascii-file which I tar-zipped using the command tar -cvzf ascii_file.tar.gz ascii_file Now, I wanted to switch to hdf5 files, but they are too large, at least in the way I am using them... First, I...

Adding a row to a Pandas DataFrame that would duplicate index

python,pandas,hdf5
I have a DataFrame with an index of type datetime objects. I am ultimately going to write this DataFrame to an HDF5 file using HDFStore.append. I am adding a lot of rows that need to be written to this HDF5 file. If i use HDFStore.append for every row, this takes...

rhdf5 package and arrays in R - storage mode list vs double

arrays,r,matrix,hdf5
I am using the package rhdf5 to build a large h5 with climate data for a specific geographic domain. Domain has a dimension of 48x47 (lonxlat) points in space. Climate variables (precipitation, temperature...) are organized in a matrix of 2256 rows (48*47=2256) and 248 columns (8 observation/day for a 31...

How to read a hdf5 file without knowing the database name in Matlab

matlab,hdf5
I have a hdf5 database but almost no experience with that kind of database. I need to open / load it in Matlab. However the Matlab function h5read requires two arguments data = h5read(filename,dataset) I know my Filename (obviously :) ) but I don't know the dataset name (because I...

Reading HDF5 files in Apache Spark

scala,apache-spark,hdf5
Is there a way to read HDF5 files using the Scala version of Spark? It looks like it can be done in Python (via Pyspark), but I can't find anything for Scala.