Getting started¶
Quick start¶
First, import the DataSet class from the dataset module using the import command at the top of your script:
>>> from geophpy.dataset import DataSet
You can a so import the complete dataset module content using (not recommended for script readability):
>>> from geophpy.dataset import *
Reading data from files
Read your data from an ASCII delimiter-separated values files using the
from_file()method from theDataSetclass:>>> # Opening data from an ascii file >>> dataset = DataSet.from_file(["test.dat"], fileformat='ascii', x_colnum=1, y_colnum=2, z_colnum=5)If not provided, the file format will be estimated from the file extension and the first three columns will be considered as the data (X, Y and Z respectively).
You can optionally give the file delimiter, the number of header lines to skip and the header lines that contains the different field names.
Displaying the dataset
Display your raw data as a scatter plot or interpolate it to display 2-D surface plot:
>>> # Raw data scatter plot >>> fig, cmap = dataset.plot(plottype='2D-SCATTER', cmmin=-20, cmmax=20) >>> fig.show()>>> # Raw data surface plot >>> dataset.interpolate(interpolation='linear') # or 'none' >>> fig, cmap = dataset.plot(plottype='2D-SURFACE', cmmin=-20, cmmax=20) >>> fig.show()
![]()
Fig. 1 Quick Start - Raw dataset scatter plot.¶
![]()
Fig. 2 Quick Start - Raw dataset surface plot.¶
You can also display the position of the measurement points in a plot or directly onto the 2-D surface plot:
>>> # Data point position (postmap plot) >>> fig, cmap = dataset.plot(plottype='2D-POSTMAP') >>> fig.show()>>> # Data point position onto the 2-D surface >>> fig, cmap = dataset.plot(plottype='2D-SURFACE', pointsdisplay=True) >>> fig.show()
![]()
Fig. 3 Quick Start - Raw dataset postmap plot.¶
![]()
Fig. 4 Quick Start - Raw dataset surface plot with data points.¶
See Plotting functions for the complete list of available plot possibilities.
General processing
Use the available processing (despike, destaggering, destriping etc.) to filter your
DataSet:>>> # Dataset destaggering >>> dataset.festoonfilt(corrmin=0.5, setmin=-20, setmax=20)>>> # Dataset destriping >>> dataset.destripecon(Nprof='all', setmin=-20, setmax=20)
![]()
Fig. 5 Quick Start - Dataset destaggering.¶
![]()
Fig. 6 Quick Start - Dataset destriping.¶
See General Processing for the complete list of available processing.
Method-specific processing
Use the method-specific available processing (for magnetic survey for instance):
>>> # Dataset reduction to the pole >>> dataset.polereduction()See Magnetic Processing for the complete list of available magnetic-survey-specific processing.
Saving data in a file
Save your data to a file using the
to_file()method from theDataSetclass:>>> # Saving to a NetCDF file >>> dataset.to_file("save.nc", description='My Processed Data')>>> # Saving to a Surfer Grid file >>> dataset.to_file("save.grd")>>> # Saving to an ascii file >>> dataset.to_file("save.dat", fileformat='ascii', delimiter=',')
Georeferencing grid nodes
Georeference your data using Ground Control Points (survey nodes known in both local and geographic coordinate systems). Use the
setgeoref()from theDataSetclass>>> # Reading GCPs file >>> import geophpy.geoposset as pset >>> gpos = pset.GeoPosSet.from_file("GCPs.csv") >>> gpos.plot(long_label=True)>>> # Georeferencing ungridded values >>> dataset.setgeoref('UTM', gpos.points_list, 'T', 32)>>> # Georeferencing gridded values >>> dataset.interpolate(x_step=0.5, y_step=0.25) >>> dataset.setgeoref('UTM', gpos.points_list, 'T', 32)
![]()
![]()
![]()
DataSet overview¶
All imported data are stored into a DataSet class that contains the different plotting and processing methods.
The DataSet class is composed of 3 objects:
DataSet.info
The
Infoclass that contains the informations about the dataset:
x_min= minimal x coordinate of the data set.
x_max= maximal x coordinate of the data set.
y_min= minimal y coordinate of the data set.
y_max= maximal y coordinate of the data set.
z_min= minimal z value of the data set.
z_max= maximal z value of the data set.
x_gridding_delta= delta between 2 x values in the interpolated image grid.
y_gridding_delta= delta between 2 y values in the interpolated image grid.
gridding_interpolation= interpolation name used for the building of the image grid.
DataSet.data
The
Dataclass that contains:
fields= the names of the fields (columns): [‘X’, ‘Y’, ‘Z’].
values= 2D array of raw values before interpolation (array with (x, y, z) values).
z_image= 2D array of the current gridded data values.
easting_image= None # easting grid (to use with z_image)
northing_image= None # northing grid(to use with z_image)
easting= None # easting array (to use with values)
northing= None # northing array (to use with values)
Warning
The z_image object is NOT AUTOMATICALLY BUILT after opening a file but by explicitly using the gridding interpolation method interpolate().
See Dataset operation for details.
DataSet.georef
The
GeoRefSystemclass object contains:
active= A flag for the georeferencing status.
refsystem= The georeferencing system (‘UTM’, ‘WGS84’, …).
utm_zoneletter= The optional UTM letter.
utm_zonenumber= The optional UTM zone number.
points_list= the list of the Ground Control Points coordinates in both the local and georeferenced system.
Opening files¶
All the reading possibilities are available through the from_file() method of the DataSet class.
GeophPy manages different types of files. You can obtain the list of accepted file formats with the command:
>>> from geophpy.dataset import fileformat_getlist
>>> fileformat_getlist()
['ascii', 'netcdf', 'surfer']
The file format is automatically recognized from the file extension using an internal dictionary:
>>> from geophpy.dataset import format_chooser
>>> format_chooser
{'.cdf': 'netcdf', '.nc': 'netcdf',
'.grd': 'surfer',
'.xyz': 'ascii', '.csv': 'ascii', '.txt': 'ascii', '.dat': 'ascii'}
If the file format is not in the dictionary or is not properly recognized from the extension, it can be forced to a specific format using the fileformat keyword of the from_file() method.
ASCII files
You can open comma-separated values (CSV) files, or any other delimiter-separated values files, by indicating the number and type of the columns of interest for the dataset to be processed:
>>> # Opening a ".dat" file from Geometrics Magnetometer G-858 >>> ## (format 'ascii' with delimiter ' ') >>> dataset = DataSet.from_file(["test.dat"], fileformat='ascii', delimiter=' ', z_colnum=5)Geometrics Magnetometer G-858 .dat file example:
>>> X Y TOP_RDG BOTTOM_RDG VRT_GRAD >>> 50.000 0.059 46406.028 46390.698 -23.585 >>> 50.000 0.178 46407.275 46394.028 -20.380 >>> 50.000 0.296 46409.165 46397.987 -17.197 >>> ... ... ... ... ...>>> # Opening an *.xyz* file >>> ## (column titles on the first line, >>> ## X, Y and data values on the others lines, >>> ## separated by a delimiter) >>> dataset = DataSet.from_file(["test.xyz"], fileformat='ascii', delimiter='\t', fields_row=1).xyz file examples:
>>> X Y Z >>> 0 0 0.34 >>> 0 1 -0.21 >>> 0 2 2.45 >>> ... ... ...>>> X,Y,Z >>> 0,0,0.34 >>> 0,1,-0.21 >>> 0,2,2.45 >>> ...,...,...You can also specify the number of header lines to skip or the specific columns for x, y and values:
>>> # Number of header lines to skip >>> dataset = DataSet.from_file(["test.txt"], fileformat='ascii', delimiter=',', skip_rows=4)>>> # Specific x, y and value column numbers >>> dataset = DataSet.from_file(["test.txt"], fileformat='ascii', delimiter=',', x_colnum=xcol, y_colnum=ycol, z_colnum=zcol)Note
If unspecified, the
delimiteris estimated directly from the file content and thefileformatis determined from the file extension.
Surfer Grid files
GeophPy manages Golden Software Surfer Grid files (Surfer 7 binary grids, Surfer 6 binary grids and Surfer 6 ASCII grids). The grid type is automatically determined from the .grd file. To open a Surfer Grid simply use:
>>> # Opening a Surfer Grid file >>> dataset = DataSet.from_file(["test.grd"])
NetCDF files
Previously processed dataset are by default save in NetCDF format (.nc). To open previously processed datasets, simply use:
>>> # Opening previously processed dataset (.nc) >>> dataset = DataSet.from_file(["dataset1.nc"])
Concatenating Multiple files
It is possible to build a dataset from a concatenation of severals ASCII files of the same format:
>>> # Opening several selected files >>> dataset = DataSet.from_file(["file1.dat","file2.dat"], format='ascii', delimiter=' ', z_colnum = 5)>>> # Opening all files beginning by "file" >>> dataset = DataSet.from_file(["file*.dat"], format='ascii', delimiter=' ', z_colnum = 5)Note
When reading multiple files directly using the
from_file()method, no edge-matching method are used so the original limits of the datasets in the mosaic maybe highly visible.
Checking files compatibility
Opening several files to build a data set needs to make sure that all files selected are in the same format.
It’s possible to check it by reading the headers of each files:
>>> compatibility = True >>> columns_nb = None >>> for file in fileslist : >>> col_nb, rows = getlinesfrom_file(file) >>> if ((columns_nb != None) and (col_nb != columns_nb)) : >>> compatibility = False >>> break >>> else : >>> columns_nb = col_nb
Dataset operation¶
Besides actual DataSet processing, basic DataSet operations (geometrical transformation, math operations, interpolation etc.) are available through simple commands.
Duplicating dataset
Duplicate a
DataSetbefore processing it to save the raw data:>>> rawdataset = dataset.copy()
Dataset interpolation
Interpolate the data value with severals gridding interpolation methods (‘none’, ‘nearest’, ‘linear’, ‘cubic’) to build the dataset
z_imageobject:>>> dataset.interpolate(interpolation="none")
![]()
Fig. 7 Quick Start - Dataset interpolation (‘none’)¶
See High level API for calling details.
Retrieving grid coordinates
The
DataSetgrid (z_image) coordinate vectors and matrices can be retrieved with the following commands:>>> # Grid coordinate matrices >>> dataset.get_xgrid() # x-coordinate matrix >>> dataset.get_ygrid() # x-coordinate matrix >>> dataset.get_xygrid() # both x and y-coordinate matrices>>> # Grid coordinate vectors >>> dataset.get_xvect() # x-coordinate vector >>> dataset.get_yvect() # y-coordinate vector >>> dataset.get_xyvect() # both x and y-coordinate vectors >>> dataset.get_gridextent() # xmin, xmax, ymin, ymax >>> dataset.get_gridcorners() # corners x and y-coordinatesThe
DataSetungridded values (values) can be retrieved with the following commands:>>> # Data sample coordinates >>> dataset.get_xvalues() # x-coordinates >>> dataset.get_yvalues() # y-coordinates >>> dataset.get_yvalues() # x, y-coordinates >>> dataset.get_values() # data values >>> dataset.get_xyzvalues() # both x, y-coordinates and data values>>> # Bounding box >>> dataset.get_boundingbox() # corners coordinates (equivalent get_gridcorners() for a gridded dataset)See High level API for calling details.
Basic math operations
You can add or multiply the
DataSetvalues by a constant with the following commands:>>> # Dataset addition/subtraction >>> dataset.add(val=14, valfilt=True, zimfilt=True) >>> dataset.add(val=-14, valfilt=True, zimfilt=True)>>> # Dataset multiplication/division >>> dataset.times(val=30, valfilt=True, zimfilt=True) >>> dataset.add(val=1/30, valfilt=True, zimfilt=True)See High level API for calling details.
Basic geometrical operations
You can translate or rotate the
DataSetwith the following commands:>>> # Dataset translation >>> dataset.translate(shiftx=20, shifty=-19)>>> # Dataset rotation >>> dataset.rotate(angle=90, center='BL')See High level API for calling details.
Saving DataSet¶
You can save the DataSet in different a file formats using the to_file() method of the DataSet class.
For the time being, only three formats are available. The list of the available file formats can be obtained with the command:
>>> from geophpy.dataset import fileformat_getlist
>>> fileformat_getlist()
['ascii', 'netcdf', 'surfer']
When saving a file, the file format is automatically recognized from the file extension using an internal dictionary:
>>> from geophpy.dataset import format_chooser
>>> format_chooser
{'.cdf': 'netcdf', '.nc': 'netcdf',
'.grd': 'surfer',
'.xyz': 'ascii', '.csv': 'ascii', '.txt': 'ascii', '.dat': 'ascii'}
If the file format is not in the dictionary or is not properly recognized from the extension, it can be forced to a specific format using the fileformat keyword of the from_file() method.
NetCDF files
Saving your data in NetCDF file format allow the conservation of the gridded dataset, the dataset values and the georeferencing system. An optional description can be added using the
descriptionkeyword.>>> dataset.to_file('save.nc') >>> dataset.to_file('save.nc', description='My dataset example') >>> dataset.to_file('save.extension', fileformat='netcdf')
Surfer Grid files
Saving your data in Surfer Grid format only conserves the gridded dataset. The different available grid types can be obtained using the command:
>>> gridtype_getlist() ['surfer7bin', 'surfer6bin', 'surfer6ascii']By default, the Surfer 7 binary grid type is used but you can use the
gridtypekeyword of thefrom_file()method to choose another grid type:>>> dataset.to_file('save.grd') >>> dataset.to_file('save.extension', fileformat='surfer') >>> dataset.to_file('save.grd', gridtype='surfer6ascii')
Ascii files
Saving your data in Surfer Grid format only conserves the dataset (ungridded) values.
>>> dataset.to_file('save.csv') >>> dataset.to_file('save.csv', delimiter='\t') >>> dataset.to_file('save.extension', delimiter='\t', fileformat='ascii')