Chapter 1: Introduction
1.4. Open DM3 Images, Spectra, Spectrum-Images and Image-Stacks with pyNSID#
part of
MSE672: Introduction to Transmission Electron Microscopy
Spring 2024
Gerd Duscher | Khalid Hattar |
Microscopy Facilities | Tennessee Ion Beam Materials Laboratory |
Materials Science & Engineering | Nuclear Engineering |
Institute of Advanced Materials & Manufacturing | |
Background and methods to analysis and quantification of data acquired with transmission electron microscopes.
Reading a dm file and translating the data in a pyNSID style hf5py file to be compatible with the pycroscopy package.
Because, many other packages and programs for TEM data manipulation are based on the hdf5
file-formats it is relatively easy to convert back and forward between them.
1.4.1. Import packages for figures and#
1.4.1.1. Check Installed Packages#
import sys
import importlib.metadata
def test_package(package_name):
"""Test if package exists and returns version or -1"""
try:
version = importlib.metadata.version(package_name)
except importlib.metadata.PackageNotFoundError:
version = '-1'
return version
# pyTEMlib setup ------------------
if test_package('pyTEMlib') < '0.2024.1.0':
print('installing pyTEMlib')
!{sys.executable} -m pip install --upgrade git+https://github.com/pycroscopy/pyTEMlib.git@main -q --upgrade
if 'google.colab' in sys.modules:
!{sys.executable} -m pip install numpy==1.24.4
# ------------------------------
print('done')
installing pyTEMlib
/bin/bash: line 1: {sys.executable}: command not found
done
1.4.1.2. Load the plotting and figure packages#
%matplotlib widget
import matplotlib.pylab as plt
import numpy as np
import sys
import pyTEMlib
import pyTEMlib.file_tools as ft # File input/ output library
import sidpy
import pyNSID
import h5py
if 'google.colab' in sys.modules:
from google.colab import output
output.enable_custom_widget_manager()
from google.colab import drive
drive.mount("/content/drive")
# For archiving reasons it is a good idea to print the version numbers out at this point
print('pyTEM version: ',pyTEMlib.__version__)
__notebook__='CH1_04-Reading_File'
__notebook_version__='2024_01_09'
You don't have igor2 installed. If you wish to open igor files, you will need to install it (pip install igor2) before attempting.
You don't have gwyfile installed. If you wish to open .gwy files, you will need to install it (pip install gwyfile) before attempting.
Symmetry functions of spglib enabled
pyTEM version: 0.2023.9.1
1.4.2. Open a file#
This function opens a hfd5 file in the pyNSID style which enables you to keep track of your data analysis.
Please see the Installation notebook for installation.
We want to consolidate files into one dataset that belongs together. For example a spectrum image dataset consists of:
Survey image,
EELS spectra
Z-contrast image acquired simultaneously with the spectra.
So load the top dataset first in the above example the survey image.
Please note that the plotting routine of matplotlib
was introduced in Matplotlib and Numpy for Micrographs notebook.
Use the file p1-3hr.dm3 from TEM_data directory for a practice run
# ------ Input ------- #
load_example = True
# -------------------- #
# Open file widget and select file which will be opened in code cell below
if not load_example:
drive_directory = ft.get_last_path()
file_widget = ft.FileWidget(drive_directory)
if load_example:
file_name = '../example_data/p1-3-hr3.dm3'
datasets = ft.open_file(file_name)
main_dataset = datasets[list(datasets.keys())[0]]
else:
main_dataset = file_widget.selected_dataset
datasets = file_widget.datasets
view = main_dataset.plot()
Please use new SciFiReaders Package for full functionality
1.4.3. Data Structure#
The data themselves reside in a sidpy dataset
which we name current_dataset
.
The current_dataset has additional information stored as attributes which can be accessed through their name.
print(main_dataset)
main_dataset
sidpy.Dataset of type IMAGE with:
dask.array<array, shape=(2048, 2048), dtype=int32, chunksize=(2048, 2048), chunktype=numpy.ndarray>
data contains: intensity (counts)
and Dimensions:
x: distance (nm) of size (2048,)
y: distance (nm) of size (2048,)
with metadata: ['experiment', 'filename']
|
print(f'size of current dataset is {main_dataset.shape}')
size of current dataset is (2048, 2048)
The current_dataset has additional information stored as attributes which can be accessed through their name.
There are two dictionaries within that attributes:
metadata
original_metadata
which contain additional information about the data
print('title: ', main_dataset.title)
print('data type: ', main_dataset.data_type)
for key in datasets:
print(key)
print(datasets[key].original_metadata.keys())
main_dataset.metadata
title: p1-3-hr3
data type: DataType.IMAGE
Channel_000
dict_keys(['ImageData', 'ImageTags', 'Name', 'UniqueID', 'DM', 'original_filename', 'ApplicationBounds', 'DocumentObjectList', 'DocumentTags', 'HasWindowPosition', 'Image Behavior', 'ImageSourceList', 'InImageMode', 'MinVersionList', 'NextDocumentObjectID', 'Page Behavior', 'SentinelList', 'Thumbnails', 'WindowPosition', 'original_title'])
Copied_of_Channel_000
dict_keys([])
Log_000
dict_keys([])
Log_001
dict_keys([])
{'experiment': {'exposure_time': 1.0,
'microscope': 'Libra 200 MC',
'acceleration_voltage': 199990.28125},
'filename': '../example_data/p1-3-hr3.dm3'}
1.4.4. Data Structure#
The datasets variable is a dictionary (like a directory in a file system) which containes contains datasets.
Below I show how to access one of those datasets with a pull down menu.
chooser = ft.ChooseDataset(datasets)
current_dataset = chooser.dataset
view = current_dataset.plot()
An important attribute in current_dataset
is the original_metadata
group, where all the original metadata of your file reside in the attributes
. This is usually a long list for dm3
files.
current_dataset.original_metadata.keys()
dict_keys(['ImageData', 'ImageTags', 'Name', 'UniqueID', 'DM', 'original_filename', 'ApplicationBounds', 'DocumentObjectList', 'DocumentTags', 'HasWindowPosition', 'Image Behavior', 'ImageSourceList', 'InImageMode', 'MinVersionList', 'NextDocumentObjectID', 'Page Behavior', 'SentinelList', 'Thumbnails', 'WindowPosition', 'original_title'])
The original_metadata attribute has all information stored from the orginal file.
No information will get lost
for key,value in current_dataset.original_metadata.items():
print(key, value)
print(current_dataset.h5_dataset)
ImageData {'Calibrations': {'Brightness': {'Origin': 0.0, 'Scale': 1.0, 'Units': ''}, 'Dimension': {'0': {'Origin': 0.0, 'Scale': 0.03894666209816933, 'Units': 'nm'}, '1': {'Origin': 0.0, 'Scale': 0.03894666209816933, 'Units': 'nm'}}, 'DisplayCalibratedUnits': 1}, 'Data': 'read', 'DataType': 7, 'Dimensions': {'0': 2048, '1': 2048}, 'PixelDepth': 4}
ImageTags {'Acquisition': {'Device': {'Active Size (pixels)': [2048, 2048], 'Camera Number': 0, 'CCD': {'Pixel Size (um)': [14.0, 14.0]}, 'Configuration': {'Transpose': {'Diagonal Flip': 0, 'Horizontal Flip': 1, 'Vertical Flip': 0}}, 'Name': 'US1000XP 1', 'Source': 'US1000XP 1'}, 'Frame': {'Area': {'Transform': {'Class Name': 'cm_acquisitiontransform_list', 'Transform List': {'0': {'Binning': [1, 1], 'Class Name': 'cm_acquisitiontransform', 'Sub Area Adjust': [0, 0, 0, 0], 'Transpose': {'Diagonal Flip': 0, 'Horizontal Flip': 1, 'Vertical Flip': 0}}}}}, 'CCD': {'Pixel Size (um)': [14.0, 14.0]}, 'Intensity': {'Range': {'Bias (counts)': 250.0, 'Dark Current (counts/s)': 0.0, 'Dark Level (counts)': 250.0, 'Maximum Value (counts)': 65535.0, 'Minimum Value (counts)': 0.0, 'Saturation Level (counts)': 65785.0}, 'Transform': {'Class Name': 'cm_valuetransform_list', 'Transform List': {'0': {'Class Name': 'cm_valuetransform_affine', 'Offset': 250.0, 'Scale': 1.0}, '1': {'ADC Max': 65535.0, 'ADC Min': 0.0, 'Class Name': 'cm_valuetransform_adc'}}}}, 'Reference Images': {'Dark': {'Mean (counts)': 251.31182837486267, 'Standard Deviation (counts)': 3.28830754005297}}, 'Sequence': {'Acquisition Start Time (epoch)': 1528991109318.0, 'Exposure Start (ns)': 130828979.37529667, 'Exposure Time (ns)': 1000000000.0, 'Frame Index': 0, 'Frame Start (ns)': 130828979.37529667, 'Frame Time (ns)': 3482273464.1519103, 'Readout Time (ns)': 3482273464.1519103}}, 'Parameters': {'Acquisition Write Flags': 4294967295, 'Base Detector': {'Class Name': 'cm_namedcameradetectorparameterset', 'Name': 'default'}, 'Detector': {'continuous': 0, 'exposure (s)': 1.0, 'hbin': 1, 'height': 2048, 'left': 0, 'top': 0, 'vbin': 1, 'width': 2048}, 'Environment': {'Mode Name': 'Imaging'}, 'High Level': {'Acquisition Buffer Size': 0, 'Antiblooming': 0, 'Binning': [1, 1], 'CCD Read Area': [0, 0, 2048, 2048], 'CCD Read Ports': 1, 'Choose Number Of Frame Shutters Automatically': 1, 'Class Name': 'cm_camera_highlevelparameters', 'Continuous Readout': 0, 'Corrections': 817, 'Corrections Mask': 817, 'Exposure (s)': 1.0, 'Number Of Frame Shutters': 1, 'Processing': 'Gain Normalized', 'Quality Level': 1, 'Read Frame Style': 0, 'Read Mode': 0, 'Secondary Shutter Post Exposure Compensation (s)': 0.0, 'Secondary Shutter Pre Exposure Compensation (s)': 0.0, 'Shutter': {'Primary Shutter States': 0, 'Primary Shutter States Mask': 0, 'Secondary Shutter States': 0, 'Secondary Shutter States Mask': 0, 'Shutter Exposure': 0, 'Shutter Index': 0}, 'Shutter Post Exposure Compensation (s)': 0.0, 'Shutter Pre Exposure Compensation (s)': 0.0, 'Transform': {'Diagonal Flip': 0, 'Horizontal Flip': 0, 'Vertical Flip': 0}}, 'Objects': {'0': {'Class Name': 'cm_autoexpose_acquireimage', 'Do Auto Expose': 0, 'Maximum Exposure (s)': 5.0, 'Minimum Exposure (s)': 0.1, 'Target Intensity (%)': 50.0, 'Test Exposure (s)': 0.1}, '1': {'Class Name': 'cm_imgproc_finalcombine', 'Frame Combine Style': 'Copy', 'Parameter 1': 1.0}, '2': {'Class Name': 'cm_stdviewerimagedisplayer', 'Do Auto Zoom': 0, 'Screen Relative Position': [1.0, 0.0], 'Version': 33947648, 'View Name': 'Frame', 'Viewer Class': 'acquire', 'Window Relative Position': [1.0, 0.0], 'Zoom': 1.0}, '3': {'Class Name': 'cm_imgproc_histogram'}}, 'Parameter Set Name': 'Record', 'Parameter Set Tag Path': 'Imaging:Acquire:Record', 'Version': 33947648}}, 'DataBar': {'Acquisition Date': '6/14/2018', 'Acquisition Time': '11:45:13 AM', 'Acquisition Time (OS)': 1.3173464713128774e+17, 'Binning': 1, 'Custom elements': {}, 'Device Name': 'US1000XP 1', 'Exposure Number': 13083663, 'Exposure Time (s)': 1.0}, 'Microscope Info': {'Actual Magnification': 359465.9784890286, 'Cs(mm)': 2.2, 'Emission Current (A)': 230.0, 'Formatted Indicated Mag': '315kx', 'Formatted Voltage': '200.0kV', 'HT Extrapolated': 0, 'Illumination Mode': 'TEM', 'Imaging Mode': 'Image Mag', 'Indicated Magnification': 315000.0, 'Items': {'0': {'Data Type': 20, 'Label': 'Specimen', 'Tag path': 'Microscope Info:Specimen', 'Value': 'Fe-9Cr(0.3Y)-3E10(17)-475C'}, '1': {'Data Type': 20, 'Label': 'Operator', 'Tag path': 'Microscope Info:Operator', 'Value': 'Tengfei Yang'}, '2': {'Data Type': 20, 'Label': 'Microscope', 'Tag path': 'Microscope Info:Microscope', 'Value': 'Libra 200 MC'}}, 'Magnification Interpolated': 0, 'Microscope': 'Libra 200 MC', 'Name': 'Libra COM', 'Operation Mode': 'IMAGING', 'Operator': 'Tengfei Yang', 'Probe Current (nA)': 0.0, 'Probe Size (nm)': 0.0, 'Specimen': 'Fe-9Cr(0.3Y)-3E10(17)-475C', 'STEM Camera Length': 94.49999779462814, 'Voltage': 199990.28125}}
Name p1-3-hr3
UniqueID {'0': 3677084, '1': 1068567676, '2': 1927893732, '3': 2094808766}
DM {'dm_version': 3, 'file_size': 17382688, 'full_file_name': '../example_data/p1-3-hr3.dm3'}
original_filename ../example_data/p1-3-hr3.dm3
ApplicationBounds [0, 0, 1465, 2236]
DocumentObjectList {'0': {'AnnotationGroupList': {'0': {'AnnotationType': 31, 'BackgroundColor': [0, 0, 0], 'BackgroundMode': 1, 'FillMode': 1, 'Font': {'Attributes': 7, 'FamilyName': 'Arial Narrow', 'Size': 85}, 'ForegroundColor': [-1, -1, -1], 'HasBackground': 1, 'IsMoveable': 1, 'IsResizable': 1, 'IsSelectable': 1, 'IsTranslatable': 1, 'IsVisible': 1, 'ObjectTags': {}, 'Rectangle': [1768.0, 128.0, 1920.0, 1088.0], 'TextOffsetH': 1.0, 'TextOffsetV': 1.0, 'TextWidth': 195.18429565429688, 'UniqueID': 9}}, 'AnnotationType': 20, 'BackgroundColor': [-1, -1, -1], 'BackgroundMode': 2, 'FillMode': 1, 'ForegroundColor': [-1, 0, -32640], 'HasBackground': 0, 'ImageDisplayInfo': {'BrightColor': [-1, -1, -1], 'Brightness': 0.5, 'CaptionOn': 0, 'CaptionSize': 10, 'CLUT': [[0, 0, 0], [257, 257, 257], [514, 514, 514], [771, 771, 771], [1028, 1028, 1028], [1285, 1285, 1285], [1542, 1542, 1542], [1799, 1799, 1799], [2056, 2056, 2056], [2313, 2313, 2313], [2570, 2570, 2570], [2827, 2827, 2827], [3084, 3084, 3084], [3341, 3341, 3341], [3598, 3598, 3598], [3855, 3855, 3855], [4112, 4112, 4112], [4369, 4369, 4369], [4626, 4626, 4626], [4883, 4883, 4883], [5140, 5140, 5140], [5397, 5397, 5397], [5654, 5654, 5654], [5911, 5911, 5911], [6168, 6168, 6168], [6425, 6425, 6425], [6682, 6682, 6682], [6939, 6939, 6939], [7196, 7196, 7196], [7453, 7453, 7453], [7710, 7710, 7710], [7967, 7967, 7967], [8224, 8224, 8224], [8481, 8481, 8481], [8738, 8738, 8738], [8995, 8995, 8995], [9252, 9252, 9252], [9509, 9509, 9509], [9766, 9766, 9766], [10023, 10023, 10023], [10280, 10280, 10280], [10537, 10537, 10537], [10794, 10794, 10794], [11051, 11051, 11051], [11308, 11308, 11308], [11565, 11565, 11565], [11822, 11822, 11822], [12079, 12079, 12079], [12336, 12336, 12336], [12593, 12593, 12593], [12850, 12850, 12850], [13107, 13107, 13107], [13364, 13364, 13364], [13621, 13621, 13621], [13878, 13878, 13878], [14135, 14135, 14135], [14392, 14392, 14392], [14649, 14649, 14649], [14906, 14906, 14906], [15163, 15163, 15163], [15420, 15420, 15420], [15677, 15677, 15677], [15934, 15934, 15934], [16191, 16191, 16191], [16448, 16448, 16448], [16705, 16705, 16705], [16962, 16962, 16962], [17219, 17219, 17219], [17476, 17476, 17476], [17733, 17733, 17733], [17990, 17990, 17990], [18247, 18247, 18247], [18504, 18504, 18504], [18761, 18761, 18761], [19018, 19018, 19018], [19275, 19275, 19275], [19532, 19532, 19532], [19789, 19789, 19789], [20046, 20046, 20046], [20303, 20303, 20303], [20560, 20560, 20560], [20817, 20817, 20817], [21074, 21074, 21074], [21331, 21331, 21331], [21588, 21588, 21588], [21845, 21845, 21845], [22102, 22102, 22102], [22359, 22359, 22359], [22616, 22616, 22616], [22873, 22873, 22873], [23130, 23130, 23130], [23387, 23387, 23387], [23644, 23644, 23644], [23901, 23901, 23901], [24158, 24158, 24158], [24415, 24415, 24415], [24672, 24672, 24672], [24929, 24929, 24929], [25186, 25186, 25186], [25443, 25443, 25443], [25700, 25700, 25700], [25957, 25957, 25957], [26214, 26214, 26214], [26471, 26471, 26471], [26728, 26728, 26728], [26985, 26985, 26985], [27242, 27242, 27242], [27499, 27499, 27499], [27756, 27756, 27756], [28013, 28013, 28013], [28270, 28270, 28270], [28527, 28527, 28527], [28784, 28784, 28784], [29041, 29041, 29041], [29298, 29298, 29298], [29555, 29555, 29555], [29812, 29812, 29812], [30069, 30069, 30069], [30326, 30326, 30326], [30583, 30583, 30583], [30840, 30840, 30840], [31097, 31097, 31097], [31354, 31354, 31354], [31611, 31611, 31611], [31868, 31868, 31868], [32125, 32125, 32125], [32382, 32382, 32382], [32639, 32639, 32639], [-32640, -32640, -32640], [-32383, -32383, -32383], [-32126, -32126, -32126], [-31869, -31869, -31869], [-31612, -31612, -31612], [-31355, -31355, -31355], [-31098, -31098, -31098], [-30841, -30841, -30841], [-30584, -30584, -30584], [-30327, -30327, -30327], [-30070, -30070, -30070], [-29813, -29813, -29813], [-29556, -29556, -29556], [-29299, -29299, -29299], [-29042, -29042, -29042], [-28785, -28785, -28785], [-28528, -28528, -28528], [-28271, -28271, -28271], [-28014, -28014, -28014], [-27757, -27757, -27757], [-27500, -27500, -27500], [-27243, -27243, -27243], [-26986, -26986, -26986], [-26729, -26729, -26729], [-26472, -26472, -26472], [-26215, -26215, -26215], [-25958, -25958, -25958], [-25701, -25701, -25701], [-25444, -25444, -25444], [-25187, -25187, -25187], [-24930, -24930, -24930], [-24673, -24673, -24673], [-24416, -24416, -24416], [-24159, -24159, -24159], [-23902, -23902, -23902], [-23645, -23645, -23645], [-23388, -23388, -23388], [-23131, -23131, -23131], [-22874, -22874, -22874], [-22617, -22617, -22617], [-22360, -22360, -22360], [-22103, -22103, -22103], [-21846, -21846, -21846], [-21589, -21589, -21589], [-21332, -21332, -21332], [-21075, -21075, -21075], [-20818, -20818, -20818], [-20561, -20561, -20561], [-20304, -20304, -20304], [-20047, -20047, -20047], [-19790, -19790, -19790], [-19533, -19533, -19533], [-19276, -19276, -19276], [-19019, -19019, -19019], [-18762, -18762, -18762], [-18505, -18505, -18505], [-18248, -18248, -18248], [-17991, -17991, -17991], [-17734, -17734, -17734], [-17477, -17477, -17477], [-17220, -17220, -17220], [-16963, -16963, -16963], [-16706, -16706, -16706], [-16449, -16449, -16449], [-16192, -16192, -16192], [-15935, -15935, -15935], [-15678, -15678, -15678], [-15421, -15421, -15421], [-15164, -15164, -15164], [-14907, -14907, -14907], [-14650, -14650, -14650], [-14393, -14393, -14393], [-14136, -14136, -14136], [-13879, -13879, -13879], [-13622, -13622, -13622], [-13365, -13365, -13365], [-13108, -13108, -13108], [-12851, -12851, -12851], [-12594, -12594, -12594], [-12337, -12337, -12337], [-12080, -12080, -12080], [-11823, -11823, -11823], [-11566, -11566, -11566], [-11309, -11309, -11309], [-11052, -11052, -11052], [-10795, -10795, -10795], [-10538, -10538, -10538], [-10281, -10281, -10281], [-10024, -10024, -10024], [-9767, -9767, -9767], [-9510, -9510, -9510], [-9253, -9253, -9253], [-8996, -8996, -8996], [-8739, -8739, -8739], [-8482, -8482, -8482], [-8225, -8225, -8225], [-7968, -7968, -7968], [-7711, -7711, -7711], [-7454, -7454, -7454], [-7197, -7197, -7197], [-6940, -6940, -6940], [-6683, -6683, -6683], [-6426, -6426, -6426], [-6169, -6169, -6169], [-5912, -5912, -5912], [-5655, -5655, -5655], [-5398, -5398, -5398], [-5141, -5141, -5141], [-4884, -4884, -4884], [-4627, -4627, -4627], [-4370, -4370, -4370], [-4113, -4113, -4113], [-3856, -3856, -3856], [-3599, -3599, -3599], [-3342, -3342, -3342], [-3085, -3085, -3085], [-2828, -2828, -2828], [-2571, -2571, -2571], [-2314, -2314, -2314], [-2057, -2057, -2057], [-1800, -1800, -1800], [-1543, -1543, -1543], [-1286, -1286, -1286], [-1029, -1029, -1029], [-772, -772, -772], [-515, -515, -515], [-258, -258, -258], [-1, -1, -1]], 'CLUTName': 'Greyscale', 'ComplexMode': 4, 'ComplexRange': 1000.0, 'Contrast': 0.5, 'ContrastMode': 1, 'DimensionLabels': {'0': ''}, 'DoAutoSurvey': 1, 'EstimatedMax': 656.0, 'EstimatedMaxTrimPercentage': 0.0010000000474974513, 'EstimatedMin': 2354.0, 'EstimatedMinTrimPercentage': 0.0010000000474974513, 'Gamma': 0.5, 'HighLimit': 2107.03125, 'HiLimitContrastDeltaTriggerPercentage': 0.0, 'IsInverted': 0, 'LowLimit': 718.9080810546875, 'LowLimitContrastDeltaTriggerPercentage': 0.0, 'MainSliceId': {'0': 0}, 'MinimumContrast': 0.0, 'RangeAdjust': 1.0, 'SparseSurvey_GridSize': 16, 'SparseSurvey_NumberPixels': 32, 'SparseSurvey_UseNumberPixels': 1, 'SurveyTechique': 2}, 'ImageDisplayType': 1, 'ImageSource': 0, 'IsMoveable': 1, 'IsResizable': 1, 'IsSelectable': 1, 'IsTranslatable': 1, 'IsVisible': 1, 'ObjectTags': {}, 'Rectangle': [0.0, 0.0, 1427.0, 1427.0], 'UniqueID': 8}}
DocumentTags {}
HasWindowPosition 1
Image Behavior {'DoIntegralZoom': 0, 'ImageDisplayBounds': [0.0, 0.0, 1427.0, 1427.0], 'IsZoomedToWindow': 1, 'UnscaledTransform': {'Offset': [0.0, 0.0], 'Scale': [1.0, 1.0]}, 'ViewDisplayID': 8, 'WindowRect': [0.0, 0.0, 1427.0, 1427.0], 'ZoomAndMoveTransform': {'Offset': [0.0, 0.0], 'Scale': [1.0, 1.0]}}
ImageSourceList {'0': {'ClassName': 'ImageSource:Simple', 'Id': {'0': 0}, 'ImageRef': 1}}
InImageMode 1
MinVersionList {'0': {'RequiredVersion': 50659328}}
NextDocumentObjectID 10
Page Behavior {'DoIntegralZoom': 0, 'DrawMargins': 1, 'DrawPaper': 1, 'IsFixedInPageMode': 0, 'IsZoomedToWindow': 1, 'LayedOut': 0, 'PageTransform': {'Offset': [0.0, 0.0], 'Scale': [1.0, 1.0]}, 'RestoreImageDisplayBounds': [0.0, 0.0, 2048.0, 2048.0], 'RestoreImageDisplayID': 8, 'TargetDisplayID': 4294967295}
SentinelList {}
Thumbnails {'0': {'ImageIndex': 0, 'SourceSize_Pixels': [1427, 1427]}}
WindowPosition [30, 801, 1457, 2228]
original_title p1_3_hr3
None
Any python object will provide a help.
help(current_dataset)
Help on Dataset in module sidpy.sid.dataset object:
class Dataset(dask.array.core.Array)
| Dataset(*args, **kwargs)
|
| ..autoclass::Dataset
|
| To instantiate from an existing array-like object,
| use :func:`Dataset.from_array` - requires numpy array, list or tuple
|
| This dask array is extended to have the following attributes:
| -data_type: DataTypes ('image', 'image_stack', spectral_image', ...
| -units: str
| -quantity: str what kind of data ('intensity', 'height', ..)
| -title: title of the data set
| -modality: character of data such as 'STM, 'AFM', 'TEM', 'SEM', 'DFT', 'simulation', ..)
| -source: origin of data such as acquisition instrument ('Nion US100', 'VASP', ..)
| -_axes: dictionary of Dimensions one for each data dimension
| (the axes are dimension datasets with name, label, units,
| and 'dimension_type' attributes).
|
| -metadata: dictionary of additional metadata
| -original_metadata: dictionary of original metadata of file,
|
| -labels: returns labels of all dimensions.
| -data_descriptor: returns a label for the colorbar in matplotlib and such
|
| functions:
| -from_array(data, title): constructs the dataset form an array like object (numpy array, dask array, ...)
| -like_data(data,title): constructs the dataset form an array like object and copies attributes and
| metadata from parent dataset
| -copy()
| -plot(): plots dataset dependent on data_type and dimension_types.
| -get_extent(): extent to be used with imshow function of matplotlib
| -set_dimension(axis, dimensions): set a Dimension to a specific axis
| -rename_dimension(dimension, name): renames attribute of dimension
| -view_metadata: pretty plot of metadata dictionary
| -view_original_metadata: pretty plot of original_metadata dictionary
|
| Method resolution order:
| Dataset
| dask.array.core.Array
| dask.base.DaskMethodsMixin
| builtins.object
|
| Methods defined here:
|
| __abs__(self)
|
| __add__(self, other)
|
| __and__(self, other)
|
| __array_ufunc__(self, numpy_ufunc, method, *inputs, **kwargs)
|
| __div__(self, other)
|
| __eq__(self, other)
| Return self==value.
|
| __floordiv__(self, other)
|
| __ge__(self, other)
| Return self>=value.
|
| __getitem__(self, idx)
|
| __gt__(self, other)
| Return self>value.
|
| __init__(self, *args, **kwargs)
| Initializes Dataset object which is essentially a Dask array
| underneath
|
| Attributes
| ----------
| self.quantity : str
| Physical quantity. E.g. - current
| self.units : str
| Physical units. E.g. - amperes
| self.data_type : enum
| Type of data such as Image, Spectrum, Spectral Image etc.
| self.title : str
| Title for Dataset
| self._structures : dict
| dictionary of ase.Atoms objects to represent structures, can be given a name
| self.view : Visualizer
| Instance of class appropriate for visualizing this object
| self.data_descriptor : str
| Description of this dataset
| self.modality : str
| character of data such as 'STM', 'TEM', 'DFT'
| self.source : str
| Source of this dataset. Such as instrument, analysis, etc.?
| self.h5_dataset : h5py.Dataset
| Reference to HDF5 Dataset object from which this Dataset was
| created
| self._axes : dict
| Dictionary of Dimension objects per dimension of the Dataset
| self.meta_data : dict
| Metadata to store relevant additional information for the dataset.
| self.original_metadata : dict
| Metadata from the original source of the dataset. This dictionary
| often contains the vendor-specific metadata or internal attributes
| of the analysis algorithm
|
| __invert__(self)
|
| __le__(self, other)
| Return self<=value.
|
| __lshift__(self, other)
|
| __lt__(self, other)
| Return self<value.
|
| __matmul__(self, other)
|
| __mod__(self, other)
|
| __mul__(self, other)
|
| __ne__(self, other)
| Return self!=value.
|
| __neg__(self)
|
| __or__(self, other)
| Return self|value.
|
| __pos__(self)
|
| __pow__(self, other)
|
| __radd__(self, other)
|
| __rand__(self, other)
|
| __rdiv__(self, other)
|
| __repr__(self)
| >>> import dask.array as da
| >>> da.ones((10, 10), chunks=(5, 5), dtype='i4')
| dask.array<..., shape=(10, 10), dtype=int32, chunksize=(5, 5), chunktype=numpy.ndarray>
|
| __rfloordiv__(self, other)
|
| __rlshift__(self, other)
|
| __rmatmul__(self, other)
|
| __rmod__(self, other)
|
| __rmul__(self, other)
|
| __ror__(self, other)
| Return value|self.
|
| __rpow__(self, other)
|
| __rrshift__(self, other)
|
| __rshift__(self, other)
|
| __rsub__(self, other)
|
| __rtruediv__(self, other)
|
| __rxor__(self, other)
|
| __setattr__(self, key, value)
| Implement setattr(self, name, value).
|
| __sub__(self, other)
|
| __truediv__(self, other)
|
| __xor__(self, other)
|
| abs(self)
|
| add_structure(self, atoms, title=None)
|
| adjust_axis(self, result, axis, title='', keepdims=False)
|
| all(self, axis=None, keepdims=False, split_every=None, out=None)
| Returns True if all elements evaluate to True.
|
| Refer to :func:`dask.array.all` for full documentation.
|
| See Also
| --------
| dask.array.all : equivalent function
|
| angle(self, deg=False)
|
| any(self, axis=None, keepdims=False, split_every=None, out=None)
| Returns True if any of the elements evaluate to True.
|
| Refer to :func:`dask.array.any` for full documentation.
|
| See Also
| --------
| dask.array.any : equivalent function
|
| argmax(self, axis=None, split_every=None, out=None)
| Return indices of the maximum values along the given axis.
|
| Refer to :func:`dask.array.argmax` for full documentation.
|
| See Also
| --------
| dask.array.argmax : equivalent function
|
| argmin(self, axis=None, split_every=None, out=None)
| Return indices of the minimum values along the given axis.
|
| Refer to :func:`dask.array.argmin` for full documentation.
|
| See Also
| --------
| dask.array.argmin : equivalent function
|
| astype(self, dtype, **kwargs)
| Copy of the array, cast to a specified type.
|
| Parameters
| ----------
| dtype : str or dtype
| Typecode or data-type to which the array is cast.
| casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
| Controls what kind of data casting may occur. Defaults to 'unsafe'
| for backwards compatibility.
|
| * 'no' means the data types should not be cast at all.
| * 'equiv' means only byte-order changes are allowed.
| * 'safe' means only casts which can preserve values are allowed.
| * 'same_kind' means only safe casts or casts within a kind,
| like float64 to float32, are allowed.
| * 'unsafe' means any data conversions may be done.
| copy : bool, optional
| By default, astype always returns a newly allocated array. If this
| is set to False and the `dtype` requirement is satisfied, the input
| array is returned instead of a copy.
|
| .. note::
|
| Dask does not respect the contiguous memory layout of the array,
| and will ignore the ``order`` keyword argument.
| The default order is 'C' contiguous.
|
| choose(self, choices)
| Use an index array to construct a new array from a set of choices.
|
| Refer to :func:`dask.array.choose` for full documentation.
|
| See Also
| --------
| dask.array.choose : equivalent function
|
| clip(self, min=None, max=None)
| Return an array whose values are limited to ``[min, max]``.
| One of max or min must be given.
|
| Refer to :func:`dask.array.clip` for full documentation.
|
| See Also
| --------
| dask.array.clip : equivalent function
|
| compute_chunk_sizes(self)
| Compute the chunk sizes for a Dask array. This is especially useful
| when the chunk sizes are unknown (e.g., when indexing one Dask array
| with another).
|
| Notes
| -----
| This function modifies the Dask array in-place.
|
| Examples
| --------
| >>> import dask.array as da
| >>> import numpy as np
| >>> x = da.from_array([-2, -1, 0, 1, 2], chunks=2)
| >>> x.chunks
| ((2, 2, 1),)
| >>> y = x[x <= 0]
| >>> y.chunks
| ((nan, nan, nan),)
| >>> y.compute_chunk_sizes() # in-place computation
| dask.array<getitem, shape=(3,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
| >>> y.chunks
| ((2, 1, 0),)
|
| conj(self)
| Complex-conjugate all elements.
|
| Refer to :func:`dask.array.conj` for full documentation.
|
| See Also
| --------
| dask.array.conj : equivalent function
|
| copy(self)
| Returns a deep copy of this dataset.
|
| Returns
| -------
| sidpy dataset
|
| cumprod(self, axis, dtype=None, out=None, method='sequential')
| Return the cumulative product of the elements along the given axis.
|
| Refer to :func:`dask.array.cumprod` for full documentation.
|
| See Also
| --------
| dask.array.cumprod : equivalent function
|
| cumsum(self, axis, dtype=None, out=None, method='sequential')
| Return the cumulative sum of the elements along the given axis.
|
| Refer to :func:`dask.array.cumsum` for full documentation.
|
| See Also
| --------
| dask.array.cumsum : equivalent function
|
| del_dimension(self, ind=None)
| Deletes the dimension attached to axis 'ind'.
|
| dot(self, other)
| Dot product of self and other.
|
| Refer to :func:`dask.array.tensordot` for full documentation.
|
| See Also
| --------
| dask.array.dot : equivalent function
|
| fft(self, dimension_type=None)
| Gets the FFT of a sidpy.Dataset of any size
|
| The data_type of the sidpy.Dataset determines the dimension_type over which the
| fourier transform is performed over, if the dimension_type is not set explicitly.
|
| The fourier transformed dataset is automatically shifted to center of dataset.
|
| Parameters
| ----------
| dimension_type: None, str, or sidpy.DimensionType - optional
| dimension_type over which fourier transform is performed, if None an educated guess will determine
| that from dimensions of sidpy.Dataset
|
| Returns
| -------
| fft_dset: 2D or 3D complex sidpy.Dataset (not tested for higher dimensions)
| 2 or 3 dimensional matrix arranged in the same way as input
|
| Example
| -------
| >> fft_dataset = sidpy_dataset.fft()
| >> fft_dataset.plot()
|
| flatten(self)
| Return a flattened array.
|
| Refer to :func:`dask.array.ravel` for full documentation.
|
| See Also
| --------
| dask.array.ravel : equivalent function
|
| flatten_complex(self)
| This function returns a dataset with real and imaginary components that have been flattened
| This is necessary for scenarios such as fitting of complex functions
| Must be a 2D or 1D dataset to begin with
| Output:
| - ouput_arr: sidpy.Dataset object
|
| fold(self, dim_order=None, method=None)
| This method collapses the dimensions of the sidpy dataset
|
| get_dimension_by_number(self, dims_in)
|
| get_dimension_slope(self, dim)
|
| get_dimensions_by_type(self, dims_in, return_axis=False)
| get dimension by dimension_type name
|
| Parameter
| ---------
| dims_in: dimension_type/str or list of dimension_types/string
|
| Returns
| -------
| dims_out: list of [index]
| the kind of dimensions specified in input in numerical order of the dataset, not the input!
|
| get_dimensions_types(self)
|
| get_extent(self, dimensions)
| get image extents as needed i.e. in matplotlib's imshow function.
| This function works for equi- or non-equi spaced axes and is suitable
| for subpixel accuracy of positions
|
| Parameters
| ----------
| dimensions: list of dimensions
|
| Returns
| -------
| list of floats
|
| get_image_dims(self, return_axis=False)
| Get all spatial dimensions
|
| get_spectral_dims(self, return_axis=False)
| Get all spectral dimensions
|
| hdf_close(self)
|
| like_data(self, data, title=None, chunks='auto', lock=False, coordinates=None, variance=None, **kwargs)
| Returns sidpy.Dataset of new values but with metadata of this dataset
| - if dimension of new dataset is different from this dataset and the scale is linear,
| then this scale will be applied to the new dataset (naming and units will stay the same),
| otherwise the dimension will be generic.
| -Additional functionality to override numeric functions
| Parameters
| ----------
| data: array like
| values of new sidpy dataset
| title: optional string
| title of new sidpy dataset
| chunks: optional list of integers
| size of chunks for dask array
| lock: optional boolean
| for dask array
| coordinates: array like
| coordinates for point cloud
| variance: numpy array, optional
| variance of dataset
|
| Returns
| -------
| sidpy dataset
|
| max(self, axis=None, keepdims=False, split_every=None, out=None)
| Return the maximum along a given axis.
|
| Refer to :func:`dask.array.max` for full documentation.
|
| See Also
| --------
| dask.array.max : equivalent function
|
| mean(self, axis=None, dtype=None, keepdims=False, split_every=None, out=None)
| Returns the average of the array elements along given axis.
|
| Refer to :func:`dask.array.mean` for full documentation.
|
| See Also
| --------
| dask.array.mean : equivalent function
|
| min(self, axis=None, keepdims=False, split_every=None, out=None)
| Return the minimum along a given axis.
|
| Refer to :func:`dask.array.min` for full documentation.
|
| See Also
| --------
| dask.array.min : equivalent function
|
| moment(self, order, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)
| Calculate the nth centralized moment.
|
| Refer to :func:`dask.array.moment` for the full documentation.
|
| See Also
| --------
| dask.array.moment : equivalent function
|
| persist(self, **kwargs)
| Persist this dask collection into memory
|
| This turns a lazy Dask collection into a Dask collection with the same
| metadata, but now with the results fully computed or actively computing
| in the background.
|
| The action of function differs significantly depending on the active
| task scheduler. If the task scheduler supports asynchronous computing,
| such as is the case of the dask.distributed scheduler, then persist
| will return *immediately* and the return value's task graph will
| contain Dask Future objects. However if the task scheduler only
| supports blocking computation then the call to persist will *block*
| and the return value's task graph will contain concrete Python results.
|
| This function is particularly useful when using distributed systems,
| because the results will be kept in distributed memory, rather than
| returned to the local process as with compute.
|
| Parameters
| ----------
| scheduler : string, optional
| Which scheduler to use like "threads", "synchronous" or "processes".
| If not provided, the default is to check the global settings first,
| and then fall back to the collection defaults.
| optimize_graph : bool, optional
| If True [default], the graph is optimized before computation.
| Otherwise the graph is run as is. This can be useful for debugging.
| **kwargs
| Extra keywords to forward to the scheduler function.
|
| Returns
| -------
| New dask collections backed by in-memory data
|
| See Also
| --------
| dask.persist
|
| plot(self, verbose=False, figure=None, **kwargs)
| Plots the dataset according to the
| - shape of the sidpy Dataset,
| - data_type of the sidpy Dataset and
| - dimension_type of dimensions of sidpy Dataset
| the dimension_type 'spatial' or 'spectral' determines how a dataset is plotted.
|
| Recognized data_types are:
| 1D: any keyword, but 'spectrum' or 'line_plot' are encouraged
| 2D: 'image' or one of ['spectrum_family', 'line_family', 'line_plot_family', 'spectra']
| 3D: 'image', 'image_map', 'image_stack', 'spectrum_image'
| 4D: not implemented yet, but will be similar to spectrum_image.
|
| Parameters
| ----------
| verbose: boolean
| kwargs: dictionary for additional plotting parameters
| additional keywords (besides the matplotlib ones) for plotting are:
| - scale_bar: for images to replace axis with a scale bar inside the image
| figure: matplotlib figure object
| define figure to which this datset will be plotted
| Returns
| -------
| self.view.fig: matplotlib figure reference
|
| prod(self, axis=None, dtype=None, keepdims=False, split_every=None, out=None)
| Return the product of the array elements over the given axis
|
| Refer to :func:`dask.array.prod` for full documentation.
|
| See Also
| --------
| dask.array.prod : equivalent function
|
| ravel(self)
| Return a flattened array.
|
| Refer to :func:`dask.array.ravel` for full documentation.
|
| See Also
| --------
| dask.array.ravel : equivalent function
|
| rechunk(self, chunks='auto', threshold=None, block_size_limit=None, balance=False)
| Convert blocks in dask array x for new chunks.
|
| Refer to :func:`dask.array.rechunk` for full documentation.
|
| See Also
| --------
| dask.array.rechunk : equivalent function
|
| reduce_dims(original_method)
| # This is wrapper method for the methods that reduce dimensions
|
| rename_dimension(self, ind, name)
| Renames Dimension at the specified index
|
| Parameters
| ----------
| ind : int
| Index of the dimension
| name : str
| New name for Dimension
|
| repeat(self, repeats, axis=None)
| Repeat elements of an array.
|
| Refer to :func:`dask.array.repeat` for full documentation.
|
| See Also
| --------
| dask.array.repeat : equivalent function
|
| reshape(self, shape, merge_chunks=True, limit=None)
| Reshape array to new shape
|
| Refer to :func:`dask.array.reshape` for full documentation.
|
| See Also
| --------
| dask.array.reshape : equivalent function
|
| round(self, decimals=0)
| Return array with each element rounded to the given number of decimals.
|
| Refer to :func:`dask.array.round` for full documentation.
|
| See Also
| --------
| dask.array.round : equivalent function
|
| set_dimension(self, ind, dimension)
| sets the dimension for the dataset including new name and updating the axes dictionary
|
| Parameters
| ----------
| ind: int
| Index of dimension
| dimension: sidpy.Dimension
| Dimension object describing this dimension of the Dataset
|
| Returns
| -------
|
| set_thumbnail(self, figure=None, thumbnail_size=128)
| Creates a thumbnail which is stored in thumbnail attribute of sidpy Dataset
| Thumbnail data is saved to Thumbnail group of associated h5_file if it exists
|
| Parameters
| ----------
| thumbnail_size: int
| size of icon in pixels (length of square)
|
| Returns
| -------
| thumbnail: numpy.ndarray
|
| squeeze(self, axis=None)
| Remove axes of length one from array.
|
| Refer to :func:`dask.array.squeeze` for full documentation.
|
| See Also
| --------
| dask.array.squeeze : equivalent function
|
| std(self, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)
| Returns the standard deviation of the array elements along given axis.
|
| Refer to :func:`dask.array.std` for full documentation.
|
| See Also
| --------
| dask.array.std : equivalent function
|
| sum(self, axis=None, dtype=None, keepdims=False, split_every=None, out=None)
| Return the sum of the array elements over the given axis.
|
| Refer to :func:`dask.array.sum` for full documentation.
|
| See Also
| --------
| dask.array.sum : equivalent function
|
| swapaxes(self, axis1, axis2)
| Return a view of the array with ``axis1`` and ``axis2`` interchanged.
|
| Refer to :func:`dask.array.swapaxes` for full documentation.
|
| See Also
| --------
| dask.array.swapaxes : equivalent function
|
| trace(self, offset=0, axis1=0, axis2=1, dtype=None)
| Return the sum along diagonals of the array.
|
| Refer to :func:`dask.array.trace` for full documentation.
|
| See Also
| --------
| dask.array.trace : equivalent function
|
| transpose(self, *axes)
| Reverse or permute the axes of an array. Return the modified array.
|
| Refer to :func:`dask.array.transpose` for full documentation.
|
| See Also
| --------
| dask.array.transpose : equivalent function
|
| unfold(self)
|
| var(self, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)
| Returns the variance of the array elements, along given axis.
|
| Refer to :func:`dask.array.var` for full documentation.
|
| See Also
| --------
| dask.array.var : equivalent function
|
| view_metadata(self)
| Prints the metadata to stdout
|
| Returns
| -------
| None
|
| view_original_metadata(self)
| Prints the original_metadata dictionary to stdout
|
| Returns
| -------
| None
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| from_array(x, title='generic', chunks='auto', lock=False, datatype='UNKNOWN', units='generic', quantity='generic', modality='generic', source='generic', coordinates=None, variance=None, **kwargs) from builtins.type
| Initializes a sidpy dataset from an array-like object (i.e. numpy array)
| All meta-data will be set to be generically.
|
| Parameters
| ----------
| x: array-like object
| the values which will populate this dataset
| chunks: optional integer or list of integers
| the shape of the chunks to be loaded
| title: optional string
| the title of this dataset
| lock: boolean
| datatype: str or sidpy.DataType
| data type of set: i.e.: 'image', spectrum', ..
| units: str
| units of dataset i.e. counts, A
| quantity: str
| quantity of dataset like intensity
| modality: str
| modality of dataset like
| source: str
| source of dataset like what kind of microscope or function
| coordinates: numpy array, optional
| coordinates for point cloud
| point_cloud: dict or None
| dict with coordinates and base_image for point_cloud data_type
| variance: array-like object
| the variance values of the x array
| Returns
| -------
| sidpy dataset
|
| ----------------------------------------------------------------------
| Readonly properties defined here:
|
| T
|
| data_descriptor
|
| imag
|
| labels
|
| real
|
| structures
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __weakref__
| list of weak references to the object
|
| data_type
|
| h5_dataset
|
| metadata
|
| modality
|
| original_metadata
|
| quantity
|
| source
|
| title
|
| units
|
| variance
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __hash__ = None
|
| ----------------------------------------------------------------------
| Methods inherited from dask.array.core.Array:
|
| __array__(self, dtype=None, **kwargs)
|
| __array_function__(self, func, types, args, kwargs)
|
| __bool__(self)
|
| __complex__(self)
|
| __dask_graph__(self) -> 'Graph'
|
| __dask_keys__(self) -> 'NestedKeys'
|
| __dask_layers__(self) -> 'Sequence[str]'
|
| __dask_optimize__ = optimize(dsk, keys, fuse_keys=None, fast_functions=None, inline_functions_fast_functions=(<function getter_inline at 0x000001805CBFA840>,), rename_fused_keys=True, **kwargs)
| Optimize dask for array computation
|
| 1. Cull tasks not necessary to evaluate keys
| 2. Remove full slicing, e.g. x[:]
| 3. Inline fast functions like getitem and np.transpose
|
| __dask_postcompute__(self)
|
| __dask_postpersist__(self)
|
| __dask_tokenize__(self)
|
| __deepcopy__(self, memo)
|
| __divmod__(self, other)
|
| __float__(self)
|
| __index__(self)
|
| __int__(self)
|
| __iter__(self)
|
| __len__(self)
|
| __long__ = __int__(self)
|
| __nonzero__ = __bool__(self)
|
| __rdivmod__(self, other)
|
| __reduce__(self)
| Helper for pickle.
|
| __setitem__(self, key, value)
|
| argtopk(self, k, axis=-1, split_every=None)
| The indices of the top k elements of an array.
|
| Refer to :func:`dask.array.argtopk` for full documentation.
|
| See Also
| --------
| dask.array.argtopk : equivalent function
|
| map_blocks(func, *args, name=None, token=None, dtype=None, chunks=None, drop_axis=None, new_axis=None, enforce_ndim=False, meta=None, **kwargs)
| Map a function across all blocks of a dask array.
|
| Note that ``map_blocks`` will attempt to automatically determine the output
| array type by calling ``func`` on 0-d versions of the inputs. Please refer to
| the ``meta`` keyword argument below if you expect that the function will not
| succeed when operating on 0-d arrays.
|
| Parameters
| ----------
| func : callable
| Function to apply to every block in the array.
| If ``func`` accepts ``block_info=`` or ``block_id=``
| as keyword arguments, these will be passed dictionaries
| containing information about input and output chunks/arrays
| during computation. See examples for details.
| args : dask arrays or other objects
| dtype : np.dtype, optional
| The ``dtype`` of the output array. It is recommended to provide this.
| If not provided, will be inferred by applying the function to a small
| set of fake data.
| chunks : tuple, optional
| Chunk shape of resulting blocks if the function does not preserve
| shape. If not provided, the resulting array is assumed to have the same
| block structure as the first input array.
| drop_axis : number or iterable, optional
| Dimensions lost by the function.
| new_axis : number or iterable, optional
| New dimensions created by the function. Note that these are applied
| after ``drop_axis`` (if present).
| enforce_ndim : bool, default False
| Whether to enforce at runtime that the dimensionality of the array
| produced by ``func`` actually matches that of the array returned by
| ``map_blocks``.
| If True, this will raise an error when there is a mismatch.
| token : string, optional
| The key prefix to use for the output array. If not provided, will be
| determined from the function name.
| name : string, optional
| The key name to use for the output array. Note that this fully
| specifies the output key name, and must be unique. If not provided,
| will be determined by a hash of the arguments.
| meta : array-like, optional
| The ``meta`` of the output array, when specified is expected to be an
| array of the same type and dtype of that returned when calling ``.compute()``
| on the array returned by this function. When not provided, ``meta`` will be
| inferred by applying the function to a small set of fake data, usually a
| 0-d array. It's important to ensure that ``func`` can successfully complete
| computation without raising exceptions when 0-d is passed to it, providing
| ``meta`` will be required otherwise. If the output type is known beforehand
| (e.g., ``np.ndarray``, ``cupy.ndarray``), an empty array of such type dtype
| can be passed, for example: ``meta=np.array((), dtype=np.int32)``.
| **kwargs :
| Other keyword arguments to pass to function. Values must be constants
| (not dask.arrays)
|
| See Also
| --------
| dask.array.map_overlap : Generalized operation with overlap between neighbors.
| dask.array.blockwise : Generalized operation with control over block alignment.
|
| Examples
| --------
| >>> import dask.array as da
| >>> x = da.arange(6, chunks=3)
|
| >>> x.map_blocks(lambda x: x * 2).compute()
| array([ 0, 2, 4, 6, 8, 10])
|
| The ``da.map_blocks`` function can also accept multiple arrays.
|
| >>> d = da.arange(5, chunks=2)
| >>> e = da.arange(5, chunks=2)
|
| >>> f = da.map_blocks(lambda a, b: a + b**2, d, e)
| >>> f.compute()
| array([ 0, 2, 6, 12, 20])
|
| If the function changes shape of the blocks then you must provide chunks
| explicitly.
|
| >>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
|
| You have a bit of freedom in specifying chunks. If all of the output chunk
| sizes are the same, you can provide just that chunk size as a single tuple.
|
| >>> a = da.arange(18, chunks=(6,))
| >>> b = a.map_blocks(lambda x: x[:3], chunks=(3,))
|
| If the function changes the dimension of the blocks you must specify the
| created or destroyed dimensions.
|
| >>> b = a.map_blocks(lambda x: x[None, :, None], chunks=(1, 6, 1),
| ... new_axis=[0, 2])
|
| If ``chunks`` is specified but ``new_axis`` is not, then it is inferred to
| add the necessary number of axes on the left.
|
| Note that ``map_blocks()`` will concatenate chunks along axes specified by
| the keyword parameter ``drop_axis`` prior to applying the function.
| This is illustrated in the figure below:
|
| .. image:: /images/map_blocks_drop_axis.png
|
| Due to memory-size-constraints, it is often not advisable to use ``drop_axis``
| on an axis that is chunked. In that case, it is better not to use
| ``map_blocks`` but rather
| ``dask.array.reduction(..., axis=dropped_axes, concatenate=False)`` which
| maintains a leaner memory footprint while it drops any axis.
|
| Map_blocks aligns blocks by block positions without regard to shape. In the
| following example we have two arrays with the same number of blocks but
| with different shape and chunk sizes.
|
| >>> x = da.arange(1000, chunks=(100,))
| >>> y = da.arange(100, chunks=(10,))
|
| The relevant attribute to match is numblocks.
|
| >>> x.numblocks
| (10,)
| >>> y.numblocks
| (10,)
|
| If these match (up to broadcasting rules) then we can map arbitrary
| functions across blocks
|
| >>> def func(a, b):
| ... return np.array([a.max(), b.max()])
|
| >>> da.map_blocks(func, x, y, chunks=(2,), dtype='i8')
| dask.array<func, shape=(20,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
|
| >>> _.compute()
| array([ 99, 9, 199, 19, 299, 29, 399, 39, 499, 49, 599, 59, 699,
| 69, 799, 79, 899, 89, 999, 99])
|
| Your block function can get information about where it is in the array by
| accepting a special ``block_info`` or ``block_id`` keyword argument.
| During computation, they will contain information about each of the input
| and output chunks (and dask arrays) relevant to each call of ``func``.
|
| >>> def func(block_info=None):
| ... pass
|
| This will receive the following information:
|
| >>> block_info # doctest: +SKIP
| {0: {'shape': (1000,),
| 'num-chunks': (10,),
| 'chunk-location': (4,),
| 'array-location': [(400, 500)]},
| None: {'shape': (1000,),
| 'num-chunks': (10,),
| 'chunk-location': (4,),
| 'array-location': [(400, 500)],
| 'chunk-shape': (100,),
| 'dtype': dtype('float64')}}
|
| The keys to the ``block_info`` dictionary indicate which is the input and
| output Dask array:
|
| - **Input Dask array(s):** ``block_info[0]`` refers to the first input Dask array.
| The dictionary key is ``0`` because that is the argument index corresponding
| to the first input Dask array.
| In cases where multiple Dask arrays have been passed as input to the function,
| you can access them with the number corresponding to the input argument,
| eg: ``block_info[1]``, ``block_info[2]``, etc.
| (Note that if you pass multiple Dask arrays as input to map_blocks,
| the arrays must match each other by having matching numbers of chunks,
| along corresponding dimensions up to broadcasting rules.)
| - **Output Dask array:** ``block_info[None]`` refers to the output Dask array,
| and contains information about the output chunks.
| The output chunk shape and dtype may may be different than the input chunks.
|
| For each dask array, ``block_info`` describes:
|
| - ``shape``: the shape of the full Dask array,
| - ``num-chunks``: the number of chunks of the full array in each dimension,
| - ``chunk-location``: the chunk location (for example the fourth chunk over
| in the first dimension), and
| - ``array-location``: the array location within the full Dask array
| (for example the slice corresponding to ``40:50``).
|
| In addition to these, there are two extra parameters described by
| ``block_info`` for the output array (in ``block_info[None]``):
|
| - ``chunk-shape``: the output chunk shape, and
| - ``dtype``: the output dtype.
|
| These features can be combined to synthesize an array from scratch, for
| example:
|
| >>> def func(block_info=None):
| ... loc = block_info[None]['array-location'][0]
| ... return np.arange(loc[0], loc[1])
|
| >>> da.map_blocks(func, chunks=((4, 4),), dtype=np.float64)
| dask.array<func, shape=(8,), dtype=float64, chunksize=(4,), chunktype=numpy.ndarray>
|
| >>> _.compute()
| array([0, 1, 2, 3, 4, 5, 6, 7])
|
| ``block_id`` is similar to ``block_info`` but contains only the ``chunk_location``:
|
| >>> def func(block_id=None):
| ... pass
|
| This will receive the following information:
|
| >>> block_id # doctest: +SKIP
| (4, 3)
|
| You may specify the key name prefix of the resulting task in the graph with
| the optional ``token`` keyword argument.
|
| >>> x.map_blocks(lambda x: x + 1, name='increment')
| dask.array<increment, shape=(1000,), dtype=int64, chunksize=(100,), chunktype=numpy.ndarray>
|
| For functions that may not handle 0-d arrays, it's also possible to specify
| ``meta`` with an empty array matching the type of the expected result. In
| the example below, ``func`` will result in an ``IndexError`` when computing
| ``meta``:
|
| >>> rng = da.random.default_rng()
| >>> da.map_blocks(lambda x: x[2], rng.random(5), meta=np.array(()))
| dask.array<lambda, shape=(5,), dtype=float64, chunksize=(5,), chunktype=numpy.ndarray>
|
| Similarly, it's possible to specify a non-NumPy array to ``meta``, and provide
| a ``dtype``:
|
| >>> import cupy # doctest: +SKIP
| >>> rng = da.random.default_rng(cupy.random.default_rng()) # doctest: +SKIP
| >>> dt = np.float32
| >>> da.map_blocks(lambda x: x[2], rng.random(5, dtype=dt), meta=cupy.array((), dtype=dt)) # doctest: +SKIP
| dask.array<lambda, shape=(5,), dtype=float32, chunksize=(5,), chunktype=cupy.ndarray>
|
| map_overlap(self, func, depth, boundary=None, trim=True, **kwargs)
| Map a function over blocks of the array with some overlap
|
| Refer to :func:`dask.array.map_overlap` for full documentation.
|
| See Also
| --------
| dask.array.map_overlap : equivalent function
|
| nonzero(self)
| Return the indices of the elements that are non-zero.
|
| Refer to :func:`dask.array.nonzero` for full documentation.
|
| See Also
| --------
| dask.array.nonzero : equivalent function
|
| store(sources: 'Array | Collection[Array]', targets: 'ArrayLike | Delayed | Collection[ArrayLike | Delayed]', lock: 'bool | Lock' = True, regions: 'tuple[slice, ...] | Collection[tuple[slice, ...]] | None' = None, compute: 'bool' = True, return_stored: 'bool' = False, **kwargs)
| Store dask arrays in array-like objects, overwrite data in target
|
| This stores dask arrays into object that supports numpy-style setitem
| indexing. It stores values chunk by chunk so that it does not have to
| fill up memory. For best performance you can align the block size of
| the storage target with the block size of your array.
|
| If your data fits in memory then you may prefer calling
| ``np.array(myarray)`` instead.
|
| Parameters
| ----------
|
| sources: Array or collection of Arrays
| targets: array-like or Delayed or collection of array-likes and/or Delayeds
| These should support setitem syntax ``target[10:20] = ...``.
| If sources is a single item, targets must be a single item; if sources is a
| collection of arrays, targets must be a matching collection.
| lock: boolean or threading.Lock, optional
| Whether or not to lock the data stores while storing.
| Pass True (lock each file individually), False (don't lock) or a
| particular :class:`threading.Lock` object to be shared among all writes.
| regions: tuple of slices or collection of tuples of slices, optional
| Each ``region`` tuple in ``regions`` should be such that
| ``target[region].shape = source.shape``
| for the corresponding source and target in sources and targets,
| respectively. If this is a tuple, the contents will be assumed to be
| slices, so do not provide a tuple of tuples.
| compute: boolean, optional
| If true compute immediately; return :class:`dask.delayed.Delayed` otherwise.
| return_stored: boolean, optional
| Optionally return the stored result (default False).
| kwargs:
| Parameters passed to compute/persist (only used if compute=True)
|
| Returns
| -------
|
| If return_stored=True
| tuple of Arrays
| If return_stored=False and compute=True
| None
| If return_stored=False and compute=False
| Delayed
|
| Examples
| --------
|
| >>> import h5py # doctest: +SKIP
| >>> f = h5py.File('myfile.hdf5', mode='a') # doctest: +SKIP
| >>> dset = f.create_dataset('/data', shape=x.shape,
| ... chunks=x.chunks,
| ... dtype='f8') # doctest: +SKIP
|
| >>> store(x, dset) # doctest: +SKIP
|
| Alternatively store many arrays at the same time
|
| >>> store([x, y, z], [dset1, dset2, dset3]) # doctest: +SKIP
|
| to_backend(self, backend: 'str | None' = None, **kwargs)
| Move to a new Array backend
|
| Parameters
| ----------
| backend : str, Optional
| The name of the new backend to move to. The default
| is the current "array.backend" configuration.
|
| Returns
| -------
| Array
|
| to_dask_dataframe(self, columns=None, index=None, meta=None)
| Convert dask Array to dask Dataframe
|
| Parameters
| ----------
| columns: list or string
| list of column names if DataFrame, single string if Series
| index : dask.dataframe.Index, optional
| An optional *dask* Index to use for the output Series or DataFrame.
|
| The default output index depends on whether the array has any unknown
| chunks. If there are any unknown chunks, the output has ``None``
| for all the divisions (one per chunk). If all the chunks are known,
| a default index with known divsions is created.
|
| Specifying ``index`` can be useful if you're conforming a Dask Array
| to an existing dask Series or DataFrame, and you would like the
| indices to match.
| meta : object, optional
| An optional `meta` parameter can be passed for dask
| to specify the concrete dataframe type to use for partitions of
| the Dask dataframe. By default, pandas DataFrame is used.
|
| See Also
| --------
| dask.dataframe.from_dask_array
|
| to_delayed(self, optimize_graph=True)
| Convert into an array of :class:`dask.delayed.Delayed` objects, one per chunk.
|
| Parameters
| ----------
| optimize_graph : bool, optional
| If True [default], the graph is optimized before converting into
| :class:`dask.delayed.Delayed` objects.
|
| See Also
| --------
| dask.array.from_delayed
|
| to_hdf5(self, filename, datapath, **kwargs)
| Store array in HDF5 file
|
| >>> x.to_hdf5('myfile.hdf5', '/x') # doctest: +SKIP
|
| Optionally provide arguments as though to ``h5py.File.create_dataset``
|
| >>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True) # doctest: +SKIP
|
| See Also
| --------
| dask.array.store
| h5py.File.create_dataset
|
| to_svg(self, size=500)
| Convert chunks from Dask Array into an SVG Image
|
| Parameters
| ----------
| chunks: tuple
| size: int
| Rough size of the image
|
| Examples
| --------
| >>> x.to_svg(size=500) # doctest: +SKIP
|
| Returns
| -------
| text: An svg string depicting the array as a grid of chunks
|
| to_tiledb(self, uri, *args, **kwargs)
| Save array to the TileDB storage manager
|
| See https://docs.tiledb.io for details about the format and engine.
|
| See function :func:`dask.array.to_tiledb` for argument documentation.
|
| See also
| --------
| dask.array.to_tiledb : equivalent function
|
| to_zarr(self, *args, **kwargs)
| Save array to the zarr storage format
|
| See https://zarr.readthedocs.io for details about the format.
|
| Refer to :func:`dask.array.to_zarr` for full documentation.
|
| See also
| --------
| dask.array.to_zarr : equivalent function
|
| topk(self, k, axis=-1, split_every=None)
| The top k elements of an array.
|
| Refer to :func:`dask.array.topk` for full documentation.
|
| See Also
| --------
| dask.array.topk : equivalent function
|
| view(self, dtype=None, order='C')
| Get a view of the array as a new data type
|
| Parameters
| ----------
| dtype:
| The dtype by which to view the array.
| The default, None, results in the view having the same data-type
| as the original array.
| order: string
| 'C' or 'F' (Fortran) ordering
|
| This reinterprets the bytes of the array under a new dtype. If that
| dtype does not have the same size as the original array then the shape
| will change.
|
| Beware that both numpy and dask.array can behave oddly when taking
| shape-changing views of arrays under Fortran ordering. Under some
| versions of NumPy this function will fail when taking shape-changing
| views of Fortran ordered arrays if the first dimension has chunks of
| size one.
|
| ----------------------------------------------------------------------
| Static methods inherited from dask.array.core.Array:
|
| __dask_scheduler__ = get(dsk: 'Mapping', keys: 'Sequence[Key] | Key', cache=None, num_workers=None, pool=None, **kwargs)
| Threaded cached implementation of dask.get
|
| Parameters
| ----------
|
| dsk: dict
| A dask dictionary specifying a workflow
| keys: key or list of keys
| Keys corresponding to desired data
| num_workers: integer of thread count
| The number of threads to use in the ThreadPool that will actually execute tasks
| cache: dict-like (optional)
| Temporary storage of results
|
| Examples
| --------
| >>> inc = lambda x: x + 1
| >>> add = lambda x, y: x + y
| >>> dsk = {'x': 1, 'y': 2, 'z': (inc, 'x'), 'w': (add, 'z', 'y')}
| >>> get(dsk, 'w')
| 4
| >>> get(dsk, ['w', 'y'])
| (4, 2)
|
| __new__(cls, dask, name, chunks, dtype=None, meta=None, shape=None)
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Readonly properties inherited from dask.array.core.Array:
|
| A
|
| blocks
| An array-like interface to the blocks of an array.
|
| This returns a ``Blockview`` object that provides an array-like interface
| to the blocks of a dask array. Numpy-style indexing of a ``Blockview`` object
| returns a selection of blocks as a new dask array.
|
| You can index ``array.blocks`` like a numpy array of shape
| equal to the number of blocks in each dimension, (available as
| array.blocks.size). The dimensionality of the output array matches
| the dimension of this array, even if integer indices are passed.
| Slicing with ``np.newaxis`` or multiple lists is not supported.
|
| Examples
| --------
| >>> import dask.array as da
| >>> x = da.arange(8, chunks=2)
| >>> x.blocks.shape # aliases x.numblocks
| (4,)
| >>> x.blocks[0].compute()
| array([0, 1])
| >>> x.blocks[:3].compute()
| array([0, 1, 2, 3, 4, 5])
| >>> x.blocks[::2].compute()
| array([0, 1, 4, 5])
| >>> x.blocks[[-1, 0]].compute()
| array([6, 7, 0, 1])
| >>> x.blocks.ravel() # doctest: +NORMALIZE_WHITESPACE
| [dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>]
|
| Returns
| -------
| An instance of ``dask.array.Blockview``
|
| chunksize
|
| dtype
|
| itemsize
| Length of one array element in bytes
|
| nbytes
| Number of bytes in array
|
| partitions
| Slice an array by partitions. Alias of dask array .blocks attribute.
|
| This alias allows you to write agnostic code that works with both
| dask arrays and dask dataframes.
|
| This returns a ``Blockview`` object that provides an array-like interface
| to the blocks of a dask array. Numpy-style indexing of a ``Blockview`` object
| returns a selection of blocks as a new dask array.
|
| You can index ``array.blocks`` like a numpy array of shape
| equal to the number of blocks in each dimension, (available as
| array.blocks.size). The dimensionality of the output array matches
| the dimension of this array, even if integer indices are passed.
| Slicing with ``np.newaxis`` or multiple lists is not supported.
|
| Examples
| --------
| >>> import dask.array as da
| >>> x = da.arange(8, chunks=2)
| >>> x.partitions.shape # aliases x.numblocks
| (4,)
| >>> x.partitions[0].compute()
| array([0, 1])
| >>> x.partitions[:3].compute()
| array([0, 1, 2, 3, 4, 5])
| >>> x.partitions[::2].compute()
| array([0, 1, 4, 5])
| >>> x.partitions[[-1, 0]].compute()
| array([6, 7, 0, 1])
| >>> x.partitions.ravel() # doctest: +NORMALIZE_WHITESPACE
| [dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>,
| dask.array<blocks, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>]
|
| Returns
| -------
| An instance of ``da.array.Blockview``
|
| vindex
| Vectorized indexing with broadcasting.
|
| This is equivalent to numpy's advanced indexing, using arrays that are
| broadcast against each other. This allows for pointwise indexing:
|
| >>> import dask.array as da
| >>> x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
| >>> x = da.from_array(x, chunks=2)
| >>> x.vindex[[0, 1, 2], [0, 1, 2]].compute()
| array([1, 5, 9])
|
| Mixed basic/advanced indexing with slices/arrays is also supported. The
| order of dimensions in the result follows those proposed for
| `ndarray.vindex <https://github.com/numpy/numpy/pull/6256>`_:
| the subspace spanned by arrays is followed by all slices.
|
| Note: ``vindex`` provides more general functionality than standard
| indexing, but it also has fewer optimizations and can be significantly
| slower.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from dask.array.core.Array:
|
| __dict__
| dictionary for instance variables
|
| chunks
| Chunks property.
|
| dask
|
| name
|
| ndim
|
| npartitions
|
| numblocks
|
| shape
|
| size
| Number of elements in array
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from dask.array.core.Array:
|
| __array_priority__ = 11
|
| ----------------------------------------------------------------------
| Methods inherited from dask.base.DaskMethodsMixin:
|
| __await__(self)
|
| compute(self, **kwargs)
| Compute this dask collection
|
| This turns a lazy Dask collection into its in-memory equivalent.
| For example a Dask array turns into a NumPy array and a Dask dataframe
| turns into a Pandas dataframe. The entire dataset must fit into memory
| before calling this operation.
|
| Parameters
| ----------
| scheduler : string, optional
| Which scheduler to use like "threads", "synchronous" or "processes".
| If not provided, the default is to check the global settings first,
| and then fall back to the collection defaults.
| optimize_graph : bool, optional
| If True [default], the graph is optimized before computation.
| Otherwise the graph is run as is. This can be useful for debugging.
| kwargs
| Extra keywords to forward to the scheduler function.
|
| See Also
| --------
| dask.compute
|
| visualize(self, filename='mydask', format=None, optimize_graph=False, **kwargs)
| Render the computation of this object's task graph using graphviz.
|
| Requires ``graphviz`` to be installed.
|
| Parameters
| ----------
| filename : str or None, optional
| The name of the file to write to disk. If the provided `filename`
| doesn't include an extension, '.png' will be used by default.
| If `filename` is None, no file will be written, and we communicate
| with dot using only pipes.
| format : {'png', 'pdf', 'dot', 'svg', 'jpeg', 'jpg'}, optional
| Format in which to write output file. Default is 'png'.
| optimize_graph : bool, optional
| If True, the graph is optimized before rendering. Otherwise,
| the graph is displayed as is. Default is False.
| color: {None, 'order'}, optional
| Options to color nodes. Provide ``cmap=`` keyword for additional
| colormap
| **kwargs
| Additional keyword arguments to forward to ``to_graphviz``.
|
| Examples
| --------
| >>> x.visualize(filename='dask.pdf') # doctest: +SKIP
| >>> x.visualize(filename='dask.pdf', color='order') # doctest: +SKIP
|
| Returns
| -------
| result : IPython.diplay.Image, IPython.display.SVG, or None
| See dask.dot.dot_graph for more information.
|
| See Also
| --------
| dask.visualize
| dask.dot.dot_graph
|
| Notes
| -----
| For more information on optimization see here:
|
| https://docs.dask.org/en/latest/optimize.html
All attributes of a python object can be viewed with the * dir* command.
As above: too much information for normal use, but it is there if needed.
dir(current_dataset)
['A',
'T',
'_Array__chunks',
'_Array__name',
'_Dataset__protected',
'_Dataset__rearrange_axes',
'_Dataset__reduce_dimensions',
'_Dataset__validate_dim',
'__abs__',
'__add__',
'__and__',
'__array__',
'__array_function__',
'__array_priority__',
'__array_ufunc__',
'__await__',
'__bool__',
'__class__',
'__complex__',
'__dask_graph__',
'__dask_keys__',
'__dask_layers__',
'__dask_optimize__',
'__dask_postcompute__',
'__dask_postpersist__',
'__dask_scheduler__',
'__dask_tokenize__',
'__deepcopy__',
'__delattr__',
'__dict__',
'__dir__',
'__div__',
'__divmod__',
'__doc__',
'__eq__',
'__float__',
'__floordiv__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getstate__',
'__gt__',
'__hash__',
'__index__',
'__init__',
'__init_subclass__',
'__int__',
'__invert__',
'__iter__',
'__le__',
'__len__',
'__long__',
'__lshift__',
'__lt__',
'__matmul__',
'__mod__',
'__module__',
'__mul__',
'__ne__',
'__neg__',
'__new__',
'__nonzero__',
'__or__',
'__pos__',
'__pow__',
'__radd__',
'__rand__',
'__rdiv__',
'__rdivmod__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rfloordiv__',
'__rlshift__',
'__rmatmul__',
'__rmod__',
'__rmul__',
'__ror__',
'__rpow__',
'__rrshift__',
'__rshift__',
'__rsub__',
'__rtruediv__',
'__rxor__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__slots__',
'__str__',
'__sub__',
'__subclasshook__',
'__truediv__',
'__weakref__',
'__xor__',
'_axes',
'_cached_keys',
'_chunks',
'_data_type',
'_elemwise',
'_griddata_transform',
'_h5_dataset',
'_key_array',
'_meta',
'_metadata',
'_min_dist',
'_modality',
'_name',
'_original_metadata',
'_quantity',
'_rebuild',
'_repr_html_',
'_reset_cache',
'_scalarfunc',
'_source',
'_structures',
'_title',
'_units',
'_vindex',
'abs',
'add_structure',
'adjust_axis',
'all',
'angle',
'any',
'argmax',
'argmin',
'argtopk',
'astype',
'blocks',
'choose',
'chunks',
'chunksize',
'clip',
'compute',
'compute_chunk_sizes',
'conj',
'copy',
'cumprod',
'cumsum',
'dask',
'data_descriptor',
'data_type',
'del_dimension',
'dim_0',
'dim_1',
'dot',
'dtype',
'fft',
'filename',
'flatten',
'flatten_complex',
'fold',
'from_array',
'get_dimension_by_number',
'get_dimensions_by_type',
'get_extent',
'get_image_dims',
'get_spectrum_dims',
'h5_dataset',
'hdf_close',
'imag',
'itemsize',
'labels',
'like_data',
'map_blocks',
'map_overlap',
'max',
'mean',
'metadata',
'min',
'modality',
'moment',
'name',
'nbytes',
'ndim',
'nonzero',
'npartitions',
'numblocks',
'original_metadata',
'partitions',
'persist',
'plot',
'prod',
'quantity',
'ravel',
'real',
'rechunk',
'reduce_dims',
'rename_dimension',
'repeat',
'reshape',
'round',
'set_dimension',
'set_thumbnail',
'shape',
'size',
'source',
'squeeze',
'std',
'store',
'structures',
'sum',
'swapaxes',
'title',
'to_backend',
'to_dask_dataframe',
'to_delayed',
'to_hdf5',
'to_svg',
'to_tiledb',
'to_zarr',
'topk',
'trace',
'transpose',
'unfold',
'units',
'var',
'view',
'view_metadata',
'view_original_metadata',
'vindex',
'visualize',
'x',
'y']
1.4.5. Adding Data#
To add another dataset that belongs to this measurement we will use the h5_add_channel from file_tools in the pyTEMlib package.
Here is how we add a channel there.
We can also add a new measurement group (add_measurement in pyTEMlib) for similar datasets.
This is equivalent to making a new directory in a file structure on your computer.
datasets['Copied_of_Channel_000'] = current_dataset.copy()
We use above functions to add the content of a (random) data-file to the current file.
This is important if you for example want to add a Z-contrast or survey-image to a spectrum image.
Therefore, these functions enable you to collect the data from different files that belong together.
datasets.keys()
dict_keys(['Channel_000', 'Copied_of_Channel_000'])
1.4.6. Adding additional information#
Similarly, we can add a whole new measurement group or a structure group.
This function will be contained in the KinsCat package of pyTEMlib.
If you loaded the example image, with graphite and ZnO both are viewed in the [1,1,1] zone axis.
import pyTEMlib.kinematic_scattering as ks # kinematic scattering Library
# with Atomic form factors from Kirkland's book
import ase
graphite = ks.structure_by_name('Graphite')
print(graphite)
Using kinematic_scattering library version {_version_ } by G.Duscher
Atoms(symbols='C4', pbc=False, cell=[[2.46772414, 0.0, 0.0], [-1.2338620699999996, 2.1371117947721068, 0.0], [0.0, 0.0, 6.711]])
current_dataset.structures['Crystal_000'] = graphite
zinc_oxide = ks.structure_by_name('ZnO')
current_dataset.structures['ZnO'] =zinc_oxide
1.4.7. Keeping Track of Analysis and Results#
A notebook is notorious for getting confusing, especially if one uses different notebooks for different task, but store them in the same file.
If you like a result of your calculation, log it.
Use the datasets dictionary to add a analysed and/or modified dataset. Make sure the metadata contain all the necessary information, so that you will know later what you did.
The convention in this class will be to call the dataset Log_000.
new_dataset = current_dataset.T
new_dataset.metadata = {'analysis': 'Nothing', 'name': 'Nothing'}
datasets['Log_000'] = new_dataset
1.4.8. An example for a log#
We log the Fourier Transform of the image we loaded
First we perform the calculation
fft_image = current_dataset.fft().abs()
fft_image = np.log(60+fft_image)
view = fft_image.plot()
Now that we like this we log it.
Please note that just saving the fourier transform would not be good as we also need the scale and such.
fft_image.title = 'FFT Gamma corrected'
fft_image.metadata = {'analysis': 'fft'}
datasets['Log_001'] = fft_image
view = fft_image.plot()
We added quite a few datasets to our dictionary.
Let’s have a look
chooser = ft.ChooseDataset(datasets)
view = chooser.dataset.plot()
1.4.9. Save Datasets to hf5_file#
Write all datasets to one h5_file, which we then close immediatedly
h5_group = ft.save_dataset(datasets, filename='./nix.hf5')
Cannot overwrite file. Using: nix-1.hf5
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_io.py:111: UserWarning: main_data_name should not contain the "-" character. Reformatted name from:p1-3-hr3 to p1_3_hr3
warn('main_data_name should not contain the "-" character. Reformatted'
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_utils.py:376: FutureWarning: validate_h5_dimension may be removed in a future version
warn('validate_h5_dimension may be removed in a future version',
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_io.py:111: UserWarning: main_data_name should not contain the "-" character. Reformatted name from:p1-3-hr3 to p1_3_hr3
warn('main_data_name should not contain the "-" character. Reformatted'
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_utils.py:376: FutureWarning: validate_h5_dimension may be removed in a future version
warn('validate_h5_dimension may be removed in a future version',
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_io.py:111: UserWarning: main_data_name should not contain the "-" character. Reformatted name from:Transposed_p1-3-hr3 to Transposed_p1_3_hr3
warn('main_data_name should not contain the "-" character. Reformatted'
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_utils.py:376: FutureWarning: validate_h5_dimension may be removed in a future version
warn('validate_h5_dimension may be removed in a future version',
C:\Users\gduscher\AppData\Local\anaconda3\envs\pyTEMlib\Lib\site-packages\pyNSID\io\hdf_utils.py:376: FutureWarning: validate_h5_dimension may be removed in a future version
warn('validate_h5_dimension may be removed in a future version',
Close the file
h5_group.file.close()
1.4.10. Open h5_file#
Open the h5_file that we just created
datasets2= ft.open_file(filename='./nix.hf5')
chooser = ft.ChooseDataset(datasets2)
1.4.10.1. Short check if we got the data right#
we print the tree and we plot the data
view = chooser.dataset.plot()