An important component of scientific work is being able to take your data with you as you move from one position to another, and being able to work with the data files on the computer systems at your new institute. Similarly, it's vital to be able to exchange data files with colleagues or just read your own files in multiple different packages.
Therefore it is important to have standards-based data formats that are openly and well documented so that anyone can implement a reader and writer for the format. Please use this page to list:
- the data formats you use
- the Debian packages needed for working with the format
- software used with that format that's not in Debian
hdf5
Hierarchical Data format is an extremely flexible format, possibly too flexible for its own good
- Open Spec: YES
- Packages
hdf5-tools (and many other related packages)
yorick-hdf5 (plug-in for the Yorick interpreted language)
netCDF
Network Common Data Format is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The netCDF4 API also supports HDF5 data formats, which will probably take over.
- Open Spec: YES
- Packages
netcdf libraries, etc.
python-netcdf from the python-scientific source package
netcdf-perl Perl interface.
FITS
Flexible Image Transport System was developed for astronomy, but could be used by many disciplines. One notable feature is good support for World Coordinates, i.e. translation between pixel coordinates and physical coordinates such as Longitude & Latitude, Frequency, Stokes parameters (polarisation). Arbitrary numbers of dimensions are supported as well, but not so flexibly as in hdf5.
- Open Spec: YES
- Packages
libcfitsio2 (plus perl wrappers)
pdl display & analysis
saods9 image viewer
yorick numerical computations and data display
Meteorological formats
- The WMO FM-94 BUFR, Binary Universal Form for the Representation of Meteorological data, is a binary code designed to represent, employing a continuous binary stream, any meteorological data. It has been designed to achieve efficient exhange and storage of meteorogical and oceanographic data. It is self defining, table driven and very flexible data representation system, especially for huge volumes of data.
- Open Spec: YES
- Packages:
CREX The FM 95 -XII CREX is standard WMO Character form for the Representation and EXchange of meteorogical and other data. It is self defining, table driven and very flexible data representation system. It is specially useful in the cases where binary representation of data is not possible due to the lack of computer handling capabilities.
- Open Spec: YES
- Packages:
GRIB "Gridded Binary" format for binary data, used by many forecast applications.
ODB "Observation database" format used by some Meteo France and others for the ALADIN forecasting system.
ODB file extensions may also be used by OpenOffice and Abaqus.
FA / LFI files
- Used by Meteo France for the ALADIN forecasting system. ALADIN includes a tool "gl" which can be used to translate these to grib format. FA files do not use a "FA" extension: that is used by the FASTA gene sequencing software.
SERVICE
Data format supported by cdo
EXTRA
Data format supported by cdo
IEG
Data format supported by cdo
XML variants
Name |
Open Spec? |
Debian Packages |
YES |
|
|
|
||
GPL |
||
|
|
|
|
|
|
Do not know |
|
|
Do not know |
|
|
Do not know |
|
dicom
Digital Imaging and Communications in Medicine is a classical format for medical computing imaging.
Links
Chemical MIME/file types
Chemical MIME types can be introduced to the Linux desktop with chemical-mime-data. You will find most information about these MIME types and the project in the source of the package.
All chemical applications (e.g. xdrawchem or openbabel), which can handle the freedesktop.org MIME specs benefit from this package. Older specs for e.g. GNOME <= 2.4 or KDE <= 3.x are a bit harder to support, because their magic databases are not expandable.
The MIME-types are not part of the official shared-mime-info package/projects, because these MIME-types have never been registered with IANA (see also http://lists.freedesktop.org/archives/xdg/2005-May/006858.html).
BioDAS
BioDAS is a Distributed Annotation System for genome work - more a protocol than a data format. It uses XML for the sequence data.
General
microformats may be a useful avenue to explore.
A raw digital camera format is essential for scientific imaging work. Debian has the ufraw package
IPTC metadata looks interesting and has fairly open licence terms.
AAF is an interesting example of an advanced interchange format (for multimedia), with facilities for adding metadata and tracking change history. There's a SDK at ?SourceForge.
THREDDS is a data publication service for environmental science data. See also LDM and the Internet Data Distribution system.