The filex Command
The filex command manages chemical structure and reaction file I/O modules. In many cases, actively loading of I/O modules is not required because of the built-in auto-load mechanism. If the toolkit encounters a file of unknown type, an attempt is made to load a suitable module by constructing the name of the module from the file suffix. However, that mechanism fails in case the file does not have a suffix, or a non-standard suffix, or the data source is not a file but some other stream, such as a network connection, a pipe, or a standard I/O channel. In these cases, explicit managing of I/O modules is required.
command has the following subcommands:
filex defined format
A check to determine whether the specified format is supported by an I/O module. In case the appropriate handler is not yet loaded, an attempt at auto-loading is made. For the equivalent command without auto-loading, see
The result value is a boolean status code.
filex exists format
A check to determine whether an I/O module for the specified format is currently loaded. This command variant does not attempt auto-loading. The format name may be either the primary name of a loaded module, or any of alias format name aliases the module recognizes. For the equivalent command with auto-loading, see
. The result value is a boolean status code.
filex get format attribute
Query the value of an attribute of the I/O module. The list of attributes is detailed in the paragraph on the
In case the format argument cannot be resolved by an active module, an attempt to auto-load a suitable module is made.
filex list ?pattern?
List the names of all currently loaded I/O modules. A string match pattern may be used to filter the result list. The variant
is an alias to this command.
filex load format ?objectfile?
filex load all
Explicitly load an I/O module. If the module is already loaded, the current version is unloaded first. If no specific object file (a shared library on Unix/Linux, a DLL on Windows, a bundle file for OSX) is specified, the standard name of the module file is automatically constructed from the format name, and then the file searched in the directories in the I/O module path. The module path can be customized in the control variable
, the return value of the command is the slot in the module table the module has been loaded into. This corresponds to the value of the
attribute which can be queried via
, the return value is a module reference.
The second form of the command scans the currently set I/O module extension search path and loads all accessible modules which are not yet in memory. Modules which are already active in the running application are not unloaded, and only a single instance of each I/O module, even if present under various alias names in the module directories, is loaded. This form of the command does not return a value.
filex modules ?pattern?
This is an alias for
-only method to get a reference of the module, which allows terser attribute retrieval commands and other operations.
filex set format ?attribute value?...
filex set format dict
fx.attribute = value
fx[attribute] = value
Set attributes of the I/O module. Compared to other classes of modules, there are rather few attributes in a module which can be set in a meaningful manner. Some of the listed attributes are read-only. They are included in this section because it is cross-referenced from the
command. These are the supported attributes:
The city part of the author contact address.
The country part of the author contact address, following the ISO3166 standard.
The state part of the author contact address. Empty if not applicable.
The street address part of the author contact address. Includes floor, house number, etc.
code or other applicable postal code of the author contact address.
The institution the author of the module works for.
registration ID of the affiliated institution. This is primarily useful for US government projects.
of the affiliated institution.
A list of alternative names of for the formats the module supports.
The author of the module.
An authorization string, for example a service login
. This is for example used in the
meta I/O module. In that case, it is a Web
generated by the module from the compiled-in application secret. Using that
, the user must log into a
account and approve access to the files of that account by the application. Only after this has been performed, opening
files with a
with information on the author, or an empty string if unset.
A boolean read-only boolean flag indicating whether the module is built-in.
A list of features and behaviors the I/O module supports. Only a few of the flags which can be found here can be changed in a productive fashion. These include:
Temporarily disable this module, without unloading it
Never attempt to memory-map files of this format
A category string to be used if the module is stored in a repository.
The base class
of this module.
A free-form string comment on the module.
The data the module source was last modified.
A digital object identifier for the module, if defined.
The email address of the author of an I/O module.
The name of a property which is used to store structure information in the file. This is only used for file formats where storing structure data is a minor objective, not for standard chemical structure exchange formats.
This attribute is a read-only list of the classes of available functions in the function table of the module. Developers can use this information to determine whether a module is input-only or output-only, or supports acceleration methods for scanning structure files.
The internal format ID of the module in the current program run. This is usually identical to the slot in the extension table for module was loaded or compiled into.
with information on the module, or an empty string if unset.
A list of keywords associated with the module.
The license class associated with this module. Setting the license to a standard type updates the associated
with a standard location.
with details about the module license.
A free-form literature reference.
type associated with the file format, for example
. This information is used for constructing
headers for data transfer in Web environments and similar tasks.
The primary name of the format the I/O module handles.
The style of nitro groups and similar groups in the file, i.e. whether these are preferably encoded with pentavalent nitrogen or a charge pair. Possible values are
(does not matter, or unknown),
. If this value is not
, structures written to the file are automatically adapted. This is performed on duplicates of the output structures, so the objects used in a
or similar command does not change. On the other hand, the requirement to duplicate the object, manipulate the duplicate, and destroy it after it has been used can be time-consuming.
The full path name of the loaded object file or dynamic library. This attribute is read-only.
code of the author (see www.orcid.org).
A dictionary of format-specific keyword/value pairs which are not represented as a general
object attribute. When a file of a specific format is opened, the data from the corresponding I/O module is copied to the parameters attribute of the
object, where it may be further customized by
commands before an input or output operation. Changing this attribute in the I/O module modifies the initial content of the parameters attribute of all
objects associated with this format created in the future. Explicitly changing the format of a
object refreshes the parameter set.
The repository path for displaying hierarchical repository trees. This attribute is independent of any file system paths.
A contact phone number of the author.
The name of a property which is used to store reaction information in the file. This is only used for file formats where storing reaction data is a minor objective, not for standard chemical structure exchange formats.
A list of flags to adjust input behavior. Not all flags are supported for all I/O modules. Unsupported flags are silently ignored. The flag set is copied as default to any
object which uses the I/O handler module. The flag set is the same as for the
attribute, but only a subset of these flags make sense as presets. The flags can be modified on the I/O module level if desired:
The same as an empty list; no flags are set.
If set, resolve bonds marked in the file as
into a Kekulé system. This includes resolution of bonds which are explicitly marked as query bonds (i.e. bond type 4 in
). This is very useful to fix frequently seen MDL Molfiles which encode structures, not queries, but nevertheless use an aromatic bond type in violation of the file format specification. Aromatic system resolution works much more robustly for structures with a complete set of hydrogens. It is advisable to combine this flag with automatic hydrogen addition.
If set, the file is automatically rewound if the end of the file is reached, and the start record of the operation has not yet been encountered again. This behavior only applies to the
command, not to normal record input. Wrapping is not possible on data source which cannot be rewound.
Only read basic connectivity information, but not additional properties. Supported only on formats which use the native
structure data storage system (
cbin, cbs, bdb
If set, perform a charge balancing step after reading, in an attempt to obtain a neutral structure.
If set, perform a charge combination step after reading, in an attempt to obtain a neutral structure.
If set, try to resolve a purely VB-based structure representation into a representation which utilizes
bonds for bonds between ligands and metal centers which cannot be described well with electron-counted VB bonds.
If set, this flag instructs I/O modules with support for this feature to read structure files which contain one spurious empty line after each data line, which unfortunately appears to happen sometimes when
-encoded files are transferred to Apple systems. This is not the same as reading
-only platforms, or vice versa, which is always possible and fully automatic. This flag addresses the problem that, due to mishandling by obscure transfer software, duplicated
-markers are introduced in the file (two identical
pairs after each data line).
If set, remove spurious atom and bond stereo descriptors assigned to non-stereogenic centers.
If set, invert wedge bonds encoded with the base at the stereo center to the IUPAC-conforming style with the tips at the stereo center.
If set, records which do not contain any atoms are silently ignored and the next record with atoms is returned instead.
If set, ignore records with raise errors on file input. Instead, silently attempt to re-synchronize the read pointer and proceed with the next record, until the end of the file has been reached, or an undamaged record could be read successfully.
Ignore any object visibility information in the file and read all data as visible objects.
Allow an isolated carriage return (
13) character without following
10) character as data content instead of examining it as potential line break symbol. This flag is necessarily ignored on Mac-style input files which only use
If set, ignore the
flag for double bonds when reading
. The default is to translate it into the
bit of the
If set, always keep atomic 2D layout coordinates, even if they are, for example in reactions, overlapping on the reagent and product sides. By default the coordinates of molecules are adjusted if necessary to be non-overlapping. This is done by moving molecules only, not by scaling the coordinates, and never by recomputing any coordinates.
If set, hydrogen modification (addition, deletion) is performed after standardization operations (see various
attributes). By default, hydrogen addition is performed before these routines are called.
If this flag is set, multi-line input from SD file data lines into a simple string property is merged into a single string value, with tab characters indicating the newlines in the file. By default, in such cases every line of a multi-line data item is stored as a new property instance. This is equivalent to the property attribute
If set, attempt to intelligently resolve any atoms with excessive multiple bonds consuming bond electrons in excess of the available number by recoding such bonds as charge pairs.
If set, no attempt is made to add missing coordinates, for example for automatically added hydrogen atoms, to the 2D and 3D coordinate sets, if such coordinates were present in the original record.
Do not add implicit hydrogen. This flag only applies to file formats which exactly define a default number of hydrogens (for example,
) as implicit part of the structure . It has no effect in file formats which just tend to omit hydrogen (for example,
Assert to the file input routine that the input does not contain any metal atoms. In that case ambiguous atom symbols, for example
are interpreted as carbon (in alpha and delta position), and not as calcium or cadmium.
Assert that none of the metal atoms in the structure has any missing hydrogen ligands. If set, hydrogen addition, if selected, skips the processing these atoms.
By default, every ensemble or reaction read from a file is augmented with a property
, indicating the origin of the record by recording the file name, record number and other information in the automatically attached property. If this information is not of interest, this wasteful step can be suppressed by setting the flag.
Assert that the file does not contain any radicals. This can for example be helpful in the resolution of aromatic systems (see
Strictly adhere to the format specification and flag any deviation as error. This is feature is only well implemented for
. It is intended to be used for strict format checking.
Edit radicals which are typically formed by reading a file without formal atomic charge information by adding standard formal charges, for example replacing NR
and OR with O
R. This only works reasonably well if the file contains a complete hydrogen set.
Force interpretation of the atomic coordinates in the record as 2D display coordinates (property
), even if syntax or data items in the file indicate the presence of 3D coordinates. This is useful for simple reading of records where 3D coordinate fields were abused for storing display coordinates.
, read the parity information. By default, as recommended by
, this information is not read and parity is instead computed from wedges if needed.
Assume that any radical encountered is a singlet, and not anything more complex such as triplets etc., regardless what the file encodes.
Perform a tautomer standardization on the read structure. This operation invalidates numerous atom and bond properties, such as coordinates, but in this special case all ensemble properties which were attached to the processed structure are retained, regardless of their sensitivity toward atom and bond changes. Tautomer resolution requires a complete hydrogen set, so either these must be present in the input file, or a suitable hydrogen addition mode must have been set on the file handle. The processing behind this input option is comparatively expensive. For normal input, when speedy input and maximum fidelity of the data to the original file is desired, this flag should not be set.
A numerical registration ID assigned to registered modules.
Cross references of the module. This is a nested list of class
s and reference type tags.
The slot the module was loaded into. This attribute is read-only.
The name of the source file of the module. This attribute is read-only.
A list of the file suffixes this module recognizes as typical for the implemented format. If a file with a suffix is opened for writing without specifying an explicit format, the last loaded module which has the suffix in its list determines the automatically assigned format. Suffixes are ignored as format identifier for file input and updates. In these cases, the file contents are analyzed to determine the format. This attribute is read-only.
The version of the module. This is a string in a 1.2.3 (or shortened) style.
associated with this module version.
In case the format argument cannot be resolved by an active module, an attempt to auto-load a suitable module is made.
List all supported subcommands of the
command in an installation.
filex unload ?format?...
Unload zero or more I/O modules. It is an error to specify the name of a module which is not loaded.
Built-in I/O modules cannot be unloaded. If the use of one of these needs to be switched off, it is possible to set the
flag of the
module attribute via the
The command returns the number of unloaded I/O modules.