The dataset Command

The dataset command is the generic command used to manipulate datasets. The syntax of this command follows the standard schema of command/subcommand/majorhandle . Datasets are major objects and thus do not need any minor object labels for identification.

Example:

dataset get $dhandle D_SIZE

As explained in the introductory section on datasets, a normal persistent dataset handle may be substituted as third argument of the dataset command by an arbitrary list of dataset, ensemble, reaction, table and network handles. Substitution is only allowed in that argument position, not in case where a dataset handle is part of the command arguments of another object command, and not in a different argument position in the context of a dataset command. Such an object list is transformed into a transient dataset for the duration of the command execution. After the command has completed, the elements of the transient dataset are in most cases restored to their original state with respect to dataset membership and position, except in a few documented exceptional circumstances.

As a means to access an embedded dataset object, its handle may be replaced by the handle of the parent object where this is unambiguous, e.g.

ens move $eh $thandle

moves the ensemble into the embedded dataset of the table, while

dataset count $thandle

treats the table argument as part of a transient dataset as described above.

This is the list of currently officially supported subcommands:

dataset add

dataset add dhandle objhandle ?position?

Add an object to the dataset, relocating it from a current dataset if it exists. If no position is specified, the object is appended to the rear of the dataset object list. The position can either be a numerical zero-based index, or any string beginning with ‘e’ to indicate the end position.

If the object handle identifies a (local) dataset, and the target dataset does not accept datasets as members, all objects in the source dataset are instead moved to the new dataset, and then the source dataset is destroyed. If ensembles, reactions, tables or networks are moved, they are unlinked from any current datasets, but these original datasets themselves persist.

This dataset command is equivalent to issuing a move command from the object.

Example:

dataset add $dh $eh end

ens move $eh $dh end

These two commands are equivalent.

dataset addthread

dataset addthread dhandle ?body?

dataset addthread dhandle count body

dataset addthread dhandle count substitutiondict body

Add one or more script threads to the dataset. By default, a single thread is added, but by setting the count parameter to a higher number multiple threads with the same script body can be added simultaneously, up to a maximum of 32 threads per dataset. It is possible to use this command to add additional threads to a dataset which already has attached threads. These older threads remain active.

The optional substitution dictionary contains a set of percent-prefixed keys and replacement values, following the Tk event procedure model. All such replacements are made before the script is passed to the thread interpreters. A single default substitution replacing the character sequence %D with the handle of the current dataset is always predefined and cannot be redefined. Replacement token keys (but not necessarily their values) are single case-depended characters, ignoring an optional percent prefix character. Within the script, percent signs which should be preserved as such must be doubled, just like in Tk event substitution commands.

The dataset threads are compatible to those of the standard Tcl threads package. Dataset-associated threads are automatically created in preserved state, and a thread::wait command is automatically appended at the end of the script, so they can be sent additional tasks via the thread::send facility. If no script body is specified, the initial script consists only of the wait command. Threads can be canceled or joined only if they are stopped the wait statement.

When a dataset is deleted, all threads associated with this dataset need first to be joined, and this can only happen if they have finished processing the main body script and are all in their idle state in thethread::wait command. Object deletion is postponed until this condition is met. A global join on all currently executing dataset threads is automatically performed when the program exits, before any object clean-up tasks are run. An application where dataset threads are stuck and do not reach their thread::wait cancellation points cannot be cleanly exited.

Duplicating datasets does not duplicate any associated threads.

The presence of threads on a dataset has consequences for the behavior of the dataset wait and dataset pop commands, as well as object insertion commands associated with other major object classes (e.g. ens move , or molfile read ). Please refer to the respective paragraphs for details. The size control mechanism of datasets in the auto mode is also dependent on the presence of absence of linked dataset threads.

Example:

dataset addthread $dh 1 [dict create %T $th] {

	while {1}

		set eh [dataset pop %D]

		if {$eh==""} break

		if {[catch {ens get $eh E_CANONIC_TAUTOMER} eh_canonic]} {

			ens delete $eh

			continue

		if {[catch {ens get $eh_canonic E_DESCRIPTORS}]} {

			ens delete $eh

			continue

		table addens %T $eh_canonic

		ens delete $eh

This code creates a processing thread on the dataset which computes properties on newly arriving ensembles, stores the data in a table (note the table handle substitution via the replacement dictionary) and then deletes the ensemble. The dataset pop command returns an empty string when it is known no more data will arrive, and otherwise blocks until an object for popping is available. This is managed by setting the eod dataset attribute from feeder threads.

The return value of the command is a list of the Tcl thread IDs of the newly created threads. These are suitable for use in thedataset jointhreads command or any standard Tcl thread package command.

dataset append

dataset append dhandle property value ?property value?

Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.

Example:

dataset append $dhandle D_NAME “_new”

dataset append $dhandle eod 1

dataset assign

dataset assign dhandle srcprop dstprop

Copy data from one property to another. Both properties must be associated with the same object class. The source property (but currently not the destination property) may be specified as an indexed property subfield. There must be a conversion path between the data types of the two properties or property subfields involved for the operation to succeed. For example, assigning a string property to a numeric property succeeds only if the string data items contain suitable numbers.

The original property data remains valid. The command variant dataset rename directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.

If the properties are not associated with datasets (prefix D_ ), the operation is performed on all dataset member objects.

Example:

dataset assign $dhandle A_XY A_XY%

This code snippet creates backup atomic 2D layout coordinates on all dataset ensembles or reactions.

dataset cancelthreads

dataset cancelthreads ?all?

dataset cancelthreads dhandle ?all?

dataset cancelthreads dhandle threadid..

Cancel (or more precisely, wait for and join) one or more threads associated with the dataset. Dataset threads can only be canceled when they are idle, executing the implicitly added thread::wait command at the end of their script. Therefore, this command is not just used for clean-up, but also useful for ascertaining that the threads have finished their tasks. The IDs of the threads associated with a dataset can be retrieved as the threads dataset attribute, or saved from the return value of the original dataset addthread command. The special all thread ID value can be used to cancel all threads of the dataset. This can also be achieved by setting an empty thread ID parameter, or omitting it altogether. If a dataset does not possess threads, this command does nothing. If a thread marked for cancellation has not yet finished, the cancellation command is suspended until it has.

This command can also be invoked without specifying an explicit or transient dataset argument, or passing it as all. In that case, the thread join cleanup is run on all threads of all currently defined datasets. This function is also implicitly run when a a script exits, before performing other application cleanup operations.

Thread cancellation for all dataset threads is implicitly invoked when a dataset is deleted, so an explicit clean-up is not required. However, this also means that a dataset deletion blocks if there are still active threads. It is not possible to forcefully cancel an thread which has entered an infinite loop, so careful programming is required.

The command returns the number of canceled threads.

dataset jointhreads is an alias to this command.

Example:

dataset jointhreads $dh

dataset cancelthreads $dh [lindex [dataset get $th threads] 0]

dataset jointhreads

The first example waits for all threads on the specified dataset to finish. The second command waits for the completion of one specific thread, and the last command waits for all threads on all currently defined datasets.

dataset cast

dataset cast datasethandle dataset/ens/reaction/table ?propertylist?

Transform the dataset into a different object. Depending on the target object class, the result is as follows:

dataset
Only supplied for the sake of completeness. This mode does nothing.
ens
The first ensemble contained in the dataset, or a newly created empty ensemble if no such object exists. The dataset and all its other contents are destroyed in the process.
reaction
The first reaction contained in the dataset, or a newly created empty reaction if no such object exists. The dataset and all its other contents are destroyed in the process.
table
A new table with automatically set up columns which are the union of all valid ensemble-class (E_*) and reaction-class (X_*) properties of the ensembles and reactions in the dataset, and rows with the data of these objects. In addition, these objects are moved into the internal table dataset. The input dataset, and its remaining contents which were not moved to the table, are destroyed.

If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.

The command returns the handle of the new object, or the input object in case of mode dataset.

dataset clear

dataset clear dhandle

Delete all objects in the dataset, but keep the dataset object. The return value is the number of deleted objects.

dataset count

dataset count dhandle|remotehandle ?filterlist?

Get the number of objects in the dataset. If the filter parameter is specified, only those objects which pass the filter are counted.

Example:

dataset count $dhandle pstereoatom

counts the number of ensembles or reactions in the dataset with one or more potential atom stereo centers.

dataset size is an alias to this command.

This command can be used with remote datasets.

In case a simple count on a local dataset is required, without any filters, the dataset size can also be queried as attribute, as in

set n [dataset get $dhandle size]

dataset create

dataset create ?objecthandlelist?...

This command creates a new dataset and returns the handle of the new dataset. If the optional object handle lists are provided as arguments, the specified objects (in case of ensemble, reaction, network or table handles), or elements of the object (for a dataset handle, with default accept flags) are moved to the new dataset. In case the accept flags of the target dataset are configured to allow datasets as primary dataset objects, the source dataset argument is not implicitly replaced by its content objects but added as a single object, retaining its objects as content. Otherwise, the source dataset is emptied but remains a valid object.

Besides handles of ensembles, reactions, networks, tables and datasets, which are identified with priority, any string which can be decoded in an ens create statement is also allowed as member initialization identifier.

If the create statement references objects which are not usually accepted by the default settings of the accept table attribute, that attribute is automatically adjusted to allow for these objects.

The command always returns the handle of the new dataset, never the handles of any objects which may have been placed into the dataset

Examples:

dataset create [list $eh1 $eh2] $dh1

creates a new dataset and move the two specified ensembles $eh1 and $eh2, as well as everything contained in the dataset $dh1 , into the new dataset.

dataset create VXPBDCBTMSKCKZ

Above command matches a partial InChI key, and puts all structures from the NCI resolver which matches the non-stereo/isotope-specific part of their full InChI key, into the new dataset.

set ::cactvs(lookupmode) „name_pattern“

dataset create [list "+morphine +methyl"]

This command performs a name pattern lookup and puts all structures from the NCI resolver which contain both name fragments in one of their known names into the dataset. The name pattern string needs to be explicitly packed into a list, because otherwise it would be split into two independent list elements.

dataset dataset

dataset dataset dhandle ?filterlist?

Get the handle of the container dataset the dataset is a member of. If the dataset is not itself a dataset member, or does not pass all of the optional filters, an empty string is returned.

This command is not equivalent to dataset datasets !

dataset datasets

dataset datasets dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the datasets that are members in the dataset identified by the command argument handle. Other objects (ensembles, reactions, tables, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the recursive flag is set, and the dataset contains other datasets as objects, datasets in these nested datasets are also listed.

This command is not equivalent of the dataset dataset command!

Example:

set dlist [dataset datasets $dhandle]

dataset defined

dataset defined dhandle property

This command checks whether a property is defined for the dataset. This is explained in more detail in the section about property validity checking. Note that this is not a check for the presence of property data! The dataset valid command is used for this purpose.

dataset delete

dataset delete ?datasethandlelist/all?...

This command destroys datasets and everything contained therein. The special handle value all may be used to delete all datasets in the application at once.

The command returns the number of datasets which were successfully deleted.

Transient datasets cannot be used with this command. Neither can be datasets which are a component of another object, e.g. the internal datasets of tables or factories. These are only and automatically deleted when their parent object is destroyed. Datasets which are a property value are also undeletable by this command.

It is a common programming error to delete a dataset, or its parent object if one exists, without protecting its current member ensembles or reactions. If they are still needed in later processing they need to be explicitly transferred into another dataset or outside it.

Examples:

dataset delete all

dataset move $dhandle {}; dataset delete $dhandle

The first example destroys all datasets defined in the current script and everything contained in them. The second example shows how to delete a dataset and preserve its contents by moving all dataset elements out prior to deletion.

dataset dget

dataset dget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get anddataset dget is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present, dataset get anddataset dget are equivalent.

dataset dup

dataset dup dhandle ?targethandle? ?cleartarget?

If the optional arguments are not supplied, the dataset with all data attached to dataset and all objects which are contained in it are duplicated. The command returns a new dataset handle. All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as dataset list $dhandle .

It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle.

Examples:

dataset dup $dhandle

dataset dup [list $eh1 $eh2] $dtarget 0

dataset ens

dataset ens dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the ensembles in the dataset. Other objects (reactions, tables, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, ensembles which are a component of a reaction in the dataset are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and ensembles in these, as well as ensembles in reactions in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended to the result list in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, ensembles which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set elist [dataset ens $dhandle astereogenic]

lists those ensembles in the dataset which have one or more atoms which are potential atom stereo centers.

set cnt [dataset ens $dhandle {} count 1]

returns a count of all ensembles which are either directly members of the dataset, or indirectly as component objects of reactions in the dataset, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset exists

dataset exists dhandle

Check whether this dataset exists. The command returns a boolean value. This command cannot be used with transient datasets.

Example:

dataset exists $dhandle

dataset expr

dataset expr dhandle expression

Compute a standard SQL -style property expression for the dataset. This is explained in detail in the chapter on property expressions.

dataset extract

dataset extract dhandle propertylist ?filterset? ?filterprocs?

This command is rather complex and closely related to the dataset xlabel command. It was designed for the efficient extraction of major or minor object data for filtered subsets of the dataset.

The property list parameter selects the property data which is extracted. Multiple properties may be specified, but they can only be associated with major objects and one arbitrary minor object class. So it is possible to simultaneously extract an ensemble and an atom property, but not an atom and a bond property.

The return value is a nested list of data items for every object which is encountered while traversing the dataset on the level of the minor object associated with the extraction property, or just ensembles or other major objects if no such property is selected. Every list element is itself a list which contains the extracted property values in the order they are named in the property list parameter.

The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

Because this command is primarily intended for numerical data display, the returned values are formatted as with the nget command, i.e. instead of enumerated values the underlying numerical values are returned.

Example:

set dhandle [dataset create [ens create CO] [ens create CN]]

dataset extract $dhandle [list E_NAME A_SYMBOL] !hydrogen

This example first creates a dataset with methanol and methylamine . The second line performs the actual extraction and returns

{CH4O C} {CH4O O} {CH5N C} {CH5N N}

This kind of extracted data is useful for the display of filtered atomic (and other minor object’s) property values.

dataset forget

dataset forget dhandle ?objectclass?

This command is essentially the same as the ens forget (orreaction forget , etc) command. It is applied to all objects in the dataset.

If the object class is dataset , all dataset-level property data is deleted.

dataset get

dataset get dhandle propertylist ?filterset? ?parameterlist?

dataset get dhandle attribute

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

In addition to retrieving property data, it can also be used to query dataset attributes. The set of supported attributes is detailed in the paragraph on the dataset set command.

Examples:

dataset get $dhandle {D_NAME D_SIZE}

yields the name and size of the dataset as a list. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.

dataset get $dhandle [list E_FORMULA E_WEIGHT]

gives the formula and molecular weight of all dataset ensembles. The result is delivered as a nested list. The first list are the formulas, the second list contains the weights.

Currently, it is not possible to use filters with this command (and the other retrieval command variants) which are not operating directly on the dataset object, but on objects lower in the hierarchy such as ensembles or atoms.

For the use of the optional property parameter list argument, refer to the documentation of the ens get command.

Variants of the dataset get command are dataset new, dataset dget, dataset nget, dataset show, dataset sqldget, dataset sqlget, dataset sqlnew anddataset sqlshow .

dataset getparam

dataset getparam dhandle property ?key? ?default?

Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned. If a default value is set, that value is returned in case the key is not found.

If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in key/value format.

This command does not attempt to compute property data. If the specified property is not present, an error results.

Example:

dataset getparam $dhandle E_GIF format

returns the actual format of the image, which could be GIF , PNG , or various bitmap formats.

dataset hadd

dataset hadd dhandle ?filterset? ?flags? ?changeset?

Add a standard set of hydrogens to all ensembles and reactions in the dataset. If the filterset parameter is specified, only those atoms which pass the filter set are processed.

Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:

nospecial
Do not perform hydrogen addition to atoms which participate in non-standard bonds (all bonds with B_TYPE not normal ).
no2dcoords
Do not assign 2D coordinates to the added hydrogens, even if the rest of the atoms in the ensemble have valid 2D coordinates. In any case, 2D coordinates are never added when the ensemble does no already possess valid 2D coordinates.
no3dcoords
Do not assign 3D coordinates to the added hydrogens, even if the rest of the atoms in the ensemble have valid 3D coordinates. In any case, 3D coordinates are never added when the ensemble does no already possess valid 3D coordinates.
nometals
Do not attempt to add hydrogen to atoms which are metals (as defined in the system element table).
noelements
Do not add hydrogen if the ensemble consists purely of isolated metal atoms, which probably represent the material in elementary form, or as an alloy.
nomemory
Do not remember the added hydrogen atoms as automatically added. Normally, a flag is retained as part of the atom information which distinguishes atoms which were added by automatic processing, such as hydrogen addition, from those which were originally input.
resetmemory
Reset the origin flag described above for all atoms in the ensemble. All current atoms appear to be part of the original atom set.
nocations
Do not add hydrogen to atoms with a positive formal charge.
noanions
Do not add hydrogen to atoms with a negative formal charge.
nohighvalences
Do not add hydrogen to atoms which already exceed their lowest standard valence minus any formal charge. This option only applies to elements which have a defined lowest standard valence (this is configurable via the element table).
noatoms
Do not add hydrogen to atoms without any bonds.

Adding hydrogens with this command is less destructive to the property data set of the ensembles or reactions than adding them with individual atom create/bond create commands, because many properties are defined to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.

If the effects of the hydrogen addition step to the validity of the property data set should not be handled with this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying a list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.

The command returns the total number of hydrogens added to all ensembles and reactions in the dataset.

Example:

dataset hadd $dhandle

dataset hread

dataset hread dhandle ?datasethandle|enshandle? ?#recs|batch|all?

This command provides the same functionality as dataset read , but additionally adds a stand set of hydrogen atoms to the read duplicate objects.

The command arguments are explained in the section ondataset read .

dataset hstrip

dataset hstrip dhandle ?flags? ?changeset?

This command removes hydrogens from the dataset ensembles and reactions. By default, all hydrogen atoms in the dataset ensembles or reactions are removed.

The flags parameter can be used to make the operation more selective. It may be a list of the following flags:

keepspecial
If this flag is set, hydrogens which are usually displayed, such as on aldehydes, wedge bonds, carbon triple bonds or hetero atoms are retained.
keeporiginal
Hydrogen atoms which were not automatically added via a hadd command are retained. Note that hydrogen addition commands can be run in a mode which does not leave information about automatic addition - hydrogens added this way will also survive.
keepwedge
Keep hydrogens which are at the end of a wedge bond, indicating stereochemistry.
wedgetransfer
If a hydrogen atom is removed which is at the end of a wedge, the wedge information is saved by transferring the wedge (changing its up/down status if necessary) to an adjacent, surviving bond. This flag has no effects if the keepspecial or keepwedge flags are set. This flag is set by default.
keepprotons
Keep any molecules which consist only of hydrogen atoms (such as protons, hydride anions, and molecular hydrogen).
keepalphawedge
Keep hydrogen atoms which are bonded to an atom which is at the tip of a wedgebond. This flag excludes the case where the bond to the hydrogen atom is the wedge bond - use the keepwedge flag to cover this case.
normalize
Normalize the wedge pattern for standard cases, removing wedges from hydrogens if the result is still stereochemically defined. Hydrogens which lose their wedge in this process are no longer protected by the keepwedge flag.
keepisotopes
Keep hydrogen atoms which are isotope labels (including enriched/depleted 1H).

If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but the default value is overridden if any flags are set!

If the changeset parameter is given, all property change events listed in the parameter are triggered.

Hydrogen stripping is not as disruptive to the ensemble or reaction data content as normal atom deletion. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that there are implicit hydrogens.

Example:

dataset hstrip $dhandle [list keeporiginal wedgetransfer]

dataset index

dataset index dhandle

dataset index dhandle position

This command comes in two variants. The tree-word version is the generic command to check dataset memberships, which is the same for all objects which can be dataset members, while the second version retrieves object references from this dataset.

This first version gets the position of the dataset in the object list of its parent dataset. If the dataset is not part of a parent dataset, -1 is returned. This is the generic dataset membership test command variant.

This second command variant obtains the object handle of the object at the specified position in this dataset. Position counting begins with zero. If the index is outside the object position range, an empty string is returned. The special value end may be used to address the last object. The indexed object remains in the dataset.

Note that this index command is not equivalent to the standard index command on minor objects which is used to obtain the position of the minor object in the minor object list of the controlling major object. This kind of functionality is not needed for major objects, because they are not contained in any minor object list.

Example:

dataset index $dhandle end

dataset jointhreads

dataset jointhreads ?all?

dataset jointhreads dhandle ?all?

dataset jointhreads dhandle threadid..

This is an alias for the dataset cancelthreads command. Please refer to its documentation.

dataset list

dataset list ?dhandle?

Without a handle argument, the command returns a list of the handles of all existing datasets.

If a dataset handle or transient dataset is passed as third argument, the command returns a list of all major objects in the dataset. This function is different from the behavior of the list subcommand for other major object classes, where the optional argument is a filter list.

Examples:

dataset list

dataset list $dhandle

dataset lock

dataset lock filehandle propertylist/dataset/all ?compute?

Lock property data of the dataset handle, meaning that it is no longer subject to the standard data consistency manager control. The data consistency manager deletes specific property data if anything is done to the dataset handle which would invalidate the information. Property data remains locked until is it explicitly unlocked.

The property data to lock can be selected by providing a list of the following identifiers:

Property names
Valid property instances on the file object are locked. If the boolean compute flag is set, an attempt is made to compute the property if it is not yet present. Otherwise, a request to lock non-existent data is silently ignored. It is not possible to lock individual property fields.
all
All valid dataset properties are locked. The compute flag is ignored.
dataset
This is an object class identifier. All property data which is controlled by the dataset major object and attached to the specified object class is locked. Since datasets do not incorporate minor objects, this identifier is equivalent to all .

A lock can be released by a dataset unlock command.

This command does not recurse into the objects contained in the dataset.

The return value is the dataset handle or, if the dataset was transient, an empty string.

dataset loop

dataset loop dhandle objvar ?maxrec? ?offset? body

Loop over the elements in a dataset. This command is similar tomolfile loop . On each iteration, the variable is set to the handle of the current member object, and then the body code is executed. The variable refers to the original dataset element, not a duplicate. This is different from dataset read.

All operations on the current loop item are allowed, including deletion. However, the next object after the current item must not be deleted or moved, because it is needed for the iteration process.

If a maximum record count is set, the loop terminates after the specified number of iterations. If the maximum record argument is set to an empty string, a negative value, or all , the loop covers all dataset elements. This is also the default.

Within the loop, the standard Tcl break and continue commands work as expected. If the body script generates an error, the loop is exited.

If no offset is specified, the loop starts at the first element. Within the loop body, the dataset attribute record is continuously updated to indicate the current loop position. Its value starts with one, like file records in themolfile loop command.

Example:

dataset loop $dh eh {

	puts „[ens get $eh E_NAME] at position[ens index $eh]“

dataset max

dataset max dhandle propertylist ?filterset?

Get the maximum value of one or more properties in from the elements in the dataset. The property argument may be any property attached to dataset members, or minor objects thereof. If the filterset argument is specified, the maximum value is searched only for objects which pass the filter set.

Examples:

dataset max $dhandle E_WEIGHT

dataset max [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the highest molecular weight in the dataset. The second example finds the largest (most positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset metadata

dataset metadata dhandle property field ?value?

Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands dataset setparam and dataset getparam can be used for convenient manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.

Examples:

array set gifparams [dataset metadata $dhandle D_GIF parameters]

dataset metadata $dhandle D_QUALITY comment “This value looks suspicious to me”

The first line retrieves the computation parameters of the property D_GIF as keyword/value pairs. These are read into the array variable gifparams , and may subsequently be accessed as $gifparams(format) , $gifparams(height) , etc. The second example shows how to attach a comment to a property value.

dataset min

dataset min dhandle propertylist ?filterset?

Get the minimum value of one or more properties from the elements in the dataset. The property argument may be any property attached to dataset sub-elements, or minor objects thereof. If the filterset argument is specified, the minimum value is searched only for objects which pass the filter set.

Examples:

dataset min $dhandle E_WEIGHT

dataset min [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon

The first example finds the smallest molecular weight in the dataset. The second example finds the smallest (most negative, or smallest positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.

dataset molfile

dataset molfile dhandle ?filterset?

Return the handle of the molfile object associated with the dataset as backing page file. If no such file object exists, and empty string is returned.

Example:

set fh [dataset molfile $dh]

set fh [dataset get $dh pagefile]

The two commands are equivalent.

dataset move

dataset move dhandle datasethandle|remotehandle ?position?

Move, depending on the acceptance flags of the destination dataset, either the objects in the dataset or transient dataset into another local or remote dataset, or move the dataset itself. If the destination dataset handle is an empty string, the dataset objects are removed from the original dataset, but not moved into any other dataset. If the destination dataset accepts datasets as members, which is not the default (see the accept attribute in the section on dataset set ) the dataset is directly moved as object. Otherwise, its contained objects are moved, under preservation of the object order from the source dataset, and the source dataset is emptied, but not deleted.

Optionally, a position in the new dataset for the first moved object may be specified. This parameter is either an index (beginning with 0), or end , which is the default. If the contents of a dataset are spliced into another at a specific position, objects after the first element of the source dataset follow as a block.

Another special position value is random . This value moves to the dataset, or dataset contents, to a random position in the target dataset. Use of this mode with remote datasets is currently not supported.

In case of a transient command dataset the original dataset memberships of the dataset objects are not restored when the command completes.

The return value of the command is the original parent dataset of the command dataset, as it existed before the move. Usually, it is an empty string.

A dataset cannot be moved into itself.

Examples:

dataset move $dhandle $dhandle2 0

dataset move $dhandle {}

dataset move [ens list] [dataset create]

The first line moves all objects in the source dataset into the first (and following) positions in the destination dataset. The second example removes all elements from the dataset. This is often useful in order to avoid dataset member destruction with the dataset delete command. The final example shows how to move a set of ensembles (here: all ensembles currently defined in the application) into a newly created dataset via an intermediate, transient dataset.

dataset move $dhandle vioxx@server55:10001

This command moves all objects in the first dataset to the remote dataset on host server55 , which listens on port 10001 and requires the pass phrase vioxx for access.

dataset mutex

dataset mutex dhandle mode

Manipulate the object mutex. During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing. This command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:

lock
Increase the recursive mutex lock count on the object. The command returns the current lock count after the command, excluding the transient single-command lock.
reset
Release all persistent locks on the object, if they exist.
test
Return the current persistent lock count on the object. This excludes the transient per-command lock.
unlock
Decrease the recursive lock count on the object. The command returns the current lock count after the command, excluding the transient single-command lock. Unlocking an object which has not been persistently locked results in an error.

There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.

dataset need

dataset need dhandle propertylist ?mode?

Standard command for the computation of property data, without immediate retrieval of results. This command is explained in more detail in the section about retrieving property data.

If the dataset is not transient, the return value is the dataset handle.

Example:

dataset need $dhandle D_GIF recalc

dataset networks

dataset networks dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the networks in the dataset. Other objects (ensembles, reactions, datasets, tables) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, networks in these nested datasets are also listed.

Example:

set n [dataset networks $dhandle {} count]

dataset new

dataset new dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference betweendataset get anddataset new is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.

dataset nget

dataset nget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get anddataset nget is that the latter always returns numeric data, even if symbolic names for the values are available.

dataset nitrostyle

dataset nitrostyle dhandle style

Change the internal encoding of nitro groups and similar functional groups in the ensembles and reactions in the dataset. Possible values for the style parameter are:

asis No change
ionic Change to encoding to a positive charge on the center atom, and a negative on one of the oxygens
xionic As above, but also change the encoding of azides, etc.
neutral Change the encoding to the neutral form with extended valence. pentavalent is an alias.
xneutral As above, but also change the encoding of azides, etc.

The command returns the dataset handle.

dataset objects

dataset objects dhandle ?pattern?

This is a non-standard cross-referencing command. The result is a list of all the objects in the dataset, where each result list element is a list consisting of the object type (ens, reaction, table, network, dataset), and the object handle. Optionally, the list objects may be filtered by the filters in the filterset argument.

Example:

dataset objects $dhandle ens*

is roughly equivalent to

dataset ens $dhandle

except that the latter only lists the ensemble handles, not pairs of object class name and handle.

dataset pack

dataset pack dhandle ?maxsize? ?requestlist? ?suppresslist?

Pack the dataset and all objects it contains into a base-64 encoded, compressed string as a serialized object. The string does not contain any non-printing characters, quotation marks or other problematic characters and is thus well suited for storage in database tables and similar applications. These packed strings are portable and platform-independent.

By default, all property data on the dataset and its member objects are stored. By providing a request list of properties which are computed if they are not yet present, and/or a list of properties not to store, the data content may be customized.

The maxsize parameter can be used to limit the maximum length of the packed string by setting a maximum length in bytes. The default value are 128K bytes. If the string would be longer, an error is generated.

The return value of this command is the packed string.

Example:

dataset pack $dhandle

dataset pop

dataset pop dhandle|remotehandle ?position? ?timeout?

Remove an object from a dataset. The handle of the selected object is returned, and the object is no longer a member of the dataset when the command completes. If a timeout is specified, it is transferred to the dataset attribute of the same name before the command is executed, as with a dataset set command.

By default the first object in the dataset, at index zero, is returned. A different object can be selected by means of the optional position argument. It can be a numerical index, or end for the last object. If the object index if larger than the maximum index of any object, it is silently rewritten to end .

This command works with remote datasets. In that case, the object is transferred via an intermediate serialized object representation over the network. It is unpacked on the local interpreter, and deleted on the remote interpreter.

If the desired dataset object cannot be found, and a timeout is set, including a negative value for an unlimited wait time, the command suspends execution until the object appears in the dataset, for example from a different script thread or as result of a remote object insertion. If a wait would be executed, but the eod/targeteod parameter pair of the dataset indicate that no further data can be expected, the command returns an empty string instead of the object handle, but does not trigger an error. Otherwise, if the object cannot be delivered immediately or after the timeout, an error results.

Example:

set eh [dataset pop $eh end]

dataset properties

dataset properties dhandle ?pattern? ?intersect/union?

Get a list of valid properties of the dataset proper and the dataset objects. By default, both dataset properties (prefix D_ ) as well as the properties of the objects in the dataset (prefix E_ for ensembles, X_ for reactions, T_ for tables, N_ for networks, D_ for datasets as members) and the properties of their minor objects (atoms, bonds, etc.) are listed. Property subsets may be selected by specifying a string filter pattern. In case of dataset element properties which are not present in all dataset members, the default intersect mode is union, meaning that all properties are reported for which at least a single instance in any member exists. The alternative mode intersect only lists those dataset element properties which are present at all dataset members.

This command may also be invoked as dataset props .

Example:

dataset properties $dhandle D_*

dataset props $dhandle E_* intersect

The first example returns a list of the currently valid dataset-level properties. The second example lists ensemble properties which are present in all dataset objects.

dataset purge

dataset purge dhandle propertylist ?emptyonly?

Delete property data from the dataset. The properties may be both dataset properties (prefix D_ ) or properties of the dataset members, such as ensemble or atom properties. If a property marked for deletion is not present on an object, it is silently ignored.

Besides normal property names, a few convenient alias names for common property deletion tasks of ensembles in a dataset, or the reaction ensembles of reactions in the dataset, are defined and can be used as a replacement for the property list. These include:

atomstereochemistry
Delete all atomic atom stereo descriptors, but keep those for bonds.
bondstereochemistry
Delete all bond stereo descriptors, but keep those for atoms.
isotopes
Delete isotope information in A_ISOTOPE and other isotope properties which may be defined in future software versions.
radicals
Delete atomic radical information in A_RADICAL and other radical-related properties which may be defined in future software versions.
stereochemistry
Delete all stereochemistry descriptors, including 2D wedges, but not 3D coordinates. The implicit property list includes A_LABEL _STEREO, B_LABEL_STEREO, A_CIP_STEREO, B_CIP_STEREO, A_DL_STEREO, B_CISTRANS_STEREO, A_HASH_STEREO, B_HASH_STEREO , A_MAP_STEREO , B_MAP_STEREO , A_STEREOINFO , B_STEREOINFO , A_STEREO_GROUP , M_STEREO_COUNT , E_STEREO_COUNT and B_FLAGS (only selected bits, the property remains valid if present).
wedges
Delete wedge bond flags in property B_FLAGS . If B_FLAGS is not present, the command is ignored and no computation attempt is made.

The optional boolean flag emptyonly restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.

Examples:

dataset purge $dhandle D_GIF

dataset purge [ens list] E_IDENT 1

dataset purge $dhandle stereochemistry

The first example deletes the property data D_GIF for the selected dataset if it is present. The second example deletes property E_IDENT from all ensembles in the current application if their property value is equal to the default value of E_IDENT . The third examples removes stereochemistry from all dataset ensembles.

dataset reactions

dataset reactions dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the reactions in the dataset. Other objects (ensembles, tables. datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.

If the optional boolean recursive argument is set, reactions of which ensembles in the dataset are a component are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and reactions in these, as well as reactions as components of ensembles in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, reactions which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.

Example:

set xlist [dataset reactions $dhandle]

Return a list of the handles of the reactions in the dataset.

set cnt [dataset reactions $dhandle {} count 1]

returns a count of all reactions which are either directly members of the dataset, or indirectly because ensembles in the dataset are part of a reaction, or which are contained in datasets which are a themselves a member of the primary dataset.

dataset read

dataset read dhandle ?datasethandle/enshandle? ?#recs|batch|all?

This command returns duplicates of one or more objects from the current dataset iterator position ( record attribute). Its arguments mimic those of the molfile read command. The iterator record attribute is automatically incremented. When the end of the dataset is reached, an empty result is returned, but no error is raised.

The return value is usually the handle of the object duplicated from the dataset member at the current read position. If an optional target dataset has been specified. the object is appended to that dataset, and the return value is the target dataset handle. It is also possible to use the magic dataset handles new or #auto , which create a new receptor dataset.

If instead of a target dataset am existing target ensemble is specified, the recipient ensemble is cleared, and the read dataset object placed into its hull without changing its handle. This requires that the read object is an ensemble, and not a reaction, table, dataset or network, and that only a single item is read. It is also possible to use an empty argument to skip these options.

By default, a single object is duplicated and the iterator record attribute of the dataset incremented by one. With the optional third argument, a different number of objects can be selected for reading as a block. The special value all reads all remaining objects, and batch copies a number of objects corresponding to the batchsize dataset attribute. If there are insufficient objects in the dataset to read all requested records, only the available set is returned, and no error results.

The dataset contents are not changed by this command. All extracted items are object duplicates. In order to fetch original objects from the dataset, use thedataset pop command, or the various object move commands.

The command variant dataset hread provides the same functionality as this command, but additionally adds a standard set of hydrogen atoms to the duplicates.

dataset rename

dataset rename dhandle srcproperty dstproperty

This is a variant of thedataset assign command. Please refer the command description in that paragraph.

dataset request

dataset request dhandle propertylist ?reload? ?modelist?

Request property data for a dataset when the dataset is not maintained locally, but a partial shadow copy of a remotely managed dataset. It is assumed to have been only partially transferred via RPC to a slave from a master controller application, for example for display purposes, but without the full data content, which resides on the master.

If the requested property data is already present on the slave, and the reload flag is not set, this command is equivalent to a dataset need command and does not invoke communication with the master. Otherwise, the master is asked to provide the information, which may be calculated on the master only after receiving the request, or even delegated by the master to another remote server for computation.

Once the requested data has been received by the slave, it is added to the property data set of the local dataset copy. The optional modelist parameter is the same as in the dataset need command. This command is used to guarantee that critical or non-computable property data is obtained from the master. Local, unsynchronized data may still be computed by the slave using standard property data access commands. It is currently not possible to send data back to the master.

This command is only available on toolkit versions which have been compiled with RPC support.

In the absence of errors, the command returns a boolean status code. If it is zero, the request failed in a non-critical way. This for example happens in case the dataset is not under control of a remote application.

Example:

if {![dataset request $dhandle A_XY]} {

	dataset need $dhandle A_XY

is a bullet proof method of guaranteeing that correct atomic 2D display coordinates are present for the dataset structures even if the script is run in a master/slave context.

dataset rewind

dataset rewind dhandle

Reset the dataset iterator record. This is equivalent to setting the record attribute to one.

dataset scan

dataset scan dhandle expression ?mode? ?parameters?

Perform a query on the dataset or transient dataset. The syntax of the query expression is the same as that of the molfile scan command and explained in more detail in its section on query expressions. Essentially, this command behaves like an in-memory data file version of the molfile scan command. However, currently queries work on ensembles and reactions as dataset members only. Any table, network or other object which is a member of a scanned dataset is skipped. Skipped items still count as records for positioning and query result output. In the absence of a specified scan record list (order parameter), dataset scans begin at the current position of the iterator record attribute that is shared with the dataset read/hread commands.

The optional parameter dictionary is the same as for molfile scan , but not all parameters are actually used. At this time, only the matchcallback, maxhits, maxscan, order, progresscallback, progresscallbackfrequency, sscheckcallback, startposition and target parameters have an effect. If result ensembles or reactions are transferred to a remote dataset via the target parameter, they are not deleted from the local dataset but duplicates are created instead. This is because the original objects are members of the dataset which, just like a structure file would, should remain unchanged as result of a scan. In contrast, in file scans, the transferred ensembles and reactions were read from file and created as new objects during the scan, and sending these does not change the underlying file. In case a progress callback function is used, the dataset handle is passed as argument in place of the molfile handle in molfile scan .

The return value depends on the mode. The default mode is enslist . The following modes are supported for dataset queries:

array
The mode parameter is a list consisting of the mode selector array and a nested list of properties and pseudo-properties. Each property item can be a list of one to three elements. The first element is a property or pseudo-property, the second element a name, and the third element again a property or pseudo property. The the second property item list element is omitted, the name is the same as the first element. If the third element is missing, it is assumed to be the pseudo-property record.

In this mode, the scan command returns a list of the names of the created arrays. For each name, a global Tcl array variable is created, and for each scan match, an Tcl array element with an element name equal to the value of the first item specification index and an element value equal to the value of the third item specification is created. For example, the specification

{array {E_NAME name2rec} {record rec2name E_NAME}}

results in the creation of two global Tcl arrays in the current interpreter, called name2rec and rec2name . The first has elements where the element name is the name of the matching structure (property E_NAME ), and the value the file record (the default, because the optional third specification parameter was omitted). The second array has elements where the record number is the array element name, and the corresponding value the structure name. The return value of the Tcl statement is the list “name2rec rec2name” , the names of the two variables created.

If array elements for a specific key already exist, the new value is appended as a list object. The result registration procedure does not overwrite the existing content. So, for example in above case, if there are multiple records with the same structure name, the array element indexed by name would contain a list or records, not just a single record item. Since global arrays are persistent, data is also appended over multiple scan statements. If this is not desired. a statement likeunset -nocomplain $arrayname should be executed before the scan is started. It is legal to use the same array name for the registration of multiple properties. In this case, each match appends a new list element for every reported property, though these lists will not be nested.

bitvector
Return a string-encoded bit vector (series of 0s and 1s) indicating the match status for every visited record.
count
Count the number of hits. The result value is an integer.
delete
Delete hits from the dataset. This is the only scan command which actually changes the dataset.
ens
Return the handle of the first matching ensemble. The query is stopped at that point. If no hits are found, an empty string is returned. If a local target dataset is specified, a found ensemble is removed from the scanned dataset.
enslist
Return the handles of all matching ensembles. If no hits are found, an empty list is the result. If a local target dataset is specified, the found ensembles are removed from the scanned dataset.
exists
A boolean check for the existence of a match. The same as count, except that the scan stops after the first match.
index
Return the positional index of the first matching dataset object. This is the same as the record mode value minus one.
indexlist
Return the positional indices of the matching dataset objects. This is the same as the recordlist mode values minus one.
molfile
The mode parameter is a list consisting of the mode selector molfile and a structure file handle, which must have been opened for writing, appending, or updating. The first matching structure is written to the file, and the command stops at that point.The output file handle attributes determine format, selection of data written, structure encoding conventions such as hydrogen status, etc. If no matching structure is found, nothing is written. In this mode, the return value of the command is the matching record number of the input file, just as in the record mode.
molfilelist
The mode parameter is a list consisting of the mode selector molfilelist and a structure file handle, which must have been opened for writing, appending, or updating. Matching structures are written to that file. The output file handle attributes determine format, selection of data written, structure encoding conventions such as hydrogen status, etc. If no matching structures are found, nothing is written. This mode is also implicitly selected if a structure file handle is directly provided as mode argument. In this mode, the return value of the command is a list of the matching record numbers of the input file, just as in the recordlist mode
property
The mode parameter is a list consisting of the mode selector property and a sequence of properties and pseudo-properties. The selected properties for the first match are returned as a list, and the command stops at that point. If there are no hits, an empty string is returned.
propertylist
The mode parameter is a list consisting of the mode selector propertylist and a sequence of properties and pseudo-properties. The selected properties for all matches are returned as a nested list. If there are no hits, an empty string is returned. This mode is also selected if the mode argument is simply a list of property and pseudo property names without an identifiable mode keyword as first list element.
reaction
Return the handle of the first matching reaction. The query is stopped at that point. If no hits are found, an empty string is returned. If a local target dataset is specified, a found reaction is removed from the scanned dataset.
reactionlist
Return the handles of all matching reactions. If no hits are found, an empty list is the result. If a local target dataset is specified, the found reactions are removed from the scanned dataset.
record
Return the object sequence number of the first hit. Sequence numbers begin, for the sake of comparability with structure file scan record numbers, with one.
recordlist
Return object sequence numbers of all hits, or an empty list. Sequence numbers begin, for the sake of comparability with structure file scan record numbers, with one.
table
The mode parameter is a list consisting of the mode selector table and a sequence of properties and pseudo-properties. This scan mode returns a table handle. The table is automatically configured with properly typed columns corresponding to the requested properties. For each hit, a row is added. If there are no hits, a table handle is still returned, but the table does not have any rows. This retrieval mode is only available if the toolkit has been compiled with table support. The individual properties may also be specified each as a list consisting of the property name, and an arbitrary string. In that case, the string is used as the column name. By default, the column names are the same as the name of the property they store. Example:

{table {E_NAME name} {E_CAS casno} record}

sets up a table with three columns called name , casno and record . The first two columns contain property data from the matching file records, the last one the record in the file which matched.

Instead of the keyword table , an existing table handle may also be used. In that case, any existing matching table columns are automatically re-used to store result data. Additionally specified properties are added as new columns to the right of the previously existing columns. New table rows generated by matches are appended to the bottom of the table.

tablecollection
Since all objects are already in memory, this mode is identical to the table scan mode for dataset scans. No table reference object duplicates are created. The result table always refers the dataset objects directly.
vrecord
For dataset scans, this is the same as record .
vrecordlist
For dataset scans, this is the same as recordlist .

If requested property data is not present on the matched dataset objects, an attempt is made to compute it. If this fails, the table object in retrieval mode table contains NULL cells, and property retrieval as list data produces empty list elements, but no errors. For minor object properties, the property list retrieval modes produce lists of all object property values instead of a single value. In table mode, only the data for the first object is retrieved, which makes this mode less suitable for direct minor object property retrieval.

The following pseudo properties can be retrieved in addition to normal properties:

avgscore
The average value of all computed scores, such as Tanimoto, Cosine or Tversky similarity scores, in the matching query for this result.
conformerindex
The index of the matching conformer in case of 3D queries with multiple conformations, -1 if no matching conformer index was determined.
conformer
A list of the atomic coordinates of the matching conformer, if a 3D query was performed. If this is not the case, an empty vector is the result. The data type of this vector is coorvec (x,y,z-triples as vector elements).
filename
This property is only provided for compatibility with molfile scan . It is always an empty string in this command.
index
The object sequence index of the matching object. For datasets, this is the same as the record value minus one.
image
A structure GIF image (property E_GIF ) with highlighted matching substructure atoms and bonds. A normal E_GIF retrieval property would just show the structure, but without highlighting. The data type of this property is the same as that of E_GIF (depending on the configuration, a diskfile reference or an in-memory blob).
matchatoms
An integer vector holding the labels of all atoms matching the substructures used in evaluating the query expression. If no substructure was used for the match, this vector is empty. highlighatoms is an alias for this pseudo property.
matchbondatoms
The same as matchbonds , except that each element is a pair of the labels of the matching atoms in the bonds, not the bond label as a single number.
matchbonds
An integer vector holding the labels of all bonds matching the substructures used in evaluating the query expression. If no substructure was used for the match, this vector is empty. highlightbonds is an alias for this pseudo property.
matchchount
The first element of the matchcounts array, as described below. If the query does not contain any substructure match nodes, the result is empty.
matchcounts
An integer vector holding the number of distinct substructure matches for substructure query nodes in the query tree. For normal substructure expressions, this value can only be zero or one because the standard substructure match mode only checks for the presence of any match (match mode first ). Additionally, this value can be minus one if the node was never evaluated, for example because it is part of an or expression. Only if the count modifier is used together with the substructure query operator, or the substructure operator is the range operator, the possibility of multiple matches is evaluated and larger values can be obtained. For these operations the match mode is currently always distinctinneratoms (see match ss command).
maxscore
The maximum value of all computed scores, such as Cosine, Tanimoto or Tversky similarity scores, in the matching query for this result.
merit
For queries which use a merit/demerit rating scheme (for example, Bruns/Watson queries) this retrieves the accumulated merit/demerit sum of the top-level query node. The query needs to match for this retrieval to work, so in case none of the demerit rules match, you get an empty result, not a default zero merit/demerit value. Internally, there is no distinction between merit and demerit scores. The keyword demerit is an alias for this pseudo-property.
minscore
The minimum value of all computed scores, such as Cosine, Tanimoto or Tversky similarity scores, in the matching query for this result.
parent
The parent structure of the matching structure as a packed, base64-encoded serialized object string. If the dataset ensemble does not already contain it, it is computed from the structure as property E_PARENT_STRUCTURE .
productmatchatoms
The same as the matchatoms pseudo property, but for the ensemble on the right side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
productmatchbondatoms
The same as the matchbondatoms pseudo property, but for the ensemble on the right side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
productmatchbonds
The same as the matchbonds pseudo property, but for the ensemble on the right side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
queryid
The ID of the search tree query item which was responsible for the principal match. Every tree element of a query expression possesses an ID, starting with 1, and then assigned in incremental sequence from left to right in depth-first manner. For simple property or structure match expressions, the query ID is the ID of the matching branch, i.e. one for single-node expressions. For logical expressions with an or , orcontinue or not node, the overall reported query ID is that of the first matching leaf node. For expressions, where all leaves need to be checked, the query ID is the ID of the and or eor node where all leaves matched, not the ID of any individual leaf node.
reagentmatchatoms
The same as the matchatoms pseudo property, but for the ensemble on the left side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
reagentmatchbondatoms
The same as the matchbondatoms pseudo property, but for the ensemble on the left side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
reagentmatchbonds
The same as the matchbonds pseudo property, but for the ensemble on the left side of a matching reaction, not a simple structure. If no reaction was matched, this is an empty list.
record
The record number. In the context of in-memory datasets, this is the dataset object list index of the matching object plus one. rc is an alias for this pseudo property. Use the index attribute to directly obtain the dataset index.
rgatoms(rg)
A list of the atom labels in a matching structure which were mapped to an expanded R-group atom in the query. The property index is the name of the R-group of interest defined in the substructure, usually something like R1. If there was no expanded R-group of that name, the result list is empty.
rgattachments(rg)
A nested list of the atom label pairs of the bonds in a matching structure which connect between the structure framework and the atoms expanded as the named R-group rg. If there was no expanded R-group of that name, the result list is empty.
score
The first element of the scores array, as described below. If the query does not contain any scoring expressions, the result is empty.
scores
An integer vector of the results of all query expression branches, in depth-first left-to-right order, which computed a score, such as structure similarity queries with Cosine, Tanimoto or Tversky bitvector comparisons. In case a branch was not evaluated when the match was determined, zero is returned.
structure
The dataset structure as a packed, base64-encoded serialized object string.
vrecord
For dataset scans, this is always the same as record.

These pseudo properties are identical to those available for structure file queries. However, structure file queries support a couple of additional pseudo properties which are not available for dataset queries.

Examples:

dataset scan $dhandle {E_WEIGHT < 200} recordlist

dataset scan $dhandle “structure >= c1ccccc1” {table E_NAME E_LOPG record}

dataset scan $dhandle “structure >~ $sshnd 90” {cmpvalue E_REACTION_ROLE X_IDENT}

The first example returns the record numbers (dataset member indices plus one) of all structures in the dataset which have a molecular weight of less than 200.

The seconds example generates a table with columns for name, logP and record number. The table is filled with data from all structures which contain a phenyl ring as substructure.

The final example returns a nested list of the properties of all dataset structures which have a Tanimoto similarity of 90% or more to the structure which is represented by its handle stored in the variable $sshnd . In this example, the ensembles are expected to be also part of a reaction, which is possible since reaction and dataset membership are completely unrelated. Each result list element contains the actual similarity value (which is the only comparison result value with a threshold evaluated in the query, so there is no ambiguity which comparison result cmpvalue refers to), the role of the ensemble in the reaction ( reagent , product , catalyst , etc.) from property E_REACTION_ROLE , and the reaction ID in X_IDENT . The scan mode is here automatically set to propertylist , because the mode list consists exclusively of names of properties and pseudo properties.

Another example:

set is_chno [dataset scan $ehandle {formula = C0-H0-N0-O0-} count]

This command checks whether the ensemble (which is, for the duration of the command, embedded into a transient dataset) contains only elements C, H, N and O.

dataset set

dataset set dhandle property value ?property value?...

Standard data manipulation command. It is explained in more detail in the section about setting property data.

In addition to property data, the dataset object possesses a few attributes, which can be retrieved with the get command (but not its related sister subcommands like dget , sqlget , etc.). Many of them are also modifiable via dataset set. These attributes are:

accept
A bit set indicating the object classes the dataset accepts as members. Currently, this can be any combination of ens , reaction , table , network and dataset . The default acceptance mask is the union of all ens , reaction and table , excluding datasets and networks as allowed dataset objects. If an attempt is made to add an unacceptable object to a dataset, the command (such as ens move , dataset add , etc.) throws an error. If the object added to a dataset is a dataset, but the dataset does not accept datasets as members, the objects contained in the source dataset are added instead.
affiliation
The institution the author works for.
author
The author of the dataset, as free-form string data.
authorurl
A URL with information on the author of the dataset, or an empty string if unset.
batchsize
The number of objects in the dataset which form a batch. This can for example be used in the dataset read command. The default batch size is 10.
category
A category string to be used if the dataset is stored in a repository.
classuuid
The base class UUID of this dataset object, as associated with its authorship attributes.
coords
f the toolkit was compiled with factory support, these are the coordinates of the object icon on its workbench, encoded as integer pair. This attribute can be changed.
counter
An integer counter which is automatically incremented every time an object is moved into the dataset, but not when the object only changes its position within the dataset. It can also be reset to an arbitrary value, and later dataset additions increment the counter from that user-specified value. It is not decremented when objects leave the dataset, so this attribute is not necessarily the same as the dataset size. The initial counter value at dataset object creation time is zero. Depending on its mode, this attribute may interact with the insertcontrol attribute.
datasetcount
A read-only attribute reporting the number of dataset objects currently contained in the dataset.
date
The date the dataset was defined.
deletable
A boolean flag indicating whether the dataset can be deleted at this time or not. This is a read-only attribute. Under certain circumstances, such as a pending dataset wait command, or the use of the dataset object as argument to a scripted computation function expecting to be able to set function result data as property values, the dataset is marked as undeletable and any destruction command will silently fail.
deselection
The inverse of the selection attribute, i.e. get all unselected object indices, or set the selection by providing a list of object indices which are not selected.
doi
A digital object identifier for the dataset object content, if defined.
email
A contact email of the author of the dataset.
enscount
A read-only attribute reporting the number of ensemble objects currently contained in the dataset.
eod
The value of the end-of-data marker. This attribute is typically used in multi-threaded applications to indicate that feeder threads have exhausted their data supplies and that no further dataset objects are expected to arrive in the dataset. This attribute is internally used by the dataset pop and dataset wait commands to determine whether they should continue to wait or exit with an empty result. The initial value of this attribute is zero.
eodcheck
Perform a check whether at least one object is in the dataset, or is expected to arrive later. If objects are currently in the dataset, or the eod attribute value is less than the targeteod attribute value, the command returns zero, otherwise one. This check is not reliable for remote datasets.
failures
A list of properties for which computation failed on this dataset object. This is a read-only attribute. Depending on configuration settings, this information may be used to block pointless attempts at re-computation of incomputable data.
footer
If the toolkit was compiled with factory support, this is the footer of the object icon on a workbench. This attribute can be changed.
gflags
If the toolkit was compiled with factory support, this is the currently set object icon rendering flag collection.
header
If the toolkit was compiled with factory support, this is the header of the object icon on a workbench. This attribute can be changed.
hidden
Flag indicating whether the dataset is hidden. This is not the same as the invisible state. This attribute is intended to be used for rendering selections. This attribute can be changed.
highwatermark
An integer specifying a high water mark object on the dataset. Some commands use this attribute for automatic start or cancellation of operations until the object count has decreased to the low water mark, or for automatic start of processing services until the low watermark has been reached again. The default high watermark value is one. The dataset wait command uses this threshold as default command parameter.
invisible
Flag indicating whether the dataset is invisible. This is not the same as the hidden state. An invisible object is no longer accessible via its handle. This is usually the case for objects which are scheduled for deletion, but still have lingering referring pointers. This attribute is read-only.
insertcontrol
This parameter controls what happens when an attempt is made to add another object into a dataset. The default mode is add , which means that the object is inserted in the database if there is no size control active, or room can be made by waiting for already inserted objects to be removed (see sizecontrol parameter). Otherwise, in that mode an error results.

Additional insertion control modes are disabled (all insertions into the dataset are blocked), discardfirst (if the maximum size has been reached, delete first object in dataset to make room), discardlast (if the maximum size has been reached, delete last object in dataset to make room), discardobject (if the maximum size has been reached, delete the object to be inserted), discardalways (never attempt an actual insertion, always delete the insertion object), ignore (if insertion cannot be performed, leave the insertion object where it currently is, with preservation of current dataset membership) and unlink (silently remove the insertion object from its old dataset, if it is a member of one, but do not insert it into the target dataset if that would exceed its maximum size).

If the object cannot be inserted and is deleted (but not if it is just unlinked or ignored, and thus continuing to exist) the dataset counter is still incremented.

The final mode is discardrandom . In this mode, if the maximum size of the dataset has not yet been reached, the object is simply added. Otherwise, a random number between one and the counter attribute of the dataset is computed. If the number is larger than the maximum dataset size, the object to be inserted is deleted, as in the discardnew mode. If the random number is between one and the dataset size, the object in the dataset at the random position is deleted. After that, the new object inserted at its designated position, which is not necessarily the position of the removed object. This mode is intended to support convenient sampling of object subsets. The random procedure yields the same mathematical results as directly picking random objects from the total object pool passing through the dataset, but may be interrupted at any time yielding a random subset of the objects processed so far.

instanceuuid
The instance UUID of this dataset, as associated with its authorship attributes.
infourl
A URL with information on the dataset object content, or an empty string if unset.
keywords
A list of keywords associated with the table object.
license
The license class associated with this dataset object. Setting the license to a standard type updates the associated URL with a standard location.
licenseurl
A URL with details about the dataset object license.
literature
A free-form literature reference for the dataset.
lowwatermark
An integer specifying a low water mark object count on the dataset. Some commands use this attribute for automatic scheduling or termination of actions. The default low watermark is zero.
maxsize
The maximum number of objects the dataset will accept. If it is set to a negative value, which is the default, the maximum number of objects is unlimited. The effects of an attempt to overload the dataset depend on the settings of the sizecontrol attribute of the dataset.
modcount
The modification count on the dataset object. This is a read-only attribute.
mutexcount
The mutex lock count as a read-only value. This is mostly of interest to developers.
name
A free-form dataset name as string.
networkcount
A read-only attribute reporting the number of network objects currently contained in the dataset.
orcid
The ORCID code of the author (see www.orcid.org).
pagefile
The handle of a molfile object. If this is set, the current contents of the dataset are deleted, the pageoffset attribute set to the current input position of the file, and a number of records up to the current value of the pagesize attribute are read into the dataset. If this attribute is set to an empty string, the connection between the dataset and the structure file is abolished.
pageoffset
The file record offset of the first object in the dataset, if the dataset is linked to a file. If this value is changed, and a link is active, dataset objects with file records outside the offset/pagesize window are deleted from the end or beginning, and new objects are added from the backing file as required.
pagesize
The number of records to keep in the dataset in case it is linked to a file. If this value is changed, and a link is active, dataset members are deleted from the dataset, or added from the backing file as necessary.
parent
Get the handle of the parent object, if the dataset is an embedded object, e.g. an integral component of a table, factory or station object. If the dataset is a standalone object, an empty string is returned. The parent attribute is not the same as dataset membership (see dataset dataset command), which can be changed (see dataset move command and the accept dataset attribute). This attribute is read-only. An embedded dataset object cannot be dissociated from its owner.
passphrase
A string which needs to be presented by remote interpreters if they connect to the listener port of the dataset object. An empty string is equivalent to no pass phrase.
path
The repository path for displaying hierarchical repository trees. This attribute is independent of any file system paths.
port
An integer port number at which a listener thread waits for connections from remote interpreters for the addition or removal of objects. If this attribute is set to an empty string, an existing listener thread is terminated and remote connections are no longer accepted.
pyobject
If the toolkit was compiled with Python support, this attribute reports the memory address of the Python wrapper class instance, if it exists. This attribute is read-only.
pyrefcount
If the toolkit was compiled with Python support, this attribute reports the reference count of the Python wrapper class instance, if it exists. This attribute is read-only.
reactioncount
A read-only attribute reporting the number of reaction objects currently contained in the dataset.
record
The current iterator record position. The first object in the dataset corresponds to record 1.
refcount
If the Tcl interpreter is using native Cactvs objects instead of string-based major object handles and integer-based minor object labels to identify toolkit objects, this returns the number of Tcl object references active for this ensemble. This attribute is read-only.
references
Cross references of the dataset. This is a nested list of class UUIDs and reference type tags.
regid
For registered datasets, the registration ID. Zero if this is a private dataset.
room
A read-only integer attribute which indicates whether the dataset has room for the insertion of another object. Datasets without size control always return 1, as do datasets which still have room for more objects. Return value 0 indicates that the maximum size has been reached, and no alternative action has been configured. Other possible special return values are -1 (insertion succeeds, but delete the inserted object), -2 (insertion will silently fail, the object remains in its old dataset membership), -3 (the object will be unlinked from any existing dataset, but silently not inserted into the new dataset) and -4 (the object will not be inserted in the target dataset, instead an application-specific alternative action will be taken). This attribute only checks the capacity of the dataset, not whether it will reject the object because it is of an unsuitable class (see accept attribute). In multi-threaded applications, the status value may become outdated before an insert command on the target dataset can be executed.
selected
Flag indicating whether the dataset object is selected. This attribute can be changed. This attribute works on the dataset object proper, not its content - see the selection attribute below.
selection
Upon retrieval, this attribute is a list of the position indices of all objects in the dataset which have the selected status flag. The index begins with zero, and the result is an empty list if there are no selected objects.

On setting, dataset set first clears all dataset object selections. The command dataset append retains it. The argument is then parsed as a list of integer object indices, and the selection flag is set for all those indices where objects can be found in the dataset. Indices outside the range between zero and the dataset size minus one or duplicate index specifications are silently ignored.

To check or set the selection status of the dataset object proper, use the selected attribute.

size
Get the number of objects in the dataset. This is a read-only attribute. It is equivalent to the dataset count command without any filters.
sizecontrol
This attribute operations in tandem with the maxsize attribute. It can be set to auto , none , error or block . pause and wait are aliases for block . The default setting is auto . In error mode, any attempt to add an object to a dataset which has already reached its maximum size raises an error. In block mode, the interpreter halts until the object count has decreased below the maximum size and then continue to move the object into the dataset. This mode is useful when the script is multi-threaded or the dataset operates a listener port for remote commands, because the number of objects in the dataset can change by these methods without involving the paused interpreter. The none mode disables the maximum size monitoring. Finally, the auto mode behaves like the error mode if there is only a single interpreter thread, and the dataset does not listen for remote commands, and like the block mode, if any of these two criteria are met.
swapthreshold
The maximum size of a dataset before ensemble and reaction objects in it are automatically swapped to disk, as they are by the explicit commands ens swapout or reaction swapout . The size check is performed at the moment new objects are added, and these new objects are the first to be swapped. The default value for this attribute can be set in the control array element:: cactvs(dataset_swap_threshold ). Its initial value is 10000. The default value for the embedded datasets in tables is controlled separately by :: cactvs(table_swap_threshold ), which is also initially set to 10000.

If this value is set to a negative value, all dataset elements which are currently swapped out are loaded back in. If it is set to a positive value, and the number of not currently swapped out objects of the dataset is more than the new limit, excess objects are swapped beginning from the end of the dataset queue until the in-memory object count of the dataset satisfies the new constraint. If the limit is increased, but not set to a negative unlimited value, the object swap status is not modified.

tablecount
A read-only attribute reporting the number of table objects currently contained in the dataset.
targeteod
The target value of the eod attribute. Once it matches or exceeds this value, the dataset is not expected to receive any more items. The initial value of this attribute is one.
threadcount
A read-only attribute returning the number of Tcl interpreter threads associated with the dataset. Normal datasets have no associated threads and return zero. This command is equivalent to the length of the list returned by the threads attribute, and the threads included in this count are the same.
threads
A read-only attribute returning the Tcl interpreter thread handles of the threads associated with the dataset (see dataset addthread command). Datasets without threads return an empty list. The handles are compatible with the standard Tcl thread package. Remote communication listener threads (see port attribute) are independent of Tcl support, do not have a Tcl handle, and are not listed by this command.
timeout
A timeout in seconds to use with the dataset wait command. A negative value means an infinite wait period, and zero no wait period. The default setting is minus one.
tooltip
If the toolkit was compiled with factory support, this is the tooltip of the object icon on a workbench. This attribute can be changed.
uuid
The automatically generated object instance UUID. This ID is independent of the UUID triple (class/instance/version) associated with the authorship attributes and intended for public dissemination. This attribute is read-only, unique for every dataset object - even duplicates -, and independent of its contents or pedigree.
version
A free-form version number of the dataset.
versionuuid
The version UUID associated with this dataset object as per its authorship attributes.
x
If the toolkit was compiled with factory support, this is the x coordinate of the object icon on its workbench. This attribute can be changed.
y
if the toolkit was compiled with factory support, this is the y coordinate of the object icon on its workbench.This attribute can be changed.

Examples:

dataset set $dhandle D_NAME “New lead structures”

dataset set $dhandle E_NAME “Lead (metal)”

The first line is a simple set operation for a dataset property. The second line shows how to set properties of multiple ensembles in one step. The same property value is assigned to all ensembles.

dataset set $dhandle port 10001 passphrase blockbuster

Set up a listener thread on port 10001 which accepts connections from remote interpreters which need to present the pass phrase as credential. Remote interpreters can add (ens move , reaction move , table move ) or remove (dataset pop ) objects to or from this dataset, as well as query the dataset object count (dataset count ). Objects are transferred over the network connection as serialized objects to and from the remote interpreters.

dataset setparam

dataset setparam dhandle property key value ?key value?...

Set or update a property computation parameter in the parameter list of a valid property. This command is described in the section about retrieving property data.

Example:

dataset setparam $dhandle D_GIF comment “Top Secret”

dataset show

dataset show dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get anddataset show is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present,dataset get anddataset show are equivalent.

dataset sort

dataset sort dhandle {property ?direction? ?cmpflags?}..

Sort a dataset according to property values of the objects in the dataset. If no sort property set is specified, the default sort properties are E_NATOMS (number of atoms) and, for breaking ties, E_WEIGHT (molecular weight) and finally E_HASHISY (stereo isotope hash code).

Every sort item is interpreted as a nested list and can have from one to three elements. The first, mandatory element is the sort property, or one of the magic names record or random . The next optional element is the sort direction, specified as up (or ascending ) or down ( descending ). The default sorting order is ascending. The final optional comparison flags parameter can be set to a combination of any of the values allowed with the prop compare command. The default is an empty flag set. Properties in the sort list have precedence in the order they are specified in. Object property values of comparison list entries to the right in this list are only considered if the comparison of all data values of list elements to the left results in a tie.

The magic property name record sorts by the object index in the dataset. Sorting upwards on this property does not change the object sequence in the dataset, and sorting downwards reverses it. This pseudo property is always added as a final implicit criterion, so that the sequence order of objects tied in all explicit comparisons is preserved. The other magic property name random assigns a random value to all dataset objects and sorts on this value, yielding a random object sequence.

The command returns a list of the handles of the objects controlled by the dataset in the newly sorted order. Simultaneously, the objects are physically moved within the dataset, so the sort has a persistent effect. The same result list may later be obtained by a dataset objects command.

It is possible to sort transient datasets, but this makes sense only if the object list sequence returned as command result is captured and used later, because the sort effect is not persistent since there exists no permanent dataset object.

Examples:

dataset sort $dhandle {E_NAME up {ignorecase lazy}]

The example sorts the dataset according to the compound name (property E_NAME , data type string) in alphabetic order, using a lazy (ignoring whitespace and punctuation) and case-insensitive comparison mode.

dataset sort $dhandle {E_NATOMS down} {E_NRINGS up}

Sort the dataset in such a way that the ensembles with the largest number of atoms, and among these those with the smallest number of rings, come first.

dataset sort $dhandle random

This command randomizes the object order in the dataset.

dataset sort $dhandle {*}$sortlist

This is the recommended construct when using a sort property list store in a Tcl variable as command argument. Older versions of the dataset sort command used a single sort argument parameter instead of a variable-size argument set.

dataset sqldget

dataset sqldget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get anddataset sqldget are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlget

dataset sqlget dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The difference between dataset get anddataset sqlget is that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlnew

dataset sqlnew dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the dataset get command. The differences between dataset get anddataset sqlnew are that the latter forces re-computation of the property data, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset sqlshow

dataset sqlshow dhandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see thedataset get command. The differences between dataset get anddataset sqlshow are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

dataset statistics

dataset statistics dhandle property

Get basic statistics on the property values of the objects in the dataset. The property can be a basic property or a property subfield, but its element data type needs to be cast-able to a simple numeric type. In addition, it must be directly attached to any of the objects which can be members of a dataset, e.g. an ensemble property, but not an atom property.

If the property data is not present on any of the objects, an attempt is made to compute it. In case that fails, or a dataset member object is not of a matching type, these objects are silently skipped.

The return value is a list containing, in this order, the number of objects in the dataset which were used for the statistics, the property value sum, the property value average and the property data standard deviation. The latter three values are floating point, regardless of the property data type. In case any of these values are not computable, for example because there were an insufficient number of objects, the reported value is zero.

The command verb can be abbreviated as stats .

Example:

lassign [dataset statistics $dh E_WEIGHT] n sum avg stddev]

dataset subcommands

dataset subcommands

Lists all subcommands of the dataset command. Note that this command does not require a dataset handle.

dataset tables

dataset tables dhandle ?filterset? ?filtermode? ?recursive?

Return a list of all the tables in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.

If the recursive flag is set, and the dataset contains other datasets as objects, tables in these nested datasets are also listed.

Example:

set n [dataset tables $dhandle {} count]

dataset taint

dataset taint dhandle propertylist/changeset ?purge?

Trigger a property data tainting event which acts on the dataset data, and all objects and their data contained in the dataset.

The parameters of this command are the same as for ens taint and explained there.

Example:

dataset taint $dhandle A_XYZ

All property data on the dataset and the dataset members is invalidated if it directly or indirectly depends on the 3D atomic coordinates.

dataset transform

dataset transform dhandle SMIRKSlist ?direction? ?reactionmode?

	?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}?	?maxstructures? ?timeout? ?maxtransforms? ?niterations?

This command is complex, but very similar to the ens transform command. Please refer to that command for a full description of the command arguments. The major difference is that the start structure set is not a single ensemble, but rather the set of all ensembles in the dataset. Any dataset items which are not ensembles are ignored. The return value is, just as with the ens transform command, a list of result ensembles. These do not become part of the input dataset.

Example:

dataset transform [ens get $ehandle E_KEKULESET] $trafolist bidirectional \

	multistep all {preservecharges checkaro setname}

This command first expands an ensemble object into a set of Kekulé structures. The property data type of the E_KEKULESET property is a dataset, so its handle is returned, and this dataset is then submitted for further transformation, which in this case involves manipulations of bonds in aromatic systems and thus is dependent on the Kekulé structures of the input ensembles.

The dataset variant of the transform command does not allow the use of marked or unmarked atom or bond specifications in the exclusion substructure list. Normal substructures are supported, and are applied to all start structures.

dataset unique

dataset unique dhandle {property ?direction? ?cmpflags?}..

This command removes duplicate objects from the dataset and destroys them. Object identity is determined by pair-wise comparison of one or more properties. If all these properties are identical for any two objects, one of them is deleted. If no properties are specified, the default is the single property E_HASHISY , the standard isotope- and stereo-aware ensemble hash code.

The command returns the ordered list of objects remaining in the dataset after deletion. The command is closely related to the dataset sort command, and the same restrictions on usable sort properties apply. Internally, the command performs a sort first, in order to avoid a quadratic growth of pair-wise comparisons. This has the side effect that the object order in the dataset is not preserved. Instead, the surviving objects are listed in ascending (by default) or descending (if the corresponding optional sort direction argument is set accordingly) values of the sort properties. The interpretation of the optional comparison flags and sort direction arguments, as well as the priority of the properties, and the special considerations when working on transient datasets, are the same as for the command dataset sort .

Example:

molfile read $fh $dh all

dataset unique $dh

This command first reads a complete file into a dataset, and then discard duplicates, using the default isotope- and stereo-aware structure hash code.

dataset unlock

dataset unlock dhandle propertylist/dataset/all

Unlock property data for the dataset object, meaning that they are again under the control of the standard data consistency manager.

The property data to unlock can be selected by providing a list of the following identifiers:

Property names
Valid property instances on the dataset object are unlocked. Non-existent data is silently ignored. It is not possible to unlock individual property fields.
all
All valid dataset object properties are unlocked.
dataset
This is an object class identifier. All property data which is controlled by the dataset major object and attached to the specified object class is unlocked. Since datasets do not incorporate minor objects, this identifier is equivalent to all .

Property data locks are obtained by the dataset lock command.

This command does not recurse into the objects contained in the dataset.

The return value is the dataset handle, or, if the argument was a transient dataset, an empty string.

dataset unpack

dataset unpack string

Generate a dataset complete with all elements it contains from a packed, base64-encoded serialized object string, as it is generated by the complementary dataset pack command.

The return value is the handle of the new dataset.. All objects in this dataset also are assigned standard handles, which can be retrieved with the usual commands such as dataset ens and dataset reactions .

Note that this command does not take a dataset handle as argument, but a pack string.

Example:

dataset unpack [dataset pack $dhandle]

This example is effectively the same as a dataset dup operation, but of course less efficient, because the objects have to be serialized, compressed, and base64-encoded and the same sequence of operations run backward again.

dataset valid

dataset valid dhandle propertylist

Returns a list of boolean values indicating whether values for the named properties are currently set for the dataset. No attempt at computation is made.

Example:

dataset valid $dhandle D_NAME

reports whether the dataset is named (has a valid D_NAME property) or not.

dataset wait

dataset wait dhandle ?size|query? ?script?

Suspend the interpreter until the number of objects in the dataset has reached a threshold, or an object which satisfies a query expression can be found. The syntax of query expressions is the same as in the dataset scan command. If no explicit size or a query expression is specified, or an empty string is passed as this parameter, the command uses the value of the highwatermark dataset attribute as default value for an implicit size threshold condition.

Another dataset attribute which has an influence on the execution of the command is the timeout attribute. If the dataset size has not grown to the required size, or no object which satisfies the query expression was added to the dataset after waiting for the timeout number of seconds, an error is raised. By default, the maximum wait period is indefinite, which corresponds to a negative timeout value. If the timeout value is set to zero, the wait condition must be met immediately, or an error results. However, no error is raised if the eod/targeteod dataset parameter pair indicates that no more data can be expected to be added in the dataset. In that case, the result is an empty string.

If no script body parameter is used, the return value of the command is the number of objects the dataset holds in case of an explicit or implicit size condition, or the handle of the first matching object in case of a query expression.

If the object count already exceeds the threshold, or a matching object can be found at the moment the command is executed, the command returns immediately.

In the presence of a script parameter, the script body is executed whenever the wait condition is met. If the script is ended with a continue statement, or simply reaches the end of the code block, the wait loop is automatically restarted. If the script reports an error, or is left via a break or return statement, the loop is terminated.

This command is mostly useful when running multi-threaded scripts, or when the dataset is operating a remote command listener on a port. Under these circumstances, new objects may arrive in the dataset without participation of the local, stopped interpreter.

While a dataset wait command is pending, the dataset cannot be deleted. Since it is possible that other threads or remote action port monitors further update the dataset between the time the wait condition is met and script processing commences, action scripts should be prepared to see more or less items in the dataset than immediately after the trigger event.

Example:

loop n 1 $nrecs {

	set eh [dataset wait $dh “E_FILE(startrec) = $n”]

	molfile write $fh $eh

	ens delete $eh

This is a part of a simple write thread which writes back processed ensembles in the same order as they were read from an input file. In case there are multiple processing threads, it is likely to happen that the computation on an ensemble read from a larger input file record finishes before another with a smaller record number and thus the sequence of the ensembles to be written as delivered in the output queue becomes out of sync. By waiting for ensembles in the input record sequence the original order is preserved. More robust versions of such a script should handle the case of ensembles from a specific input record never appearing in the dataset and similar sources of disruption.

dataset weed

dataset weed dhandle keywords

This command performs standard clean-up operations on all ensembles and reactions in the dataset. The supported operations are described in more detail in the section on the equivalent ens weed command.

The return value of this command is the dataset handle.

dataset xlabel

dataset xlabel dhandle propertylist ?filterset? ?filterprocs?

This command is rather complex and closely related to the dataset extract command. Its purpose is to extract handle and label information for selected subsets of the dataset. The return value is a nested list. The sublists consist of the object handle, the object label (if the object does not have a label, 1 is substituted), and the dataset object index. The dataset object index starts with zero.

The selection of the class of objects which are extracted is performed indirectly via the property list. For practical purposes, this list should be a single property. Its object association type determines the class of objects selected. For example, A_LABEL or A_SYMBOL returns atom labels, while B_ORDER returns bond labels and E_NAME select complete ensembles (with 1 as pseudo ensemble label).

The returned objects can further be filtered by a standard filter set, and additionally by a list of callback procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback procedure used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.

The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.

This command is primarily useful for the display of filtered minor object data from datasets, such as atom property values for specific types of atoms.

Example:

set dhandle [dataset create [ens create O] [ens create C=C]]

dataset xlabel $dhandle A_LABEL !hydrogen

dataset xlabel $dhandle B_ORDER doublebond

First, a dataset with two ensembles (water and ethene) is created. This dataset is then queried. The first query is for all atoms in it which are not hydrogen. The returned list is

{ens0 1 0} {ens1 1 1} {ens1 2 1}

In object ens0 , which is the first object in the dataset, atom 1 passes the filter. In object ens1 , which is the second object in the dataset, atoms with label 1 and 2 pass. The second query asks for the labels of double bonds in the dataset. The use of property B_ORDER is arbitrary - any other bond property would do as well. The return value of this command is

{ens1 1 1}

which indicates that only the bond with label 1 in object ens1 , which is the second object in the dataset, fulfills this condition.