The dataset command is the generic command used to manipulate datasets. The syntax of this command follows the standard schema of command/subcommand/majorhandle . Datasets are major objects and thus do not need any minor object labels for identification.
dataset get $dhandle D_SIZE
As explained in the introductory section on datasets, a normal persistent dataset handle may be substituted as third argument of the
dataset
command by an arbitrary list of dataset, ensemble, reaction, table and network handles. Substitution is only allowed in that argument position, not in case where a dataset handle is part of the command arguments of another object command, and not in a different argument position in the context of a
dataset
command. Such an object list is transformed into a transient dataset for the duration of the command execution. After the command has completed, the elements of the transient dataset are in most cases restored to their original state with respect to dataset membership and position, except in a few documented exceptional circumstances.
As a means to access an embedded dataset object, its handle may be replaced by the handle of the parent object where this is unambiguous, e.g.
ens move $eh $thandle
moves the ensemble into the embedded dataset of the table, while
dataset count $thandle
treats the table argument as part of a transient dataset as described above.
This is the list of currently officially supported subcommands:
dataset add dhandle objhandle ?position?
Add an object to the dataset, relocating it from a current dataset if it exists. If no position is specified, the object is appended to the rear of the dataset object list. The position can either be a numerical zero-based index, or any string beginning with ‘e’ to indicate the end position.
If the object handle identifies a (local) dataset, and the target dataset does not accept datasets as members, all objects in the source dataset are instead moved to the new dataset, and then the source dataset is destroyed. If ensembles, reactions, tables or networks are moved, they are unlinked from any current datasets, but these original datasets themselves persist.
This dataset command is equivalent to issuing a move command from the object.
dataset add $dh $eh end
ens move $eh $dh end
dataset addthread dhandle ?body?
dataset addthread dhandle count body
dataset addthread dhandle count substitutiondict body
Add one or more script threads to the dataset. By default, a single thread is added, but by setting the count parameter to a higher number multiple threads with the same script body can be added simultaneously, up to a maximum of 32 threads per dataset. It is possible to use this command to add additional threads to a dataset which already has attached threads. These older threads remain active.
The optional substitution dictionary contains a set of percent-prefixed keys and replacement values, following the Tk event procedure model. All such replacements are made before the script is passed to the thread interpreters. A single default substitution replacing the character sequence %D with the handle of the current dataset is always predefined and cannot be redefined. Replacement token keys (but not necessarily their values) are single case-depended characters, ignoring an optional percent prefix character. Within the script, percent signs which should be preserved as such must be doubled, just like in Tk event substitution commands.
The dataset threads are compatible to those of the standard
Tcl
threads package. Dataset-associated threads are automatically created in
preserved
state, and a
thread::wait
command is automatically appended at the end of the script, so they can be sent additional tasks via the
thread::send
facility. If no script body is specified, the initial script consists only of the wait command. Threads can be canceled or joined only if they are stopped the wait statement.
When a dataset is deleted, all threads associated with this dataset need first to be joined, and this can only happen if they have finished processing the main body script and are all in their idle state in the
thread::wait
command. Object deletion is postponed until this condition is met. A global join on all currently executing dataset threads is automatically performed when the program exits, before any object clean-up tasks are run. An application where dataset threads are stuck and do not reach their t
hread::wait
cancellation points cannot be cleanly exited.
Duplicating datasets does not duplicate any associated threads.
The presence of threads on a dataset has consequences for the behavior of the
dataset wait
and
dataset pop
commands, as well as object insertion commands associated with other major object classes (e.g.
ens move
, or
molfile read
). Please refer to the respective paragraphs for details. The size control mechanism of datasets in the auto mode is also dependent on the presence of absence of linked dataset threads.
dataset addthread $dh 1 [dict create %T $th] {
while {1}
set eh [dataset pop %D]
if {$eh==""} break
if {[catch {ens get $eh E_CANONIC_TAUTOMER} eh_canonic]} {
ens delete $eh
continue
}
if {[catch {ens get $eh_canonic E_DESCRIPTORS}]} {
ens delete $eh
continue
}
table addens %T $eh_canonic
ens delete $eh
}
}
This code creates a processing thread on the dataset which computes properties on newly arriving ensembles, stores the data in a table (note the table handle substitution via the replacement dictionary) and then deletes the ensemble. The
dataset pop
command returns an empty string when it is known no more data will arrive, and otherwise blocks until an object for popping is available. This is managed by setting the
eod
dataset attribute from feeder threads.
The return value of the command is a list of the
Tcl
thread IDs of the newly created threads. These are suitable for use in the
dataset jointhreads
command or any standard
Tcl
thread package command.
dataset append dhandle property value ?property value?
Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.
dataset append $dhandle D_NAME “_new”
dataset append $dhandle eod 1
dataset assign dhandle srcprop dstprop
Copy data from one property to another. Both properties must be associated with the same object class. The source property (but currently not the destination property) may be specified as an indexed property subfield. There must be a conversion path between the data types of the two properties or property subfields involved for the operation to succeed. For example, assigning a string property to a numeric property succeeds only if the string data items contain suitable numbers.
The original property data remains valid. The command variant
dataset rename
directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.
If the properties are not associated with datasets (prefix D_ ), the operation is performed on all dataset member objects.
dataset assign $dhandle A_XY A_XY%
This code snippet creates backup atomic 2D layout coordinates on all dataset ensembles or reactions.
dataset cancelthreads ?all?
dataset cancelthreads dhandle ?all?
dataset cancelthreads dhandle threadid..
Cancel (or more precisely, wait for and join) one or more threads associated with the dataset. Dataset threads can only be canceled when they are idle, executing the implicitly added
thread::wait
command at the end of their script. Therefore, this command is not just used for clean-up, but also useful for ascertaining that the threads have finished their tasks. The IDs of the threads associated with a dataset can be retrieved as the threads dataset attribute, or saved from the return value of the original
dataset addthread
command. The special all thread ID value can be used to cancel all threads of the dataset. This can also be achieved by setting an empty thread ID parameter, or omitting it altogether. If a dataset does not possess threads, this command does nothing. If a thread marked for cancellation has not yet finished, the cancellation command is suspended until it has.
This command can also be invoked without specifying an explicit or transient dataset argument, or passing it as all. In that case, the thread join cleanup is run on all threads of all currently defined datasets. This function is also implicitly run when a a script exits, before performing other application cleanup operations.
Thread cancellation for all dataset threads is implicitly invoked when a dataset is deleted, so an explicit clean-up is not required. However, this also means that a dataset deletion blocks if there are still active threads. It is not possible to forcefully cancel an thread which has entered an infinite loop, so careful programming is required.
The command returns the number of canceled threads.
dataset jointhreads
is an alias to this command.
dataset jointhreads $dh
dataset cancelthreads $dh [lindex [dataset get $th threads] 0]
dataset jointhreads
The first example waits for all threads on the specified dataset to finish. The second command waits for the completion of one specific thread, and the last command waits for all threads on all currently defined datasets.
dataset cast datasethandle dataset/ens/reaction/table ?propertylist?
Transform the dataset into a different object. Depending on the target object class, the result is as follows:
If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.
The command returns the handle of the new object, or the input object in case of mode dataset.
dataset clear dhandle
Delete all objects in the dataset, but keep the dataset object. The return value is the number of deleted objects.
dataset count dhandle|remotehandle ?filterlist?
Get the number of objects in the dataset. If the filter parameter is specified, only those objects which pass the filter are counted.
dataset count $dhandle pstereoatom
counts the number of ensembles or reactions in the dataset with one or more potential atom stereo centers.
dataset size
is an alias to this command.
This command can be used with remote datasets.
In case a simple count on a local dataset is required, without any filters, the dataset size can also be queried as attribute, as in
set n [dataset get $dhandle size]
dataset create ?objecthandlelist?...
This command creates a new dataset and returns the handle of the new dataset. If the optional object handle lists are provided as arguments, the specified objects (in case of ensemble, reaction, network or table handles), or elements of the object (for a dataset handle, with default accept flags) are moved to the new dataset. In case the accept flags of the target dataset are configured to allow datasets as primary dataset objects, the source dataset argument is not implicitly replaced by its content objects but added as a single object, retaining its objects as content. Otherwise, the source dataset is emptied but remains a valid object.
Besides handles of ensembles, reactions, networks, tables and datasets, which are identified with priority, any string which can be decoded in an
ens create
statement is also allowed as member initialization identifier.
If the create statement references objects which are not usually accepted by the default settings of the accept table attribute, that attribute is automatically adjusted to allow for these objects.
The command always returns the handle of the new dataset, never the handles of any objects which may have been placed into the dataset
dataset create [list $eh1 $eh2] $dh1
creates a new dataset and move the two specified ensembles $eh1 and $eh2, as well as everything contained in the dataset $dh1 , into the new dataset.
dataset create VXPBDCBTMSKCKZ
Above command matches a partial InChI key, and puts all structures from the NCI resolver which matches the non-stereo/isotope-specific part of their full InChI key, into the new dataset.
set ::cactvs(lookupmode) „name_pattern“
dataset create [list "+morphine +methyl"]
This command performs a name pattern lookup and puts all structures from the NCI resolver which contain both name fragments in one of their known names into the dataset. The name pattern string needs to be explicitly packed into a list, because otherwise it would be split into two independent list elements.
dataset dataset dhandle ?filterlist?
Get the handle of the container dataset the dataset is a member of. If the dataset is not itself a dataset member, or does not pass all of the optional filters, an empty string is returned.
dataset datasets dhandle ?filterset? ?filtermode? ?recursive?
Return a list of all the datasets that are members in the dataset identified by the command argument handle. Other objects (ensembles, reactions, tables, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the recursive flag is set, and the dataset contains other datasets as objects, datasets in these nested datasets are also listed.
This command is
not
equivalent of the
dataset dataset
command!
set dlist [dataset datasets $dhandle]
dataset defined dhandle property
This command checks whether a property is defined for the dataset. This is explained in more detail in the section about property validity checking. Note that this is
not
a check for the presence of property data! The
dataset valid
command is used for this purpose.
dataset delete ?datasethandlelist/all?...
This command destroys datasets and everything contained therein. The special handle value all may be used to delete all datasets in the application at once.
The command returns the number of datasets which were successfully deleted.
Transient datasets cannot be used with this command. Neither can be datasets which are a component of another object, e.g. the internal datasets of tables or factories. These are only and automatically deleted when their parent object is destroyed. Datasets which are a property value are also undeletable by this command.
It is a common programming error to delete a dataset, or its parent object if one exists, without protecting its current member ensembles or reactions. If they are still needed in later processing they need to be explicitly transferred into another dataset or outside it.
dataset delete all
dataset move $dhandle {}; dataset delete $dhandle
The first example destroys all datasets defined in the current script and everything contained in them. The second example shows how to delete a dataset and preserve its contents by moving all dataset elements out prior to deletion.
dataset dget dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset dget
is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present,
dataset get
and
dataset dget
are equivalent.
dataset dup dhandle ?targethandle? ?cleartarget?
If the optional arguments are not supplied, the dataset with all data attached to dataset and all objects which are contained in it are duplicated. The command returns a new dataset handle. All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as
dataset list $dhandle
.
It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle.
dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0
dataset ens dhandle ?filterset? ?filtermode? ?recursive?
Return a list of all the ensembles in the dataset. Other objects (reactions, tables, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the optional boolean recursive argument is set, ensembles which are a component of a reaction in the dataset are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and ensembles in these, as well as ensembles in reactions in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended to the result list in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, ensembles which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.
set elist [dataset ens $dhandle astereogenic]
lists those ensembles in the dataset which have one or more atoms which are potential atom stereo centers.
set cnt [dataset ens $dhandle {} count 1]
returns a count of all ensembles which are either directly members of the dataset, or indirectly as component objects of reactions in the dataset, or which are contained in datasets which are a themselves a member of the primary dataset.
dataset exists dhandle
Check whether this dataset exists. The command returns a boolean value. This command cannot be used with transient datasets.
dataset exists $dhandle
dataset expr dhandle expression
Compute a standard SQL -style property expression for the dataset. This is explained in detail in the chapter on property expressions.
dataset extract dhandle propertylist ?filterset? ?filterprocs?
This command is rather complex and closely related to the
dataset xlabel
command. It was designed for the efficient extraction of major or minor object data for filtered subsets of the dataset.
The property list parameter selects the property data which is extracted. Multiple properties may be specified, but they can only be associated with major objects and one arbitrary minor object class. So it is possible to simultaneously extract an ensemble and an atom property, but not an atom and a bond property.
The return value is a nested list of data items for every object which is encountered while traversing the dataset on the level of the minor object associated with the extraction property, or just ensembles or other major objects if no such property is selected. Every list element is itself a list which contains the extracted property values in the order they are named in the property list parameter.
The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.
The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.
Because this command is primarily intended for numerical data display, the returned values are formatted as with the nget command, i.e. instead of enumerated values the underlying numerical values are returned.
set dhandle [dataset create [ens create CO] [ens create CN]]
dataset extract $dhandle [list E_NAME A_SYMBOL] !hydrogen
This example first creates a dataset with methanol and methylamine . The second line performs the actual extraction and returns
{CH4O C} {CH4O O} {CH5N C} {CH5N N}
This kind of extracted data is useful for the display of filtered atomic (and other minor object’s) property values.
dataset forget dhandle ?objectclass?
This command is essentially the same as the
ens forget
(or
reaction forget
, etc)
command. It is applied to all objects in the dataset.
If the object class is dataset , all dataset-level property data is deleted.
dataset get dhandle propertylist ?filterset? ?parameterlist?
dataset get dhandle attribute
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
In addition to retrieving property data, it can also be used to query dataset attributes. The set of supported attributes is detailed in the paragraph on the
dataset set
command.
dataset get $dhandle {D_NAME D_SIZE}
yields the name and size of the dataset as a list. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.
dataset get $dhandle [list E_FORMULA E_WEIGHT]
gives the formula and molecular weight of all dataset ensembles. The result is delivered as a nested list. The first list are the formulas, the second list contains the weights.
Currently, it is not possible to use filters with this command (and the other retrieval command variants) which are not operating directly on the dataset object, but on objects lower in the hierarchy such as ensembles or atoms.
For the use of the optional property parameter list argument, refer to the documentation of the ens get command.
Variants of the
dataset get
command are
dataset new, dataset dget, dataset nget, dataset show, dataset sqldget, dataset sqlget, dataset sqlnew
and
dataset sqlshow
.
dataset getparam dhandle property ?key? ?default?
Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned. If a default value is set, that value is returned in case the key is not found.
If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in key/value format.
This command does not attempt to compute property data. If the specified property is not present, an error results.
dataset getparam $dhandle E_GIF format
returns the actual format of the image, which could be GIF , PNG , or various bitmap formats.
dataset hadd dhandle ?filterset? ?flags? ?changeset?
Add a standard set of hydrogens to all ensembles and reactions in the dataset. If the filterset parameter is specified, only those atoms which pass the filter set are processed.
Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:
Adding hydrogens with this command is less destructive to the property data set of the ensembles or reactions than adding them with individual
atom create/bond create
commands, because many properties are defined to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.
If the effects of the hydrogen addition step to the validity of the property data set should not be handled with this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying a list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.
The command returns the total number of hydrogens added to all ensembles and reactions in the dataset.
dataset hadd $dhandle
dataset hread dhandle ?datasethandle|enshandle? ?#recs|batch|all?
This command provides the same functionality as
dataset read
, but additionally adds a stand set of hydrogen atoms to the read duplicate objects.
The command arguments are explained in the section on
dataset read
.
dataset hstrip dhandle ?flags? ?changeset?
This command removes hydrogens from the dataset ensembles and reactions. By default, all hydrogen atoms in the dataset ensembles or reactions are removed.
The flags parameter can be used to make the operation more selective. It may be a list of the following flags:
If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but the default value is overridden if any flags are set!
If the changeset parameter is given, all property change events listed in the parameter are triggered.
Hydrogen stripping is not as disruptive to the ensemble or reaction data content as normal atom deletion. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that there are implicit hydrogens.
dataset hstrip $dhandle [list keeporiginal wedgetransfer]
dataset index dhandle
dataset index dhandle position
This command comes in two variants. The tree-word version is the generic command to check dataset memberships, which is the same for all objects which can be dataset members, while the second version retrieves object references from this dataset.
This first version gets the position of the dataset in the object list of its parent dataset. If the dataset is not part of a parent dataset, -1 is returned. This is the generic dataset membership test command variant.
This second command variant obtains the object handle of the object at the specified position in this dataset. Position counting begins with zero. If the index is outside the object position range, an empty string is returned. The special value end may be used to address the last object. The indexed object remains in the dataset.
Note that this
index
command is not equivalent to the standard
index
command on minor objects which is used to obtain the position of the minor object in the minor object list of the controlling major object. This kind of functionality is not needed for major objects, because they are not contained in any minor object list.
dataset index $dhandle end
dataset jointhreads ?all?
dataset jointhreads dhandle ?all?
dataset jointhreads dhandle threadid..
This is an alias for the
dataset
cancelthreads
command. Please refer to its documentation.
dataset list ?dhandle?
Without a handle argument, the command returns a list of the handles of all existing datasets.
If a dataset handle or transient dataset is passed as third argument, the command returns a list of all major objects in the dataset. This function is different from the behavior of the list subcommand for other major object classes, where the optional argument is a filter list.
dataset list
dataset list $dhandle
dataset lock filehandle propertylist/dataset/all ?compute?
Lock property data of the dataset handle, meaning that it is no longer subject to the standard data consistency manager control. The data consistency manager deletes specific property data if anything is done to the dataset handle which would invalidate the information. Property data remains locked until is it explicitly unlocked.
The property data to lock can be selected by providing a list of the following identifiers:
A lock can be released by a
dataset unlock
command.
This command does not recurse into the objects contained in the dataset.
The return value is the dataset handle or, if the dataset was transient, an empty string.
dataset loop dhandle objvar ?maxrec? ?offset? body
Loop over the elements in a dataset. This command is similar to
molfile loop
. On each iteration, the variable is set to the handle of the current member object, and then the body code is executed. The variable refers to the original dataset element, not a duplicate. This is different from
dataset read.
All operations on the current loop item are allowed, including deletion. However, the next object after the current item must not be deleted or moved, because it is needed for the iteration process.
If a maximum record count is set, the loop terminates after the specified number of iterations. If the maximum record argument is set to an empty string, a negative value, or all , the loop covers all dataset elements. This is also the default.
Within the loop, the standard
Tcl
break
and
continue
commands work as expected. If the body script generates an error, the loop is exited.
If no offset is specified, the loop starts at the first element. Within the loop body, the dataset attribute
record
is continuously updated to indicate the current loop position. Its value starts with one, like file records in the
molfile loop
command.
dataset loop $dh eh {
puts „[ens get $eh E_NAME] at position[ens index $eh]“
}
dataset max dhandle propertylist ?filterset?
Get the maximum value of one or more properties in from the elements in the dataset. The property argument may be any property attached to dataset members, or minor objects thereof. If the filterset argument is specified, the maximum value is searched only for objects which pass the filter set.
dataset max $dhandle E_WEIGHT
dataset max [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon
The first example finds the highest molecular weight in the dataset. The second example finds the largest (most positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.
dataset metadata dhandle property field ?value?
Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands
dataset setparam
and
dataset getparam
can be used for convenient manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.
array set gifparams [dataset metadata $dhandle D_GIF parameters]
dataset metadata $dhandle D_QUALITY comment “This value looks suspicious to me”
The first line retrieves the computation parameters of the property
D_GIF
as keyword/value pairs. These are read into the array variable
gifparams
, and may subsequently be accessed as
$gifparams(format)
,
$gifparams(height)
, etc. The second example shows how to attach a comment to a property value.
dataset min dhandle propertylist ?filterset?
Get the minimum value of one or more properties from the elements in the dataset. The property argument may be any property attached to dataset sub-elements, or minor objects thereof. If the filterset argument is specified, the minimum value is searched only for objects which pass the filter set.
dataset min $dhandle E_WEIGHT
dataset min [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon
The first example finds the smallest molecular weight in the dataset. The second example finds the smallest (most negative, or smallest positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.
dataset molfile dhandle ?filterset?
Return the handle of the molfile object associated with the dataset as backing page file. If no such file object exists, and empty string is returned.
set fh [dataset molfile $dh]
set fh [dataset get $dh pagefile]
dataset move dhandle datasethandle|remotehandle ?position?
Move, depending on the acceptance flags of the destination dataset, either the objects in the dataset or transient dataset into another local or remote dataset, or move the dataset itself. If the destination dataset handle is an empty string, the dataset objects are removed from the original dataset, but not moved into any other dataset. If the destination dataset accepts datasets as members, which is not the default (see the
accept
attribute in the section on
dataset set
) the dataset is directly moved as object. Otherwise, its contained objects are moved, under preservation of the object order from the source dataset, and the source dataset is emptied, but not deleted.
Optionally, a position in the new dataset for the first moved object may be specified. This parameter is either an index (beginning with 0), or end , which is the default. If the contents of a dataset are spliced into another at a specific position, objects after the first element of the source dataset follow as a block.
Another special position value is random . This value moves to the dataset, or dataset contents, to a random position in the target dataset. Use of this mode with remote datasets is currently not supported.
In case of a transient command dataset the original dataset memberships of the dataset objects are not restored when the command completes.
The return value of the command is the original parent dataset of the command dataset, as it existed before the move. Usually, it is an empty string.
A dataset cannot be moved into itself.
dataset move $dhandle $dhandle2 0
dataset move $dhandle {}
dataset move [ens list] [dataset create]
The first line moves all objects in the source dataset into the first (and following) positions in the destination dataset. The second example removes all elements from the dataset. This is often useful in order to avoid dataset member destruction with the
dataset delete
command. The final example shows how to move a set of ensembles (here: all ensembles currently defined in the application) into a newly created dataset via an intermediate, transient dataset.
dataset move $dhandle vioxx@server55:10001
This command moves all objects in the first dataset to the remote dataset on host server55 , which listens on port 10001 and requires the pass phrase vioxx for access.
dataset mutex dhandle mode
Manipulate the object mutex. During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing. This command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:
There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.
dataset need dhandle propertylist ?mode?
Standard command for the computation of property data, without immediate retrieval of results. This command is explained in more detail in the section about retrieving property data.
If the dataset is not transient, the return value is the dataset handle.
dataset need $dhandle D_GIF recalc
dataset networks dhandle ?filterset? ?filtermode? ?recursive?
Return a list of all the networks in the dataset. Other objects (ensembles, reactions, datasets, tables) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, networks in these nested datasets are also listed.
set n [dataset networks $dhandle {} count]
dataset new dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset new
is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.
dataset nget dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset nget
is that the latter always returns numeric data, even if symbolic names for the values are available.
dataset nitrostyle dhandle style
Change the internal encoding of nitro groups and similar functional groups in the ensembles and reactions in the dataset. Possible values for the style parameter are:
dataset objects dhandle ?pattern?
This is a non-standard cross-referencing command. The result is a list of all the objects in the dataset, where each result list element is a list consisting of the object type (ens, reaction, table, network, dataset), and the object handle. Optionally, the list objects may be filtered by the filters in the filterset argument.
dataset objects $dhandle ens*
dataset ens $dhandle
except that the latter only lists the ensemble handles, not pairs of object class name and handle.
dataset pack dhandle ?maxsize? ?requestlist? ?suppresslist?
Pack the dataset and all objects it contains into a base-64 encoded, compressed string as a serialized object. The string does not contain any non-printing characters, quotation marks or other problematic characters and is thus well suited for storage in database tables and similar applications. These packed strings are portable and platform-independent.
By default, all property data on the dataset and its member objects are stored. By providing a request list of properties which are computed if they are not yet present, and/or a list of properties not to store, the data content may be customized.
The maxsize parameter can be used to limit the maximum length of the packed string by setting a maximum length in bytes. The default value are 128K bytes. If the string would be longer, an error is generated.
The return value of this command is the packed string.
dataset pack $dhandle
dataset pop dhandle|remotehandle ?position? ?timeout?
Remove an object from a dataset. The handle of the selected object is returned, and the object is no longer a member of the dataset when the command completes. If a timeout is specified, it is transferred to the dataset attribute of the same name before the command is executed, as with a
dataset set
command.
By default the first object in the dataset, at index zero, is returned. A different object can be selected by means of the optional position argument. It can be a numerical index, or end for the last object. If the object index if larger than the maximum index of any object, it is silently rewritten to end .
This command works with remote datasets. In that case, the object is transferred via an intermediate serialized object representation over the network. It is unpacked on the local interpreter, and deleted on the remote interpreter.
If the desired dataset object cannot be found, and a timeout is set, including a negative value for an unlimited wait time, the command suspends execution until the object appears in the dataset, for example from a different script thread or as result of a remote object insertion. If a wait would be executed, but the eod/targeteod parameter pair of the dataset indicate that no further data can be expected, the command returns an empty string instead of the object handle, but does not trigger an error. Otherwise, if the object cannot be delivered immediately or after the timeout, an error results.
set eh [dataset pop $eh end]
dataset properties dhandle ?pattern? ?intersect/union?
Get a list of valid properties of the dataset proper and the dataset objects. By default, both dataset properties (prefix D_ ) as well as the properties of the objects in the dataset (prefix E_ for ensembles, X_ for reactions, T_ for tables, N_ for networks, D_ for datasets as members) and the properties of their minor objects (atoms, bonds, etc.) are listed. Property subsets may be selected by specifying a string filter pattern. In case of dataset element properties which are not present in all dataset members, the default intersect mode is union, meaning that all properties are reported for which at least a single instance in any member exists. The alternative mode intersect only lists those dataset element properties which are present at all dataset members.
This command may also be invoked as
dataset props
.
dataset properties $dhandle D_*
dataset props $dhandle E_* intersect
The first example returns a list of the currently valid dataset-level properties. The second example lists ensemble properties which are present in all dataset objects.
dataset purge dhandle propertylist ?emptyonly?
Delete property data from the dataset. The properties may be both dataset properties (prefix D_ ) or properties of the dataset members, such as ensemble or atom properties. If a property marked for deletion is not present on an object, it is silently ignored.
Besides normal property names, a few convenient alias names for common property deletion tasks of ensembles in a dataset, or the reaction ensembles of reactions in the dataset, are defined and can be used as a replacement for the property list. These include:
The optional boolean flag emptyonly restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.
dataset purge $dhandle D_GIF
dataset purge [ens list] E_IDENT 1
dataset purge $dhandle stereochemistry
The first example deletes the property data D_GIF for the selected dataset if it is present. The second example deletes property E_IDENT from all ensembles in the current application if their property value is equal to the default value of E_IDENT . The third examples removes stereochemistry from all dataset ensembles.
dataset reactions dhandle ?filterset? ?filtermode? ?recursive?
Return a list of all the reactions in the dataset. Other objects (ensembles, tables. datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the optional boolean recursive argument is set, reactions of which ensembles in the dataset are a component are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and reactions in these, as well as reactions as components of ensembles in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, reactions which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.
set xlist [dataset reactions $dhandle]
Return a list of the handles of the reactions in the dataset.
set cnt [dataset reactions $dhandle {} count 1]
returns a count of all reactions which are either directly members of the dataset, or indirectly because ensembles in the dataset are part of a reaction, or which are contained in datasets which are a themselves a member of the primary dataset.
dataset read dhandle ?datasethandle/enshandle? ?#recs|batch|all?
This command returns duplicates of one or more objects from the current dataset iterator position (
record
attribute). Its arguments mimic those of the
molfile read
command. The iterator record attribute is automatically incremented. When the end of the dataset is reached, an empty result is returned, but no error is raised.
The return value is usually the handle of the object duplicated from the dataset member at the current read position. If an optional target dataset has been specified. the object is appended to that dataset, and the return value is the target dataset handle. It is also possible to use the magic dataset handles new or #auto , which create a new receptor dataset.
If instead of a target dataset am existing target ensemble is specified, the recipient ensemble is cleared, and the read dataset object placed into its hull without changing its handle. This requires that the read object is an ensemble, and not a reaction, table, dataset or network, and that only a single item is read. It is also possible to use an empty argument to skip these options.
By default, a single object is duplicated and the iterator record attribute of the dataset incremented by one. With the optional third argument, a different number of objects can be selected for reading as a block. The special value all reads all remaining objects, and batch copies a number of objects corresponding to the batchsize dataset attribute. If there are insufficient objects in the dataset to read all requested records, only the available set is returned, and no error results.
The dataset contents are not changed by this command. All extracted items are object duplicates. In order to fetch original objects from the dataset, use the
dataset pop
command, or the various object
move
commands.
The command variant
dataset hread
provides the same functionality as this command, but additionally adds a standard set of hydrogen atoms to the duplicates.
dataset rename dhandle srcproperty dstproperty
This is a variant of the
dataset assign
command. Please refer the command description in that paragraph.
dataset request dhandle propertylist ?reload? ?modelist?
Request property data for a dataset when the dataset is not maintained locally, but a partial shadow copy of a remotely managed dataset. It is assumed to have been only partially transferred via RPC to a slave from a master controller application, for example for display purposes, but without the full data content, which resides on the master.
If the requested property data is already present on the slave, and the
reload
flag is not set, this command is equivalent to a
dataset need
command and does not invoke communication with the master. Otherwise, the master is asked to provide the information, which may be calculated on the master only after receiving the request, or even delegated by the master to another remote server for computation.
Once the requested data has been received by the slave, it is added to the property data set of the local dataset copy. The optional
modelist
parameter is the same as in the
dataset need
command. This command is used to guarantee that critical or non-computable property data is obtained from the master. Local, unsynchronized data may still be computed by the slave using standard property data access commands. It is currently not possible to send data back to the master.
This command is only available on toolkit versions which have been compiled with RPC support.
In the absence of errors, the command returns a boolean status code. If it is zero, the request failed in a non-critical way. This for example happens in case the dataset is not under control of a remote application.
if {![dataset request $dhandle A_XY]} {
dataset need $dhandle A_XY
}
is a bullet proof method of guaranteeing that correct atomic 2D display coordinates are present for the dataset structures even if the script is run in a master/slave context.
dataset rewind dhandle
Reset the dataset iterator record. This is equivalent to setting the record attribute to one.
dataset scan dhandle expression ?mode? ?parameters?
Perform a query on the dataset or transient dataset. The syntax of the query expression is the same as that of the
molfile scan
command and explained in more detail in its section on query expressions. Essentially, this command behaves like an in-memory data file version of the
molfile scan
command. However, currently queries work on ensembles and reactions as dataset members only. Any table, network or other object which is a member of a scanned dataset is skipped. Skipped items still count as records for positioning and query result output. In the absence of a specified scan record list (order parameter), dataset scans begin at the current position of the iterator record attribute that is shared with the
dataset read/hread
commands.
The optional parameter dictionary is the same as for
molfile scan
, but not all parameters are actually used. At this time, only the
matchcallback, maxhits, maxscan, order, progresscallback, progresscallbackfrequency, sscheckcallback, startposition
and
target
parameters have an effect. If result ensembles or reactions are transferred to a remote dataset via the
target
parameter, they are not deleted from the local dataset but duplicates are created instead. This is because the original objects are members of the dataset which, just like a structure file would, should remain unchanged as result of a scan. In contrast, in file scans, the transferred ensembles and reactions were read from file and created as new objects during the scan, and sending these does not change the underlying file. In case a progress callback function is used, the dataset handle is passed as argument in place of the
molfile
handle in
molfile scan
.
The return value depends on the mode. The default mode is enslist . The following modes are supported for dataset queries:
In this mode, the scan command returns a list of the names of the created arrays. For each name, a global Tcl array variable is created, and for each scan match, an Tcl array element with an element name equal to the value of the first item specification index and an element value equal to the value of the third item specification is created. For example, the specification
{array {E_NAME name2rec} {record rec2name E_NAME}}
results in the creation of two global Tcl arrays in the current interpreter, called name2rec and rec2name . The first has elements where the element name is the name of the matching structure (property E_NAME ), and the value the file record (the default, because the optional third specification parameter was omitted). The second array has elements where the record number is the array element name, and the corresponding value the structure name. The return value of the Tcl statement is the list “name2rec rec2name” , the names of the two variables created.
If array elements for a specific key already exist, the new value is appended as a list object. The result registration procedure does not overwrite the existing content. So, for example in above case, if there are multiple records with the same structure name, the array element indexed by name would contain a list or records, not just a single record item. Since global arrays are persistent, data is also appended over multiple scan statements. If this is not desired. a statement like
unset -nocomplain $arrayname
should be executed before the scan is started. It is legal to use the same array name for the registration of multiple properties. In this case, each match appends a new list element for every reported property, though these lists will not be nested.
{table {E_NAME name} {E_CAS casno} record}
sets up a table with three columns called name , casno and record . The first two columns contain property data from the matching file records, the last one the record in the file which matched.
Instead of the keyword table , an existing table handle may also be used. In that case, any existing matching table columns are automatically re-used to store result data. Additionally specified properties are added as new columns to the right of the previously existing columns. New table rows generated by matches are appended to the bottom of the table.
If requested property data is not present on the matched dataset objects, an attempt is made to compute it. If this fails, the table object in retrieval mode table contains
NULL
cells, and property retrieval as list data produces empty list elements, but no errors. For minor object properties, the property list retrieval modes produce lists of all object property values instead of a single value. In
table
mode, only the data for the first object is retrieved, which makes this mode less suitable for direct minor object property retrieval.
The following pseudo properties can be retrieved in addition to normal properties:
molfile scan
. It is always an empty string in this command.
match ss
command).These pseudo properties are identical to those available for structure file queries. However, structure file queries support a couple of additional pseudo properties which are not available for dataset queries.
dataset scan $dhandle {E_WEIGHT < 200} recordlist
dataset scan $dhandle “structure >= c1ccccc1” {table E_NAME E_LOPG record}
dataset scan $dhandle “structure >~ $sshnd 90” {cmpvalue E_REACTION_ROLE X_IDENT}
The first example returns the record numbers (dataset member indices plus one) of all structures in the dataset which have a molecular weight of less than 200.
The seconds example generates a table with columns for name, logP and record number. The table is filled with data from all structures which contain a phenyl ring as substructure.
The final example returns a nested list of the properties of all dataset structures which have a Tanimoto similarity of 90% or more to the structure which is represented by its handle stored in the variable
$sshnd
. In this example, the ensembles are expected to be also part of a reaction, which is possible since reaction and dataset membership are completely unrelated. Each result list element contains the actual similarity value (which is the only comparison result value with a threshold evaluated in the query, so there is no ambiguity which comparison result
cmpvalue
refers to), the role of the ensemble in the reaction (
reagent
,
product
,
catalyst
, etc.) from property
E_REACTION_ROLE
, and the reaction ID in
X_IDENT
. The scan mode is here automatically set to
propertylist
, because the mode list consists exclusively of names of properties and pseudo properties.
set is_chno [dataset scan $ehandle {formula = C0-H0-N0-O0-} count]
This command checks whether the ensemble (which is, for the duration of the command, embedded into a transient dataset) contains only elements C, H, N and O.
dataset set dhandle property value ?property value?...
Standard data manipulation command. It is explained in more detail in the section about setting property data.
In addition to property data, the dataset object possesses a few attributes, which can be retrieved with the
get
command (but not its related sister subcommands like
dget
,
sqlget
, etc.). Many of them are also modifiable via
dataset set.
These attributes are:
ens move
,
dataset add
, etc.) throws an error. If the object added to a dataset is a dataset, but the dataset does not accept datasets as members, the objects contained in the source dataset are added instead.
dataset read
command. The default batch size is 10.
dataset wait
command, or the use of the dataset object as argument to a scripted computation function expecting to be able to set function result data as property values, the dataset is marked as undeletable and any destruction command will silently fail.
dataset pop
and
dataset wait
commands to determine whether they should continue to wait or exit with an empty result. The initial value of this attribute is zero.
dataset wait
command uses this threshold as default command parameter.
Additional insertion control modes are disabled (all insertions into the dataset are blocked), discardfirst (if the maximum size has been reached, delete first object in dataset to make room), discardlast (if the maximum size has been reached, delete last object in dataset to make room), discardobject (if the maximum size has been reached, delete the object to be inserted), discardalways (never attempt an actual insertion, always delete the insertion object), ignore (if insertion cannot be performed, leave the insertion object where it currently is, with preservation of current dataset membership) and unlink (silently remove the insertion object from its old dataset, if it is a member of one, but do not insert it into the target dataset if that would exceed its maximum size).
If the object cannot be inserted and is deleted (but not if it is just unlinked or ignored, and thus continuing to exist) the dataset counter is still incremented.
The final mode is discardrandom . In this mode, if the maximum size of the dataset has not yet been reached, the object is simply added. Otherwise, a random number between one and the counter attribute of the dataset is computed. If the number is larger than the maximum dataset size, the object to be inserted is deleted, as in the discardnew mode. If the random number is between one and the dataset size, the object in the dataset at the random position is deleted. After that, the new object inserted at its designated position, which is not necessarily the position of the removed object. This mode is intended to support convenient sampling of object subsets. The random procedure yields the same mathematical results as directly picking random objects from the total object pool passing through the dataset, but may be interrupted at any time yielding a random subset of the objects processed so far.
dataset dataset
command), which can be changed (see
dataset move
command and the
accept
dataset attribute). This attribute is read-only. An embedded dataset object cannot be dissociated from its owner.
On setting,
dataset set
first clears all dataset object selections. The command
dataset append
retains it. The argument is then parsed as a list of integer object indices, and the selection flag is set for all those indices where objects can be found in the dataset. Indices outside the range between zero and the dataset size minus one or duplicate index specifications are silently ignored.
To check or set the selection status of the dataset object proper, use the selected attribute.
dataset count
command without any filters.
ens swapout
or
reaction swapout
. The size check is performed at the moment new objects are added, and these new objects are the first to be swapped. The default value for this attribute can be set in the control array element::
cactvs(dataset_swap_threshold
). Its initial value is 10000. The default value for the embedded datasets in tables is controlled separately by ::
cactvs(table_swap_threshold
), which is also initially set to 10000. If this value is set to a negative value, all dataset elements which are currently swapped out are loaded back in. If it is set to a positive value, and the number of not currently swapped out objects of the dataset is more than the new limit, excess objects are swapped beginning from the end of the dataset queue until the in-memory object count of the dataset satisfies the new constraint. If the limit is increased, but not set to a negative unlimited value, the object swap status is not modified.
dataset addthread
command). Datasets without threads return an empty list. The handles are compatible with the standard
Tcl
thread package. Remote communication listener threads (see port attribute) are independent of
Tcl
support, do not have a
Tcl
handle, and are not listed by this command.
dataset wait
command. A negative value means an infinite wait period, and zero no wait period. The default setting is minus one.dataset set $dhandle D_NAME “New lead structures”
dataset set $dhandle E_NAME “Lead (metal)”
The first line is a simple set operation for a dataset property. The second line shows how to set properties of multiple ensembles in one step. The same property value is assigned to all ensembles.
dataset set $dhandle port 10001 passphrase blockbuster
Set up a listener thread on port 10001 which accepts connections from remote interpreters which need to present the pass phrase as credential. Remote interpreters can add (
ens move
,
reaction move
,
table move
) or remove (
dataset pop
) objects to or from this dataset, as well as query the dataset object count (
dataset count
). Objects are transferred over the network connection as serialized objects to and from the remote interpreters.
dataset setparam dhandle property key value ?key value?...
Set or update a property computation parameter in the parameter list of a valid property. This command is described in the section about retrieving property data.
dataset setparam $dhandle D_GIF comment “Top Secret”
dataset show dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset show
is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present,
dataset get
and
dataset show
are equivalent.
dataset sort dhandle {property ?direction? ?cmpflags?}..
Sort a dataset according to property values of the objects in the dataset. If no sort property set is specified, the default sort properties are E_NATOMS (number of atoms) and, for breaking ties, E_WEIGHT (molecular weight) and finally E_HASHISY (stereo isotope hash code).
Every sort item is interpreted as a nested list and can have from one to three elements. The first, mandatory element is the sort property, or one of the magic names
record
or
random
. The next optional element is the sort direction, specified as
up
(or
ascending
) or
down
(
descending
). The default sorting order is ascending. The final optional comparison flags parameter can be set to a combination of any of the values allowed with the
prop compare
command. The default is an empty flag set. Properties in the sort list have precedence in the order they are specified in. Object property values of comparison list entries to the right in this list are only considered if the comparison of all data values of list elements to the left results in a tie.
The magic property name record sorts by the object index in the dataset. Sorting upwards on this property does not change the object sequence in the dataset, and sorting downwards reverses it. This pseudo property is always added as a final implicit criterion, so that the sequence order of objects tied in all explicit comparisons is preserved. The other magic property name random assigns a random value to all dataset objects and sorts on this value, yielding a random object sequence.
The command returns a list of the handles of the objects controlled by the dataset in the newly sorted order. Simultaneously, the objects are physically moved within the dataset, so the sort has a persistent effect. The same result list may later be obtained by a
dataset objects
command.
It is possible to sort transient datasets, but this makes sense only if the object list sequence returned as command result is captured and used later, because the sort effect is not persistent since there exists no permanent dataset object.
dataset sort $dhandle {E_NAME up {ignorecase lazy}]
The example sorts the dataset according to the compound name (property E_NAME , data type string) in alphabetic order, using a lazy (ignoring whitespace and punctuation) and case-insensitive comparison mode.
dataset sort $dhandle {E_NATOMS down} {E_NRINGS up}
Sort the dataset in such a way that the ensembles with the largest number of atoms, and among these those with the smallest number of rings, come first.
dataset sort $dhandle random
This command randomizes the object order in the dataset.
dataset sort $dhandle {*}$sortlist
This is the recommended construct when using a sort property list store in a
Tcl
variable as command argument. Older versions of the
dataset sort
command used a single sort argument parameter instead of a variable-size argument set.
dataset sqldget dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqldget
are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
script processing.
dataset sqlget dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset sqlget
is that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
script processing.
dataset sqlnew dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqlnew
are that the latter forces re-computation of the property data, and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
script processing.
dataset sqlshow dhandle propertylist ?filterset? ?parameterlist?
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqlshow
are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
script processing.
dataset statistics dhandle property
Get basic statistics on the property values of the objects in the dataset. The property can be a basic property or a property subfield, but its element data type needs to be cast-able to a simple numeric type. In addition, it must be directly attached to any of the objects which can be members of a dataset, e.g. an ensemble property, but not an atom property.
If the property data is not present on any of the objects, an attempt is made to compute it. In case that fails, or a dataset member object is not of a matching type, these objects are silently skipped.
The return value is a list containing, in this order, the number of objects in the dataset which were used for the statistics, the property value sum, the property value average and the property data standard deviation. The latter three values are floating point, regardless of the property data type. In case any of these values are not computable, for example because there were an insufficient number of objects, the reported value is zero.
The command verb can be abbreviated as stats .
lassign [dataset statistics $dh E_WEIGHT] n sum avg stddev]
dataset subcommands
Lists all subcommands of the dataset command. Note that this command does not require a dataset handle.
dataset tables dhandle ?filterset? ?filtermode? ?recursive?
Return a list of all the tables in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, tables in these nested datasets are also listed.
set n [dataset tables $dhandle {} count]
dataset taint dhandle propertylist/changeset ?purge?
Trigger a property data tainting event which acts on the dataset data, and all objects and their data contained in the dataset.
The parameters of this command are the same as for
ens taint
and explained there.
dataset taint $dhandle A_XYZ
All property data on the dataset and the dataset members is invalidated if it directly or indirectly depends on the 3D atomic coordinates.
dataset transform dhandle SMIRKSlist ?direction? ?reactionmode?
?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}? ?maxstructures? ?timeout? ?maxtransforms? ?niterations?
This command is complex, but very similar to the
ens transform
command. Please refer to that command for a full description of the command arguments. The major difference is that the start structure set is not a single ensemble, but rather the set of all ensembles in the dataset. Any dataset items which are not ensembles are ignored. The return value is, just as with the
ens transform
command, a list of result ensembles. These do not become part of the input dataset.
dataset transform [ens get $ehandle E_KEKULESET] $trafolist bidirectional \
multistep all {preservecharges checkaro setname}
This command first expands an ensemble object into a set of Kekulé structures. The property data type of the E_KEKULESET property is a dataset, so its handle is returned, and this dataset is then submitted for further transformation, which in this case involves manipulations of bonds in aromatic systems and thus is dependent on the Kekulé structures of the input ensembles.
The dataset variant of the transform command does not allow the use of marked or unmarked atom or bond specifications in the exclusion substructure list. Normal substructures are supported, and are applied to all start structures.
dataset unique dhandle {property ?direction? ?cmpflags?}..
This command removes duplicate objects from the dataset and destroys them. Object identity is determined by pair-wise comparison of one or more properties. If all these properties are identical for any two objects, one of them is deleted. If no properties are specified, the default is the single property E_HASHISY , the standard isotope- and stereo-aware ensemble hash code.
The command returns the ordered list of objects remaining in the dataset after deletion. The command is closely related to the
dataset sort
command, and the same restrictions on usable sort properties apply. Internally, the command performs a sort first, in order to avoid a quadratic growth of pair-wise comparisons. This has the side effect that the object order in the dataset is not preserved. Instead, the surviving objects are listed in ascending (by default) or descending (if the corresponding optional sort direction argument is set accordingly) values of the sort properties. The interpretation of the optional comparison flags and sort direction arguments, as well as the priority of the properties, and the special considerations when working on transient datasets, are the same as for the command
dataset sort
.
molfile read $fh $dh all
dataset unique $dh
This command first reads a complete file into a dataset, and then discard duplicates, using the default isotope- and stereo-aware structure hash code.
dataset unlock dhandle propertylist/dataset/all
Unlock property data for the dataset object, meaning that they are again under the control of the standard data consistency manager.
The property data to unlock can be selected by providing a list of the following identifiers:
Property data locks are obtained by the
dataset lock
command.
This command does not recurse into the objects contained in the dataset.
The return value is the dataset handle, or, if the argument was a transient dataset, an empty string.
dataset unpack string
Generate a dataset complete with all elements it contains from a packed, base64-encoded serialized object string, as it is generated by the complementary dataset pack command.
The return value is the handle of the new dataset.. All objects in this dataset also are assigned standard handles, which can be retrieved with the usual commands such as
dataset ens
and
dataset reactions
.
Note that this command does not take a dataset handle as argument, but a pack string.
dataset unpack [dataset pack $dhandle]
This example is effectively the same as a
dataset dup
operation, but of course less efficient, because the objects have to be serialized, compressed, and base64-encoded and the same sequence of operations run backward again.
dataset valid dhandle propertylist
Returns a list of boolean values indicating whether values for the named properties are currently set for the dataset. No attempt at computation is made.
dataset valid $dhandle D_NAME
reports whether the dataset is named (has a valid D_NAME property) or not.
dataset wait dhandle ?size|query? ?script?
Suspend the interpreter until the number of objects in the dataset has reached a threshold, or an object which satisfies a query expression can be found. The syntax of query expressions is the same as in the
dataset scan
command. If no explicit size or a query expression is specified, or an empty string is passed as this parameter, the command uses the value of the
highwatermark
dataset attribute as default value for an implicit size threshold condition.
Another dataset attribute which has an influence on the execution of the command is the timeout attribute. If the dataset size has not grown to the required size, or no object which satisfies the query expression was added to the dataset after waiting for the timeout number of seconds, an error is raised. By default, the maximum wait period is indefinite, which corresponds to a negative timeout value. If the timeout value is set to zero, the wait condition must be met immediately, or an error results. However, no error is raised if the eod/targeteod dataset parameter pair indicates that no more data can be expected to be added in the dataset. In that case, the result is an empty string.
If no script body parameter is used, the return value of the command is the number of objects the dataset holds in case of an explicit or implicit size condition, or the handle of the first matching object in case of a query expression.
If the object count already exceeds the threshold, or a matching object can be found at the moment the command is executed, the command returns immediately.
In the presence of a script parameter, the script body is executed whenever the wait condition is met. If the script is ended with a continue statement, or simply reaches the end of the code block, the wait loop is automatically restarted. If the script reports an error, or is left via a break or return statement, the loop is terminated.
This command is mostly useful when running multi-threaded scripts, or when the dataset is operating a remote command listener on a port. Under these circumstances, new objects may arrive in the dataset without participation of the local, stopped interpreter.
While a
dataset wait
command is pending, the dataset cannot be deleted. Since it is possible that other threads or remote action port monitors further update the dataset between the time the wait condition is met and script processing commences, action scripts should be prepared to see more or less items in the dataset than immediately after the trigger event.
loop n 1 $nrecs {
set eh [dataset wait $dh “E_FILE(startrec) = $n”]
molfile write $fh $eh
ens delete $eh
}
This is a part of a simple write thread which writes back processed ensembles in the same order as they were read from an input file. In case there are multiple processing threads, it is likely to happen that the computation on an ensemble read from a larger input file record finishes before another with a smaller record number and thus the sequence of the ensembles to be written as delivered in the output queue becomes out of sync. By waiting for ensembles in the input record sequence the original order is preserved. More robust versions of such a script should handle the case of ensembles from a specific input record never appearing in the dataset and similar sources of disruption.
dataset weed dhandle keywords
This command performs standard clean-up operations on all ensembles and reactions in the dataset. The supported operations are described in more detail in the section on the equivalent
ens weed
command.
dataset xlabel dhandle propertylist ?filterset? ?filterprocs?
This command is rather complex and closely related to the
dataset extract
command. Its purpose is to extract handle and label information for selected subsets of the dataset. The return value is a nested list. The sublists consist of the object handle, the object label (if the object does not have a label, 1 is substituted), and the dataset object index. The dataset object index starts with zero.
The selection of the class of objects which are extracted is performed indirectly via the property list. For practical purposes, this list should be a single property. Its object association type determines the class of objects selected. For example, A_LABEL or A_SYMBOL returns atom labels, while B_ORDER returns bond labels and E_NAME select complete ensembles (with 1 as pseudo ensemble label).
The returned objects can further be filtered by a standard filter set, and additionally by a list of callback procedures. These Tcl script procedures are called with the respective object handles and object labels as arguments. For example, a callback procedure used in an atom retrieval context would be called for each atom with its ensemble handle and the atom label as arguments. If objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.
The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.
This command is primarily useful for the display of filtered minor object data from datasets, such as atom property values for specific types of atoms.
set dhandle [dataset create [ens create O] [ens create C=C]]
dataset xlabel $dhandle A_LABEL !hydrogen
dataset xlabel $dhandle B_ORDER doublebond
First, a dataset with two ensembles (water and ethene) is created. This dataset is then queried. The first query is for all atoms in it which are not hydrogen. The returned list is
{ens0 1 0} {ens1 1 1} {ens1 2 1}
In object ens0 , which is the first object in the dataset, atom 1 passes the filter. In object ens1 , which is the second object in the dataset, atoms with label 1 and 2 pass. The second query asks for the labels of double bonds in the dataset. The use of property B_ORDER is arbitrary - any other bond property would do as well. The return value of this command is
{ens1 1 1}
which indicates that only the bond with label 1 in object ens1 , which is the second object in the dataset, fulfills this condition.