Substructure Match Commands

Substructure and reaction substructure matching is a complex functionality. While a limited number of object commands are supplied ( ens match , mol match , etc.), comprehensive match functionality is accessible via special commands. The match command implements various structure matching commands.

The match command

The match command matches substructures, maximum common substructures, and reactions. Its syntax scheme is

match ss ?-option value?... ss_spec st_spec ?atommatchvar? ?bondmatchvar? 	?molmatchvar?
match mcss -option value?... ss_spec st_spec ?atommatchvar? ?bondmatchvar? 	?molmatchvar?
match rss -option value?... rxnss_spec rxn_spec ?reagent_atommatchvar? 	?reagent_bondmatchvar? ?reagent_molmatchvar? ?product_atommatchvar?	?product_bondmatchvar? ?product_molmatchvar?
match(class=’ss’/’mcss’/’rss’,substructure=,structure=,?align=?,	?allowmissingstereo=?,	?anchor=?,?atomenvproperty=?,?atomhighlight=?,?atomlistcomparison=?,	?atommapproperty=?,?atommatchcount=?,?bondenvproperty=?,	?bondhighlight=?,?bondmapproperty=?,?burnmode=?,?chain=?,	?charge=?,?clearatomhighlight=?,?clearbondhighlight=?,?cmpflags=?,	?cmpflags2=?,?command=?,?creategroup=?,?daylightaromaticity=?,	?excludeenvironment?,?excludeflags_ss=?,?excludeflags_st=?,?excludeh=?,	?excludelabels_ss=?,?excludelabels_st?=,?excludelabels_st_root=?,	?excludestructures=?,?exclude_ss_h=?,?exclude_st_h=?,	?fixedframework=?,?forceringmatch=?,?formula=?,?fuzz=?,	?generalize=?,?heavyatoms=?,?hinstances_ss=?,?implicit_is_singlearo=?,	?includeflags_ss=?,?includeflags_st=?,	?includelabels_ss=?,?includelabels_st=?,	?isotope=?,?kekule=?,?limit=?,?mapanchor=?,	?maxopenlinks=?,?mode=?,?multihighlight=?,?noaliphaticonaro=?,	?noarofg=?,	?nochainonaro=?,?nochainonring=?,?nodoubleonaro=?,?noheterofg=?,	?nomultibondfg=?,?nosingleonaro=?,?nosuperatommultilink=?,?nosuperatomonh?,	?omitrecursion=?,?openhcount=?,?overlap=?,?pionaro=?,?queryatomexpansion=?	?remembercomplexmatches=?,?restrictsubstitution=?,?rotateterminals=?,	?stereo=?,?strictexclusion=?,	?strictsmarts=?,?superatomexpansion=?,?superatommapproperty=?,?tautomers=?,	?terminal=?,?timeout=?,?transferstereo=?,?useatomtree=?,?useatomtype?,	?usebondorder=?,?usebondtree=?,?varbondglobal=?,?varbondlocal=?,?wedge=?,	?atommatchvariable=?,?bondmatchvariable=?,?molmatchvariable=?,	?product_atommatchvariable=?,?product_bondmatchvariable=?,	?product_molmatchvariable=?)

The first word defines the match class and also the type of the expected structure or reaction arguments. Match class ss performs a substructure match operation, class mcss a simple maximum common substructure match operation, and class rss a reaction substructure match. In the latter case, the substructure and structure arguments are expected to be reaction substructures (for example, SMIRKS ) and reaction handles or specifications (for example, reaction SMILES ) instead of simple structures.

Structure and substructure may both be independently specified in several different formats:

match ss $ss_handle $st_handle

For reaction substructure matching, the argument should be a reaction handle or reference.

match ss $ss_handle [list $st_handle 1]

This mode is not applicable to reaction substructure matching.

match ss c1ccccc1 $st_handle

For reaction substructure matching, the argument is expected to be reaction SMILES or SMIRKS string instead.

The return value of the command is the number of successful matches. For simple match modes, which return only a match/nomatch result, this is 0 or 1, but modes which can produce multiple matches may return higher counts.

The final three optional parameters are names of variables which receive atom, bond and molecule mapping information. If these parameters are not supplied, or a variable name is spelled as an empty string (or None for Python ), no variable is created or modified. If a variable is specified, but no match is found, the map variables are set to empty strings. For reaction matching, there are separate variable sets for the reagent and product side. In Python reaction matching the standard three variables apply to the reagent side, while the product side is addressed with an explicit argument name prefix i.e. , atomapvariable= vs. product_atommapvariable= . The variable named in parameter atommapvariable is used both for normal substructure and reagent-side reaction match information.

For match modes which can only return a single match, these map variables are simple nested lists. Each list element contains a substructure and a structure object label or reference, in this order. The number of elements in the result list corresponds to the number of substructure objects. Example:

match ss CN CCN amap bmap mmap

The variable amap is set to “{1 2} {2 3}” , which is the first match of the C-N substructure fragment on the ethylamine structure. The numbers are atom labels - in case of SMILES strings, atom labels are assigned in the order the atoms appear in the string. The bmap variable is set to “ {1 2}” , since only a single bond is involved. The first bond of the substructure matches the second bond of the structure. Finally, the mmap variable is set to “{1 1}” , because both substructure and structure contain only a single molecule, which was assigned the default label 1. The bond and mol map results are still nested lists, even if they appear in this simple example as plain lists

There is no guarantee that the lowest possible labels are use for a simple match - the match algorithm uses internal optimizations for choosing good start atoms for matches. Matches should not be expected to start with an atom with the lowest label. Match result variables are filled in the order of the objects in the internal object lists, which also is not necessarily an ascending label sequence.

These nested result lists can easily be transformed to an Tcl array with a statement like

array set array_amap [join $amap]

The array variable array_amap now contains elements which are named with the substructure labels, and have values which correspond to the structure labels. The unzip command is also useful to isolate substructure or atom label sets.

In case a match mode is invoked which can return more than one match, the map variables are constructed with an additional nesting level. They are a list, where each element describes one match. Each of these elements for a specific match is formatted as in above description of simple match results. Note that the actual number of reported matches does not influence the scheme - if there is a theoretical possibility that more than one match can be found, the maximum nesting level is 3, not 2, even if only a single match is finally found.

Example:

set nmatches [match -mode distinct CC CCC amap]

Here, the match count is 2 (the distinct mode reports matches which differ by at least one structure atom from any previous match - the all mode reports 4 matches, which include reversals of the CC fragment), and the amap variable is set to “{{1 1} {2 2}} {{1 2} {2 3}}” . The first match is substructure atom 1 on structure atom 1, and 2 on 2, the second match maps substructure atom 1 on structure atom 2, and substructure atom 2 on structure atom 3.

The match ss command has a large number of options, which can be used to fine-tune the matching process. In the Tcl version, any number of options, in any order, may be inserted before the substructure specification. In the Python version, they are specified as named arguments. Every argument after the structure must use the named argument style.

This is the list of options:

-align

-align none/rotate/redraw/xaxis/yaxis/diagonal/combined

If a match was successful, change the layout of the structure by modifying the A_XY atomic 2D coordinates property.

Mode none , which is the default, does not perform any 2D coordinate changes.

In modes xaxis , yaxis and diagonal , the coordinates of the matched structure atoms are extracted and the largest principal component/eigenvector of these computed. The structure is then rotated in such a way that this eigenvector is aligned to the x-axis, y-axis, or diagonal (lower left to upper right). No coordinates of the substructure atoms are used.

In rotate mode, the structure is rotated in steps of 15 degrees, with and without a flip. The orientation which is in best alignment with the coordinates of the matched substructure atoms is retained.

In redraw mode, the structure is completely redrawn, using the coordinates of the matching substructure atoms as starting point. The rest of the structure is drawn around it. The matched structure atoms possess the same coordinates as the matching substructure atoms.

There are some limitations in this mode, which are automatically enforced by setting the corresponding match control flags. First, it is not possible to match partial ringsystems. A substructure ring atom must match the same class of ring system, i.e. a substructure 6-membered ring fragment only matches a structure benzene or cyclohexane ring, but not naphthalene, adamantane, etc. This limitation is deeply rooted in the 2D layout generator, which treats ring systems different from the acyclic connections. Acyclic substructure atoms can only match acyclic structure atoms, with the exception that a terminal acyclic substructure atom may still match a structure ring atom.

The final mode besteffort combines the redraw and rotate modes - if a match in mode redraw fails, the match attempt is automatically repeated in mode rotate , which has relaxed match conditions with respect to ring system checks.

-allowmissingstereo

-allowmissingstereo none/atoms/bonds/both

If not set to none , the default, stereogenic atom or bond centers on the structure side may be matched by corresponding centers on the substructure with defined stereochemistry if they do not possess a non-zero stereo descriptor in A_LABEL_STEREO or B_LABEL_STEREO . If there is a structure-side stereo descriptor on the matched center, the normal stereo match process applies (i.e. absolute or relative stereo matching). This is a global option which applies to the complete substructure pattern. There are also atom- and bond-specific bits in A_QUERY and B_QUERY to control this feature on a local level in the pattern.

-anchor

-anchor nested_anchor_object_list

This option defines restrictions on which substructure and substructure atoms or bonds must match. The argument is a nested list where each outer list element is a list of two or three elements.

The first two of the inner list elements are either an object label, an empty string, or the word any . The latter two options are equivalent. The first sublist item identifies a substructure object, the second a structure object. If no third sublist argument is supplied, the object labels are interpreted as atoms. With the third optional argument, an explicit minor chemistry object classification such as atom or bond can be set, though currently only anchor atoms and bonds are supported.

If any or an empty identifier is used as sublist argument, it is read as a wildcard. If two object labels are paired, the two objects must map onto each other in all reported matches. If these objects are incompatible, no matches are found. If a wildcard is used, it means that the partner object must be in the match, but without the need to match any specific counter object. If fuzzy matching is used, this can make sense even on the structure side. The use of a pair of wildcards is not illegal, but has no filtering effect.

Example:

match ss -anchor {{1 2} {any 3}} $sshandle $ehandle

This sample line only returns a match where (in addition to meeting the query conditions of the substructure ensemble object) substructure atom 1 maps onto structure atom 2, and structure atom 3 is part of the match without requesting it to be mapped to any specific substructure atom.

match ss -anchor {{1 2 bond}} $sshandle $ehandle

This command only reports matches where substructure bond 1 is mapped to structure bond 2.

-atomenvproperty

-atomenvproperty 0/1

If set, fill property A_SSMATCH_ENVIRONMENT on match. This is an experimental feature.

-atomhighlight

-atomhighlight none/structure/substructure/both

If this flag is set, all matched atoms in the structure (modes structure or both , or numeric encodings 1 or 3) or substructure (modes substructure or both , or the equivalent numeric encodings 2 or 3) have the highlight flag set in property A_FLAGS . In case multiple matches are generated, the result depends on the - multihighlight option setting. By default, only the first match is highlighted, but highlighting the union of all found matches is also possible. This option does not reset existing atom highlight flags - see the - clearatomhighlight option for this functionality. By default this function is disabled (equivalent to mode none or 0).

-atomlistcomparison

-atomlistcomparison intersection/identity

Specify how the matching of atom list should be handled if they are encountered on both the structure and substructure sides. By default any common element in these lists is sufficient for a match (intersection mode), but in identity mode they must contain the same set of elements.

-atommapproperty

-atommapproperty none/structure/substructure/both

If this flag is set, for each match a new instance of property A_SSMATCH is attached to the structure (in modes structure or both , or numeric encodings 1 or 3) ensemble, or a new instance of property A_STMATCH to the substructure (in modes substructure or both , or the equivalent numeric encodings 2 or 3) ensemble - the first match is recorded in A_SSMATCH or A_STMATCH , the second in A_SSMATCH/2 or A_STMATCH/2 , and so on. If instances of this property are already set on the structure or substructure ensembles, the new instances start with the highest existing instance number plus one. Structure or substructure atoms which are not used in a match have their respective A_SSMATCH or A_STMATCH data set to 0. Matched structure atoms are marked with the atom label of the matching substructure atom, and matched substructure atoms with the atom label of the matching structure atom. By default, this flag is not active (equivalent to mode none or 0).

-atommatchcount

-atommatchcount none/structure/substructure/both

Specify whether the number of time an atom was matched in successful matches should be recorded in properties A_SSMATCH_COUNT (for the structure side) or A_STMATCH_COUNT (for the substructure side).

-bondenvproperty

-bondenvproperty 0/1

If set, fill property AB_SSMATCH_ENVIRONMENT on match. This is an experimental feature.

-bondhighlight

-bondhighlight none/structure/substructure/both

If this flag is set, all matched bonds in the structure (modes structure or both , or numeric encodings 1 or 3) or substructure (modes substructure or both , or the equivalent numeric encodings 2 or 3) have the highlight flag set in property B_FLAGS . In case multiple matches are generated, the result depends on the - multihighlight option setting. By default, only the first match is highlighted, but highlighting the union of all found matches is also possible. This option does not reset existing bond highlight flags - see the - clearbondhighlight option for this functionality. By default this function is disabled (equivalent to mode none or 0).

-bondmapproperty

-bondmapproperty none/structure/substructure/both

If this flag is set, for each match a new instance of property B_SSMATCH is attached to the structure (in modes structure or both , or numeric encodings 1 or 3) ensemble, or a new instance of property B_STMATCH to the substructure (in modes substructure or both , or the equivalent numeric encodings 2 or 3) ensemble - the first match is recorded in B_SSMATCH or B_STMATCH , the second in B_SSMATCH/2 or B_STMATCH/2 , and so on. If instances of this property are already set on the structure or substructure ensembles, the new instances start with the highest existing instance number plus one. Structure or substructure bonds which are not used in a match have their respective B_SSMATCH or B_STMATCH data set to 0. Matched structure bonds are marked with the bond label of the matching substructure bond, and matched substructure bonds with the bond label of the matching structure bond. By default, this flag is not active (equivalent to mode none or 0).

-burnmode

-burnmode modeflags

This flag has no effect in normal stand-alone substructure matches. Its settings are relevant in case prior substructure matches to exclude certain areas of the structure from matching are spliced into the processing before the actual substructure match. This mode defines how the exclusion match effects remaining area for the final match. Thius is a bit set and multiple flags can be combined. Possible values are atoms (mark all matched atoms are excluded), bonds (mark matched bonds), carbon (mark matched carbon atoms, but not the rest of the matched structure part), terminals (mark matched terminal atoms), ringsystems (mark all complete ringsystems where at least one atom was matched) and aroringsystems (mark the connected aromatic part of ringsystems where at least one ring atom was matched).

-chain

-chain 0/1

If set, this flag allows additional matches after the first match only if these matches are chained to a previous match, i.e. they do not overlap with any previous match, but a normal or complex bond exists between at least one structure atom of the new match and a structure atom of a previous match. In more complex cases, the results of this command variant can depend on the atom order. For example in case of a structure which contains a left part A and a right part AA linked by some construct, matching with substructure fragment A returns a single hit if the left part is matched first, but two fragment matches if the right part is matched first. However, within a single chain of building blocks in the structure it does not matter where the first match occurs - the chain fragment is recursively appended in all directions and ultimately cover all linked blocks. The chain does not need to be linear - rings or star topologies can be matched, too. Obviously, this option has no effect in match mode first , because specific results are only generated when more than a single match is sought.

-charge

-charge 0/1

This flag determines whether atomic formal charges on the substructure and substructure atoms are used for determining the possibility of an atom match. By default, formal charges are ignored. This option only affects the standard match attributes. Atom query expressions which explicitly refer to property A_FORMAL_CHARGE always use their comparison result to determine matches.

-clearatomhighlight

-clearatomhighlight 0/1

If this flag is set, all highlight bits in property A_FLAGS are reset on the structure (and possibly the substructure) ensemble before the first match is processed. By default, this flag is not set and any existing A_FLAGS highlight bit pattern remains unchanged. Because the reset is performed in the routine where the highlight bits are set, this option is effective only in combination with the - atomhighlight option. The decision whether to reset the flags on the structure or substructure side, or both sides, follows the setting of the - atomhighlight mode.

-clearbondhighlight

-clearbondhighlight 0/1

If this flag is set, all highlight bits in property B_FLAGS are reset on the structure (and possibly the substructure) ensemble before the first match is processed. By default, this flag is not set and any existing B_FLAGS highlight bit pattern remains unchanged. Because the reset is performed in the routine where the highlight bits are set, this option is effective only in combination with the - bondhighlight option. The decision whether to reset the flags on the structure or substructure side, or both sides, follows the setting of the - bondhighlight mode.

-cmpflags

-cmpflags flags

This option provides a direct access to the full set of flags which modify the substructure match process. The more common flags can be set or unset with specific options of this command for convenience. The default flag set is bondorder|useatomtree|usebondtree .

The flag set can either override the default flags (if specified as simple attribute list), added to them (if prefixed with a ’+’), removed, (if prefixed with a ’-’), or toggled (if prefixed with a ’^’).

These are generally useful flags recognized:

-cmpflags2

Another set of comparison flags. The default is an empty flag set. Recognized flags are>

-command

-command tcl_command/python_function

Define a Tcl or Python callback function which is called when a new match is found and all property-based constraints have been checked. This function is called with four parameters. The first two parameters are the handles or references of the substructure and structure ensembles. The third parameter is a nested list of label/reference pairs (substructure atom label/structure atom label or references) for all substructure atoms which are currently matched to a structure atom. The fourth parameter is a nested list of label/reference pairs (substructure bond label/structure bond label or references) for all substructure bonds which are matched to a structure bond. The format of these arguments is the same as that of the match variables of the match command for single-match modes. Within the callback functions, the match can be further evaluated in ways not possible by the standard match options.

If the function returns 0, any non-numeric value, or throws an error, the post-processing of completed matches, such as atom or bond highlighting, is not executed and the match discarded.

While the callback routine is free to perform any additional match analysis, it must neither delete the structure or substructure, nor change its connectivity (remove or add atoms and bonds), nor discard or invalidate any property data used in the matching process. The computation or setting of additional property data on the substructure or structure ensembles is allowed.

By default, or in case an empty string is passed as callback procedure name, no callback is executed.

Example:

proc my_match_check {ens_ss ens_st amap bmap} {
			puts $amap
			return 1
}
match ss -command my_match_check CC CC

This example outputs “{1 1} {2 2}”, which is the atom mapping of the match found.

-creategroup

-creategroup 0/1

If this flag is set, every match creates a new group minor object on the structure ensemble. The atoms in the group are all those structure atoms which were matched by the substructure. The group name (property G_NAME ) is set to the name of the substructure (property E_NAME ). By default, no groups are generated as side effects of a match.

-daylightaromaticity

-daylightaromaticity 0/1

If the flag is set, the use of Daylight aromaticity in the matching is enforced both on the structure and substructure side regardless of the global aromaticity system setting. For the substructure, this applies to implicitly defined aromaticity, for example the presence of a complete aromatic ring with all defined bond orders and elements, not explicit query attributes. The option can be shortened to - daylightaro .

-excludeenvironment

-excludeenvironment 0/1

If this flag is set and a recursive SMARTS expression is processed, all parts of the structure which are already matched are excluded from the recursive match check. By default, a new recursion level does not have any knowledge about previous matches and may match all atoms in the structure.

Example:

match ss -excludeenvironment 0 {C[$(OC)]} CO
match ss -excludeenvironment 1 {C[$(OC)]} CO

The first example does match, because the carbon of the recursive fragment may match on the same structure carbon as the first carbon atom in the substructure. In the second case, the structure carbon is marked as already matched, and there is no place to map the recursive fragment carbon, so no match is found.

-excludeflags_ss

-excludeflags_ss flag_value

This option allows the exclusion of substructure atoms from the match procedure which have at least one of potentially several bits set in the A_FLAGS property. The decoded flag values are used as a bit mask, and all structure atoms which have one or more bits of the mask set are hidden from further processing.

Example:

match ss -excludeflags_ss [list starred boxed] $ss_handle $st_handle

This example ignores all substructure atoms which have been marked with the starred or boxed attribute.

All substructure atom exclusion options can be combined, but not repeated, and are cumulative.

-excludeflags_st

-excludeflags_st flag_value

This option allows the exclusion of structure atoms from the match procedure which have at least one of potentially several bits set in the A_FLAGS property. The decoded flag values are used as a bit mask, and all structure atoms which have one or more bits of the mask set are hidden from further processing.

Example:

match ss -excludeflags_st [list starred boxed] $ss_handle $st_handle

This example ignores all structure atoms which have been marked with the starred or boxed attributes.

All structure atom exclusion options can be combined, but not repeated, and are cumulative.

-excludelabels_ss

-exclude_ss label_list
-excludelabels_ss label_list/aref_sequence

This option allows the exclusion of a set of substructure atoms from the match process. All atoms which are listed here are completely ignored by the match algorithm. By default, or when an empty list is passed, all substructure atoms of the ensemble or molecule (if the handle/molecule label specification was used) are used for matching.

Example:
match ss -exclude_ss [ens atoms $sshandle hydrogen] $sshandle $sthandle

This example does not use any hydrogens on the substructure for matching. This is more efficient and stripping and possibly re-attaching the hydrogen atoms from the substructure.

All substructure atom exclusion options can be combined, but not repeated, and are cumulative.

-excludelabels_st

-exclude_st label_list
-excludelabels_st label_list/aref_sequence

This option allows the exclusion of a set of structure atoms from the match process. All atoms which are listed here are completely ignored by the match algorithm. By default, or when an empty list is passed, all structure atoms of the ensemble or molecule (if the handle/molecule label specification was used) are available for matching.

All structure atom exclusion options can be combined, but not repeated, and are cumulative.

-excludelabels_st_root

-exclude_st_root label_list
-excludelabels_st_root label_list/aref_sequence

This set of structure atoms to be excluded is similar to the one specified with - exclude_st . The difference is that this exclusion only applies to the first level of matching. In deeper match levels, for example recursive SMARTS expressions, these atoms are no longer blocked.

All structure atom exclusion options can be combined, but not repeated, and are cumulative.

-excludestructures

-excludestructures ens_mol_list

Specify of set of exclusion fragments. These structure fragments are exhaustively matched as substructures on the structure, and all structure atoms and bonds they match are excluded from the actual match procedure invoked by this command. The exclusion fragment substructure match is always performed with the default mode settings - options like - bondorder or - charge are only applied to the final match. The exclusion fragments may be specified in the same styles as the main substructure and structure, i.e. as an ensemble handle, a list of an ensemble handle and a molecule label, or as a SMILES/SMARTS string.

Example:

match ss {[OH]} CC(=O)O
match ss -excludestructures {C(=O)[OH]} {[OH]} CC(=O)O

The first example matches the hydroxyl group of the structure, which is acetic acid. In order to prevent of match of hydroxyl groups which are part of carboxylic acid groups, carboxylic acid groups can be ignored on the structure with a statement like in the second example. Of course, this example could be easily made more generic, such as hiding all groups which have the hydroxyl group attached to any non-carbon, or carbon with any other hetero atom neighbor, as in

match ss -excludestructures {[!C,C&x{2-}][OH]} {[OH]} $sthandle

All structure atom or fragment exclusion options can be combined, but not repeated, and are cumulative.

-exclude_ss_h

-exclude_ss_h 0/1

If this flag is set, all substructure hydrogen atoms are ignored in the match process. By default, all atoms in the substructure are used.

All substructure atom exclusion options can be combined, but not repeated, and are cumulative.

-exclude_st_h

-exclude_st_h 0/1

If this flag is set, all structure hydrogen atoms are ignored in the match process. By default, all atoms in the structure are used.

All structure atom exclusion options can be combined, but not repeated, and are cumulative.

-fixedframework

-fixedframework 0/1

If this flag is set, all carbons in the structure are prevented from possessing any unmatched hetero atom or carbon neighbors. Matched structure hetero atoms may be bonded to unmatched hetero atoms or carbon atoms. By default, the flag is not set. The acceptability of extra unmatched hydrogen, carbon, or hetero atom neighbors may be additionally controlled on the atomic level by setting the appropriate flags in property A _query( flags ) on the substructure.

Example:

match ss -fixedframework 1 CC CCO
match ss -fixedframework 1 CCO CCOC

The first example does not match, because in all possible match orientations there is one matched carbon with bonded to an unmatched hetero atom (the oxygen atom). The second example does match - the matched hetero atom may possess bonds to unmatched non-hydrogen atoms - the methyl group in this case.

This match option is useful for locating starting materials for synthesis in vendor catalogs.

-forceringmatch

-forceringmatch no/strict/relaxed/ringsystem

This option controls the matching of the substructure into structure ring systems. If the option is not specified, or set to no (or 0), the matching is only controlled by explicitly set atom and query attributes, such as the number of ring bonds, or membership of rings of specific size.

The option value strict allows the matching of substructure atoms or bonds which are members of rings only onto structure parts in ring systems of the same class, i.e. the same set of rings of a given size and arrangement, but without consideration of atoms, bond orders, aromaticity, etc. With this option, a phenyl substructure fragment no longer matches a naphthalene structure, and acyclic substructure atoms or bonds can only match acyclic structure parts. All other query attributes, such as bond order, element type, aromaticity, etc. are applied in addition to this constraint.

The relaxed mode has basically the same constraints, but with one small exception: A terminal substructure atom (an atom which has only a single bond, and thus cannot be a ring member) may match onto structure atoms in ring systems, if the normal query attributes allow this.The relaxed mode is automatically enforced if the - align option with value redraw is specified.

The ringsystem mode requires that any structure ringsystem is either completely matched, or not part of the match at all.

-formula

-formula formulaexpression

Require that the set of structure atoms matched by the substructure also matches a formula expression (see molfile scan ). The matched atom set is tested after full expansion. This means that multiple structure atoms which are matched by a single query atom (for example, an [ALK] alkyl superatom group) all count. However, structure atoms which are matched multiple times (for example, by overlapping fragments with a suitable overlap mode) are only tallied once. The default other element mode for the formula expression is exact , but that can be changed by the operator prefix (see again molfile scan ).

Example:

match ss -formula C1-2N1-2 {[C,N][C,N][C,N]} $ehstructure

This example will only allow substructure matches where the substructure matches 1-2 carbon and 1-2 nitrogen atoms but where the relative orientation of the matched atoms is arbitrary. It disallows a C3 or N3 match, which would be allowed by the SMARTS string.

-fuzz

-fuzz n

If this option is used with a value n larger than zero, fuzzy substructure matching is activated. In this mode, it is no longer required that all substructure atoms are mapped to structure atoms. Up to n atoms may fail. Within the A_query property, fields are provided which allow a more detailed specification whether a substructure atom may be in the fail set, and how much fuzz is allowed in its immediate neighborhood. The - anchor option is also useful to force the use of some critical substructure atoms in the found matches.

This match variant is computationally significantly more expensive than the standard match procedure, and can generate a large set of matches if a match mode which can generate more than one match is used.

Example:

match ss -fuzz 1 ClCCCl CCCl
match ss -fuzz 1 ClCCCl CCl
match ss -fuzz 1 ClCCCl ClCCl

The first example matches, since there is only a single unmatched substructure atom in the best mapping - one of the chlorine atoms- , but the second and third do not. The third example demonstrates that fail atoms are straightforwardly ignored, but their unmatched neighbors are not allowed to start new implicit fragments. The second chlorine atom in the substructure cannot match because it remains tethered to the main fragment, even if the excess carbon atom in the substructure is designated as the one allowed failure atom. Both example two and three will however match with a fuzz of 2.

-generalize

-generalize none:heteroatoms

In mode heteroatoms , all atoms except carbon and hydrogen are treated as a generic hetero atom type and match each other.

-heavyatoms

-heavyatoms any:all

If mode all is selected, all heavy atoms on the structure side must have been matched by substructure atoms for an overall hit.

-implicit_is_singlaro

-implicit_is_singlearo 0/1

If the flag is set, bonds which were not specified with an exact bond order (for example, in SMARTS ) are handled as „single or aromatic“ query bonds, and not with their effective bond order or aromaticity status after decoding.

-includeflags_ss

-includeflags_ss flag_value

This option allows the selection substructure atoms for the match procedure which have one of potentially several bits set in the A_FLAGS property. The decoded flag values are used as a bit mask, and only those structure atoms which have one or more bits of the mask set are selected for matching. By default, all substructure atoms are used for matching. If both an inclusion flag set and exclusion flag set (option - excludeflags_ss ) is specified, the inclusion list is processed first. From the remaining atoms, those which match the exclusion filter are removed.

-includeflags_st

-includeflags_st flag_value

This option allows the selection structure atoms for the match procedure which have one of potentially several bits set in the A_FLAGS property. The decoded flag values are used as a bit mask, and only those structure atoms which have one or more bits of the mask set are selected for matching. By default, all structure atoms are used for matching. If both an inclusion flag set and exclusion flag set (option - excludeflags_st ) is specified, the inclusion list is processed first. From the remaining atoms, those which match the exclusion filter are removed.

-includelabels_ss

-include_ss labellist
-includelabels_ss label_list/aref_sequence

Select substructure atoms for use in matching. By default, all substructure atoms are used. If both an inclusion list and an exclusion list (option - exclude_ss ) are specified, the inclusion list is processed first. From the remaining atoms, those which are also listed in the exclusion list are removed.

-includelabels_st

-include_st label_list
-includelabels_st label_list/aref_sequence

Select structure atoms for use in matching. By default, all structure atoms are used. If both an inclusion list and an exclusion list (option - exclude_st ) are specified, the inclusion list is processed first. From the remaining atoms, those which are also listed in the exclusion list are removed.

-isotope

-isotope 0/1

This flag determines whether isotopic labeling is used for matching. By default, isotope label matching is not performed. If this flag is set, substructures with an isotope label must map onto a structure atom with the same isotope label. Even if this option is not set, explicit references to property A_ISOTOPE in atom query expressions are always evaluated and used to determine the match.

-kekule

-kekule none/odd/even/all

By default (value none or 0), the Kekulé bond order of aromatic bonds is not used for matching. A substructure aromatic bond matches a structure aromatic bond, regardless of whether their Kekulé bond orders are the same or not. If this flag is set to all (or 3), aromatic bonds are compared with the drawn bond order. This can be useful for example in order to find a sequence of atoms for perform a reaction transformation which allows a simple change of bond orders in the path without a complete rearrangement of the full π system. The modes odd and even are useful for controlled matching of certain heteroaromatic systems. In mode odd (or 1), the Kekulé bond order is used for all bonds which are only a member of aromatic rings with an odd number of atoms, while the order of bonds in even aromatic systems (including those which are simultaneously a member in an odd aromatic system) is disregarded. Mode even (or 2) is the complementary counterpart.

-limit

-limit n

Set the maximum number of reported substructure matches to n . Any additional matches which might be present are ignored. - maxmatch is an alias.

-mapanchor

-mapanchor 0/1

If this flag is set, an anchor set (see option - anchor ) is automatically constructed from the values of the A_MAPPING properties on the substructure and structure. A_MAPPING is the default property to encode reaction mapping information. Both substructure and structure must possess valid A_MAPPING data, otherwise this option is ignored. If this condition is fulfilled, any substructure atom which has a non-negative mapping number1 is anchored to its counterpart on the structure side with the same mapping number. If no such number is present, the command immediately returns zero matches and empty atom/bond/mol mapping variables, if these were specified. This option can be combined with a normal - anchor option. The anchor tables are cumulative in this case.

-maxopenlinks

-maxopenlinks n

Limit the number of open links of the substructure embedded in a match. Any continuation of the structure from the matched substructure into the unmatched parts except by hydrogen atoms is considered an open link. Example:

match ss -mode distinct CC CCCC
match ss -mode distinct -maxopenlinks 1 CC CCCC

The first example reports three matches, the second only two.

In the latter case, the substructure matches only at either end, because in case of a match in the middle of the C4 carbon chain there would be two continuation links. The - terminal option is equivalent to using this mode with an open link count of one.

For substructures which are larger than a single atom, the open link count is the number of matched structure atoms which have one or more open links. For single-atom substructures, the match count is directly the number of open links of the single matched structure atom.

This attribute is only checked for the root level of hierarchical substructure matches (e.g. when using Recursive SMARTS or a similar match mechanism, the match of the recursive fragments is not tested). Example:

match ss -maxopenlinks 1 {[C;$(*OC=O)]} $eh

This command checks that the matched carbon has only a single non-hydrogen continuation (necessarily into the ester group, e.g. it cannot match an ethyl or higher ester, only methyl), but this test does not apply to the ester fragment which is checked in a nested deeper level.

-mode

-mode first/all/canonic/distinctatoms/distinctbonds/distinctfirstatom/distinctheavyatoms/distinctinneratoms/distinctmols/distinctssatoms/nocommon/unique/bilateraldistinct/bestscore/bestscores/distinctscores/bilateralunique/distinctfgatoms/nocommonfgatoms/nocommonheavyatoms/conditionalnocommonfgatoms

This important option determines the substructure match mode. The default mode is first . In mode first , only the first, if any, match is returned, and any list variables used to capture the atom, bond or molecule maps use only a single level of nesting.

Mode all reports all (subject to a potentially set maximum number of results, see - limit option) all possible matches, which differ in at least one atom mapping relationship to any other reported match.

Example:

set nmatch [match ss -mode all CC CCC]

returns 4, because the C2 fragment can be embedded in forward and backward direction, and matched on either the first two or last two carbon atoms of the propane structure.

Mode distinctatoms only reports matches which map onto a different set of structure atoms. Example:

set nmatch [match ss -mode distinctatoms CC CCC]

returns 2 for the mapping of the substructure onto the first two, and the last two carbon atoms. The backward matches of the C2 fragment are not reported, because they do not cover a new set of atoms.

Mode distinctheavyatoms is similar to the distinctatoms mode, but only uses non-hydrogen substructure atoms for determining whether a match should be considered new and included.

Example:

set nmatch [match ss -mode distinctatoms {CC[#1]} CCC]
set nmatch [match ss -mode distinctheavyatoms {CC[#1]} CCC]

The first example reports an astonishing 10 matches, because the hydrogen atom can be mapped to either of the three terminal hydrogens, or two central hydrogens, and there are two distinct embeddings of the substructure C2 fragment. Mode distintheavyatoms reduces the number of reported hits to 2, because only the atom mappings of the two carbons in the substructure are considered. In many cases, hydrogens can be considered equivalent, and in these cases this mode comes in handy.

Mode distinctinneratoms is similar to distinctheavyatoms , but instead of ignoring all hydrogen atoms on the substructure when determining the novelty of a match, all terminal atoms (those with less than two bonds) are ignored in filtering new matches.

Mode distinctfirstatom is another mode with a modified view of what are distinct matches. This mode only looks at the structure atom matched by the first substructure atom.

Mode distinctfgatoms is another variant of distinctheavyatoms where in addition to hydrogen all carbon atoms which are not a functional group (having four bonded ligands, or being aromatic) are omitted in the match distinctiveness test.

Mode distinctmols requires that the substructure matches a different molecule in the structure ensemble in each accepted match.

Mode distinctbonds uses the set of matched structure bonds to determine whether a match is novel. For cage structures, there may be multiple matches of the same structure atoms, but matching different bond paths.

Mode unique is a stricter version of mode distinctatoms . Here, the matched atoms must additionally be topologically different, as determined by hash code property A_HASH (when matching without stereochemistry) or A_STEREO_HASH (in stereo match mode), or the isotope-aware variants thereof if isotope labels are checked.

Mode bilateralunique is a variant where pairs of matched substructure/structure atom labels are used to determine whether a label set is unique, not just the atom label set.

Mode nocommon only reports matches which do not share any common atoms. Example:

set nmatch [match ss -mode nocommon CC CCC]

returns only a single match, because the middle carbon atom in the structure is already matched by the first match. Unfortunately, the results of this match mode may depend on the numbering of atoms. If, by change, a C2 substructure fragment is first matched in the middle of a C4 chain, only a single match is found, but if it matches first at one of the ends, two matches are found, because the middle match, if found next, is discarded and then the other terminal match is accepted. The described effect is not a problem in all cases, depending on the nature of the substructure, but using this mode requires careful analysis.

Variants of this mode are nocommonfgatoms and nocommonheavyatoms . The first disallows matches which share a functional heavy atom (hetero atom or non-aromatic carbon with a multiple bond) with a previous match, while the second allows overlap of matched hydrogens, but not of other atoms. Like the plain nocommon mode, the results can be dependent on atom numbering. The mode conditionalnocommonfgatoms works like nocommonfgatoms if there are functionalized atoms in the match. If there are not, the mode is automatically and temporarily reset to nocommonfgatoms for checking this match for overlap with an existing match.

Mode canonic returns only a single match if one can be found, but uses atom hash codes to return a canonic match within the structure, regardless of atom order. The exact hash code properties used to determine the canonic match mirrors whether the match checks for stereochemistry and/or isotopes, as in mode unique .

Modes distinctssatoms and bilateraldistinctdistinct are only useful in contexts where only a part the substructure may be matched, for example when using the - fuzz option. Mode distinctssatoms is essentially the same as mode distinctatoms , only that the matched atoms on the substructure side are checked, not those on the structure side. Mode bilateraldistinct uses substructure atom/structure atom pairs instead of simple atom identities as criterion of distinctiveness.

Modes bestscore , bestscores and distinctscores can only be used if a match scoring mechanism has been configured. If that is the case, bestscore returns one of the matches with the best score, bestscores all matches which share the best score, and distinctscores a set of matches with all have different scores, omitting matches with duplicate scores.

-multihighlight

-multihighlight 0/1

If this option is set, and the options - atomhighlight and/or - bondhighlight are used, and more than one match is generated, the highlight atom and/or bond attributes are also set for the second and further matches, resulting in a highlight set which is the union of all matches. By default, only the first match is highlighted, even if more than one match is generated and reported.

-noaliphaticonaro

-noaliphaticonaro 0/1

If this flag is set, aliphatic bonds do not map on aromatic bonds. By default, and in the absence of other criteria determining the match of a bond, both single and double (but not triple or higher) aliphatic substructure bonds match aromatic structure bonds, and vice versa. If the flag is set, substructure bonds which are not marked aromatic, either by explicit attribute setting or indirectly by aromaticity analysis of the substructure fragment, do not match aromatic structure bonds. By default, this flag is not set. This option does not influence the processing of bond query expressions which explicitly reference properties such as B_ORDER or B_ISAROMATIC . These are evaluated in any case.

-noarobondfg

-noarobondbg 0/1

If this flag is set, aromatic bonds are not considered functional groups. This flag influences the interpretation of the insulator and separator pseudo-atoms, which are constructs used to separate functional groups in the match process. By default, aromatic bonds are considered part of a functional group.

-nochainonaro

-nochainonaro 0/1

If this flag is set, substructure chain bonds (acyclic bonds) do not match on aromatic structure bonds. By default, and if no options prohibiting this like - nosingleonaro or - nodoubleonaro are set, single and double chain bonds can match aromatic structure bonds.

-nochainonring

-nochainonring 0/1

If set, substructure chain atoms cannot match ring structure atoms.

-nodoubleonaro

-nodoubleonaro 0/1

If this flag is set, double bonds do not map on aromatic bonds. By default, and in the absence of other criteria determining the match of a bond, both single and double (but not triple or higher) aliphatic substructure bonds match aromatic structure bonds, and vice versa. If the flag is set, substructure double bonds which are not marked aromatic, either by explicit attribute setting or indirectly by aromaticity analysis of the substructure fragment, do not match aromatic structure bonds. By default, this flag is not set. This option does not influence the processing of bond query expressions which explicitly reference properties such as B_ORDER or B_ISAROMATIC . These are evaluated in any case.

-noheterofg

-noheterofg 0/1

If this flag is set, bonds to hetero atoms are not considered part of functional groups. This flag influences the interpretation of the insulator and separator pseudo-atoms, which are constructs used to separate functional groups in the match process. By default, bonds involving a hetero atom are considered part of a functional group.

-nomultibondfg

-nomultibondfg 0/1

If this flag is set, non-aromatic multiple bonds are not considered part of functional groups. This flag influences the interpretation of the insulator and separator pseudo-atoms, which are constructs used to separate functional groups in the match process. By default, non-aromatic multiple bonds are considered part of a functional group.

-nosingleonaro

-nosingleonaro 0/1

If this flag is set, single bonds do not map on aromatic bonds. By default, and in the absence of other criteria determining the match of a bond, both single and double (but not triple or higher) aliphatic substructure bonds match aromatic structure bonds, and vice versa. If the flag is set, substructure single bonds which are not marked aromatic, either by explicit attribute setting or indirectly by aromaticity analysis of the substructure fragment, do not match aromatic structure bonds. By default, this flag is not set. This option does not influence the processing of bond query expressions which explicitly reference properties such as B_ORDER or B_ISAROMATIC . These are evaluated in any case.

-nosuperatommultilink

-nosuperatommultilink 0/1

If set, substructure superatoms can only be matched in such a fashion that they have a single link to other matched parts of the structure.

-nosuperatomonh

-nosuperatomonh 0/1

If set, substructure superatoms cannot match structure hydrogen, even if their usual specification would allow it.

-omitrecursion

-omitrecursion 0/1

This options influences the way matches of recursive SMARTS fragments are reported. Internally, the first atom of a recursive fragment is represented by an any atom on the basic substructure. This placeholder atom and its mapped structure counterpart are reported in atom maps, and the bonds leading to the placeholder in bond maps. If this flag is set, the placeholder atom and its bonds are omitted from the maps.

Example:

match ss -omitrecursion 0 {C[$(OC)]} COC amap
match ss -omitrecursion 1 {C[$(OC)]} COC amap

In the first example, the atom map contains the pairs “{1 1} {2 2}”, while in the second example only “{1 1}” is returned as atom map.

In any case, detailed mapping information about all the atoms and bonds of the recursive fragment is currently not directly available on the script level.

-openhcount

-openhcount 0/1

If this flag is set, all hydrogen counts are considered minimum values. If a matched structure atom possesses more hydrogens, the match still succeeds, even if the original comparison operator uses equality as criterion, provided that the compared property value is A_HCOUNT , the standard hydrogen count property, which is the default used by the various query syntax decoders of the toolkit. This option is unusual because it is also applied to comparisons in atom or bond query expressions. By default, this flag is not set.

Example:

match ss -openhcount 0 {[C;H2]} CC
match ss -openhcount 1 {[C;H2]} CC

The first example does not match, because both carbon atoms in the structure possess three hydrogen atoms, not two, while the second attempt succeeds. Note that the simple specification

match ss -openhcount x {[CH2]} CC

succeeds regardless of the setting of this flag. This is a side effect of the implicit expansion of SMARTS hydrogen atoms when they appear directly behind the atom symbol in the default SMARTS decoder mode, which is described in detail in the section about the handling of SMILES strings.

Alternatively, it is of course possible to either use standard SMARTS or-connected hydrogen count alternative values, or use the toolkit-specific range extensions, as in

match ss {[C;H2,H3]} CC
match ss {[CH{2-}]} CC

but in many cases this makes the query more complicated than necessary.

-overlap

-overlap none/any/nobonds/noembedding/distinctatoms/distinctmols

This option controls how potential overlap of multiple substructure fragments on the target structure is handled. If the substructure contains only a single fragment, this option has no effect.

The default mode is none . In this mode, no overlap of substructure fragments on the target structure may occur. All fragments must be matched side by side, matching different structure parts.

Mode distinctmols is even more restrictive than mode none . In this mode, only one substructure fragment may be matched onto each structure fragment (i.e. molecule).

In mode any , every substructure fragment is treated independently of any other substructure fragment. No information about any match by other fragments is used. Arbitrary overlap of the fragments on the target structure is allowed.

Mode nobonds allows the overlap of atoms, but not of bonds. In effect, multiple fragments may overlap at the edges, but not share any larger structure parts.

In mode noembedding , atoms and bonds may overlap, but no substructure fragment may be completely embedded into the matched structure part covered by another fragment, meaning that at least one of any pair of matching substructure fragments must match an atom which is not matched by the other fragment.

Mode distinctatoms is similar to mode noembedding , but in this mode any pair of matching substructure fragments at least one structure atom must be matched by each substructure fragment which is not matched by the other.

Because internally bitsets are used to track the mapping of substructure fragments, the maximum number of fragments which may be used in any mode but none or distinctmols is 64. The none and distinctmols modes do not have a maximum fragment count.

-pionaro

-pionaro 0/1

If this flag is set, any bond between atoms which are part of a π system can match an aromatic bond. This option is intended to allow the reproduction of the behavior of the Daylight toolkit, which has a much broader idea about which ring systems are aromatic than the Cactvs toolkit in its default aromaticity mode. The Daylight toolkit recognizes rings with exocyclic keto groups, such as purines and pyrimidines, as aromatic, while this toolkit does not. If the option flag is set, aromatic fragments match on such systems. By default, the flag is not set. This option is outdated. It is recommended to use Daylight aromaticity for matching instead.

-queryatomexpansion

-queryatomexpansion 0/1

If this flag is set, query atoms which potentially cover multiple matched structure atoms are expanded to cover all matched atoms. Query atoms of this class include the MDL Beilstein types and the EliLilly Spinach types, but not superatoms (see - superatomexpansion ). With a set flag, the matched atoms map potentially contains multiple entries for a single query atom. By default, such query atoms are only registered to match the first structure atom.

-remembercomplexmatches

-remembercomplexmatches 0/1

If set, any complex match expression that once a substructure atom is found to match an atom with all its atom match conditions, is assumed to match it again with a different overall substructure mapping without re-checking. This flag can accelerate matches, but must only be used if the general match of an atom does not depend on the match status of other substructure atoms.

-restrictsubstitution

If set, unmatched non-hydrogen portions of the structure can only be linked to substructure any atoms, or open valences. If open valences are used, the number of open valences linked to a substructure atom limits the number of unmatched heavy atom continuation atoms on the structure. Example: A substructure phenyl-any would match benzene (on a hydrogen substituent), toluene or other mono-substituted benzenes, but not higher-substituted compounds. A substructure open-C-open would not match tertiary or quaternary structure carbons.

-rotateterminals

-rotateterminals 0/1

If set, the 2D bond direction of matched structure-side terminal atoms (i.e. atoms with only a single bond) is adjusted in property A_XY to match that of the direction of the matched substructure-side bond. This option is for example useful to force the same orientation of hydrogens as in a template. Obviously, this option requires for useful results that the general orientation of the matched structure part is the same as that of the substructure pattern. This is usually enforced by combining this option with the - align option in the rotate , redraw or besteffort modes.

-spinachformula

-spinachformula formulaexpression

Define a formula match expression for spinach -class superatoms. The atom set matched by such an atom must match the formula expression, otherwise the match is rejected. This can for example be used to restrict the size and type of substituents. If no explicit spinach formula is set, it is taken from ::cactvs(default_spinach_formula) , where it can be globally changed. Example:

match ss -spinachformula >=C1-2 {smarts:c1ccccc1[[CSPINACH]]} c1ccccc1C
match ss -spinachformula >=C1-2 {smarts:c1ccccc1[[CSPINACH]]} c1ccccc1CC
match ss -spinachformula >=C1-2 {smarts:c1ccccc1[[CSPINACH]]} c1ccccc1CCC

The first two expressions match. The carbon-spinach superatom gobbles up the entirety of the ring substituent atoms, and it has one or two carbons for the first two examples. The third example fails, because the substituent has too many carbons.

-stereo

-stereo none/absolute/relative

This option controls the global use of stereochemistry information of the substructure in the match process. By default, stereochemistry is ignored. If this flag is set, stereochemistry present in the substructure is checked against the stereochemical features in the structure. Stereo checks are performed on a pseudo-3D model of the compound and do not use simple descriptor values such as R and S.

If a stereo center in the substructure is unspecified, any stereochemistry, including unspecified stereochemistry, is allowed on the structure side in the matching atoms or bonds. If stereochemistry on an atom or bond of the substructure is specified, it must match the features found in the structure. Unspecified stereochemistry for the matched bond or center on the structure normally leads to a mismatch, except in case a nostereook flag has been set in A_query( flags ) or B_query( flags ) for the substructure atoms or bonds. Currently, the substructure match system handles stereochemistry of tetrahedral centers (including those which involve free electron pairs), cis/trans double bonds, allenes (both odd and even) and square planar geometries. Other geometries such as pentagonal bipyramids or octaeders are not yet supported.

With stereo match mode absolute , the pseudo-3D configuration of substructure and structure must match at all stereo centers and diastereomeric bonds specified in the substructure. The alternative mode relative allows the opposite configuration at stereo centers (but not bonds), provided that all matched stereo centers possess the opposite configuration. For example, an S,S-substructure would match both an S,S- and R,R-structure, but not the S,R or R,S-isomer. In effect, only stereo isomers are matched, but not diastereomers. The relative mode is obviously useful only when more than one stereo center needs to be matched.

Explicit atom stereo groups, such as the MDL stereo groups, override the global absolute or relative settings for the atoms involved.

Examples:

match ss -stereo none {[Cl,Br,I][C@H](CC)C} {C[C@H](CCC)Cl}
match ss -stereo absolute {[Cl,Br,I][C@H](CC)C} {C[C@H](CCC)Cl}
match ss -stereo absolute {[Cl,Br,I][C@H](CC)C} {C[C@@H](CCC)Cl}
match ss -stereo absolute {[Cl,Br,I][C@H](CC)C} {C[CH](CCC)Cl}

In this example set, the first line matches, because stereochemistry is ignored. The second line does not match, because the target structure represents the opposite stereo isomer. The third line does match, and the last line fails again because the substructure requested matching stereochemistry at a center for which no stereochemical information was available on the structure.

-strictexclusion

-strictexclusion 0/1

This is an expert option which controls how substructure fragments are handled which exclusively consist of atoms which bear the attribute that they should not be matched. By default, an attempt to match these fragments is performed after all other substructure fragments have been matched, and their matched structure parts are blocked. If at this point a match of any such fragment succeeds, the match is a failure. However, at this stage, structure parts which could match the exclusion fragment are potentially covered by other substructure fragments and thus protected, if the overlap mode disallows overlaps. If the flag is set, the check of these fragments is performed before the normal substructure fragments are processed. If a match occurs, the match process is immediately aborted.

-strictsmarts

-strictsmarts 0/1

If set, substructure argument specifications are decoded as strict SMARTS definitions. This means for example that the non-aromaticity of upper-case elements in SMARTS is enforced. Atoms for which aromaticity is not relevant need to be encoded with # notation, or as or-ed uppercase and lowercase element symbol pair. This flag only has an effect if the substructure is decoded within the match command. If the handle or reference of an existing ensemble is used as substructure specification, its internal representation and match behavior is not changed and was already defined by whatever decoder options were used when it was created.

-superatomexpansion

-superatomexpansion 0/1

If set, the match of a superatom is expanded to cover all structure atoms its definition encompasses. In the atom map, all these structure atoms are mapped to the same substructure superatom. By default, the match register only maps the first substructure/structure atom pair, and the other structure atoms are not explicitly registered as matched. Superatoms are not the same as expandable query atoms (see - queryatomexpansion ).

-superatommapproperty

-superatommapproperty 0/1

If set, superatom matches are registered in property A_SUPERATOM_SSMATCH . In case there are multiple matches, each match generates its own property instance. The value for the matched structure atoms is the matching substructure superatom string copied from property A_SUPERATOMSTRING .

-tautomers

-tautomers none/basic/advanced

By default, bond orders and location of hydrogen atoms in the structure are fixed. A tautomer of a compound is considered a different chemical entity and does not match another tautomer. If the tautomer match mode is explicitly set to none , the match procedure continues to work in this style.

The alternative tautomer match modes basic and advanced introduce flexibility - at the cost of longer processing times, and a risk of obtaining matches which are surprising at first glance.

Examples:

match ss -tauto none {C=CO[H]} CC(=O)C
match ss -tauto none {CC=O} C=C(O)C
match ss -tauto basic {C=CO[H]} CC(=O)C
match ss -tauto basic {CC=O} C=C(O)C

The first two sample lines with the substructures of an enol and a keto group do not find a match with the structures of acetone and its keto form. The second pair of lines does find matches in both cases.

Atom and bond maps can be used with tautomeric matches, but the results can be surprising. The bond of a wandering hydrogen atom in the substructure is matched to the bond with the hydrogen in the original structure. However, since the substructure hydrogen atom may actually have been matched against a different virtual structure than the one passed to the match routine, the partner atoms of the bonds to the hydrogens in the substructure and structure may not have been mapped onto each other!

The difference between the basic and advanced modes is that the basic mode does not disturb aromatic systems, while the advanced mode considers forms which involve the conversion of aromatic systems into quinoids and vice versa, at the cost of extra processing time and less precisely defined matches.

The option can be shortened to - tauto .

-terminal

-terminal 1/0

This is another expert flag, and equivalent to the - maxopenlinks option with a link count of one. If it is set, a maximum of one bond, with the exclusion of bonds to hydrogen, may lead from the matched part of the structure to any non-hydrogen unmatched atoms. Essentially, the substructure is mapped into peripheral regions of the structure.

Example:

set nmatch [match ss -mode all CO C(O)C(O)C]
set nmatch [match ss -mode all -terminal 1 CO C(O)C(O)C]

In this example, the first line returns two matches, since the CO fragment can be matched onto both CO groups in the structure. The second line finds only a single match. The substructure cannot be matched onto the seconds CO group, because in that match the structure carbon atom has two unmatched non-hydrogen neighbors, one leading to the first CO group, and the other to the methyl group.

-timeout

-timeout nsecs

Set a time-out for the match operation. By default, or when a value of zero is given, the routine does not time out. If a time-out occurs, the match procedure is stopped. If any matches have been found so far, these are reported as results, without raising an error.

-transferstereo

-transferstereo none/atoms/bonds/both

If set to anything but none , which is the default, stereogenic atoms and/or bonds in the structure that are matched by substructure atoms or bonds with defined stereochemistry, and do not already possess their own stereochemistry descriptors, inherit stereochemistry from the substructure. This is done by setting properties A_LABEL_STEREO or B_LABEL_STEREO in such a fashion that the absolute configuration is the same as in the substructure. Depending on the atom and bond labeling of the structure vs. substructure, this is not necessarily the same descriptor value. In order for such a match to succeed, missing atom or bond stereochemistry on the structure side needs to be allowed (see - allowmissingstereo option).

-useatomtree

-useatomtree 0/1

This flag is set by default, but may be reset with this option. If the flag is set, atom query expression trees present in property A_query( query ) are evaluated and used to determine match possibilities. If this flag is not set, query trees are ignored and only the flat atom match attribute set is used.

-useatomtype

-useatomtype 0/1

This flag is set by default. If it is reset, atom type information is not checked in matching. -atomtype is an alias.

-usebondorder

-usebondorder 0/1/2

This flag determines whether the bond orders of substructure and structure bonds outside aromatic systems is used for determining a match. By default this flag is set to 1, but may be disabled with this option. This option affects only the basic bond match. Bond match query expressions which explicitly or implicitly refer to property B_ORDER always use their comparison results to determine matches. In rarely used mode 2, bond order matching is only used for terminal structure bonds (i.e. those which contain an atom which participates only in a single bond). -bondorder is an alias.

-usebondtree

-usebondtree 0/1

This flag is set by default, but may be reset with this option. If the flag is set, bond query expression trees present in property B_QUERY( query ) are evaluated and used to determine match possibilities. If this flag is not set, query trees are ignored and only the flat bond match attribute set is used.

-varbondglobal

-varbondglobal maxdelta

If this option is used, the global use of approximated fractional bond orders for coordinate compound hypergraph matching is enabled for bonds with explicit approximated bond order request values stored in property B_QUERY( varbo ) . The maxdelta parameter is the maximum allowed average deviation of the matched structure bonds (with fractional order in B_ORDER_ESTIMATE ) vs. the substructure bonds that have a specified value in B_QUERY( varbo ) .

-varbondlocal

-varbondlocal maxdelta

If this option is used, the use of approximated fractional bond orders for coordinate compound hypergraph matching is enabled for bonds with explicit approximated bond order request values stored in property B_QUERY( varbo ) . The maxdelta parameter is the maximum allowed individual deviation of the fractional query bond orders in B_QUERY( varbo ) from the structure-side fractional bond order values of matched bonds stored in property B_ORDER_ESTIMATE .

-wedge

-wedge 0/1

If this flag is set, matching bonds on the substructure and structure sides must possess identical wedge attributes (both wedge tip location and up or down direction). This option should only be used under very specific circumstances. It is not a replacement for stereo center matching, since wedges can be placed onto different bonds around a stereo center, and still represent the same stereo isomer.

Maximum Common Substructure Matching

Maximum common substructures can be found with the mcss mode. Algorithmically, this works by gradually increasing the -fuzz parameter until a match is found.

Example:

set eh1 [ens create CCC(=O)C]
set eh2 [ens create ClCCC]
match mcss -excludeh all $eh1 $eh2 amap
echo $amap
match mcss -mode all -excludeh all $eh1 $eh2 amap
echo $amap

The first command finds one substructure (first three carbons of the first structure mapping on atoms 2-4 of the second structure. The second command reports also alternative solutions, all with three carbons. Excluding the hydrogens from the match is not required, though it accelerates the process.

All match options can be used for this command variant - including, for example, the -align variants, which result in the structures being aligned by a common core.

Reaction Substructure Matching

Reaction substructure matching is very similar to simple substructure matching. The substructure and reaction arguments are required to represent reactions, not ensembles. All standard configuration options can be used. Configured match options are implicitly applied both to the reagent and product side substructure matches.

In the normal case, both the reaction pattern and full reaction should be mapped, i.e. both sides of the reactions should possess property A_MAPPING which contains the label of the atom on the other side corresponding to the atom bearing the property value, or 0 for atoms which are only present on one side. This property is automatically set on the reagent and product ensembles when reading reaction files with mapping information, or decoding a Reaction SMILES with map IDs. The reaction match algorithm automatically uses the mapping to construct an anchor table for the second stage matching. This second-stage anchor table is independent of an anchor table specified as command argument - a custom anchor table will be used only in the first-stage match if it is set.

Matching happens in two levels of recursion. First, a normal substructure match is attempted between the substructure ensemble with the reagent E_REACTION_ROLE and the corresponding reagent ensemble of the full reaction. If that succeeds, the query ensemble with the product E_REACTION_ROLE is matched in a recursive call against the corresponding full reaction product ensemble. If both the query and the full reaction have A_MAPPING , an additional constraint is enforced that the product query atoms with a non-zero A_MAPPING can only match the full product atoms which have the same A_MAPPING as the full reagent atoms which were matched by the reagent query atoms with the same A_MAPPING as the current product query atom.

This mechanism ties the two query sides together and makes a full reaction query different from a query which only looks for the independent presence of the query patterns in both sides of the reaction. It is possible to run a reaction query without the implicit A_MAPPING tie by omitting the mapping, either on the query or the full reaction. In that case, the effect is an independent match of the patterns on both sides.

A_MAPPING is computable on the full reaction or reaction query pattern, but the current implementation is comparatively underoptimized, and for more complex query patterns likely to report spurious results. In general, reaction mapping has a very problematic algorithmic complexity power law (NP problem). The toolkit implementation only works reliably and with usable speed for small molecules and simple query patterns. The reaction match command does not attempt to compute the property (neither on the pattern nor on the full reaction side) if it is not present.

If match variables are configured to capture detailed information on the match result, reaction substructure matches utilize two independent sets - one for the reagent-side match and one for the product-side match. The format of results stored therein is the same as for normal substructure matches. It is possible to configure any subset of these variables and to ignore other information.

Examples:

set xh1 [reaction create c1ccccc1C(=O)C>>c1ccccc1C(O)C]
set xh2 [reaction create {[c:1]1[c:2][c:3][c:4][c:5][c:6]1[C:7](=[O:8])[C:9] 			>>[c:1]1[c:2][c:3][c:4][c:5][c:6]1[C:7]([O:8])[C:9]}]

Two identical reactions (reduction of phenylketone), without and with atom mapping. The reactions may of course be created from other formats, such as MDL RXN files, KEGG reaction IDs, or ChemDraw files.

set xss1 [reaction create {[c:1]-[C:2]=[O:3]>>[c:1]-[C:2]-[O:3]}]
echo [match rss $xss1 $xh1] (1)
echo [match rss $xss1 $xh2] (1)

Both match - the first reaction because no reaction mapping is present on the reaction and so this is a simple double substructure check - while in the second case the atoms match as indicates by the mapping, and the bonds change as required.

set xss2 [reaction create {[c]-[C]=[O]>>[c]-[C]-[O]}]
echo [match rss $xss2 $xh1] (1)
echo [match rss $xss2 $xh2] (1)

Both match because there is no mapping on the query, and this again is a simple double substructure match.

set xss3 [reaction create {[c]-[C]=[O]>>[c]-[C]-[c]}] 
echo [match rss $xss3 $xh1] (1)
echo [match rss $xss3 $xh2] (1)

Once more: No mapping, simple double substructure match - matching the right-side pattern substructure on a different location unrelated to the left side is no error.

set xss4 [reaction create {[c:1]-[C:2]=[O:3]>>[c:1](-[C:2])-[c:3]}]
echo [match rss $xss4 $xh1] (1)
echo [match rss $xss4 $xh2] (0)

But this does not work any longer with mapping present both on the pattern and the full reaction. The third query atom does not get converted to a carbon in the reaction. Having mapping on both the full reaction and the query is the standard case for reaction substructure matching.

set xss5 [reaction create {[c:1]-[C:2]-[O:3]>>[c:1]-[C:2]=[O:3]}]  
echo [match rss $xss5 $xh1] (0)
echo [match rss $xss5 $xh2] (0)

Here the pattern is for the reverse reaction, not the forward direction. Since the direction of the reaction is encoded in the reaction object, this does not work. See the reaction reverse command for a possibility to reverse the encoding of a reaction.


1. A zero or negative atom mapping value indicates an unmapped atom.