XMLWalker.jl API
XMLWalker.ChainMatcher
— Type ChainMatcher
Is a wrapper type for a vector of matchers, which is iterable
XMLWalker.ContainsAllPats
— Typematcher = ContainsAllPats(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) - true if input.field_name
contains all of pat elements (pat can be an iterable collection) if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.ContainsAnyPat
— Typematcher = ContainsAnyPat(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) - true if input.field_name
contains any of pat elements (pat can be an iterable collection) if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.ContainsPat
— Type matcher = ContainsPat(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) true if the input.field_name
contains the pattern pat
if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.HasAllKeysPat
— Type matcher = ContainsPat(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) - true if the input.field_name
has all elements of pat
as keys if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.HasAnyKeyPat
— Type matcher = ContainsPat(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) - true if theinput.field_name
has at least one element of pat
as a key if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.MatchersSet
— Typematchers_set = MatchersSet(matchers_collection::P,type::Symbol)
matchers_set_empty = MatchersSet(type::Symbol)
Combines several matchers from matchers_collection
,type
can be :all
or :any
matchers_set
is a callable object, it applies matchers from the collecion and and if type = :all
or type = :any
if all or any of matchers returns true
. As far as MatchersSet
is a subtype of AbstractMatcher
it can be used as an input for find_nodes(starting_node,::AbstractMatcher)
function to find nodes matching the matchers set. Calling MatchersSet
only type
input creates an empty set, which can be filled with matchers using Base.push! function.
MatchersSet can be used to construct more complicated matchers than those provided by parsing strings, for instance, it can be used to make matchers for multiple properties:
matchers_intersection = MatchersSet( (MatchesPat("A",:tag), MatchesPat("B",:variable), MatchesPat("C",:attributes)),:all)
find_nodes(starting_node, matchers_intersection) # will search for nodes, that contain "A","B" and "C" in their tag, variable and attributes fields simultaneously
XMLWalker.MatchesPat
— Typematcher = MatchesPat(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) returns true if pat
is matched to the input
object field field_name
if field_name
is nothing
than input
itself is matched.
Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.PatContains
— Type matcher = PatContains(pat,field_name::Union{Symbol,Nothing}=nothing);
matcher(input) - true if pat
contains the input.field_name
if field_name
is nothing
than input
itself is matched. Other matchers: Union{XMLWalker.AnyPat, XMLWalker.ContainsAllPats, XMLWalker.ContainsAnyPat, XMLWalker.ContainsPat, XMLWalker.HasAllKeysPat, XMLWalker.HasAnyKeyPat, XMLWalker.MatchersSet, XMLWalker.MatchesPat, XMLWalker.PatContains}
XMLWalker.chain_string_specification
— TypeChain strings specification
Simple chain by tags
A single string input (without the token separator symbol "/"
) to the XMLWalker.find_nodes
function, such as "node1_tag"
, represents a single token. By default, this token is assumed to be the tag field of the node and XMLWalker.find_nodes("node1_tag")
it finds all nodes (if any) matching the pattern specified by that string. Additionally, the field name can be provided as an optional second input argument.
XMLWalker.find_nodes(starting_node, "ABB",:attributes) # returns nodes, that has "ABB" value of field `attributes`
The symbol "/"
is used as a token separator, indicating that all substrings separated by this symbol should be interpreted as a sequence of matchings. Each token in the string corresponds to a node in the XML tree. The function XMLWalker.find_nodes
uses these chain-strings to create matchers that are checked sequentially, searching for nodes that fit the entire sequence. For the chain-string "A/B/C"
, XMLWalker.find_nodes
returns nodes reachable by following the path with tags "A"
→ "B"
→ "C"
.
find_nodes(starting_node,"A/B/C") # returns a vector of nodes which can be reached following the "A"->"B"->"C" chain
# (starting_node must has a field :tag = "A")
Single token specification
The following sections describe a single token syntax, all of this is also applicable to the string-chain as far as string-chain is a sequance of tokens.
Additional fields values check
When a tag name is followed by a dot, as in "tag_name.field_name = field_value"
, the field_name
is interpreted as the name of the node's field, and the value after the =
(i.e., field_value
) is treated as the value of that field. For example, "A.value = ABC"
represents a node with the tag
field equal to "A"
and the value
field equal to "ABC"
. By default, the value is parsed as a string, but if it can be interpreted as a number, such as in "tag_name.field_name = 25.4"
, it means the field_name
of the node tagged tag_name
has the value 25.4
(Float64). Additionally, the annotation ::text
can be added to force the value to be parsed as a string, so /tag_name.field_name = 25.4::text/
searches for the field_name
with the string value "25.4"
(String).
find_nodes(starting_node,"B.value = 356::text") # searches for nodes with "B" tags and field value ="356"
Nodes with dictionary fields
Node fields can also be of the AbstractDict
type, allowing for searches of specific keys or key-value pairs. The syntax for this is as follows: "tag_name.field_name([key1,key2,key3])"
when field name is followed by "(....)"
. This pattern matches nodes where the field_name
dictionary of a tag_name
node contains any of the keys "key1"
, "key2"
, or "key3"
. All keys must be enclosed in either "[]"
or "{}"
— where "[]"
represents any of the keys, and "{}"
means all keys must be present. For example, "A.attributes({p1,p2})"
refers to a node with the tag "A"
having the attributes
field that contains both "p1"
and "p2"
keys.
It is also possible to search for nodes with specific key-value pairs within the field-dictionary. The syntax is similar to the key search pattern, but each key is followed by an equals sign (=
). For instance, "tag_name.field_name({key1=value1,key2=value2})"
matches a node with tag "tag_name"
whose field_name
dictionary contains both "key1"=>"value1"
and "key2"=>"value2"
.
find_nodes(starting_node,"B.value([p1,p2])") # searches for nodes with "B" tags and field value with any of p1 or p2 keys
find_nodes(starting_node,"B.value([p1=2,p2=20])") # searches for nodes with "B" tags and field value with all of "p1" => 2.0 and "p2" => 20.0 key-value pairs
Special Symbols
All special symbols in this section apply to tags, field values, and field dictionary keys.
"[...]"
(any) and "{...}"
(all) Patterns
To find nodes within a specific set of tags, for example "tg1"
, "tg2"
, or "tg3"
, these tag names must be enclosed in square brackets "[]"
. This pattern returns true if any of the enclosed tags match. For example, "[tg1,tg2,tg3]"
searches for nodes with any of the tags "tg1"
, "tg2"
, or "tg3"
.
Similarly, this pattern can be used to match field values, such as "A.field_name = [a,b]"
, or to match field dictionary keys like "A.field_name([k1,k2])"
.
find_nodes(starting_node,"[A,B].field1 = [ab,bc]") # finds nodes with "A" or "B" tag field and "ab" or "ac" field1 field value
If all patterns need to be matched simultaneously, the {}
block can be used. This block is especially useful for specifying field values. For example, "[A,B].field_name({a = 1, b = 2})"
will match nodes tagged as "A"
or "B"
that contain both key-value pairs "a" => 1.0
and "b" => 2.0
in their field_name
field.
*
(always match), *...
(contains) and ...::regex
(regular expression) patterns
To skip a pattern node in a search tree, the *
symbol, which represents an always-match wildcard, can be used. For example, "*.field_name = c"
matches nodes with any tag and a field_name
value of "c"
. Hence, the "*.tag = A"
token is equivalent to just "A"
.
When the *
symbol appears anywhere within a string, it indicates a partial match (i.e., the pattern is contained within the string). For instance, "*Prop"
matches nodes with tags containing "Prop"
, such as "BulkProp"
or "PropertyOne"
. This rule also applies to field values and dictionary keys. However, for key-value pairs matching, partial match patterns using *
are not supported. If any key in the key-value pairs contains *
, all values in that pair are ignored. For example, "A.field_name([*abb=1,b=2])"
behaves the same as "A.field_name([*abb,b])"
. Both field names and values in key-value pairs cannot contain the *
symbol; thus, "node_tag.*partial_name = c"
or "A.field_name([a=*b , b=2])"
are not supported, but "node_tag.field_name = "*ca"
is allowed.
If the field value or key is marked with "::regex"
it is interpreted as a regular expression, the main rule here is that thus marked pattern should be matched if julia match(::Regex,str)
returns not nothing
. To use regular expression for tag search it should be embraced in {}
or []
. For instance, matcher string token for nodes containing digits in their tag field and attributes field dictionary with key id
will be "[ [\d]::regex ].attributes([id])"
XMLWalker.chain_string_to_matchers
— Functionchain_string_to_matchers(s::AbstractString,root_field_name::SymbolOrNothing=:tag)
Converts chain string with multiple tokens to Matchers vector according to chain_string_specification
XMLWalker.chain_string_token_to_matcher
— Functionchain_string_token_to_matcher(s::String,field_name::SymbolOrNothing=DEFAULT_ROOT_TAG)
Converts single token to a matcher object, for string specification see chain_string_specification
XMLWalker.convert_args_vector_to_regex_vector
— Methodconvert_args_vector_to_regex_vector(input::AbstractVector)
Converts the vector of args to regex, removes *
and ::regex
if any
XMLWalker.convert_braced_arguments_to_matcher
— Methodconvert_braced_arguments_to_matcher(args::AbstractString,field_name::SymbolOrNothing; as_keys::Bool=false)
Input variables: args - string to be converted, e.g. "{a,b,c}" fieldname - field name of the matcher askeys - if true all args are interpreted as keys of key-value pairs of field_name
dictionary
Braced arguments like {a,b,c}
, [c,d,f]
are converted to matcher with the field name field_name
as_keys
flag means that arges are interpreted as elements contained in dictionary field_name
, otherwise as - values the field field_name
should match itself.
struct A tag end
matcher = convert_braced_arguments_to_matcher("[a,b,c]",:tag,as_keys=false)
matcher(A("a")) # true
matcher(A(["a","b"])) # true
matcher = convert_braced_arguments_to_matcher("[a,b,c]",:tag,as_keys=true)
matcher(A(Dict("a"=>1)))) # true
matcher = convert_braced_arguments_to_matcher("[a=1,a=3]",:tag,as_keys=true)
matcher(A(Dict("a"=>2)))) # false because key-value doesnt matches any of `a=>1.0` or `a=>3`
If any of embraced patterns contain ::regex
or *
symbol all of them are forced to be converted to regular expressions, if they are key-value pairs values are ignored
matcher = convert_braced_arguments_to_matcher("[*a=1]",:tag,as_keys=true)
Returned matcher can be used to find nodes see find_nodes
XMLWalker.extract_embraced_args_square_or_curl
— Methodextract_embraced_args_square_or_curl(s;convert_to_regex::Bool=false)
Switches embracer type ("{..}"
or "[...]"
) from the string itself and retrurns the vector of parsed values
XMLWalker.extract_field
— Methodextract_field(s)
Returns right part of string separated by non-digit dot .
XMLWalker.field_string_to_matcher
— Methodfield_string_to_matcher(s::AbstractString)
Converts field string to a single <:AbstractMatcher
object, field string can be "field_name = args1"
or "field_name(args2)"
In the first case, args1
can be a single value number or string or regex-style object viz [a,b,c] or {a,b,c}, where {}
or []
braces tell that all
or any
args values viz "a", "b" and "c" should be in field_name
field. All args1
elements must be separated by the comma ,
.
In the second case "field_name(args1)"
, the content of (...)
is interpreted as arguments which are the members of AbstractDict stored in field_name
, args2 must be embraced in {}
or []
. E.g. "field_name([a,b,c] )"
is interpreted as any of keys "a","b" and "c" must be among keys of
fieldnamedictionary, when args1 also contains
=, e.g.
args1 = [a=1,b=2,c=3]` this means key-value pairs specification, fieldname({a=1,b=2,c=3}) means that field_name
dictionary must contain the following key-value pairs: "a"=>1,"b"=>2,"c"=>3
XMLWalker.find_nodes
— Functionfind_nodes(starting_node, search_string::AbstractString, field_name::SymbolOrNothing = DEFAULT_ROOT_TAG)
First parses search_string
, than finds nodes. For string specification see chain_string_specification
. field_name
is the root field name, by default it is a tag
field. Root field name is the field, where the first part of string token is searched, e.g.
find_nodes(starting_node, "A.values = 5") # "A" will be recursively searched in `starting_node.tag` and in all children nodes of `starting_node` etc.
find_nodes(starting_node, "A.values = 5", :Name) # "A" will be recursively searched in `starting_node.Name` and in all children nodes of `starting_node` etc.
XMLWalker.find_nodes!
— Methodfind_nodes!(node_vector::Vector{T},node::T,matcher::AbstractMatcher) where T
Finds matching node and pushs it to the node_vector
XMLWalker.find_nodes
— Methodfind_nodes(starting_node::T, matcher::AbstractMatcher) where T
Finds nodes, that match the matcher object
XMLWalker.find_nodes_chain!
— Methodfind_nodes_chain!(node_vector::Vector{T},node::T,tag_chain::ChainMatcher,state::Int=1,first_node::Bool=true) where T
XMLWalker.find_nodes_chain
— Methodfind_nodes_chain(starting_node::T,xml_chain_string::String, field_name::SymbolOrNothing = DEFAULT_ROOT_TAG) where T
Finds nodes chain, for details of xmlchainstring see chain_string_specification
XMLWalker.has_digit_dot
— Methodhas_digit_dot(s::AbstractString)
Checks for any dot symbol .
, which is decimal digits separator
XMLWalker.has_nondigit_dot
— Methodhas_nondigit_dot(s::AbstractString)
Checks for any dot symbol .
, which is not decimal digits separator
XMLWalker.single_string_or_number_to_matcher
— Functionsingle_string_or_number_to_matcher(value::AbstractString,field_name::SymbolOrNothing=:tag)
Converts single string to matcher, this is a root function
XMLWalker.split_equality
— Methodsplit_equality(s::AbstractString)
Splits string by first equality symbol =
XMLWalker.split_tag_and_field_name
— Methodsplit_tag_and_field_name(s)
Splits string into two parts by non-digit dot .