Skip to content

Commit 5300ccf

Browse files
korydraughnalanking
authored andcommitted
[#182] Add support for GenQuery2 to genquery.py module.
1 parent 208ecb2 commit 5300ccf

2 files changed

Lines changed: 175 additions & 25 deletions

File tree

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,17 @@ The following arguments are strictly optional and may be used either in `Query`'
421421
* *limit*: `None` by default (ie "no limit"), this option can be an integer >= 1 if specified, and limits the returned row enumeration to the requested number of results. Often used with *offset*, as defined above.
422422
* *case-sensitive* is normally `True`. If it is set to `False`, the condition string will be uppercased, and the query will be executed without regard to case. This allows for more permissive matching on the names of resources, collections, data objects, etc.
423423
* *options* is a bitmask of extra options, and defaults to a value of 0. `genquery.Option.NO_DISTINCT` is one such extra option, as is RETURN_TOTAL_ROW_COUNT (although in the latter case, using the `Query` object's `row_count` method should be preferred.)
424+
* *parser* defines which GenQuery parser to use for querying the catalog. The following values are supported:
425+
- `Parser.GENQUERY1` configures the `Query` object to use GenQuery1 (i.e. the traditional parser). This is the default.
426+
- `Parser.GENQUERY2` configures the `Query` object to use GenQuery2. GenQuery2 is an experimental parser with several enhancements over GenQuery1.
427+
- When using the GenQuery2 parser, the following applies:
428+
- The `output` constructor parameter is ignored.
429+
- The `case_sensitive` constructor parameter is ignored.
430+
- The `options` constructor parameter is ignored.
431+
- The `total_rows` member function of the `Query` class always returns `None`.
432+
* *order_by* is a string holding the sorting instructions for a GenQuery2 query. The string must be a comma-delimited list of GenQuery2 columns (or expressions). For example, `order_by='COLL_NAME, DATA_NAME'`. For more examples, see [How do I sort data using the Query class and GenQuery2?](#how-do-i-sort-data-using-the-query-class-and-genquery2). Defaults to an empty string. Only recognized by the GenQuery2 parser.
433+
434+
When the processing of a GenQuery2 resultset is complete, it is best practice to call the `close()` member function. Doing this will instruct the server to immediately free any resources allocated to the `Query` object. This is extremely important when multiple `Query` objects are executed within a single rule.
424435
425436
## Questions and Answers
426437
@@ -502,3 +513,42 @@ Upon invoking `nrep_rule`, a message very similar to the one that follows will a
502513
"server_zone": "tempZone"
503514
}
504515
```
516+
517+
## How do I escape embedded single quotes using the Query class and GenQuery2?
518+
519+
GenQuery2 provides two ways to escape embedded single quotes. They are as follows:
520+
- Use a single quote to escape a single quote as defined by the SQL standard
521+
- Or, use the hexadecimal encoding
522+
523+
Below, we demonstrate each escape mechanism on the data object name, `'_quotes_around_me_'`.
524+
```python
525+
from genquery import Query, Parser
526+
row = Query(callback, 'DATA_NAME', "DATA_NAME = '''_quotes_around_me_\\x27'", parser=Parser.GENQUERY2).first()
527+
callback.writeLine('serverLog', f'row = {row}')
528+
```
529+
530+
Notice the use of `\\x` rather than `\x` for the second single quote. This is required to keep Python from interpreting the hexadecimal value.
531+
532+
## How do I sort data using the Query class and GenQuery2?
533+
534+
Use the `order_by` parameter. Two examples are provided below.
535+
536+
Assume we have five collections and we want to sort them by the length of their logical path, but in descending order.
537+
538+
```python
539+
from genquery import Query, Parser
540+
for row in Query(callback, 'COLL_NAME, length(COLL_NAME)', order_by='length(COLL_NAME) desc', parser=Parser.GENQUERY2):
541+
callback.writeLine('serverLog', f'row = {row}')
542+
```
543+
544+
And here's a second example demonstrating how to sort using multiple columns.
545+
546+
```python
547+
from genquery import Query, Parser
548+
for row in Query(callback, 'COLL_NAME, DATA_NAME', order_by='COLL_NAME desc, DATA_NAME', parser=Parser.GENQUERY2):
549+
callback.writeLine('serverLog', f'row = {row}')
550+
```
551+
552+
The string passed to the `order_by` parameter instructs the database to do the following:
553+
- First, sort by collection name in descending order
554+
- Second, sort the data names in each collection in ascending order

genquery.py

Lines changed: 125 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import itertools
22
import re
33
from collections import OrderedDict
4+
from enum import Enum
5+
from irods_errors import END_OF_RESULTSET
46

57
def AUTO_CLOSE_QUERIES(): return True
68

@@ -15,6 +17,9 @@ def AUTO_CLOSE_QUERIES(): return True
1517

1618
MAX_SQL_ROWS = 256
1719

20+
# An optimization for the GenQuery2 implementation.
21+
_END_OF_RESULTSET_ERROR_STRING_PART = f':{END_OF_RESULTSET}]'
22+
1823
class Option(object):
1924
"""iRODS QueryInp option flags - used internally.
2025
@@ -36,6 +41,11 @@ class AS_DICT (row_return_type): pass
3641
class AS_LIST (row_return_type): pass
3742
class AS_TUPLE (row_return_type): pass
3843

44+
class Parser(Enum):
45+
"""Available GenQuery parsers."""
46+
GENQUERY1 = 1
47+
GENQUERY2 = 2 # Experimental.
48+
3949
class GenQuery_Options_Spec_Error(RuntimeError): pass
4050
class GenQuery_Columns_Type_Error(RuntimeError): pass
4151
class GenQuery_Row_Return_Type_Error(RuntimeError): pass
@@ -52,6 +62,19 @@ class Query(object):
5262
:param limit: (optional) maximum amount of results, can be used for pagination
5363
:param case_sensitive: (optional) set this to False to make the entire where-clause case insensitive
5464
:param options: (optional) other OR-ed options to pass to the query (see the Option type above)
65+
:param parser: (optional) the GenQuery engine to use. Defaults to Parser.GENQUERY1
66+
:param order_by: (optional) order-by clause, as a string. Defaults to an empty string. Only recognized by the GenQuery2 parser
67+
68+
GenQuery2 parser:
69+
70+
This is an experimental parser and may change in the future.
71+
72+
When used, the following applies:
73+
- The "output", "case_sensitive", and "options" constructor parameters are ignored
74+
- The "total_rows()" member function always returns None
75+
76+
Some features of GenQuery2 cannot be expressed via this interface (e.g. GROUP BY). If
77+
those features are needed, use the GenQuery2 microservices directly.
5578
5679
Getting the total row count:
5780
@@ -91,7 +114,7 @@ class Query(object):
91114
print('name: {}/{} - owned by {}'.format(*x))
92115
"""
93116

94-
__parameter_names = tuple('columns,conditions,output,offset,limit,case_sensitive,options'.split(','))
117+
__parameter_names = tuple('columns,conditions,output,offset,limit,case_sensitive,options,parser,order_by'.split(','))
95118
__non_whitespace = re.compile('\S+')
96119

97120
def __init__(self,
@@ -102,7 +125,9 @@ def __init__(self,
102125
offset=0,
103126
limit=None,
104127
case_sensitive=True,
105-
options=0):
128+
options=0,
129+
parser=Parser.GENQUERY1,
130+
order_by=''):
106131

107132
self.callback = callback
108133

@@ -116,6 +141,9 @@ def __init__(self,
116141
if not isinstance (columns, list):
117142
raise GenQuery_Columns_Type_Error("'columns' could not be coerced to list type")
118143

144+
if parser not in [Parser.GENQUERY1, Parser.GENQUERY2]:
145+
raise ValueError('Invalid value for [parser]. Expected Parser.GENQUERY1 or Parser.GENQUERY2.')
146+
119147
# Options as specified
120148
self.columns = columns # - via 2nd argument to ctor; or copy() 'columns' keyword option
121149
self.conditions = conditions
@@ -124,6 +152,9 @@ def __init__(self,
124152
self.limit = limit
125153
self.case_sensitive = case_sensitive
126154
self.options = options
155+
self.parser = parser
156+
self.gq2_handle = None
157+
self.order_by = order_by
127158

128159
# The conditions string used in query (possibly uppercased). Appears in SQL-ish str(self) but not repr(self)
129160
self.conditions_for_exec = conditions
@@ -163,6 +194,24 @@ def copy(self,**options):
163194

164195
def exec_if_not_yet_execed(self):
165196
"""Query execution is delayed until the first result or total row count is requested."""
197+
if self.parser == Parser.GENQUERY2:
198+
# The presence of a GenQuery2 handle indicates the query has already been executed.
199+
# Therefore, the results are already available for processing. If the query must be
200+
# executed again, a new Query object must be used.
201+
if self.gq2_handle is not None:
202+
return
203+
204+
query_string = f'select {", ".join(self.columns)}'
205+
206+
if self.conditions: query_string += f' where {self.conditions}'
207+
if self.order_by: query_string += f' order by {self.order_by}'
208+
if self.offset > 0: query_string += f' offset {self.offset}'
209+
if self.limit is not None: query_string += f' limit {self.limit}'
210+
211+
ret = self.callback.msi_genquery2_execute('', query_string)
212+
self.gq2_handle = ret['arguments'][0]
213+
return
214+
166215
if self.gqi is not None:
167216
return
168217
if self.options & Option.UPPER_CASE_WHERE:
@@ -197,30 +246,59 @@ def total_rows(self):
197246
"""Returns the total amount of rows matching the query.
198247
199248
This includes rows that are omitted from the result due to limit/offset parameters.
249+
250+
GenQuery2 does not automatically count the total number of rows in a resultset. Users
251+
are expected to execute another query to determine that. For that reason, this function
252+
always returns None when GenQuery2 is used.
200253
"""
201254
if self._total is None:
202-
if self.offset == 0 and self.options & Option.RETURN_TOTAL_ROW_COUNT:
203-
# Easy mode: Extract row count from gqo.
204-
self.exec_if_not_yet_execed()
205-
self._total = self.gqo.totalRowCount
206-
else:
207-
# Hard mode: for some reason, using PostgreSQL, you cannot get
208-
# the total row count when an offset is supplied.
209-
# When RETURN_TOTAL_ROW_COUNT is set in combination with a
210-
# non-zero offset, iRODS solves this by executing the query
211-
# twice[1], one time with no offset to get the row count.
212-
# Apparently this does not work (we get the correct row count, but no rows).
213-
# So instead, we run the query twice manually. This should
214-
# perform only slightly worse.
215-
# [1]: https://github.com/irods/irods/blob/4.2.6/plugins/database/src/general_query.cpp#L2393
216-
self._total = Query(self.callback, self.columns, self.conditions, limit=0,
217-
options=self.options|Option.RETURN_TOTAL_ROW_COUNT).total_rows()
255+
if self.parser == Parser.GENQUERY1:
256+
if self.offset == 0 and self.options & Option.RETURN_TOTAL_ROW_COUNT:
257+
# Easy mode: Extract row count from gqo.
258+
self.exec_if_not_yet_execed()
259+
self._total = self.gqo.totalRowCount
260+
else:
261+
# Hard mode: for some reason, using PostgreSQL, you cannot get
262+
# the total row count when an offset is supplied.
263+
# When RETURN_TOTAL_ROW_COUNT is set in combination with a
264+
# non-zero offset, iRODS solves this by executing the query
265+
# twice[1], one time with no offset to get the row count.
266+
# Apparently this does not work (we get the correct row count, but no rows).
267+
# So instead, we run the query twice manually. This should
268+
# perform only slightly worse.
269+
# [1]: https://github.com/irods/irods/blob/4.2.6/plugins/database/src/general_query.cpp#L2393
270+
self._total = Query(self.callback, self.columns, self.conditions, limit=0,
271+
options=self.options|Option.RETURN_TOTAL_ROW_COUNT).total_rows()
218272

219273
return self._total
220274

221275
def __iter__(self):
222276
self.exec_if_not_yet_execed()
223277

278+
if self.parser == Parser.GENQUERY2:
279+
column_count = len(self.columns)
280+
281+
while True:
282+
try:
283+
self.callback.msi_genquery2_next_row(self.gq2_handle)
284+
except RuntimeError as e:
285+
if _END_OF_RESULTSET_ERROR_STRING_PART in str(e):
286+
break
287+
raise
288+
289+
row = []
290+
for c in range(column_count):
291+
ret = self.callback.msi_genquery2_column(self.gq2_handle, str(c), '')
292+
row.append(ret['arguments'][2])
293+
294+
try:
295+
yield row
296+
except GeneratorExit:
297+
self._close()
298+
return
299+
300+
return
301+
224302
row_i = 0
225303

226304
# Iterate until all rows are fetched / the query is aborted.
@@ -253,13 +331,33 @@ def __iter__(self):
253331
self._fetch()
254332

255333
def _fetch(self):
256-
"""Fetch the next batch of results"""
334+
"""Fetch the next batch of results.
335+
336+
Not supported by GenQuery2.
337+
"""
338+
if self.parser == Parser.GENQUERY2:
339+
# GenQuery2 always returns the full resultset. For that reason, this
340+
# function is a no-op. Care must be taken to not exhaust the server's
341+
# available memory.
342+
return
343+
257344
ret = self.callback.msiGetMoreRows(self.gqi, self.gqo, 0)
258345
self.gqo = ret['arguments'][1]
259346
self.cti = ret['arguments'][2]
260347

261348
def _close(self):
262-
"""Close the query (prevents filling the statement table)."""
349+
"""Close the query.
350+
351+
When GenQuery1 is used, prevents filling the statement table.
352+
When GenQuery2 is used, closes the resultset.
353+
"""
354+
if self.parser == Parser.GENQUERY2:
355+
if self.gq2_handle is not None:
356+
self.callback.msi_genquery2_free(self.gq2_handle)
357+
self.gq2_handle = None
358+
self.order_by = ''
359+
return
360+
263361
if not self.cti:
264362
return
265363

@@ -290,13 +388,15 @@ def first(self):
290388
return result
291389

292390
def __str__(self):
293-
return 'select {}{}{}{}'.format(', '.join(self.columns),
294-
' where '+self.conditions_for_exec if self.conditions_for_exec else '',
295-
' limit '+str(self.limit) if self.limit is not None else '',
296-
' offset '+str(self.offset) if self.offset else '')
391+
projections = ', '.join(self.columns)
392+
where_clause = ' where ' + self.conditions_for_exec if self.conditions_for_exec else ''
393+
order_by_clause = ' order by ' + self.order_by if self.order_by else ''
394+
limit_clause = ' limit ' + str(self.limit) if self.limit is not None else ''
395+
offset_clause = ' offset '+ str(self.offset) if self.offset else ''
396+
return f'select {projections}{where_clause}{order_by_clause}{limit_clause}{offset_clause} [parser={self.parser.name}]'
297397

298398
def __del__(self):
299-
"""Auto-close query on when Query goes out of scope."""
399+
"""Auto-close query when Query goes out of scope."""
300400
self._close()
301401

302402

0 commit comments

Comments
 (0)