1
0
Форкнуть 0

Handle incoming special meta parsing (`^`), and clean up reader macro (`#`) handling as well (#34)

This commit is contained in:
adampauls 2021-06-09 12:18:10 -07:00 коммит произвёл GitHub
Родитель 571396db25
Коммит 3eae57b031
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
23 изменённых файлов: 721 добавлений и 381 удалений

175
README-LISPRESS-1.0.md Normal file
Просмотреть файл

@ -0,0 +1,175 @@
This is an outdated description of Lispress ("Lispress 1.0"), left here to document the SMCalFLow
1.x datasets. For the more current description of Lispress,
ee [this REAMDE](README-LISPRESS.md).
# Lispress
*Lispress* is a lisp-like serialization format for programs.
It is intended to be human-readable, easy to work with in Python, and easy to
tokenize and predict with a standard seq2seq model.
Here is an example program in Lispress (a response to the utterance
`"what is my appointment with janice kang"`):
```clojure
(yield
(:id
(singleton
(:results
(FindEventWrapperWithDefaults
:constraint (StructConstraint[Event]
:attendees (AttendeeListHasRecipientConstraint
:recipientConstraint (RecipientWithNameLike
:constraint (StructConstraint[Recipient])
:name #(PersonName "janice kang")))))))))
```
## Syntax
A Lispress program is an s-expression: either
a bare symbol, or
a whitespace-separated list of s-expressions, surrounded by parentheses.
### Values
Value literals are represented with a hash character followed by an
s-expression containing the name of the schema (i.e. type) of the data, followed by a
json-encoded string literal of the data surrounded by double-quotes.
For example: `#(PersonName "janice kang")`.
A `Number` may omit the double-quotes, e.g. `#(Number 4)`.
### Function application
The most common form in Lispress is a function applied to zero or more
arguments.
Function application expressions are lists,
with the first element of the list denoting the function,
and the remainder of the elements denoting its arguments.
There are two kinds of function application:
#### Named arguments
If the name of a function begins with a capitalized letter (`[A-Z]`),
then it accepts named arguments (and only named arguments).
The name of each named argument is prefixed with a colon character,
and named arguments are written after the function as alternating
`:name value` pairs.
Named arguments can be given in any order (when rendering, we alphabetize named arguments).
For example, in
```clojure
(DateAtTimeWithDefaults
:date (Tomorrow)
:time (NumberAM :number #(Number 10))
```
the `DateAtTimeWithDefaults` function is a applied to two named arguments.
`(Tomorrow)` is passed to the function as the `date` argument, and
`(NumberAM :number #(Number 10)` is passed in as the `time` argument.
`(Tomorrow)` is an example of a function applied to zero named arguments.
Some functions accepting named arguments may not require all arguments to be present.
You will often see the `StructConstraint[Event]` function being called without
a `:subject` or an `:end`, for example.
#### Positional arguments
If the name of a function does not begin with a capitalized letter
(i.e. it is lowercase or symbolic), then it accepts positional
arguments (and only positional arguments).
For example,
```clojure
(?= #(String "soccer game"))
```
represents the function `?=` being
applied to the single argument `#(String "soccer game")`.
And `(toDays #(Number 10))` is the function `toDays` applied to the single
argument `#(Number 10)`.
### Sugared `get`
There is a common construct in our programs where the `get` function
retrieves a field (specified by a `Path`) from a structured object.
For example,
```clojure
(get
(refer (StructConstraint[Event]))
#(Path "attendees"))
```
returns the `attendees` field of the salient `Event`.
When the path is a valid identifier (i.e. contains no whitespace or special
characters), the following sugared version is equivalent and preferred:
```clojure
(:attendees
(refer (StructConstraint[Event])))
```
### Variable binding with `let`
To use a value more than once, it can be given a variable name using a `let`
binding.
A `let` binding is a list with three elements,
- the keyword `let`,
- a "binding" list containing alternating `variableName variableValue` pairs, and
- a program body, in which variable names bound in the previous form can be
referenced.
For example, in the following response to `"Can you find some past events on my calendar?"`,
```clojure
(let
(x0 (Now))
(yield
(FindEventWrapperWithDefaults
:constraint (EventOnDateBeforeTime
:date (:date x0)
:event (StructConstraint[Event])
:time (:time x0)))))
```
the variable `x0` is assigned the value `(Now)` and then used twice in the body.
Note that `(Now)` is only evaluated once.
`let` bindings are an important mechanism to reuse the result of a
side-effecting computation.
For example, depending on the implementation of `Now`, the
following program may be referencing different values in the `:date` and `:time` fields:
```clojure
(FindEventWrapperWithDefaults
:constraint (EventOnDateBeforeTime
:date (:date (Now))
:event (StructConstraint[Event])
:time (:time (Now)))))
```
### Performing multiple actions in a turn with `do`
Two or more statements can be sequenced using the `do` keyword.
Each statement in a `do` form is fully interpreted and executed before any following
statements are.
In
```clojure
(do
(ConfirmAndReturnAction)
(yield
(:start
(FindNumNextEvent
:constraint (StructConstraint[Event])
:number #(Number 1)))))
```
for example, `ConfirmAndReturnAction` is guaranteed to execute before `FindNumNextEvent`.
## Code
Code for parsing and rendering Lispress is in the `dataflow.core.lispress`
package.
`parse_lispress` converts a string into a `Lispress` object, which is a nested
list-of-lists with `str`s as leaves.
`render_compact` renders `Lispress` on a single line (used in our `jsonl` data
files), and `render_pretty` renders with indentation, which is easier to read.
`lispress_to_program` and `program_to_lispress` convert to and from a `Program` object,
which is closer to a computation DAG (rather than an abstract syntax tree), and
is sometimes more convenient to work with.

Просмотреть файл

@ -2,22 +2,24 @@
*Lispress* is a lisp-like serialization format for programs.
It is intended to be human-readable, easy to work with in Python, and easy to
tokenize and predict with a standard seq2seq model.
tokenize and predict with a standard seq2seq model. An older version, Lispress 1.0,
is described in [this README](README-LISPRESS-1.0.md). The current code is backwards
compatible with Lispress 1.0 programs.
Here is an example program in Lispress (a response to the utterance
`"what is my appointment with janice kang"`):
```clojure
(yield
(:id
(singleton
(:results
(FindEventWrapperWithDefaults
:constraint (StructConstraint[Event]
:attendees (AttendeeListHasRecipientConstraint
:recipientConstraint (RecipientWithNameLike
:constraint (StructConstraint[Recipient])
:name #(PersonName "janice kang")))))))))
(Yield
(Event.id
(singleton
(QueryEventResponse.results
(FindEventWrapperWithDefaults
(Event.attendees?
(AttendeeListHasRecipientConstraint
(RecipientWithNameLike
(^(Recipient) EmptyStructConstraint)
(PersonName.apply "janice kang")))))))))
```
@ -25,15 +27,43 @@ Here is an example program in Lispress (a response to the utterance
A Lispress program is an s-expression: either
a bare symbol, or
a whitespace-separated list of s-expressions, surrounded by parentheses.
### Values
Value literals are represented with a hash character followed by an
s-expression containing the name of the schema (i.e. type) of the data, followed by a
json-encoded string literal of the data surrounded by double-quotes.
For example: `#(PersonName "janice kang")`.
A `Number` may omit the double-quotes, e.g. `#(Number 4)`.
a whitespace-separated list of s-expressions, surrounded by parentheses. There is a little
bit of special syntax:
* Strings surrounded by double-quotes (`"`) are treated parsed as a single are symbol
(including the quotes), with standard JSON escaping for strings. For example,
```clojure
(MyFunc "this is a (quoted) string with a \" in it")
```
will pass the symbol `"this is a (quoted) string with a \" in it"` to `MyFunc`.
Note that when converting to a Program, we trim the whitespace from either side of a
string, so `(MyFunc " a ")` and `(MyFunc "a")` are the same program.
* The meta character (`^`)
([borrowed from Clojure](https://clojure.org/reference/metadata))
can be used for type ascriptions and type arguments. For example,
```clojure
^Number 1
```
would be written as `1: Number` in Scala. A list marked by the meta character
in the first argument of an s-expression is interpreted as a list of type arguments.
For example,
```clojure
(^(Number) MyFunc 1)
```
would be written as `MyFunc[Number](1)` in Scala or `MyFunc<Number>(1)` in Swift and Rust.
* (Deprecated) The reader macro character (`#`),
[borrowed from Common Lisp](https://gist.github.com/chaitanyagupta/9324402)
marks literal values.
For example, `#(PersonName "John")` marks a value of type `PersonName` with
content `"John"`. Reader macros are no longer in Lispress 2.0. Instead,
standard literals like booleans, longs, numbers, and strings, can be written directly,
while wrapper types (like `PersonName`) feature an explicit call to a constructor
like `PersonName.apply`. The current code will interpret Lispress 1.0
`Number`s and `String`s as their bare equivalents, so `#(String "foo")` and `"foo"`
will be interpreted as the same program. Similarly, `#(Number 1)` and `1` will
be interpreted as the same program, and `#(Boolean true)` and `true` are the same
program.
* Literals of type Long are written as an integer literal followed by an `L` (e.g. `12L`)
as in Java/Scala.
### Function application
@ -42,47 +72,14 @@ arguments.
Function application expressions are lists,
with the first element of the list denoting the function,
and the remainder of the elements denoting its arguments.
There are two kinds of function application:
We follow Common Lisp and Clojure in using `:` to denote named arguments. For example,
`(MyFunc :foo 1)` would be `MyFunc(foo = 1)` in Scala or Python. At present, functions
must either be entirely positional or entirely named, and only functions with an
uppercase letter for the first character may take named arguments.
#### Named arguments
If the name of a function begins with a capitalized letter (`[A-Z]`),
then it accepts named arguments (and only named arguments).
The name of each named argument is prefixed with a colon character,
and named arguments are written after the function as alternating
`:name value` pairs.
Named arguments can be given in any order (when rendering, we alphabetize named arguments).
### (Deprecated) Sugared `get`
For example, in
```clojure
(DateAtTimeWithDefaults
:date (Tomorrow)
:time (NumberAM :number #(Number 10))
```
the `DateAtTimeWithDefaults` function is a applied to two named arguments.
`(Tomorrow)` is passed to the function as the `date` argument, and
`(NumberAM :number #(Number 10)` is passed in as the `time` argument.
`(Tomorrow)` is an example of a function applied to zero named arguments.
Some functions accepting named arguments may not require all arguments to be present.
You will often see the `StructConstraint[Event]` function being called without
a `:subject` or an `:end`, for example.
#### Positional arguments
If the name of a function does not begin with a capitalized letter
(i.e. it is lowercase or symbolic), then it accepts positional
arguments (and only positional arguments).
For example,
```clojure
(?= #(String "soccer game"))
```
represents the function `?=` being
applied to the single argument `#(String "soccer game")`.
And `(toDays #(Number 10))` is the function `toDays` applied to the single
argument `#(Number 10)`.
### Sugared `get`
There is a common construct in our programs where the `get` function
There is a common construct in the SMCalFLow 1.x dataset where the `get` function
retrieves a field (specified by a `Path`) from a structured object.
For example,
```clojure
@ -91,11 +88,15 @@ For example,
#(Path "attendees"))
```
returns the `attendees` field of the salient `Event`.
When the path is a valid identifier (i.e. contains no whitespace or special
characters), the following sugared version is equivalent and preferred:
For backwards compatibility with Lispress 1.0, the parser will accept
the following equivalent form.
```clojure
(:attendees
(refer (StructConstraint[Event])))
(:attendees (refer (StructConstraint[Event])))
```
In updated Lispress, accessor functions contain the name of the type they access:
```clojure
(Event.attendees (refer (^(Event) StructConstraint)))
```
@ -113,27 +114,27 @@ referenced.
For example, in the following response to `"Can you find some past events on my calendar?"`,
```clojure
(let
(x0 (Now))
(yield
(FindEventWrapperWithDefaults
:constraint (EventOnDateBeforeTime
:date (:date x0)
:event (StructConstraint[Event])
:time (:time x0)))))
(let
(x0 (Now))
(Yield
(FindEventWrapperWithDefaults
(EventOnDateBeforeTime
(DateTime.date x0)
(^(Event) EmptyStructConstraint)
(DateTime.time x0)))))
```
the variable `x0` is assigned the value `(Now)` and then used twice in the body.
Note that `(Now)` is only evaluated once.
`let` bindings are an important mechanism to reuse the result of a
side-effecting computation.
For example, depending on the implementation of `Now`, the
following program may be referencing different values in the `:date` and `:time` fields:
following program may be produce different values in the `:date` and `:time` fields:
```clojure
(FindEventWrapperWithDefaults
:constraint (EventOnDateBeforeTime
:date (:date (Now))
:event (StructConstraint[Event])
:time (:time (Now)))))
(FindEventWrapperWithDefaults
(EventOnDateBeforeTime
(DateTime.date (Now)))
(^(Event) EmptyStructConstraint)
(DateTime.time (Now)))))
```
### Performing multiple actions in a turn with `do`
@ -146,10 +147,10 @@ In
(do
(ConfirmAndReturnAction)
(yield
(:start
(Event.start
(FindNumNextEvent
:constraint (StructConstraint[Event])
:number #(Number 1)))))
(^(Event) StructConstraint)
1L))))
```
for example, `ConfirmAndReturnAction` is guaranteed to execute before `FindNumNextEvent`.

Просмотреть файл

@ -3,7 +3,6 @@ python_version = 3.7
incremental = False
strict_optional = False
mypy_path=./src/:./tests/
plugins = pydantic.mypy
[mypy-pytest.*,_pytest.*,jsons.*,more_itertools.*,tqdm.*,glob2.*,sexpdata.*]
ignore_missing_imports = True
@ -16,9 +15,3 @@ ignore_missing_imports = True
[mypy-onmt.*,torch.*]
ignore_missing_imports = True
[pydantic-mypy]
init_forbid_extra = True
init_typed = True
warn_required_dynamic_aliases = True
warn_untyped_fields = True

Просмотреть файл

@ -15,7 +15,6 @@ setup(
zip_safe=False,
install_requires=[
"jsons==0.10.1",
"pydantic==1.4",
"more-itertools==8.2.0",
"sexpdata==0.0.3",
"pandas==1.0.0",
@ -25,5 +24,5 @@ setup(
extra_requires={
"OpenNMT-py": ["OpenNMT-py==1.0.0", "pytorch>=1.2.0,<=1.4.0"]
},
python_requires=">=3.6",
python_requires=">=3.7",
)

Просмотреть файл

@ -11,11 +11,11 @@ import dataclasses
import json
import os
import re
from dataclasses import dataclass
from typing import Dict, List, Tuple
import numpy as np
import pandas as pd
from pydantic.dataclasses import dataclass
from dataflow.core.dialogue import Dialogue, Turn, TurnId
from dataflow.core.io import load_jsonl_file, save_jsonl_file

Просмотреть файл

@ -1,9 +1,8 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from dataclasses import dataclass
from typing import List, Optional
from pydantic.dataclasses import dataclass
from dataflow.core.linearize import lispress_to_seq
from dataflow.core.lispress import lispress_to_program, parse_lispress
from dataflow.core.program import Program

Просмотреть файл

@ -23,7 +23,7 @@ def load_jsonl_file(
desc = f"Reading {cls} from {data_jsonl}"
else:
desc = None
with open(data_jsonl) as fp:
with open(data_jsonl, encoding="utf-8") as fp:
for line in tqdm(
fp, desc=desc, unit=unit, dynamic_ncols=True, disable=not verbose
):

Просмотреть файл

@ -1,9 +1,11 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import json
import re
from collections import Counter
from dataclasses import replace
from json import JSONDecodeError, loads
from typing import Dict, List, Set, Tuple
from typing import Dict, List, Optional, Set, Tuple
from more_itertools import chunked
@ -13,6 +15,7 @@ from dataflow.core.program import (
Expression,
Op,
Program,
TypeName,
ValueOp,
)
from dataflow.core.program_utils import DataflowFn, Idx, OpType, get_named_args
@ -21,6 +24,7 @@ from dataflow.core.program_utils import (
is_struct_op_schema,
mk_call_op,
mk_struct_op,
mk_type_name,
mk_value_op,
unwrap_idx_str,
)
@ -41,6 +45,7 @@ VAR_PREFIX = "x"
NAMED_ARG_PREFIX = ":"
# values are rendered as `#(MySchema "json_dump_of_my_value")`
VALUE_CHAR = "#"
META_CHAR = "^"
# Lispress has lisp syntax, and we represent it as an s-expression
Lispress = Sexp
@ -54,40 +59,44 @@ def try_round_trip(lispress_str: str) -> str:
If it is not valid, returns the original string unmodified.
"""
try:
# round-trip to canonicalize
lispress = parse_lispress(lispress_str)
program, _ = lispress_to_program(lispress, 0)
round_tripped = program_to_lispress(program)
def normalize_numbers(exp: Lispress) -> "Lispress":
if isinstance(exp, str):
try:
num = float(exp)
return f"{num:.1f}"
except ValueError:
return exp
else:
return [normalize_numbers(e) for e in exp]
def strip_copy_strings(exp: Lispress) -> "Lispress":
if isinstance(exp, str):
if len(exp) > 2 and exp[0] == '"' and exp[-1] == '"':
return '"' + exp[1:-1].strip() + '"'
else:
return exp
else:
return [strip_copy_strings(e) for e in exp]
return render_compact(strip_copy_strings(normalize_numbers(round_tripped)))
return _try_round_trip(lispress_str)
except Exception: # pylint: disable=W0703
return lispress_str
def _try_round_trip(lispress_str: str) -> str:
# round-trip to canonicalize
lispress = parse_lispress(lispress_str)
program, _ = lispress_to_program(lispress, 0)
round_tripped = program_to_lispress(program)
def normalize_numbers(exp: Lispress) -> "Lispress":
if isinstance(exp, str):
try:
num = float(exp)
return f"{num:.1f}"
except ValueError:
return exp
else:
return [normalize_numbers(e) for e in exp]
def strip_copy_strings(exp: Lispress) -> "Lispress":
if isinstance(exp, str):
if len(exp) > 2 and exp[0] == '"' and exp[-1] == '"':
return '"' + exp[1:-1].strip() + '"'
else:
return exp
else:
return [strip_copy_strings(e) for e in exp]
return render_compact(strip_copy_strings(normalize_numbers(round_tripped)))
def program_to_lispress(program: Program) -> Lispress:
""" Converts a Program to Lispress. """
unsugared = _program_to_unsugared_lispress(program)
sugared_gets = _sugar_gets(unsugared)
return _strip_extra_parens_around_values(sugared_gets)
return sugared_gets
def lispress_to_program(lispress: Lispress, idx: Idx) -> Tuple[Program, Idx]:
@ -96,8 +105,7 @@ def lispress_to_program(lispress: Lispress, idx: Idx) -> Tuple[Program, Idx]:
Returns the last id used along with the Program.
"""
desugared_gets = _desugar_gets(lispress)
with_parens_around_values = _add_extra_parens_around_values(desugared_gets)
return _unsugared_lispress_to_program(with_parens_around_values, idx)
return _unsugared_lispress_to_program(desugared_gets, idx)
def render_pretty(lispress: Lispress, max_width: int = 60) -> str:
@ -118,7 +126,6 @@ def render_pretty(lispress: Lispress, max_width: int = 60) -> str:
(Constraint[Recipient])
#(PersonName "Elaine")))))))
"""
lispress = _render_value_expressions(lispress)
result = "\n".join(_render_lines(sexp=lispress, max_width=max_width))
return result
@ -131,7 +138,7 @@ def render_compact(lispress: Lispress) -> str:
>>> print(render_compact(lispress))
(describe (:start (findNextEvent (Constraint[Event] :attendees (attendeeListHasRecipientConstraint (recipientWithNameLike (Constraint[Recipient]) #(PersonName "Elaine")))))))
"""
return sexp_to_str(_render_value_expressions(lispress))
return sexp_to_str(lispress)
def parse_lispress(s: str) -> Lispress:
@ -151,7 +158,7 @@ def parse_lispress(s: str) -> Lispress:
>>> parse_lispress(s)
['describe', [':start', ['findNextEvent', ['Constraint[Event]', ':attendees', ['attendeeListHasRecipientConstraint', ['recipientWithNameLike', ['Constraint[Recipient]'], '#', ['PersonName', '"Elaine"']]]]]]]
"""
return parse_sexp(s, clean_singletons=False)[0]
return parse_sexp(s)
def _group_named_args(lines: List[str]) -> List[str]:
@ -172,41 +179,6 @@ def _group_named_args(lines: List[str]) -> List[str]:
return result
def _render_value_expressions(sexp: Sexp) -> Sexp:
"""
Finds Value sub-expressions within `sexp` and replaces them in place
with their rendered str.
This ensures that values are always atomically rendered on the same line,
and also allows us to render "#(" without a space between them.
"""
if isinstance(sexp, str):
return sexp
else:
result: List[Lispress] = []
i = 0
while i < len(sexp):
s = sexp[i]
if s == VALUE_CHAR and i + 1 < len(sexp):
# merge "#" and the following (rendered) subexpression
result.append(VALUE_CHAR + render_compact(sexp[i + 1]))
i += 2
else:
result.append(_render_value_expressions(s))
i += 1
# special-case top-level values because we can't strip out the last level
# of parens in the Sexp:
if (
isinstance(result, list)
# value has been turned into a single str already here by _render_value_expressions
and len(result) == 1
and isinstance(result[0], str)
and result[0].startswith(VALUE_CHAR)
):
return result[0]
return result
def _render_lines(sexp: Lispress, max_width: int) -> List[str]:
"""Helper function for `render_pretty`."""
compact = render_compact(sexp)
@ -214,14 +186,31 @@ def _render_lines(sexp: Lispress, max_width: int) -> List[str]:
return [compact]
else:
fn, *args = sexp
prefix = " " * NUM_INDENTATION_SPACES
fn_line = LEFT_PAREN + render_compact(fn)
arg_lines = _group_named_args(
[line for arg in args for line in _render_lines(arg, max_width=max_width)]
)
lines = [fn_line] + [prefix + line for line in arg_lines]
lines[-1] = lines[-1] + RIGHT_PAREN
return lines
if fn == VALUE_CHAR:
assert len(args) == 1, "# Value expressions must have one argument"
lines = _render_lines(args[0], max_width=max_width)
lines[0] = VALUE_CHAR + lines[0]
return lines
elif fn == META_CHAR:
assert len(args) == 2, "^ Meta expressions must have one argument"
lines = _render_lines(args[0], max_width=max_width)
lines.extend(_render_lines(args[1], max_width=max_width))
lines[0] = META_CHAR + lines[0]
return lines
else:
prefix = " " * NUM_INDENTATION_SPACES
fn_lines = _render_lines(fn, max_width=max_width)
arg_lines = _group_named_args(
[
line
for arg in args
for line in _render_lines(arg, max_width=max_width)
]
)
lines = fn_lines + [prefix + line for line in arg_lines]
lines[0] = LEFT_PAREN + lines[0]
lines[-1] = lines[-1] + RIGHT_PAREN
return lines
def _idx_to_var_str(idx: int) -> str:
@ -270,15 +259,44 @@ def op_to_lispress(op: Op) -> Lispress:
value = json.loads(op.value)
schema = value.get("schema")
underlying = value.get("underlying")
# this json formatter makes it easier (than other json formatters) to tokenize the string
underlying_json_str = " ".join(
json.dumps(underlying, separators=(" ,", " : "), indent=0).split("\n")
)
return [OpType.Value.value, [schema, underlying_json_str]]
# Long literals look like 2L
if schema == "Long":
return str(underlying) + "L"
else:
# this json formatter makes it easier (than other json formatters) to tokenize the string
underlying_json_str = " ".join(
json.dumps(underlying, separators=(" ,", " : "), indent=0).split("\n")
)
if schema in ("Number", "String", "Boolean"):
# Numbers and strings were typed in Calflow 1.0 (e.g. #(Number 1),
# #(String "foo"), in Calflow 2.0, any bare number parseable as a float
# or int is interpreted as Number and any quoted string is interpreted
# as a string. This means that we drop the explicit String and Number
# annotations from Calflow 1.0 when roundtripping.
return underlying_json_str
else:
return [OpType.Value.value, [schema, underlying_json_str]]
else:
raise Exception(f"Op with unknown type: {op}")
def type_args_to_lispress(type_args: List[TypeName]) -> Optional[Lispress]:
"""Converts the provided list of type args into a Lispress expression."""
if len(type_args) == 0:
return None
return [type_name_to_lispress(targ) for targ in type_args]
def type_name_to_lispress(type_name: TypeName) -> Lispress:
"""Converts the provided type name into a Lispress expression."""
if len(type_name.type_args) == 0:
return type_name.base
else:
base: List[Sexp] = [type_name.base]
type_args = [type_name_to_lispress(targ) for targ in type_name.type_args]
return base + type_args
def _sugar_gets(sexp: Lispress) -> Lispress:
"""A sugaring that converts `(get X #(Path "y"))` to `(:y X)`. (inverse of `unsugar_gets`)"""
if isinstance(sexp, str) or len(sexp) == 0:
@ -322,51 +340,11 @@ def _desugar_gets(sexp: Lispress) -> Lispress:
return [
DataflowFn.Get.value,
_desugar_gets(obj),
OpType.Value.value,
["Path", f'"{key}"'],
[OpType.Value.value, ["Path", f'"{key}"']],
]
return [_desugar_gets(s) for s in sexp]
def _strip_extra_parens_around_values(sexp: Lispress) -> Lispress:
"""Removes one level of parens around value sexps"""
if isinstance(sexp, list) and len(sexp) >= 1 and sexp[0] == OpType.Value.value:
# top-level value, can't remove any parens
return sexp
def helper(s: Sexp) -> List[Sexp]:
if isinstance(s, str) or len(s) == 0:
return [s]
else:
unnested_one_level = [y for x in s for y in helper(x)]
if s[0] == OpType.Value.value:
# unnest one level
return unnested_one_level
else:
return [unnested_one_level]
return [y for x in helper(sexp) for y in x]
def _add_extra_parens_around_values(sexp: Lispress) -> Lispress:
"""Adds an extra level of parens around value sexps"""
if isinstance(sexp, str) or len(sexp) == 0:
return sexp
else:
result: List[Sexp] = []
i = 0
while i < len(sexp):
curr = sexp[i]
if curr == OpType.Value.value and i + 1 < len(sexp):
# Add an extra level of parens
result.append([curr, sexp[i + 1]])
i += 2
else:
result.append(_add_extra_parens_around_values(curr))
i += 1
return result
def _roots_and_reentrancies(program: Program) -> Tuple[Set[str], Set[str]]:
ids = {e.id for e in program.expressions}
arg_counts = Counter(a for e in program.expressions for a in e.arg_ids)
@ -399,12 +377,25 @@ def _program_to_unsugared_lispress(program: Program) -> Lispress:
# create a sexp for expression
idx = expression.id
op_lispress = op_to_lispress(expression.op)
# if there type args, we create a META expression
if expression.type_args is not None:
op_type_args_lispress = type_args_to_lispress(expression.type_args)
op_lispress = [META_CHAR, op_type_args_lispress, op_lispress]
curr: Sexp
if isinstance(expression.op, (BuildStructOp, CallLikeOp)):
curr = [op_lispress]
named_args = sorted(get_named_args(expression)) # sort alphabetically
named_args = get_named_args(expression)
# if all args are named (i.e., not positional), sort them alphabetically
# TODO in principle, we could get mixed positional and names arguments,
# but for now that doesn't happen in SMCalFlow 2.0 so this code is good
# enough. This code also only works for functions with named arguments
# that have upper case names, which again happens to work for SMCalFlow
# 2.0.
has_positional = any(k is None for k, _ in named_args)
if not has_positional:
named_args = sorted(get_named_args(expression)) # sort alphabetically
for arg_name, arg_id in named_args:
if not arg_name.startswith("arg"):
if arg_name is not None and not arg_name.startswith("arg"):
# name of named argument
curr += [_key_to_named_arg(arg_name)]
if arg_id in reentrant_ids:
@ -418,6 +409,8 @@ def _program_to_unsugared_lispress(program: Program) -> Lispress:
curr += [[EXTERNAL_LABEL, arg_id]]
else:
curr = op_lispress # value
if expression.type:
curr = [META_CHAR, type_name_to_lispress(expression.type), curr]
# add it to results
if idx in reentrancies:
# give reentrancies fresh ids as they are encountered
@ -442,6 +435,9 @@ def _program_to_unsugared_lispress(program: Program) -> Lispress:
return [LET, let_bindings, result] if len(let_bindings) > 0 else result
_long_number_regex = re.compile("^([0-9]+)L$")
def unnest_line(
s: Lispress, idx: Idx, var_id_bindings: Tuple[Tuple[str, int], ...],
) -> Tuple[List[Expression], Idx, Idx, Tuple[Tuple[str, int], ...]]:
@ -460,15 +456,24 @@ def unnest_line(
"""
if not isinstance(s, list):
try:
# bare value
value = loads(s)
known_value_types = {
str: "String",
int: "Number",
}
schema = known_value_types[type(value)]
expr, idx = mk_value_op(value=value, schema=schema, idx=idx)
return [expr], idx, idx, var_id_bindings
m = _long_number_regex.match(s)
if m is not None:
n = m.group(1)
expr, idx = mk_value_op(value=int(n), schema="Long", idx=idx)
return [expr], idx, idx, var_id_bindings
else:
# bare value
value = loads(s)
known_value_types = {
str: "String",
float: "Number",
int: "Number",
bool: "Boolean",
}
schema = known_value_types[type(value)]
expr, idx = mk_value_op(value=value, schema=schema, idx=idx)
return [expr], idx, idx, var_id_bindings
except (JSONDecodeError, KeyError):
return unnest_line([s], idx=idx, var_id_bindings=var_id_bindings)
elif len(s) == 0:
@ -478,9 +483,22 @@ def unnest_line(
s = [x for x in s if x != EXTERNAL_LABEL]
hd, *tl = s
if not isinstance(hd, str):
# we don't know how to handle this case, so we just pack the whole thing into a generic value
expr, idx = mk_value_op(value=s, schema="Object", idx=idx)
return [expr], idx, idx, var_id_bindings
if len(hd) == 3 and hd[0] == META_CHAR:
# type args
_meta_char, type_args, function = hd
without_type_args = [function] + tl
exprs, arg_idx, idx, var_id_bindings = unnest_line(
without_type_args, idx=idx, var_id_bindings=var_id_bindings
)
exprs[-1] = replace(
exprs[-1], type_args=[mk_type_name(targ) for targ in type_args]
)
return exprs, arg_idx, idx, var_id_bindings
else:
# we don't know how to handle this case, so we just pack the whole thing
# into a generic value
expr, idx = mk_value_op(value=s, schema="Object", idx=idx)
return [expr], idx, idx, var_id_bindings
elif _is_idx_str(hd):
# argId pointer
var_id_dict = dict(var_id_bindings)
@ -520,6 +538,22 @@ def unnest_line(
)
result_exprs.extend(exprs)
return result_exprs, arg_idx, idx, var_id_bindings
elif hd == META_CHAR:
# type ascriptions look like (^ T Expr), e.g. (^ Number (+ 1 2))
# would be (1 + 2): Number in Scala.
# Note that there is sugar in sexp.py to parse/render `(^ T Expr)`
# as just `^T Expr`.
assert (
len(tl) == 2
), f"Type ascriptions with ^ must have two arguments, but got {str(tl)}"
(type_declaration, sexpr) = tl
# Recurse on the underlying expression
exprs, arg_idx, idx, var_id_bindings = unnest_line(
sexpr, idx=idx, var_id_bindings=var_id_bindings
)
# Update is type declaration.
exprs[-1] = replace(exprs[-1], type=mk_type_name(type_declaration))
return exprs, arg_idx, idx, var_id_bindings
elif hd == OpType.Value.value:
assert (
len(tl) >= 1 and len(tl[0]) >= 1
@ -533,29 +567,35 @@ def unnest_line(
expr, idx = mk_value_op(value=value, schema=schema, idx=idx)
return [expr], idx, idx, var_id_bindings
elif is_struct_op_schema(hd):
name = hd
result = []
kvs = []
for key, val in chunked(tl, 2):
val_exprs, val_idx, idx, var_id_bindings = unnest_line(
val, idx, var_id_bindings
)
result.extend(val_exprs)
kvs.append((_named_arg_to_key(key), val_idx))
struct_op, idx = mk_struct_op(name, dict(kvs), idx)
# not all args are named and so positional arg names are set to `None`
pending_key = None
for arg in tl:
if isinstance(arg, str) and _is_named_arg(arg):
pending_key = arg
else:
val_exprs, val_idx, idx, var_id_bindings = unnest_line(
arg, idx, var_id_bindings
)
result.extend(val_exprs)
key = None
if pending_key is not None:
key = _named_arg_to_key(pending_key)
kvs.append((key, val_idx))
struct_op, idx = mk_struct_op(hd, kvs, idx)
return result + [struct_op], idx, idx, var_id_bindings
else:
# CallOp
name = hd
result = []
args = []
for a in tl:
for arg in tl:
arg_exprs, arg_idx, idx, var_id_bindings = unnest_line(
a, idx, var_id_bindings
arg, idx, var_id_bindings
)
result.extend(arg_exprs)
args.append(arg_idx)
call_op, idx = mk_call_op(name, args=args, idx=idx)
call_op, idx = mk_call_op(hd, args=args, idx=idx)
return result + [call_op], idx, idx, var_id_bindings

Просмотреть файл

@ -1,9 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from dataclasses import field
from typing import List, Union
from pydantic.dataclasses import dataclass
from dataclasses import dataclass, field
from typing import List, Optional, Union
@dataclass(frozen=True)
@ -19,7 +17,7 @@ class CallLikeOp:
@dataclass(frozen=True)
class BuildStructOp:
op_schema: str
op_fields: List[str]
op_fields: List[Optional[str]]
empty_base: bool
push_go: bool
@ -32,10 +30,18 @@ class BuildStructOp:
Op = Union[ValueOp, CallLikeOp, BuildStructOp]
@dataclass(frozen=True)
class TypeName:
base: str
type_args: List["TypeName"]
@dataclass(frozen=True)
class Expression:
id: str
op: Op
type_args: Optional[List[TypeName]] = None
type: Optional[TypeName] = None
arg_ids: List[str] = field(default_factory=list)

Просмотреть файл

@ -3,9 +3,16 @@
import re
from enum import Enum
from json import dumps
from typing import Any, Dict, Iterable, List, Tuple
from typing import Any, List, Optional, Tuple
from dataflow.core.program import BuildStructOp, CallLikeOp, Expression, ValueOp
from dataflow.core.program import (
BuildStructOp,
CallLikeOp,
Expression,
TypeName,
ValueOp,
)
from dataflow.core.sexp import Sexp
# revise args
ROOT_LOCATION = "rootLocation"
@ -56,7 +63,7 @@ def is_struct_op_schema(name: str) -> bool:
return re.match(r"[A-Z]", name[0]) is not None
def get_named_args(e: Expression) -> List[Tuple[str, str]]:
def get_named_args(e: Expression) -> List[Tuple[str, Optional[str]]]:
"""
Gets a list of (arg_name, arg_id) pairs.
If `e` is a BuildStructOp, then `arg_names` are its `fields`, otherwise
@ -73,11 +80,9 @@ def get_named_args(e: Expression) -> List[Tuple[str, str]]:
def mk_constraint(
tpe: str, args: Iterable[Tuple[str, int]], idx: Idx,
tpe: str, args: List[Tuple[Optional[str], int]], idx: Idx,
) -> Tuple[Expression, Idx]:
return mk_struct_op(
schema=f"Constraint[{tpe.capitalize()}]", args=dict(args), idx=idx
)
return mk_struct_op(schema=f"Constraint[{tpe.capitalize()}]", args=args, idx=idx)
def mk_equality_constraint(val: int, idx: Idx) -> Tuple[Expression, Idx]:
@ -85,11 +90,11 @@ def mk_equality_constraint(val: int, idx: Idx) -> Tuple[Expression, Idx]:
def mk_unset_constraint(idx: Idx) -> Tuple[Expression, Idx]:
return mk_struct_op(schema="EmptyConstraint", args={}, idx=idx)
return mk_struct_op(schema="EmptyConstraint", args=[], idx=idx)
def mk_salience(tpe: str, idx: Idx) -> Tuple[List[Expression], Idx]:
constraint_expr, constraint_idx = mk_constraint(tpe=tpe, args={}, idx=idx)
constraint_expr, constraint_idx = mk_constraint(tpe=tpe, args=[], idx=idx)
salience_expr, idx = mk_call_op(
name=DataflowFn.Refer.value, args=[constraint_idx], idx=constraint_idx
)
@ -121,11 +126,11 @@ def mk_revise(
"""
return mk_struct_op(
schema=DataflowFn.Revise.value,
args={
ROOT_LOCATION: root_location_idx,
OLD_LOCATION: old_location_idx,
NEW: new_idx,
},
args=[
(ROOT_LOCATION, root_location_idx),
(OLD_LOCATION, old_location_idx),
(NEW, new_idx),
],
idx=idx,
)
@ -148,7 +153,7 @@ def mk_revise_the_main_constraint(
salient_action_exprs, salient_action_idx = mk_salient_action(new_idx)
old_loc_expr, old_loc_idx = mk_struct_op(
schema=f"Constraint[Constraint[{tpe.capitalize()}]]",
args={},
args=[],
idx=salient_action_idx,
)
revise_expr, revise_idx = mk_revise(
@ -161,16 +166,15 @@ def mk_revise_the_main_constraint(
def mk_struct_op(
schema: str, args: Dict[str, Idx], idx: Idx,
schema: str, args: List[Tuple[Optional[str], Idx]], idx: Idx,
) -> Tuple[Expression, Idx]:
new_idx = idx + 1
args = dict(args) # defensive copy
base = args.pop(NON_EMPTY_BASE, None)
# args = dict(args) # defensive copy
base = next((v for k, v in args if k == NON_EMPTY_BASE), None)
is_empty_base = base is None
pairs = sorted(args.items()) # sorts keys alphabetically
arg_names = [k for k, v in pairs]
arg_names = [k for k, v in args]
# nonEmptyBase always comes first
arg_vals = ([] if is_empty_base else [base]) + [v for k, v in pairs]
arg_vals = ([] if is_empty_base else [base]) + [v for k, v in args]
flat_exp = Expression(
id=idx_str(new_idx),
op=BuildStructOp(
@ -194,6 +198,13 @@ def mk_call_op(name: str, args: List[Idx], idx: Idx = 0) -> Tuple[Expression, Id
return flat_exp, new_idx
def mk_type_name(sexp: Sexp) -> TypeName:
if isinstance(sexp, str):
return TypeName(sexp, [])
hd, *tl = sexp
return TypeName(hd, [mk_type_name(e) for e in tl])
def mk_value_op(value: Any, schema: str, idx: Idx) -> Tuple[Expression, Idx]:
my_idx = idx + 1
dumped = dumps({"schema": schema, "underlying": value})

Просмотреть файл

@ -7,6 +7,8 @@ LEFT_PAREN = "("
RIGHT_PAREN = ")"
ESCAPE = "\\"
DOUBLE_QUOTE = '"'
META = "^"
READER = "#"
# we unwrap Symbols into strings for convenience
Sexp = Union[str, List["Sexp"]] # type: ignore # Recursive type
@ -42,86 +44,127 @@ def _split_respecting_quotes(s: str) -> List[str]:
return result
def parse_sexp(sexp_string: str, clean_singletons=False) -> Sexp:
""" Parses an S-expression from a string """
sexp_string = sexp_string.strip()
# handle some special cases
if sexp_string == "":
return []
if sexp_string[-1] == ";":
sexp_string = sexp_string[:-1]
# find and group top-level parentheses (that are not inside quoted strings)
num_open_brackets = 0
open_bracket_idxs = []
close_bracket_idxs = []
result: List[Sexp] = []
state = QuoteState.Outside
for index, ch in enumerate(sexp_string):
if ch == DOUBLE_QUOTE and (index < 1 or sexp_string[index - 1] != ESCAPE):
state = state.flipped()
if ch == LEFT_PAREN and state == QuoteState.Outside:
num_open_brackets += 1
if num_open_brackets == 1:
open_bracket_idxs.append(index)
elif ch == RIGHT_PAREN and state == QuoteState.Outside:
num_open_brackets -= 1
if num_open_brackets == 0:
close_bracket_idxs.append(index)
def parse_sexp(s: str) -> Sexp:
offset = 0
assert len(open_bracket_idxs) == len(
close_bracket_idxs
), f"Mismatched parentheses: {sexp_string}"
assert state == QuoteState.Outside, f"Mismatched double quotes: {sexp_string}"
# eoi = end of input
def is_eoi():
nonlocal offset
return offset == len(s)
start = 0
for index, (open_bracket_idx, close_bracket_idx) in enumerate(
zip(open_bracket_idxs, close_bracket_idxs)
):
if start < open_bracket_idx:
preparen = sexp_string[start:open_bracket_idx].strip()
if preparen != "":
tokens = _split_respecting_quotes(preparen)
result.extend(tokens)
result.append(
parse_sexp(
sexp_string[open_bracket_idx + 1 : close_bracket_idx],
clean_singletons=clean_singletons,
)
)
start = close_bracket_idx + 1
def peek():
nonlocal offset
return s[offset]
if start < len(sexp_string):
# tokens after the last ')'
postparen = sexp_string[start:].strip()
if postparen != "":
tokens = _split_respecting_quotes(postparen)
result.extend(tokens)
def next_char():
# pylint: disable=used-before-assignment
nonlocal offset
cn = s[offset]
offset += 1
return cn
if len(result) == 2 and result[-1] == ";":
sexp_tmp = result[0]
else:
sexp_tmp = result
def skip_whitespace():
while (not is_eoi()) and peek().isspace():
next_char()
# special-case top-level values because they need an extra level
# of parens in the Sexp:
if isinstance(result, list) and len(result) >= 1 and result[0] == "#":
return [result]
if clean_singletons and len(sexp_tmp) == 1:
return sexp_tmp[0]
return sexp_tmp
def skip_then_peek():
skip_whitespace()
return peek()
def read() -> Sexp:
skip_whitespace()
c = next_char()
if c == LEFT_PAREN:
return read_list()
elif c == DOUBLE_QUOTE:
return read_string()
elif c == META:
meta = read()
expr = read()
return [META, meta, expr]
elif c == READER:
return [READER, read()]
else:
out_inner = ""
if c != "\\":
out_inner += c
# TODO: is there a better loop idiom here?
if not is_eoi():
next_c = peek()
escaped = c == "\\"
while (not is_eoi()) and (
escaped or not _is_beginning_control_char(next_c)
):
if (not escaped) and next_c == "\\":
next_char()
escaped = True
else:
out_inner += next_char()
escaped = False
if not is_eoi():
next_c = peek()
return out_inner
def read_list():
out_list = []
while skip_then_peek() != RIGHT_PAREN:
out_list.append(read())
next_char()
return out_list
def read_string():
out_str = ""
while peek() != '"':
c_string = next_char()
out_str += c_string
if c_string == "\\":
out_str += next_char()
next_char()
return f'"{out_str}"'
out = read()
skip_whitespace()
assert offset == len(
s
), f"Failed to exhaustively parse {s}, maybe you are missing a close paren?"
return out
def _is_beginning_control_char(nextC):
return (
nextC.isspace()
or nextC == LEFT_PAREN
or nextC == RIGHT_PAREN
or nextC == DOUBLE_QUOTE
or nextC == READER
or nextC == META
)
def sexp_to_str(sexp: Sexp) -> str:
""" Generates string representation from S-expression """
# Note that some of this logic is repeated in lispress.render_pretty
if isinstance(sexp, list):
return "(" + " ".join(sexp_to_str(f) for f in sexp) + ")"
if len(sexp) == 3 and sexp[0] == META:
(_meta, type_expr, underlying_expr) = sexp
return META + sexp_to_str(type_expr) + " " + sexp_to_str(underlying_expr)
elif len(sexp) == 2 and sexp[0] == READER:
(_reader, expr) = sexp
return READER + sexp_to_str(expr)
else:
return "(" + " ".join(sexp_to_str(f) for f in sexp) + ")"
else:
return sexp
if sexp.startswith('"') and sexp.endswith('"'):
return sexp
else:
return _escape_symbol(sexp)
def flatten(form: Sexp) -> List[str]:
return (
[form]
if not isinstance(form, list)
else [s for subexp in form for s in flatten(subexp)]
)
def _escape_symbol(symbol: str) -> str:
out = []
for c in symbol:
if _is_beginning_control_char(c):
out.append("\\")
out.append(c)
return "".join(out)

Просмотреть файл

@ -1,9 +1,8 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from dataclasses import dataclass
from typing import Dict, List, Optional
from pydantic.dataclasses import dataclass
@dataclass(frozen=True)
class Slot:

Просмотреть файл

@ -8,10 +8,9 @@ TRADE predictions (trade) or dataflow execution results (dataflow).
"""
import argparse
import json
from dataclasses import dataclass
from typing import Any, Dict, List, Union
from pydantic.dataclasses import dataclass
from dataflow.core.io import load_jsonl_file_and_build_lookup, save_jsonl_file
from dataflow.core.prediction_report import (
PredictionReportDatum,

Просмотреть файл

@ -299,7 +299,7 @@ def generate_express_for_topic(
pointer_count_for_slot[slot_name] = pointer_count
expression, pointer_count = mk_constraint(
tpe=topic, args=pointer_count_for_slot.items(), idx=pointer_count
tpe=topic, args=list(pointer_count_for_slot.items()), idx=pointer_count
)
expressions.append(expression)

Просмотреть файл

@ -7,11 +7,10 @@ Evaluates belief state tracking predictions.
"""
import argparse
from dataclasses import field
from dataclasses import dataclass, field
from typing import Dict, cast
import jsons
from pydantic.dataclasses import dataclass
from dataflow.core.io import load_jsonl_file_and_build_lookup
from dataflow.multiwoz.create_belief_state_prediction_report import (

Просмотреть файл

@ -8,11 +8,11 @@ Executes programs to produce TRADE belief states.
import argparse
import copy
import json
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple
import jsons
import numpy as np
from pydantic.dataclasses import dataclass
from dataflow.core.dialogue import Dialogue, Turn
from dataflow.core.io import load_jsonl_file, load_jsonl_file_and_build_lookup

Просмотреть файл

@ -1,10 +1,9 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Dict, List, Optional, Set, Tuple
from pydantic.dataclasses import dataclass
@dataclass(frozen=True)
class PartialExecutionResult:

Просмотреть файл

@ -7,11 +7,11 @@ Creates the prediction report from onmt_translate output.
"""
import argparse
import dataclasses
from dataclasses import dataclass
from typing import Dict, Iterator, List, Union
import jsons
from more_itertools import chunked
from pydantic.dataclasses import dataclass
from dataflow.core.dialogue import (
AgentUtterance,

Просмотреть файл

@ -7,10 +7,10 @@ Creates text data (source-target pairs) to be used for training OpenNMT models."
import argparse
import dataclasses
import re
from dataclasses import dataclass
from typing import Dict, Iterator, List, TextIO
import jsons
from pydantic.dataclasses import dataclass
from tqdm import tqdm
from dataflow.core.constants import SpecialStrings

Просмотреть файл

@ -10,11 +10,11 @@ Computes both turn-level and dialogue-level accuracy.
import argparse
import csv
from dataclasses import dataclass
from typing import List, Optional, Tuple
import jsons
import pandas as pd
from pydantic.dataclasses import dataclass
from dataflow.core.dialogue import TurnId
from dataflow.core.io import load_jsonl_file

Просмотреть файл

@ -1,6 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import json
from typing import List
from dataflow.core.linearize import (
program_to_seq,
@ -12,7 +13,7 @@ from dataflow.core.linearize import (
)
from dataflow.core.lispress import unnest_line
from dataflow.core.program import Expression, Program, ValueOp
from dataflow.core.sexp import flatten
from dataflow.core.sexp import Sexp
def test_plan_to_seq_strict():
@ -50,6 +51,13 @@ def test_plan_to_seq_strict():
") )"
)
def flatten(form: Sexp) -> List[str]:
return (
[form]
if not isinstance(form, list)
else [s for subexp in form for s in flatten(subexp)]
)
linearized_plan_tokens = program_to_seq(program=original_program)
assert all(" " not in tok for tok in flatten(linearized_plan_tokens))
assert " ".join(linearized_plan_tokens) == expected_linearized_plan

Просмотреть файл

@ -1,9 +1,9 @@
from dataflow.core.lispress import (
_try_round_trip,
lispress_to_program,
parse_lispress,
program_to_lispress,
render_pretty,
try_round_trip,
)
from dataflow.core.program import Program
from dataflow.core.program_utils import mk_value_op
@ -18,7 +18,7 @@ surface_strings = [
(dateAtTimeWithDefaults
(nextDOW #(DayOfWeek "THURSDAY"))
(numberPM #(Int 5))))
:subject (?= #(String "makeup artist"))))))""",
:subject (?= "makeup artist")))))""",
# Contains a sugared `get` (`(:start ...)`)
"""
(Yield
@ -37,8 +37,7 @@ surface_strings = [
:output (createCommitEventWrapper
(createPreflightEventWrapper
(eventAllDayOnDate
(Constraint[Event]
:subject (?= #(String "sales conference")))
(Constraint[Event] :subject (?= "sales conference"))
(nextDayOfMonth (today) #(Int 29)))))))""",
"""
(Yield
@ -66,7 +65,7 @@ surface_strings = [
(:results
(findEventWrapperWithDefaults
(eventOnDateAfterTime
(Constraint[Event] :subject (?~= #(String "lunch")))
(Constraint[Event] :subject (?~= "lunch"))
(:date x1)
(:time x1)))))))))))""",
# Includes a `get` that should not be desugared,
@ -84,12 +83,14 @@ surface_strings = [
:start (?=
(adjustByPeriodDuration
(:end (get x0 #(Path "item two")))
(PeriodDuration :duration (toHours #(Number 4)))))
:subject (?= #(String "dinner at foo"))))))))""",
(PeriodDuration :duration (toHours 4))))
:subject (?= "dinner at foo")))))))""",
# tests that whitespace is preserved inside a quoted string,
# as opposed to tokenized and then joined with a single space.
'#(String "multi\\tword quoted\\nstring")',
'#(String "i got quotes\\"")',
'"multi\\tword quoted\\nstring"',
'"i got quotes\\""',
'#(PersonName "multi\\tword quoted\\nstring")',
'#(PersonName "i got quotes\\"")',
# tests that empty plans are handled correctly
"()",
# regression test that no whitespace is inserted between "#" and "(".
@ -107,7 +108,27 @@ surface_strings = [
(RecipientWithNameLike
:constraint (Constraint[Recipient])
:name #(PersonName "Tom"))))))))
:number #(Number 1))))""",
:number 1)))""",
# META_CHAR expression
"""
(Yield
(^(Long) >
^Long
(size
(QueryEventResponse.results
(FindEventWrapperWithDefaults
(EventDuringRange
(^(Event) EmptyStructConstraint)
(ThisWeekend)))))
0L))""",
# Long VALUE_CHAR expression
"""
(Yield
(==
#(PersonName
"veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeerylong")
#(PersonName "short")))
""",
]
@ -143,16 +164,64 @@ def test_program_to_lispress_with_quotes_inside_string():
v, _ = mk_value_op(value='i got quotes"', schema="String", idx=0)
program = Program(expressions=[v])
rendered_lispress = render_pretty(program_to_lispress(program))
assert rendered_lispress == '#(String "i got quotes\\"")'
assert rendered_lispress == '"i got quotes\\""'
sexp = parse_lispress(rendered_lispress)
round_tripped, _ = lispress_to_program(sexp, 0)
assert round_tripped == program
def test_bare_values():
assert try_round_trip("0") == "#(Number 0.0)"
assert try_round_trip("#(Number 0)") == "#(Number 0.0)"
assert _try_round_trip("0L") == "0L"
assert _try_round_trip("0") == "0.0"
assert _try_round_trip("0.0") == "0.0"
assert _try_round_trip("#(Number 0)") == "0.0"
assert _try_round_trip("#(Number 0.0)") == "0.0"
def test_typenames():
roundtrip = _try_round_trip("^Number (^(String) foo (bar) ^Bar (bar))")
assert roundtrip == "^Number (^(String) foo (bar) ^Bar (bar))"
def test_typename_with_args():
roundtrip = _try_round_trip("^(Number Foo) (^(String) foo (bar) ^Bar (bar))")
assert roundtrip == "^(Number Foo) (^(String) foo (bar) ^Bar (bar))"
def test_sorts_named_args():
# TODO: scary: named
roundtrip = _try_round_trip("(Foo :foo 1.0 :bar 3.0)")
assert roundtrip == "(Foo :bar 3.0 :foo 1.0)"
def test_mixed_named_and_positional_args():
# TODO: scary: named
roundtrip = _try_round_trip("(Foo 1.0 2.0 :bar 3)")
assert roundtrip == "(Foo 1.0 2.0 :bar 3.0)"
def test_number_float():
lispress = "(Yield (> (a) 0.0))"
assert _try_round_trip(lispress) == lispress
assert _try_round_trip("(Yield (> (a) 0))") == lispress
assert _try_round_trip("(toHours 4)") == "(toHours 4.0)"
def test_bool():
assert _try_round_trip("(toHours true)") == "(toHours true)"
def test_string():
assert _try_round_trip('(+ (a) #(String "b"))') == '(+ (a) "b")'
assert _try_round_trip('(+ (a) #(PersonName "b"))') == '(+ (a) #(PersonName "b"))'
def test_escaped_name():
string = "(a\\ b)"
assert parse_lispress(string) == ["a b"]
assert _try_round_trip(string) == string
def test_strip_copy_strings():
assert try_round_trip('#(String " Tom ")') == '#(String "Tom")'
assert _try_round_trip('#(String " Tom ")') == '"Tom"'
assert _try_round_trip('" Tom "') == '"Tom"'

Просмотреть файл

@ -119,15 +119,15 @@ def test_create_programs_with_revise(trade_dialogue_1: Dict[str, Any]):
salience_model = VanillaSalienceModel()
expected_plans: List[str] = [
# turn 1
"""(find (Constraint[Hotel] :name (?= #(String "none")) :type (?= #(String "none"))))""",
"""(find (Constraint[Hotel] :name (?= "none") :type (?= "none")))""",
# turn 2
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "hilton")) :pricerange (?= #(String "cheap")) :type (?= #(String "guest house"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "hilton") :pricerange (?= "cheap") :type (?= "guest house")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 3
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "none"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "none")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 4
"""(abandon (Constraint[Hotel]))""",
# turn 5
"""(find (Constraint[Hotel] :area (?= #(String "west"))))""",
"""(find (Constraint[Hotel] :area (?= "west")))""",
# turn 6
"""(find (Constraint[Restaurant] :area (refer (Constraint[Area]))))""",
# turn 7
@ -135,7 +135,7 @@ def test_create_programs_with_revise(trade_dialogue_1: Dict[str, Any]):
# turn 8
"()",
# turn 9
"""(find (Constraint[Taxi] :departure (?= #(String "none"))))""",
"""(find (Constraint[Taxi] :departure (?= "none")))""",
# turn 10
"()",
]
@ -160,23 +160,23 @@ def test_create_programs_with_revise_with_fill_none(trade_dialogue_1: Dict[str,
expected_plans: List[str] = [
# turn 1
"""(find (Constraint[Hotel] :area (?= #(String "none")) :book-day (?= #(String "none")) :book-people (?= #(String "none")) :book-stay (?= #(String "none")) :internet (?= #(String "none")) :name (?= #(String "none")) :parking (?= #(String "none")) :pricerange (?= #(String "none")) :stars (?= #(String "none")) :type (?= #(String "none"))))""",
"""(find (Constraint[Hotel] :area (?= "none") :book-day (?= "none") :book-people (?= "none") :book-stay (?= "none") :internet (?= "none") :name (?= "none") :parking (?= "none") :pricerange (?= "none") :stars (?= "none") :type (?= "none")))""",
# turn 2
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "hilton")) :pricerange (?= #(String "cheap")) :type (?= #(String "guest house"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "hilton") :pricerange (?= "cheap") :type (?= "guest house")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 3
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "none"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "none")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 4
"""(abandon (Constraint[Hotel]))""",
# turn 5
"""(find (Constraint[Hotel] :area (?= #(String "west")) :book-day (?= #(String "none")) :book-people (?= #(String "none")) :book-stay (?= #(String "none")) :internet (?= #(String "none")) :name (?= #(String "none")) :parking (?= #(String "none")) :pricerange (?= #(String "none")) :stars (?= #(String "none")) :type (?= #(String "none"))))""",
"""(find (Constraint[Hotel] :area (?= "west") :book-day (?= "none") :book-people (?= "none") :book-stay (?= "none") :internet (?= "none") :name (?= "none") :parking (?= "none") :pricerange (?= "none") :stars (?= "none") :type (?= "none")))""",
# turn 6
"""(find (Constraint[Restaurant] :area (refer (Constraint[Area])) :book-day (?= #(String "none")) :book-people (?= #(String "none")) :book-time (?= #(String "none")) :food (?= #(String "none")) :name (?= #(String "none")) :pricerange (?= #(String "none"))))""",
"""(find (Constraint[Restaurant] :area (refer (Constraint[Area])) :book-day (?= "none") :book-people (?= "none") :book-time (?= "none") :food (?= "none") :name (?= "none") :pricerange (?= "none")))""",
# turn 7
"""(ReviseConstraint :new (Constraint[Restaurant] :pricerange (refer (Constraint[Pricerange]))) :oldLocation (Constraint[Constraint[Restaurant]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 8
"()",
# turn 9
"""(find (Constraint[Taxi] :arriveby (?= #(String "none")) :departure (?= #(String "none")) :destination (?= #(String "none")) :leaveat (?= #(String "none"))))""",
"""(find (Constraint[Taxi] :arriveby (?= "none") :departure (?= "none") :destination (?= "none") :leaveat (?= "none")))""",
# turn 10
"()",
]
@ -204,15 +204,15 @@ def test_create_programs_with_revise_with_avoid_empty_plan(
salience_model = VanillaSalienceModel()
expected_plans: List[str] = [
# turn 1
"""(find (Constraint[Hotel] :name (?= #(String "none")) :type (?= #(String "none"))))""",
"""(find (Constraint[Hotel] :name (?= "none") :type (?= "none")))""",
# turn 2
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "hilton")) :pricerange (?= #(String "cheap")) :type (?= #(String "guest house"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "hilton") :pricerange (?= "cheap") :type (?= "guest house")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 3
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= #(String "none"))) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Hotel] :name (?= "none")) :oldLocation (Constraint[Constraint[Hotel]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 4
"""(abandon (Constraint[Hotel]))""",
# turn 5
"""(find (Constraint[Hotel] :area (?= #(String "west"))))""",
"""(find (Constraint[Hotel] :area (?= "west")))""",
# turn 6
"""(find (Constraint[Restaurant] :area (refer (Constraint[Area]))))""",
# turn 7
@ -220,9 +220,9 @@ def test_create_programs_with_revise_with_avoid_empty_plan(
# turn 8
"""(ReviseConstraint :new (Constraint[Restaurant] :pricerange (refer (Constraint[Pricerange]))) :oldLocation (Constraint[Constraint[Restaurant]]) :rootLocation (roleConstraint #(Path "output")))""",
# turn 9
"""(find (Constraint[Taxi] :departure (?= #(String "none"))))""",
"""(find (Constraint[Taxi] :departure (?= "none")))""",
# turn 10
"""(ReviseConstraint :new (Constraint[Taxi] :departure (?= #(String "none"))) :oldLocation (Constraint[Constraint[Taxi]]) :rootLocation (roleConstraint #(Path "output")))""",
"""(ReviseConstraint :new (Constraint[Taxi] :departure (?= "none")) :oldLocation (Constraint[Constraint[Taxi]]) :rootLocation (roleConstraint #(Path "output")))""",
]
dataflow_dialogue, _, _ = create_programs_for_trade_dialogue(
trade_dialogue=trade_dialogue_1,