Correctly calculate schema size for json columns: 1

This uses the fact that we tag `object` metrics with `format=json` in
the schema, which is ... sort of correct.
Technically `format` only applies to `type: string` fields in
jsonschema, but nothing blows up if you just set it.

We need _some_ signal for json columns.
The jsonschema-transpiler actually relies on schema configuration to
know when to output a JSON column.

Yes, that's now two different ways on how to do it.
Maybe this SHOULD use what's configured in `mozPipelineMetadata`
instead?
This commit is contained in:
Jan-Erik Rediger 2023-09-29 14:28:45 +02:00
Родитель 8f0367e034
Коммит baa64fa599
2 изменённых файлов: 11 добавлений и 0 удалений

Просмотреть файл

@ -116,6 +116,10 @@ class Schema(object):
return sum(Schema._get_schema_size(s) for s in schema)
if "type" not in schema:
# A JSON column is just that: one column
if schema.get("format") == "json":
return 1
raise Exception("Missing type for schema element at key " + "/".join(key))
if isinstance(schema["type"], list):

Просмотреть файл

@ -87,3 +87,10 @@ class TestSchema(object):
)
print_and_test(schema.schema, res_schema.schema)
def test_schema_with_json(self):
json_obj = {"format": "json"}
assert Schema._get_schema_size(json_obj) == 1
defined_obj = {"type": "object", "properties": {"str": {"type": "json"}}}
assert Schema._get_schema_size(defined_obj) == 1