cognitive-services/RssGenerator
Dan Grecoe 403a501c11
Remove name from header
2019-01-24 07:38:20 -05:00
..
CosmosDBHelper Move to Azure repo 2019-01-24 07:16:27 -05:00
Properties Move to Azure repo 2019-01-24 07:16:27 -05:00
RSS Move to Azure repo 2019-01-24 07:16:27 -05:00
RecordFormats Move to Azure repo 2019-01-24 07:16:27 -05:00
StorageHelper Move to Azure repo 2019-01-24 07:16:27 -05:00
App.config Move to Azure repo 2019-01-24 07:16:27 -05:00
Configuration.cs Move to Azure repo 2019-01-24 07:16:27 -05:00
Configuration.json Move to Azure repo 2019-01-24 07:16:27 -05:00
Program.cs Move to Azure repo 2019-01-24 07:16:27 -05:00
Program_Query.cs Move to Azure repo 2019-01-24 07:16:27 -05:00
Program_Seed.cs Move to Azure repo 2019-01-24 07:16:27 -05:00
Program_UploadRss.cs Move to Azure repo 2019-01-24 07:16:27 -05:00
README.md Remove name from header 2019-01-24 07:38:20 -05:00
RssGenerator.csproj Move to Azure repo 2019-01-24 07:16:27 -05:00
RssGenerator.csproj.user Move to Azure repo 2019-01-24 07:16:27 -05:00
RssGenerator.sln Move to Azure repo 2019-01-24 07:16:27 -05:00
packages.config Move to Azure repo 2019-01-24 07:16:27 -05:00

README.md

RssGenerator

Use Case: Mass Ingestion of Electronic Documents

This directory contains the source code used to build the generator application that feeds the pipeline for this demo.

Data Formats

Ingestion format

This format is contains the original content of the article. Articles are broken down into article (contains text) and image entries in the Ingest collection in teh Cosmos DB.

{
	"id" : "GUID",
	"asset_hash" : "hash of the item",
	"artifact_type" : "article|image",
	"properties" :
		{
			Dependent on artifact_type
		}
}

Property Bag Properties

Property Type Required Article Image
original_uri String Y X X
retrieval_datetime DateTime Y X X
post_date DateTime N X
body String N X
title String N X
author String N X
hero_image String N X
child_images Array(object) N X
internal_uri String N X X
Media Object

The media object is used for child_images. The field media_id is the Document ID of the media document in the Articles table.

{
    "mediaId": "9d30724f5b8043e49552f4b8eb02f010",
    "origUri": "https://dummy/thirdgrade.jpg",
    "internalUri": "https://dangtestrepo.blob.core.windows.net/scraped/thirdgrade.jpg"
}

Processed Format

This format is contains the results of analyzing a portion of the ingested article. There will be one for the main article and one for each image. These records are kept in the Processed collection in Cosmos DB.

{
	"id" : "GUID",
	"artifact_type" : "article|image", 
        “parent” : “parent id”,
	"properties" : {
			.... dependent on artifact type ......
	}
	"tags" :[interesting/need alerting/dealers choice!]
}

Property Bag Properties

Property Type Required Article Image
processed_datetime DateTime Y X X
processed_time* Int Y X X
title** object N X
body** object N X
vision*** object N X
face**** object N X
tags Array(string) N X X
* Total processing time (ms)

** Text Field Analytics objects

*** Vision Analytics object

*** Face Analytics object

Text Field Analytics Object
"body|title": {
    "type": "Body|Title",
    "orig_lang_code": "language detected",
    "lang_code": "requested language",
    "value": "Translated text content",
    "key_phrases": [
        "Array of strings, key phrases found"
    ],
    "sentiment": 0.5,
    "entities": [
        {
            "OriginalText": "(array of items found) British premier",
            "Name": "Prime Minister of the United Kingdom",
            "BingId": "2570ebea-8c42-048a-3350-57c9e4169167",
            "WikipediaUrl": "https://en.wikipedia.org/wiki/Prime_Minister_of_the_United...."
        }
		....
    ]
}
Vision Analytics Object
"vision": {
     "object_categories": ["array of strings of object categories found"],
     "objects": ["array of strings of objects"],
     "text": ["array of strings of text found in images"]
 }
Face Analytics Object

The face object is a list of People with gender and age.

"face": {
    "people": [
		{
			"gender" : "gender of person found",
			"age" : "age of person found"
		}
	]
}