Kevin Lewis
b01af008cd
Merge pull request #17 from kelewis/master
...
Minor changes and bugfixes
2018-05-27 14:49:33 -07:00
kelewis
499911212b
Removing some unnessisary column data related to name.
2018-05-27 14:44:12 -07:00
Kevin Lewis
2f2a98f260
Fixing but in CommitFile parsing. The incomming changes were not being properly deduped creating really large rows of nonsense killing the job.
2018-05-27 14:44:12 -07:00
Kevin Lewis
17e3deac8d
Improving untaring of the mongo backup file
...
Moving back to compressing the intermediate json output but now making the output files in chunks. This improves reading of the files from blob into datalake.
2018-05-27 14:44:12 -07:00
Kevin Lewis
f2e73e1307
Going back to gzip, but changing the processing to produce files no bigger than 1gig.
...
Also trying a fix where CommitFile processing was producing intermediate results that had row sizes larger than 4MB.
2018-05-27 14:44:12 -07:00
Kevin Lewis
4f959977aa
New restriction in Azure Data Lake won't read very large gzip files. Changing the processing to no longer gzip.
2018-05-27 14:44:12 -07:00
Kevin Lewis
35c906116b
Adding "silent" flag to extractor to allowing the skipping of rows that cannot be parsed.
2018-05-27 14:44:12 -07:00
Kevin Lewis
d2de524f90
Merge pull request #15 from kelewis/master
...
Minor updates and bugfixes
2016-12-09 10:35:24 -08:00
Kevin Lewis
8ac145c968
Updating packages
2016-12-09 10:31:52 -08:00
Kevin Lewis
474378a29d
Adding new GetGuid utility function
2016-12-09 10:29:24 -08:00
Kevin Lewis
b4e4f505c6
Bug fixes related to the Watch table. Resolves an issue where watchers we getting reduced to the wrong primary key.
2016-12-09 10:28:38 -08:00
Kevin Lewis
b19bc04b1d
Adjusted script to match the new account used for sharing.
2016-12-09 10:27:13 -08:00
Kevin Lewis
55353fe1b9
Merge pull request #14 from riv/master
...
Conslidated Activites + PullRequestCommit
2016-08-16 14:42:11 -07:00
t-shche
f52799c41c
PR Followup Fix
...
-Fixed Comment Issue
-Changed number of PullRequestCommit buckets to 20
2016-08-16 14:37:23 -07:00
t-shche
4bbd9abd55
Renamed activity name from RunDaily1 to Rundaily
2016-08-16 14:15:12 -07:00
t-shche
a756aeb238
-Deleted unused .usql.cs file
2016-08-16 14:13:33 -07:00
t-shche
27d85bb933
Merge branch 'master' of https://github.com/riv/ghinsights.git
2016-08-16 14:09:21 -07:00
t-shche
019d4b230c
-Compressed previously running daily activities to one activity: RunDaily
...
-Changed Pipeline to meet account for changes
-Added PullRequestCommit table for previously non-existing relationship
2016-08-16 14:08:52 -07:00
Kevin Lewis
5120996a6c
Merge pull request #13 from riv/master
...
Changed GetString back to GetInteger for EventId
2016-08-02 16:18:10 -07:00
Shuo Cheng
e24e18daa5
Fixed the commit
2016-08-02 16:16:39 -07:00
t-shche
2c31e3eb8a
Changed GetString back to GetInteger for EventId
2016-08-02 11:40:46 -07:00
Kevin Lewis
23ab97538f
Merge pull request #12 from kelewis/master
...
Fixed bug where daily processing was not truncating the Event table, …
2016-07-20 13:13:34 -07:00
Kevin Lewis
432a8b2427
Fixed bug where daily processing was not truncating the Event table, resulting in the entire set being duplicated each day. Thanks Shuo for finding this!
...
Added Url to Commit table as it provides context to which repo the Commits were found.
2016-07-20 11:09:09 -07:00
Kevin Lewis
dc776470e8
Merge pull request #11 from riv/master
...
Merging GHInsightsms incremental
2016-07-19 13:53:42 -07:00
t-shche
932208caa4
Fix Merged (Again)
...
-OutputStreamPath changed to C:\Data
2016-07-19 13:49:17 -07:00
t-shche
e562e85f43
Fix Merge
...
-Changed to 'ghinsightspublic' in StageData
-Removed local path in USQL.usqlproj
-Changed encodings to utf-8 for file that were showing up as binary in github
2016-07-19 13:44:04 -07:00
t-shche
27fa39ff16
Pull Request Fix + Pipeline Addition
...
-Fixed Typo and other small errors
-Removed references to extraneous scripts
-Removed usql.cs file
-Added two MaxAnalytics scripts as Daily Activities (DailyRepoActivity and RepoAttributes)
2016-07-19 13:26:05 -07:00
t-shche
c18731e9f2
GHInsights Incremental Pipeline Completed
...
-Completed Incremental changes for ghinsightsms
-For the EventId field, changed the GetInteger method to GetString
2016-07-08 17:36:35 -07:00
t-shche
ba10634ad6
Complete Merge with GHInsights Incremental
2016-06-28 11:11:53 -07:00
t-shche
e2ec6107f3
Commit To Merge Incremental
2016-06-28 11:08:07 -07:00
t-shche
4e476f60a0
Updated Job Retry Interval to 3 hours from 1
2016-06-28 11:00:25 -07:00
Kevin Lewis
ad5ec3942f
Merge pull request #10 from kelewis/master
...
Adding ADF & USQL changes to support daily updates.
2016-06-28 09:44:37 -07:00
Kevin Lewis
ad130ee3ed
Changes to table partitioning to support faster daily processing.
...
A few other updates related to missing columns or schema inconsistencies.
2016-06-27 11:07:23 -07:00
Kevin Lewis
ddeff92ba8
Adding USQL activities to main public pipeline.
...
Changed StageData.usql to be a callable stored procedure with data slice parameters to be passed in by Data Factory.
New ProcessDaily.usql procedure that maintains all the final tables with daily data updates. Called by data factory.
2016-06-27 10:47:33 -07:00
Kevin Lewis
61a7505038
Adding a .gitignore to not include environment specific configurations.
...
Removing earlier entry from the root .gitignore.
2016-06-27 10:36:09 -07:00
Kevin Lewis
c0d420526e
Merge of pipeline additions to DataFactory project.
2016-06-27 10:35:32 -07:00
Kevin Lewis
de5a4b109c
Merge pull request #9 from riv/master
...
ADDITION:
2016-06-24 13:16:31 -07:00
t-shche
b485618d77
ADDITION:
...
-Add to commit: DataFactory.dfproj
2016-06-24 11:31:57 -07:00
Kevin Lewis
8c275a4a1f
Merge pull request #8 from riv/master
...
[NEW FEATURE] MSGHT Non-Incremental Pipeline Setup
2016-06-24 11:20:46 -07:00
t-shche
fd60427441
DELETION:
...
-Deleted Filese Related EventKeys (script generating all possible paths)
2016-06-24 11:13:09 -07:00
t-shche
37833287e4
DELETION:
...
-Deleted all files related to ProcessMsEvent
2016-06-24 11:06:42 -07:00
t-shche
1af0c9f34b
[FEATURE] MSGHT Non-Incremental Pipeline Setup
...
-Created new MSGHT Pipeline which pulls from MSGHTorrentAzureStorage and outputs to MSGHTEventDetailSotrage
-Created Webhook event processeing script is uneeded now, but may come in handy later when processing webtraffic data.
-Copied Kevin's usql script EventKeys for finding the set of all JSON paths exhibited in Events JSON objects.
2016-06-23 17:09:55 -07:00
Jeff McAffer
6fa96fe4e0
add usql pointer
2016-05-19 11:56:38 -05:00
Jeff McAffer
7f883a69b2
update commit file info
2016-05-19 11:38:44 -05:00
Kevin Lewis
3008e50b57
Removing this readme, content now merged into main file
2016-05-19 11:14:33 -05:00
Jeff McAffer
2fab3c75ef
fix cut/paste error
2016-05-19 11:03:38 -05:00
Jeff McAffer
699b757703
update readme
2016-05-19 11:02:52 -05:00
Kevin Lewis
d02c2f6f92
Commenting out CommitFile restore
...
so it won't be restored by default. It should only be restored by folks willing to pay for the additional restore time required.
2016-05-18 22:08:14 -07:00
Kevin Lewis
4ceef73aa5
Updates to data sharing readme
2016-05-18 22:00:20 -07:00
Kevin Lewis
522b71bd2b
Merge pull request #7 from kelewis/master
...
Data Import/Export Updates
2016-05-18 23:20:47 -05:00