Граф коммитов

89 Коммитов

Автор SHA1 Сообщение Дата
Kevin Lewis b01af008cd
Merge pull request #17 from kelewis/master
Minor changes and bugfixes
2018-05-27 14:49:33 -07:00
kelewis 499911212b Removing some unnessisary column data related to name. 2018-05-27 14:44:12 -07:00
Kevin Lewis 2f2a98f260 Fixing but in CommitFile parsing. The incomming changes were not being properly deduped creating really large rows of nonsense killing the job. 2018-05-27 14:44:12 -07:00
Kevin Lewis 17e3deac8d Improving untaring of the mongo backup file
Moving back to compressing the intermediate json output but now making the output files in chunks.  This improves reading of the files from blob into datalake.
2018-05-27 14:44:12 -07:00
Kevin Lewis f2e73e1307 Going back to gzip, but changing the processing to produce files no bigger than 1gig.
Also trying a fix where CommitFile processing was producing intermediate results that had row sizes larger than 4MB.
2018-05-27 14:44:12 -07:00
Kevin Lewis 4f959977aa New restriction in Azure Data Lake won't read very large gzip files. Changing the processing to no longer gzip. 2018-05-27 14:44:12 -07:00
Kevin Lewis 35c906116b Adding "silent" flag to extractor to allowing the skipping of rows that cannot be parsed. 2018-05-27 14:44:12 -07:00
Kevin Lewis d2de524f90 Merge pull request #15 from kelewis/master
Minor updates and bugfixes
2016-12-09 10:35:24 -08:00
Kevin Lewis 8ac145c968 Updating packages 2016-12-09 10:31:52 -08:00
Kevin Lewis 474378a29d Adding new GetGuid utility function 2016-12-09 10:29:24 -08:00
Kevin Lewis b4e4f505c6 Bug fixes related to the Watch table. Resolves an issue where watchers we getting reduced to the wrong primary key. 2016-12-09 10:28:38 -08:00
Kevin Lewis b19bc04b1d Adjusted script to match the new account used for sharing. 2016-12-09 10:27:13 -08:00
Kevin Lewis 55353fe1b9 Merge pull request #14 from riv/master
Conslidated Activites + PullRequestCommit
2016-08-16 14:42:11 -07:00
t-shche f52799c41c PR Followup Fix
-Fixed Comment Issue
-Changed number of PullRequestCommit buckets to 20
2016-08-16 14:37:23 -07:00
t-shche 4bbd9abd55 Renamed activity name from RunDaily1 to Rundaily 2016-08-16 14:15:12 -07:00
t-shche a756aeb238 -Deleted unused .usql.cs file 2016-08-16 14:13:33 -07:00
t-shche 27d85bb933 Merge branch 'master' of https://github.com/riv/ghinsights.git 2016-08-16 14:09:21 -07:00
t-shche 019d4b230c -Compressed previously running daily activities to one activity: RunDaily
-Changed Pipeline to meet account for changes
-Added PullRequestCommit table for previously non-existing relationship
2016-08-16 14:08:52 -07:00
Kevin Lewis 5120996a6c Merge pull request #13 from riv/master
Changed GetString back to GetInteger for EventId
2016-08-02 16:18:10 -07:00
Shuo Cheng e24e18daa5 Fixed the commit 2016-08-02 16:16:39 -07:00
t-shche 2c31e3eb8a Changed GetString back to GetInteger for EventId 2016-08-02 11:40:46 -07:00
Kevin Lewis 23ab97538f Merge pull request #12 from kelewis/master
Fixed bug where daily processing was not truncating the Event table, …
2016-07-20 13:13:34 -07:00
Kevin Lewis 432a8b2427 Fixed bug where daily processing was not truncating the Event table, resulting in the entire set being duplicated each day. Thanks Shuo for finding this!
Added Url to Commit table as it provides context to which repo the Commits were found.
2016-07-20 11:09:09 -07:00
Kevin Lewis dc776470e8 Merge pull request #11 from riv/master
Merging GHInsightsms incremental
2016-07-19 13:53:42 -07:00
t-shche 932208caa4 Fix Merged (Again)
-OutputStreamPath changed to C:\Data
2016-07-19 13:49:17 -07:00
t-shche e562e85f43 Fix Merge
-Changed to 'ghinsightspublic' in StageData
-Removed local path in USQL.usqlproj
-Changed encodings to utf-8 for file that were showing up as binary in github
2016-07-19 13:44:04 -07:00
t-shche 27fa39ff16 Pull Request Fix + Pipeline Addition
-Fixed Typo and other small errors
-Removed references to extraneous scripts
-Removed usql.cs file
-Added two MaxAnalytics scripts as Daily Activities (DailyRepoActivity and RepoAttributes)
2016-07-19 13:26:05 -07:00
t-shche c18731e9f2 GHInsights Incremental Pipeline Completed
-Completed Incremental changes for ghinsightsms
-For the EventId field, changed the GetInteger method to GetString
2016-07-08 17:36:35 -07:00
t-shche ba10634ad6 Complete Merge with GHInsights Incremental 2016-06-28 11:11:53 -07:00
t-shche e2ec6107f3 Commit To Merge Incremental 2016-06-28 11:08:07 -07:00
t-shche 4e476f60a0 Updated Job Retry Interval to 3 hours from 1 2016-06-28 11:00:25 -07:00
Kevin Lewis ad5ec3942f Merge pull request #10 from kelewis/master
Adding ADF & USQL changes to support daily updates.
2016-06-28 09:44:37 -07:00
Kevin Lewis ad130ee3ed Changes to table partitioning to support faster daily processing.
A few other updates related to missing columns or schema inconsistencies.
2016-06-27 11:07:23 -07:00
Kevin Lewis ddeff92ba8 Adding USQL activities to main public pipeline.
Changed StageData.usql to be a callable stored procedure with data slice parameters to be passed in by Data Factory.

New ProcessDaily.usql procedure that maintains all the final tables with daily data updates.  Called by data factory.
2016-06-27 10:47:33 -07:00
Kevin Lewis 61a7505038 Adding a .gitignore to not include environment specific configurations.
Removing earlier entry from the root .gitignore.
2016-06-27 10:36:09 -07:00
Kevin Lewis c0d420526e Merge of pipeline additions to DataFactory project. 2016-06-27 10:35:32 -07:00
Kevin Lewis de5a4b109c Merge pull request #9 from riv/master
ADDITION:
2016-06-24 13:16:31 -07:00
t-shche b485618d77 ADDITION:
-Add to commit: DataFactory.dfproj
2016-06-24 11:31:57 -07:00
Kevin Lewis 8c275a4a1f Merge pull request #8 from riv/master
[NEW FEATURE] MSGHT Non-Incremental Pipeline Setup
2016-06-24 11:20:46 -07:00
t-shche fd60427441 DELETION:
-Deleted Filese Related EventKeys (script generating all possible paths)
2016-06-24 11:13:09 -07:00
t-shche 37833287e4 DELETION:
-Deleted all files related to ProcessMsEvent
2016-06-24 11:06:42 -07:00
t-shche 1af0c9f34b [FEATURE] MSGHT Non-Incremental Pipeline Setup
-Created new MSGHT Pipeline which pulls from MSGHTorrentAzureStorage and outputs to MSGHTEventDetailSotrage
-Created Webhook event processeing script is uneeded now, but may come in handy later when processing webtraffic data.
-Copied Kevin's usql script EventKeys for finding the set of all JSON paths exhibited in Events JSON objects.
2016-06-23 17:09:55 -07:00
Jeff McAffer 6fa96fe4e0 add usql pointer 2016-05-19 11:56:38 -05:00
Jeff McAffer 7f883a69b2 update commit file info 2016-05-19 11:38:44 -05:00
Kevin Lewis 3008e50b57 Removing this readme, content now merged into main file 2016-05-19 11:14:33 -05:00
Jeff McAffer 2fab3c75ef fix cut/paste error 2016-05-19 11:03:38 -05:00
Jeff McAffer 699b757703 update readme 2016-05-19 11:02:52 -05:00
Kevin Lewis d02c2f6f92 Commenting out CommitFile restore
so it won't be restored by default.  It should only be restored by folks willing to pay for the additional restore time required.
2016-05-18 22:08:14 -07:00
Kevin Lewis 4ceef73aa5 Updates to data sharing readme 2016-05-18 22:00:20 -07:00
Kevin Lewis 522b71bd2b Merge pull request #7 from kelewis/master
Data Import/Export Updates
2016-05-18 23:20:47 -05:00