Merge pull request #17 from kelewis/master
master
Kevin Lewis
2018-05-27 14:49:33 -0700
499911212bRemoving some unnessisary column data related to name.
kelewis
2018-05-27 14:22:54 -0700
2f2a98f260Fixing but in CommitFile parsing. The incomming changes were not being properly deduped creating really large rows of nonsense killing the job.
Kevin Lewis
2017-05-08 14:50:44 -0700
17e3deac8dImproving untaring of the mongo backup file Moving back to compressing the intermediate json output but now making the output files in chunks. This improves reading of the files from blob into datalake.
Kevin Lewis
2017-05-08 14:49:13 -0700
f2e73e1307Going back to gzip, but changing the processing to produce files no bigger than 1gig.
Kevin Lewis
2017-04-20 14:45:16 -0700
4f959977aaNew restriction in Azure Data Lake won't read very large gzip files. Changing the processing to no longer gzip.
Kevin Lewis
2017-04-20 13:28:09 -0700
35c906116bAdding "silent" flag to extractor to allowing the skipping of rows that cannot be parsed.
Kevin Lewis
2016-12-21 08:59:52 -0800
d2de524f90Merge pull request #15 from kelewis/master
Kevin Lewis
2016-12-09 10:35:24 -0800
8ac145c968Updating packages
Kevin Lewis
2016-12-09 10:31:52 -0800
474378a29dAdding new GetGuid utility function
Kevin Lewis
2016-12-09 10:29:24 -0800
b4e4f505c6Bug fixes related to the Watch table. Resolves an issue where watchers we getting reduced to the wrong primary key.
Kevin Lewis
2016-12-09 10:28:38 -0800
b19bc04b1dAdjusted script to match the new account used for sharing.
Kevin Lewis
2016-12-09 10:27:13 -0800
55353fe1b9Merge pull request #14 from riv/master
Kevin Lewis
2016-08-16 14:42:11 -0700
f52799c41cPR Followup Fix -Fixed Comment Issue -Changed number of PullRequestCommit buckets to 20
t-shche
2016-08-16 14:37:23 -0700
4bbd9abd55Renamed activity name from RunDaily1 to Rundaily
t-shche
2016-08-16 14:15:12 -0700
019d4b230c-Compressed previously running daily activities to one activity: RunDaily -Changed Pipeline to meet account for changes -Added PullRequestCommit table for previously non-existing relationship
t-shche
2016-08-16 14:08:52 -0700
5120996a6cMerge pull request #13 from riv/master
Kevin Lewis
2016-08-02 16:18:10 -0700
e24e18daa5Fixed the commit
Shuo Cheng
2016-08-02 16:16:39 -0700
2c31e3eb8aChanged GetString back to GetInteger for EventId
t-shche
2016-08-02 11:40:46 -0700
23ab97538fMerge pull request #12 from kelewis/master
Kevin Lewis
2016-07-20 13:13:34 -0700
432a8b2427Fixed bug where daily processing was not truncating the Event table, resulting in the entire set being duplicated each day. Thanks Shuo for finding this!
Kevin Lewis
2016-07-20 11:09:09 -0700
dc776470e8Merge pull request #11 from riv/master
Kevin Lewis
2016-07-19 13:53:42 -0700
e562e85f43Fix Merge -Changed to 'ghinsightspublic' in StageData -Removed local path in USQL.usqlproj -Changed encodings to utf-8 for file that were showing up as binary in github
t-shche
2016-07-19 13:44:04 -0700
27fa39ff16Pull Request Fix + Pipeline Addition -Fixed Typo and other small errors -Removed references to extraneous scripts -Removed usql.cs file -Added two MaxAnalytics scripts as Daily Activities (DailyRepoActivity and RepoAttributes)
t-shche
2016-07-19 13:26:05 -0700
c18731e9f2GHInsights Incremental Pipeline Completed -Completed Incremental changes for ghinsightsms -For the EventId field, changed the GetInteger method to GetString
t-shche
2016-07-08 17:36:35 -0700
ba10634ad6Complete Merge with GHInsights Incremental
t-shche
2016-06-28 11:11:53 -0700
e2ec6107f3Commit To Merge Incremental
t-shche
2016-06-28 11:08:07 -0700
4e476f60a0Updated Job Retry Interval to 3 hours from 1
t-shche
2016-06-28 11:00:25 -0700
ad5ec3942fMerge pull request #10 from kelewis/master
Kevin Lewis
2016-06-28 09:44:37 -0700
ad130ee3edChanges to table partitioning to support faster daily processing.
Kevin Lewis
2016-06-27 11:07:23 -0700
ddeff92ba8Adding USQL activities to main public pipeline.
Kevin Lewis
2016-06-27 10:47:33 -0700
61a7505038Adding a .gitignore to not include environment specific configurations.
Kevin Lewis
2016-06-27 10:36:09 -0700
c0d420526eMerge of pipeline additions to DataFactory project.
Kevin Lewis
2016-06-27 10:35:32 -0700
de5a4b109cMerge pull request #9 from riv/master
Kevin Lewis
2016-06-24 13:16:31 -0700
b485618d77ADDITION: -Add to commit: DataFactory.dfproj
t-shche
2016-06-24 11:31:57 -0700
8c275a4a1fMerge pull request #8 from riv/master
Kevin Lewis
2016-06-24 11:20:46 -0700
fd60427441DELETION: -Deleted Filese Related EventKeys (script generating all possible paths)
t-shche
2016-06-24 11:13:09 -0700
37833287e4DELETION: -Deleted all files related to ProcessMsEvent
t-shche
2016-06-24 11:06:42 -0700
1af0c9f34b[FEATURE] MSGHT Non-Incremental Pipeline Setup -Created new MSGHT Pipeline which pulls from MSGHTorrentAzureStorage and outputs to MSGHTEventDetailSotrage -Created Webhook event processeing script is uneeded now, but may come in handy later when processing webtraffic data. -Copied Kevin's usql script EventKeys for finding the set of all JSON paths exhibited in Events JSON objects.
t-shche
2016-06-23 17:09:55 -0700
6fa96fe4e0add usql pointer
Jeff McAffer
2016-05-19 11:56:38 -0500
7f883a69b2update commit file info
Jeff McAffer
2016-05-19 11:38:44 -0500
3008e50b57Removing this readme, content now merged into main file
Kevin Lewis
2016-05-19 11:14:33 -0500
2fab3c75effix cut/paste error
Jeff McAffer
2016-05-19 11:03:38 -0500
699b757703update readme
Jeff McAffer
2016-05-19 11:02:52 -0500
d02c2f6f92Commenting out CommitFile restore
Kevin Lewis
2016-05-18 22:08:14 -0700
4ceef73aa5Updates to data sharing readme
Kevin Lewis
2016-05-18 22:00:20 -0700
522b71bd2bMerge pull request #7 from kelewis/master
Kevin Lewis
2016-05-18 23:20:47 -0500
6ee4e05b84Updating the data export scripts.
Kevin Lewis
2016-05-18 21:15:40 -0700
ce01b418c3Removing strings as byte[] from all tables. They just dosn't work in enough cases for them to not be worth the extra length. So now they're all truncated to the 128k limit. They can be brought back when some of the related ADLA bugs are fixed or limitations are changed.
Kevin Lewis
2016-05-18 20:57:02 -0700
bd9ce358f7Altering the import/export script generator to handle boolean and long types. Running the script now generates the usql files directly. Also made a change so the staging files are written to a different adls account.
Kevin Lewis
2016-05-15 13:40:57 -0700
696d87518cAdding data sharing scripts provided by @saveenr and the Data Lake team.
Kevin Lewis
2016-05-15 12:46:47 -0700
dae54918c0Merge pull request #6 from kelewis/master
Kevin Lewis
2016-05-15 09:53:06 -0700
6502ed3095Changed the path where the local Data Factory Activity executor gets the settings file.
Kevin Lewis
2016-05-15 09:50:10 -0700
510da1beebMerge pull request #5 from kelewis/master
Kevin Lewis
2016-05-14 23:34:49 -0700
e89ca44991Removing old files that were removed from the solution but never actually deleted.
Kevin Lewis
2016-05-14 23:22:53 -0700
21598f2cc3Merge pull request #4 from kelewis/master
Kevin Lewis
2016-05-14 23:17:59 -0700
246773d91aAdding license and link in readme
Kevin Lewis
2016-05-14 23:16:44 -0700
a1521dbfafMerge pull request #3 from kelewis/master
Kevin Lewis
2016-05-14 22:54:05 -0700
6883b9978bRenaming project name in readme file
Kevin Lewis
2016-05-14 22:44:38 -0700
913b02de3aMissed a rename in environment template file
Kevin Lewis
2016-05-14 22:38:00 -0700
7a45f53ae2Rename of files and changes to file content
Kevin Lewis
2016-05-14 22:36:53 -0700
d8c924aa61Directory name changes
Kevin Lewis
2016-05-14 22:08:49 -0700
aae6a1e841Rename top directory
Kevin Lewis
2016-05-14 22:01:26 -0700
d82cdddf96Last commit removed a bunch of files from teh Event table in order to make it fit under the rowsize requirements. This adds the removed columns back in a seperate table.
Kevin Lewis
2016-05-14 21:55:28 -0700
1f6c1eebe7Lots of changes on the USql side.
Kevin Lewis
2016-05-14 20:04:20 -0700
9ecf4060a3TarStream - fix to allow skipping large files in the archive DataFactory.Tests - simple test to test out the TarStream - far from complete Extractors.Json - added a way to estimate the size of the produced SqlMap so it can throw an error before going over the limit and crashing the job. Looking at perhaps compressing the data when this happens to enable extraction at a later time. Utility - lots of changes with USql utility functions to help with catching errors converting the parsed json into real types. There is a lot of work here done to catch and convert "usql" strings as USql has a very short string limit of 128k. Also added a function to hash emails as suggested by GitHub, as well as adding functions to compress and decompress large byte arrays. UtilityTests - a quick test to test out the email hashing function.
Kevin Lewis
2016-05-14 15:17:45 -0700
7ab8ee9e5aFound an issue where the commit email info wasn't making it into the destination table. Was keeping it out before until the email hashing was implemented, but they never got added correctly back in until now.
Kevin Lewis
2016-03-15 00:27:19 -0700
ad02d8cc79Nearing completion of the processing scripts.
Kevin Lewis
2016-03-14 13:54:34 -0700
e82ba0b7e8Updating usql project file to not include missing files.
Kevin Lewis
2016-03-13 17:13:56 -0700
a1890039cfInterim commit.
Kevin Lewis
2016-03-13 17:12:59 -0700
1a823f06d1Interim checkin
Kevin Lewis
2016-03-12 13:06:26 -0800
635a250e5dAdding missing app.config that was getting excluded.
Kevin Lewis
2016-02-05 12:44:32 -0800