Граф коммитов

  • b01af008cd
    Merge pull request #17 from kelewis/master master Kevin Lewis 2018-05-27 14:49:33 -0700
  • 499911212b Removing some unnessisary column data related to name. kelewis 2018-05-27 14:22:54 -0700
  • 2f2a98f260 Fixing but in CommitFile parsing. The incomming changes were not being properly deduped creating really large rows of nonsense killing the job. Kevin Lewis 2017-05-08 14:50:44 -0700
  • 17e3deac8d Improving untaring of the mongo backup file Moving back to compressing the intermediate json output but now making the output files in chunks. This improves reading of the files from blob into datalake. Kevin Lewis 2017-05-08 14:49:13 -0700
  • f2e73e1307 Going back to gzip, but changing the processing to produce files no bigger than 1gig. Kevin Lewis 2017-04-20 14:45:16 -0700
  • 4f959977aa New restriction in Azure Data Lake won't read very large gzip files. Changing the processing to no longer gzip. Kevin Lewis 2017-04-20 13:28:09 -0700
  • 35c906116b Adding "silent" flag to extractor to allowing the skipping of rows that cannot be parsed. Kevin Lewis 2016-12-21 08:59:52 -0800
  • 7b29f52c54 daily traffic ghcrawler Maggie Pint 2017-03-10 10:49:22 -0800
  • cad8e4178d paths and referrers Maggie Pint 2017-03-10 09:58:44 -0800
  • d70a714172 clones and views Maggie Pint 2017-03-09 22:13:03 -0800
  • 58819643d5 use GHInisghts, not dev Maggie Pint 2017-02-27 14:47:24 -0800
  • ba3cf61aa9 completed daily job Maggie Pint 2017-02-24 14:35:39 -0800
  • edad243f8b process daily take 1, needs dedupe updates Maggie Pint 2017-02-20 16:36:34 -0800
  • 90092d2557 pull request commit and commit map Maggie Pint 2017-02-08 10:24:56 -0800
  • cb6384ca05 add schema version Maggie Pint 2017-02-06 17:34:19 -0800
  • 18b7db6d1e add repo log Maggie Pint 2017-02-06 14:26:45 -0800
  • 1df2655b65 add stars Maggie Pint 2017-02-06 09:57:24 -0800
  • b8a1143b87 commit comment Maggie Pint 2017-01-28 19:03:51 -0800
  • cd7f59644a add commit file Maggie Pint 2017-01-23 11:14:44 -0800
  • 4d791132e3 add user urn to org Maggie Pint 2017-01-23 11:14:35 -0800
  • c853d65cac add small new pieces of data Maggie Pint 2017-01-19 17:39:47 -0800
  • 656d61532c add pull request data Maggie Pint 2017-01-19 17:39:33 -0800
  • e5fbd046c9 retrieve all data for single repo Maggie Pint 2017-01-05 09:56:40 -0800
  • 3d752d2bd0 Events ETL Maggie Pint 2016-12-21 17:46:04 -0800
  • e3dfe60cfc process collections Maggie Pint 2016-12-19 09:36:59 -0800
  • 9c05968968 add teams Maggie Pint 2016-12-16 16:26:41 -0800
  • 64a7f784a2 move orgs Maggie Pint 2016-12-16 16:04:01 -0800
  • 40e37d2f97 comment out data we dont' have yet Maggie Pint 2016-12-16 15:21:30 -0800
  • 538a8e5f9f normalize ordering with processedAt Maggie Pint 2016-12-16 11:26:20 -0800
  • de2790b5ab fix issue, add commit parent Maggie Pint 2016-12-15 16:42:42 -0800
  • a2398db4da start of ghcrawler conversion Maggie Pint 2016-12-15 16:16:38 -0800
  • 53eaba09c2 Merge 804f7cd16e into d2de524f90 Kevin Lewis 2016-12-09 18:42:10 +0000
  • d2de524f90 Merge pull request #15 from kelewis/master Kevin Lewis 2016-12-09 10:35:24 -0800
  • 8ac145c968 Updating packages Kevin Lewis 2016-12-09 10:31:52 -0800
  • 474378a29d Adding new GetGuid utility function Kevin Lewis 2016-12-09 10:29:24 -0800
  • b4e4f505c6 Bug fixes related to the Watch table. Resolves an issue where watchers we getting reduced to the wrong primary key. Kevin Lewis 2016-12-09 10:28:38 -0800
  • b19bc04b1d Adjusted script to match the new account used for sharing. Kevin Lewis 2016-12-09 10:27:13 -0800
  • 55353fe1b9 Merge pull request #14 from riv/master Kevin Lewis 2016-08-16 14:42:11 -0700
  • f52799c41c PR Followup Fix -Fixed Comment Issue -Changed number of PullRequestCommit buckets to 20 t-shche 2016-08-16 14:37:23 -0700
  • 4bbd9abd55 Renamed activity name from RunDaily1 to Rundaily t-shche 2016-08-16 14:15:12 -0700
  • a756aeb238 -Deleted unused .usql.cs file t-shche 2016-08-16 14:13:33 -0700
  • 27d85bb933 Merge branch 'master' of https://github.com/riv/ghinsights.git t-shche 2016-08-16 14:09:21 -0700
  • 019d4b230c -Compressed previously running daily activities to one activity: RunDaily -Changed Pipeline to meet account for changes -Added PullRequestCommit table for previously non-existing relationship t-shche 2016-08-16 14:08:52 -0700
  • 5120996a6c Merge pull request #13 from riv/master Kevin Lewis 2016-08-02 16:18:10 -0700
  • e24e18daa5 Fixed the commit Shuo Cheng 2016-08-02 16:16:39 -0700
  • 2c31e3eb8a Changed GetString back to GetInteger for EventId t-shche 2016-08-02 11:40:46 -0700
  • 23ab97538f Merge pull request #12 from kelewis/master Kevin Lewis 2016-07-20 13:13:34 -0700
  • 432a8b2427 Fixed bug where daily processing was not truncating the Event table, resulting in the entire set being duplicated each day. Thanks Shuo for finding this! Kevin Lewis 2016-07-20 11:09:09 -0700
  • dc776470e8 Merge pull request #11 from riv/master Kevin Lewis 2016-07-19 13:53:42 -0700
  • 932208caa4 Fix Merged (Again) -OutputStreamPath changed to C:\Data t-shche 2016-07-19 13:49:17 -0700
  • e562e85f43 Fix Merge -Changed to 'ghinsightspublic' in StageData -Removed local path in USQL.usqlproj -Changed encodings to utf-8 for file that were showing up as binary in github t-shche 2016-07-19 13:44:04 -0700
  • 27fa39ff16 Pull Request Fix + Pipeline Addition -Fixed Typo and other small errors -Removed references to extraneous scripts -Removed usql.cs file -Added two MaxAnalytics scripts as Daily Activities (DailyRepoActivity and RepoAttributes) t-shche 2016-07-19 13:26:05 -0700
  • c18731e9f2 GHInsights Incremental Pipeline Completed -Completed Incremental changes for ghinsightsms -For the EventId field, changed the GetInteger method to GetString t-shche 2016-07-08 17:36:35 -0700
  • ba10634ad6 Complete Merge with GHInsights Incremental t-shche 2016-06-28 11:11:53 -0700
  • e2ec6107f3 Commit To Merge Incremental t-shche 2016-06-28 11:08:07 -0700
  • 4e476f60a0 Updated Job Retry Interval to 3 hours from 1 t-shche 2016-06-28 11:00:25 -0700
  • ad5ec3942f Merge pull request #10 from kelewis/master Kevin Lewis 2016-06-28 09:44:37 -0700
  • ad130ee3ed Changes to table partitioning to support faster daily processing. Kevin Lewis 2016-06-27 11:07:23 -0700
  • ddeff92ba8 Adding USQL activities to main public pipeline. Kevin Lewis 2016-06-27 10:47:33 -0700
  • 61a7505038 Adding a .gitignore to not include environment specific configurations. Kevin Lewis 2016-06-27 10:36:09 -0700
  • c0d420526e Merge of pipeline additions to DataFactory project. Kevin Lewis 2016-06-27 10:35:32 -0700
  • de5a4b109c Merge pull request #9 from riv/master Kevin Lewis 2016-06-24 13:16:31 -0700
  • b485618d77 ADDITION: -Add to commit: DataFactory.dfproj t-shche 2016-06-24 11:31:57 -0700
  • 8c275a4a1f Merge pull request #8 from riv/master Kevin Lewis 2016-06-24 11:20:46 -0700
  • fd60427441 DELETION: -Deleted Filese Related EventKeys (script generating all possible paths) t-shche 2016-06-24 11:13:09 -0700
  • 37833287e4 DELETION: -Deleted all files related to ProcessMsEvent t-shche 2016-06-24 11:06:42 -0700
  • 1af0c9f34b [FEATURE] MSGHT Non-Incremental Pipeline Setup -Created new MSGHT Pipeline which pulls from MSGHTorrentAzureStorage and outputs to MSGHTEventDetailSotrage -Created Webhook event processeing script is uneeded now, but may come in handy later when processing webtraffic data. -Copied Kevin's usql script EventKeys for finding the set of all JSON paths exhibited in Events JSON objects. t-shche 2016-06-23 17:09:55 -0700
  • 6fa96fe4e0 add usql pointer Jeff McAffer 2016-05-19 11:56:38 -0500
  • 7f883a69b2 update commit file info Jeff McAffer 2016-05-19 11:38:44 -0500
  • 3008e50b57 Removing this readme, content now merged into main file Kevin Lewis 2016-05-19 11:14:33 -0500
  • 2fab3c75ef fix cut/paste error Jeff McAffer 2016-05-19 11:03:38 -0500
  • 699b757703 update readme Jeff McAffer 2016-05-19 11:02:52 -0500
  • d02c2f6f92 Commenting out CommitFile restore Kevin Lewis 2016-05-18 22:08:14 -0700
  • 4ceef73aa5 Updates to data sharing readme Kevin Lewis 2016-05-18 22:00:20 -0700
  • 522b71bd2b Merge pull request #7 from kelewis/master Kevin Lewis 2016-05-18 23:20:47 -0500
  • 6ee4e05b84 Updating the data export scripts. Kevin Lewis 2016-05-18 21:15:40 -0700
  • ce01b418c3 Removing strings as byte[] from all tables. They just dosn't work in enough cases for them to not be worth the extra length. So now they're all truncated to the 128k limit. They can be brought back when some of the related ADLA bugs are fixed or limitations are changed. Kevin Lewis 2016-05-18 20:57:02 -0700
  • bd9ce358f7 Altering the import/export script generator to handle boolean and long types. Running the script now generates the usql files directly. Also made a change so the staging files are written to a different adls account. Kevin Lewis 2016-05-15 13:40:57 -0700
  • 696d87518c Adding data sharing scripts provided by @saveenr and the Data Lake team. Kevin Lewis 2016-05-15 12:46:47 -0700
  • dae54918c0 Merge pull request #6 from kelewis/master Kevin Lewis 2016-05-15 09:53:06 -0700
  • 6502ed3095 Changed the path where the local Data Factory Activity executor gets the settings file. Kevin Lewis 2016-05-15 09:50:10 -0700
  • 510da1beeb Merge pull request #5 from kelewis/master Kevin Lewis 2016-05-14 23:34:49 -0700
  • e89ca44991 Removing old files that were removed from the solution but never actually deleted. Kevin Lewis 2016-05-14 23:22:53 -0700
  • 21598f2cc3 Merge pull request #4 from kelewis/master Kevin Lewis 2016-05-14 23:17:59 -0700
  • 246773d91a Adding license and link in readme Kevin Lewis 2016-05-14 23:16:44 -0700
  • a1521dbfaf Merge pull request #3 from kelewis/master Kevin Lewis 2016-05-14 22:54:05 -0700
  • 6883b9978b Renaming project name in readme file Kevin Lewis 2016-05-14 22:44:38 -0700
  • 913b02de3a Missed a rename in environment template file Kevin Lewis 2016-05-14 22:38:00 -0700
  • 7a45f53ae2 Rename of files and changes to file content Kevin Lewis 2016-05-14 22:36:53 -0700
  • d8c924aa61 Directory name changes Kevin Lewis 2016-05-14 22:08:49 -0700
  • aae6a1e841 Rename top directory Kevin Lewis 2016-05-14 22:01:26 -0700
  • d82cdddf96 Last commit removed a bunch of files from teh Event table in order to make it fit under the rowsize requirements. This adds the removed columns back in a seperate table. Kevin Lewis 2016-05-14 21:55:28 -0700
  • 1f6c1eebe7 Lots of changes on the USql side. Kevin Lewis 2016-05-14 20:04:20 -0700
  • 9ecf4060a3 TarStream - fix to allow skipping large files in the archive DataFactory.Tests - simple test to test out the TarStream - far from complete Extractors.Json - added a way to estimate the size of the produced SqlMap so it can throw an error before going over the limit and crashing the job. Looking at perhaps compressing the data when this happens to enable extraction at a later time. Utility - lots of changes with USql utility functions to help with catching errors converting the parsed json into real types. There is a lot of work here done to catch and convert "usql" strings as USql has a very short string limit of 128k. Also added a function to hash emails as suggested by GitHub, as well as adding functions to compress and decompress large byte arrays. UtilityTests - a quick test to test out the email hashing function. Kevin Lewis 2016-05-14 15:17:45 -0700
  • 7ab8ee9e5a Found an issue where the commit email info wasn't making it into the destination table. Was keeping it out before until the email hashing was implemented, but they never got added correctly back in until now. Kevin Lewis 2016-03-15 00:27:19 -0700
  • ad02d8cc79 Nearing completion of the processing scripts. Kevin Lewis 2016-03-14 13:54:34 -0700
  • e82ba0b7e8 Updating usql project file to not include missing files. Kevin Lewis 2016-03-13 17:13:56 -0700
  • a1890039cf Interim commit. Kevin Lewis 2016-03-13 17:12:59 -0700
  • 1a823f06d1 Interim checkin Kevin Lewis 2016-03-12 13:06:26 -0800
  • 635a250e5d Adding missing app.config that was getting excluded. Kevin Lewis 2016-02-05 12:44:32 -0800