This commit is contained in:
Hamel Husain 2019-09-19 07:36:28 -07:00
Родитель e792e1caea
Коммит f444bf354e
5 изменённых файлов: 2369275 добавлений и 0 удалений

77
resources/README.md Normal file
Просмотреть файл

@ -0,0 +1,77 @@
# This is an empty directory where you will download the training data, using the [/script/setup](/script/setup) script.
After downloading the data, the directory structure will look like this:
```
├──data
| │
| ├──`{javascript, java, python, ruby, php, go}_licenses.pkl`
| ├──`{javascript, java, python, ruby, php, go}_dedupe_definitions_v2.pkl`
| │
| ├── javascript
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| ├── java
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| ├── python
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| ├── ruby
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| ├── ruby
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| ├── php
| │   └── final
| │   └── jsonl
| │   ├── test
| │   ├── train
| │   └── valid
| └── go
|   └── final
|   └── jsonl
|   ├── test
|   ├── train
|    └── valid
|
└── saved_models
```
## Directory structure
- `{javascript, java, python, ruby, php, go}\final\jsonl{test,train,valid}`: these directories will contain multi-part [jsonl](http://jsonlines.org/) files with the data partitioned into train, valid, and test sets. The baseline training code uses TensorFlow, which expects data to be stored in this format, and will concatenate and shuffle these files appropriately.
- `{javascript, java, python, ruby, php, go}_dedupe_definitions_v2.pkl` these files are python dictionaries that contain a superset of all functions even those that do not have comments. This is used for model evaluation.
- `{javascript, java, python, ruby, php, go}_licenses.pkl` these files are python dictionaries that contain the licenses found in the source code used as the dataset for CodeSearchNet. The key is the owner/name and the value is a tuple of ( path, license content). For example:
```
In [6]: data['pandas-dev/pandas']
Out[6]:
('pandas-dev/pandas/LICENSE',
'BSD 3-Clause License\n\nCopyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development
Team\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are
permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above
copyright notice, this\n list of conditions and the following disclaimer.\n\n* Redistributions in binary form must
reproduce the above copyright notice,\n this list of conditions and the following disclaimer in the documentation\n
and/or other materials provided with the distribution....')
````
- `saved_models`: default destination where your models will be saved if you do not supply a destination
## Data Format
See [this](docs/DATA_FORMAT.md) for documentation and an example of how the data is stored.

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

100
resources/queries.csv Normal file
Просмотреть файл

@ -0,0 +1,100 @@
query
convert int to string
priority queue
string to date
sort string list
save list to file
postgresql connection
confusion matrix
set working directory
group by count
binomial distribution
aes encryption
linear regression
socket recv timeout
write csv
convert decimal to hex
export to excel
scatter plot
convert json to csv
pretty print json
replace in file
k means clustering
connect to sql
html encode string
finding time elapsed using a timer
parse binary file to custom class
get current ip address
convert int to bool
read text file line by line
get executable path
httpclient post json
get inner html
convert string to number
format date
readonly array
filter array
map to json
parse json file
get current observable value
get name of enumerated value
encode url
create cookie
how to empty array
how to get current date
how to make the checkbox checked
initializing array
how to reverse a string
read properties file
copy to clipboard
convert html to pdf
json to xml conversion
how to randomly pick a number
normal distribution
nelder mead optimize
hash set for counting distinct elements
how to get database table name
deserialize json
find int in string
get current process id
regex case insensitive
custom http error response
how to determine a string is a valid word
html entities replace
set file attrib hidden
sorting multiple arrays based on another arrays sorted order
string similarity levenshtein
how to get html of website
buffered file reader read text
encrypt aes ctr mode
matrix multiply
print model summary
unique elements
extract data from html content
heatmap from 3d coordinates
get all parents of xml node
how to extract zip file recursively
underline text in label widget
unzipping large files
copying a file to a path
get the description of a http status code
randomly extract x items from a list
convert a date string into yyyymmdd
convert a utc time to epoch
all permutations of a list
extract latitude and longitude from given input
how to check if a checkbox is checked
converting uint8 array to image
memoize to disk - persistent memoization
parse command line argument
how to read the contents of a .gz compressed file?
sending binary data over a serial connection
extracting data from a text file
positions of substrings in string
reading element from html - <td>
deducting the median from each column
concatenate several file remove header lines
parse query string in url
fuzzy match ranking
output to html file
how to read .csv file in an efficient way?
1 query
2 convert int to string
3 priority queue
4 string to date
5 sort string list
6 save list to file
7 postgresql connection
8 confusion matrix
9 set working directory
10 group by count
11 binomial distribution
12 aes encryption
13 linear regression
14 socket recv timeout
15 write csv
16 convert decimal to hex
17 export to excel
18 scatter plot
19 convert json to csv
20 pretty print json
21 replace in file
22 k means clustering
23 connect to sql
24 html encode string
25 finding time elapsed using a timer
26 parse binary file to custom class
27 get current ip address
28 convert int to bool
29 read text file line by line
30 get executable path
31 httpclient post json
32 get inner html
33 convert string to number
34 format date
35 readonly array
36 filter array
37 map to json
38 parse json file
39 get current observable value
40 get name of enumerated value
41 encode url
42 create cookie
43 how to empty array
44 how to get current date
45 how to make the checkbox checked
46 initializing array
47 how to reverse a string
48 read properties file
49 copy to clipboard
50 convert html to pdf
51 json to xml conversion
52 how to randomly pick a number
53 normal distribution
54 nelder mead optimize
55 hash set for counting distinct elements
56 how to get database table name
57 deserialize json
58 find int in string
59 get current process id
60 regex case insensitive
61 custom http error response
62 how to determine a string is a valid word
63 html entities replace
64 set file attrib hidden
65 sorting multiple arrays based on another arrays sorted order
66 string similarity levenshtein
67 how to get html of website
68 buffered file reader read text
69 encrypt aes ctr mode
70 matrix multiply
71 print model summary
72 unique elements
73 extract data from html content
74 heatmap from 3d coordinates
75 get all parents of xml node
76 how to extract zip file recursively
77 underline text in label widget
78 unzipping large files
79 copying a file to a path
80 get the description of a http status code
81 randomly extract x items from a list
82 convert a date string into yyyymmdd
83 convert a utc time to epoch
84 all permutations of a list
85 extract latitude and longitude from given input
86 how to check if a checkbox is checked
87 converting uint8 array to image
88 memoize to disk - persistent memoization
89 parse command line argument
90 how to read the contents of a .gz compressed file?
91 sending binary data over a serial connection
92 extracting data from a text file
93 positions of substrings in string
94 reading element from html - <td>
95 deducting the median from each column
96 concatenate several file remove header lines
97 parse query string in url
98 fuzzy match ranking
99 output to html file
100 how to read .csv file in an efficient way?

2
resources/saved_models/.gitignore поставляемый Normal file
Просмотреть файл

@ -0,0 +1,2 @@
*
!.gitignore