added missing folders
This commit is contained in:
Родитель
e792e1caea
Коммит
f444bf354e
|
@ -0,0 +1,77 @@
|
|||
# This is an empty directory where you will download the training data, using the [/script/setup](/script/setup) script.
|
||||
|
||||
After downloading the data, the directory structure will look like this:
|
||||
|
||||
```
|
||||
├──data
|
||||
| │
|
||||
| ├──`{javascript, java, python, ruby, php, go}_licenses.pkl`
|
||||
| ├──`{javascript, java, python, ruby, php, go}_dedupe_definitions_v2.pkl`
|
||||
| │
|
||||
| ├── javascript
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| ├── java
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| ├── python
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| ├── ruby
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| ├── ruby
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| ├── php
|
||||
| │ └── final
|
||||
| │ └── jsonl
|
||||
| │ ├── test
|
||||
| │ ├── train
|
||||
| │ └── valid
|
||||
| └── go
|
||||
| └── final
|
||||
| └── jsonl
|
||||
| ├── test
|
||||
| ├── train
|
||||
| └── valid
|
||||
|
|
||||
└── saved_models
|
||||
```
|
||||
|
||||
## Directory structure
|
||||
|
||||
- `{javascript, java, python, ruby, php, go}\final\jsonl{test,train,valid}`: these directories will contain multi-part [jsonl](http://jsonlines.org/) files with the data partitioned into train, valid, and test sets. The baseline training code uses TensorFlow, which expects data to be stored in this format, and will concatenate and shuffle these files appropriately.
|
||||
- `{javascript, java, python, ruby, php, go}_dedupe_definitions_v2.pkl` these files are python dictionaries that contain a superset of all functions even those that do not have comments. This is used for model evaluation.
|
||||
- `{javascript, java, python, ruby, php, go}_licenses.pkl` these files are python dictionaries that contain the licenses found in the source code used as the dataset for CodeSearchNet. The key is the owner/name and the value is a tuple of ( path, license content). For example:
|
||||
```
|
||||
In [6]: data['pandas-dev/pandas']
|
||||
Out[6]:
|
||||
('pandas-dev/pandas/LICENSE',
|
||||
'BSD 3-Clause License\n\nCopyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development
|
||||
Team\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are
|
||||
permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above
|
||||
copyright notice, this\n list of conditions and the following disclaimer.\n\n* Redistributions in binary form must
|
||||
reproduce the above copyright notice,\n this list of conditions and the following disclaimer in the documentation\n
|
||||
and/or other materials provided with the distribution....')
|
||||
````
|
||||
- `saved_models`: default destination where your models will be saved if you do not supply a destination
|
||||
|
||||
## Data Format
|
||||
|
||||
See [this](docs/DATA_FORMAT.md) for documentation and an example of how the data is stored.
|
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
|
@ -0,0 +1,100 @@
|
|||
query
|
||||
convert int to string
|
||||
priority queue
|
||||
string to date
|
||||
sort string list
|
||||
save list to file
|
||||
postgresql connection
|
||||
confusion matrix
|
||||
set working directory
|
||||
group by count
|
||||
binomial distribution
|
||||
aes encryption
|
||||
linear regression
|
||||
socket recv timeout
|
||||
write csv
|
||||
convert decimal to hex
|
||||
export to excel
|
||||
scatter plot
|
||||
convert json to csv
|
||||
pretty print json
|
||||
replace in file
|
||||
k means clustering
|
||||
connect to sql
|
||||
html encode string
|
||||
finding time elapsed using a timer
|
||||
parse binary file to custom class
|
||||
get current ip address
|
||||
convert int to bool
|
||||
read text file line by line
|
||||
get executable path
|
||||
httpclient post json
|
||||
get inner html
|
||||
convert string to number
|
||||
format date
|
||||
readonly array
|
||||
filter array
|
||||
map to json
|
||||
parse json file
|
||||
get current observable value
|
||||
get name of enumerated value
|
||||
encode url
|
||||
create cookie
|
||||
how to empty array
|
||||
how to get current date
|
||||
how to make the checkbox checked
|
||||
initializing array
|
||||
how to reverse a string
|
||||
read properties file
|
||||
copy to clipboard
|
||||
convert html to pdf
|
||||
json to xml conversion
|
||||
how to randomly pick a number
|
||||
normal distribution
|
||||
nelder mead optimize
|
||||
hash set for counting distinct elements
|
||||
how to get database table name
|
||||
deserialize json
|
||||
find int in string
|
||||
get current process id
|
||||
regex case insensitive
|
||||
custom http error response
|
||||
how to determine a string is a valid word
|
||||
html entities replace
|
||||
set file attrib hidden
|
||||
sorting multiple arrays based on another arrays sorted order
|
||||
string similarity levenshtein
|
||||
how to get html of website
|
||||
buffered file reader read text
|
||||
encrypt aes ctr mode
|
||||
matrix multiply
|
||||
print model summary
|
||||
unique elements
|
||||
extract data from html content
|
||||
heatmap from 3d coordinates
|
||||
get all parents of xml node
|
||||
how to extract zip file recursively
|
||||
underline text in label widget
|
||||
unzipping large files
|
||||
copying a file to a path
|
||||
get the description of a http status code
|
||||
randomly extract x items from a list
|
||||
convert a date string into yyyymmdd
|
||||
convert a utc time to epoch
|
||||
all permutations of a list
|
||||
extract latitude and longitude from given input
|
||||
how to check if a checkbox is checked
|
||||
converting uint8 array to image
|
||||
memoize to disk - persistent memoization
|
||||
parse command line argument
|
||||
how to read the contents of a .gz compressed file?
|
||||
sending binary data over a serial connection
|
||||
extracting data from a text file
|
||||
positions of substrings in string
|
||||
reading element from html - <td>
|
||||
deducting the median from each column
|
||||
concatenate several file remove header lines
|
||||
parse query string in url
|
||||
fuzzy match ranking
|
||||
output to html file
|
||||
how to read .csv file in an efficient way?
|
|
|
@ -0,0 +1,2 @@
|
|||
*
|
||||
!.gitignore
|
Загрузка…
Ссылка в новой задаче