Make gather an optional component (#9755)

* Update .npmrc to get latest python

* Update to latest analysis and fix gather icon

* Use sliceLastExecution and fix
cell executionId for gather.

* Fix import of python analysis

* Update to python-analysis 0.4.12

* get gathered text from cell.textSlice

* Updates to fix gather logging

* All updates to using new nb owned gather.

* Working cell execution count codelenses.

* Rename gather* to match existing classes
and interfaces

* Update to latest python analysis

* lock python analysis ver

* Fix MockJupyterNotebook

* Fix casing of getgatherProvider

* Private Gather support #1
Dynamically look for python-program-analysis. If not there
gather is unsupported.

* Ensure cellHashProvider can be found

* Revert back to uri for NBExecutionActivated

* Update new jupyterServerWrapper

* Workaround fix for screwed up cellhashes
Being fixed by Ian in a separate branch.

* Gracefully fail if python-pgm-analysis is absent

* Resolve obvious unit test problems.

* Fixes to make sure that gather.ts is loadable.

* Optional gather build, editing, running works.

* Remove unecessary getAll of IGatherProviders.

* Code cleanup

* Fix gather notebook header
- Also remove reference to python-program-analysis in package.json

* Gather icon not only on hover

* Enable gather only in insiders

* Init gather only if enabled.

* Add link to survey

* Fix spacing in initialization.yml

* Make linter and prettier happy

* Make webpack happy with regard to gather

* Don't define ENABLE_GATHER env var

* Couple minor issues found in test

* Fix a few tests

* Everything but dataScienceIocContainer.ts

* More fixes

* Whoops, fix IW functional test back to original

* Temporarily include datascienceioccontainer in 120
character line length.

* Make linter finally happy.

* Fix Gather functional tests

* A bit more cleanup

* React to PR feedback

* Couple more PR review changes.

* Tweak

* More minor cleanup

* Fix provider finders.

* Actually fix the provider get problem.

* Fix smoke test

* Fix unit test
This commit is contained in:
Jim Griesmer 2020-03-08 10:42:29 -07:00 коммит произвёл GitHub
Родитель 499d4a7e28
Коммит d1b22042b3
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
77 изменённых файлов: 20464 добавлений и 320 удалений

2
.npmrc
Просмотреть файл

@ -1 +1 @@
@types:registry=https://registry.npmjs.org
@types:registry=https://registry.npmjs.org

Просмотреть файл

@ -8,6 +8,12 @@ module.exports = {
options: {
tabWidth: 2
}
},
{
files: ['**/datascience/serviceRegistry.ts'],
options: {
printWidth: 240
}
}
]
};

Просмотреть файл

@ -18,6 +18,7 @@ parameters:
compile: 'true'
sqlite: 'false'
installVSCEorNPX: 'true'
enableGather: 'false'
steps:
- bash: |
@ -92,6 +93,15 @@ steps:
verbose: true
customCommand: ci
- task: Npm@1
displayName: "Install optional python-program-analysis"
condition: and(succeeded(), eq('${{ parameters.enableGather }}', 'true'))
inputs:
workingDir: ${{ parameters.workingDirectory }}
command: custom
verbose: true
customCommand: "install @msrvida/python-program-analysis"
# On Mac, the command `node` doesn't always point to the current node version.
# Debugger tests use the variable process.env.NODE_PATH
- script: |
@ -101,7 +111,7 @@ steps:
displayName: "Setup NODE_PATH for extension (Debugger Tests)"
condition: and(succeeded(), eq(variables['agent.os'], 'Darwin'))
# Install vsce
# Install vsce
- bash: |
npm install -g vsce
displayName: "Install vsce"

Просмотреть файл

@ -13,6 +13,9 @@ const configFileName = path.join(constants.ExtensionRootDir, 'tsconfig.extension
const existingModulesInOutDir = common.getListOfExistingModulesInOutDir();
// tslint:disable-next-line:no-var-requires no-require-imports
const FileManagerPlugin = require('filemanager-webpack-plugin');
// If ENABLE_GATHER variable is defined, don't exclude the python-program-analysis pacakge.
// See externals, below.
const ppaPackageList = process.env.ENABLE_GATHER ? [] : ['@msrvida/python-program-analysis'];
const config = {
mode: 'production',
target: 'node',
@ -56,7 +59,10 @@ const config = {
{ enforce: 'post', test: /linebreak[\/\\]src[\/\\]linebreaker.js/, loader: 'transform-loader?brfs' }
]
},
externals: ['vscode', 'commonjs', ...existingModulesInOutDir],
// Packages listed in externals keeps webpack from trying to package them.
// The ppaPackageList variable is set to non-empty if the build pipeline has been
//authenticated to install @msrvida/python-program-analysis.
externals: ['vscode', 'commonjs', ...ppaPackageList, ...existingModulesInOutDir],
plugins: [
...common.getDefaultPlugins('extension'),
// Copy pdfkit bits after extension builds. webpack can't handle pdfkit.

5
package-lock.json сгенерированный
Просмотреть файл

@ -1453,11 +1453,6 @@
"tinyqueue": "^1.1.0"
}
},
"@msrvida/python-program-analysis": {
"version": "0.4.1",
"resolved": "https://registry.npmjs.org/@msrvida/python-program-analysis/-/python-program-analysis-0.4.1.tgz",
"integrity": "sha512-8jCtPTTxXyvN3udvz71inScvFvQil7Wh61newdrq79CjuV0W634GCDGuyGpI5G+kgX9PbBZPQJTc1+BMNMB5sQ=="
},
"@nteract/markdown": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/@nteract/markdown/-/markdown-3.0.1.tgz",

Просмотреть файл

@ -1833,13 +1833,13 @@
"python.dataScience.enableGather": {
"type": "boolean",
"default": false,
"description": "Enable code gather for executed cells. For a gathered cell, that cell and only the code it depends on will be exported to a new notebook.",
"description": "Enable experimental code gathering for executed cells. For a gathered cell, that cell and only the code it depends on will be exported to a new notebook.",
"scope": "resource"
},
"python.dataScience.gatherToScript": {
"type": "boolean",
"default": true,
"description": "Gather code to a python script rather than a notebook.",
"description": "If experimental code gather is enabled, gather code to a python script rather than a notebook.",
"scope": "resource"
},
"python.dataScience.codeLenses": {
@ -2868,7 +2868,6 @@
"@jupyterlab/services": "^4.2.0",
"@koa/cors": "^3.0.0",
"@loadable/component": "^5.12.0",
"@msrvida/python-program-analysis": "^0.4.1",
"ansi-regex": "^4.1.0",
"arch": "^2.1.0",
"azure-storage": "^2.10.3",

Просмотреть файл

@ -418,7 +418,7 @@
"DataScience.findJupyterCommandProgressCheckInterpreter": "Checking {0}.",
"DataScience.findJupyterCommandProgressSearchCurrentPath": "Searching current path.",
"DataScience.gatheredScriptDescription": "# This file contains only the code required to produce the results of the gathered cell.\n",
"DataScience.gatheredNotebookDescriptionInMarkdown": "# Gathered Notebook\nGenerated from ```{0}```\n\nThis notebook contains only the code and cells required to produce the same results as the gathered cell.\n\nPlease note that the python analysis is quite conservative, so if it is unsure whether a line of code is necessary for execution, it will err on the side of including it.",
"DataScience.gatheredNotebookDescriptionInMarkdown": "## Gathered Notebook\nGenerated from ```{0}```\n\nThis notebook contains only the code and cells required to produce the same results as the gathered cell.\n\nPlease note that the python analysis is quite conservative, so if it is unsure whether a line of code is necessary for execution, it will err on the side of including it.\n\nAs this is an experimental feature, please let us know how well Gather works for you at [https://aka.ms/gathersurvey](https://aka.ms/gathersurvey)",
"DataScience.savePngTitle": "Save Image",
"DataScience.jupyterSelectURIQuickPickTitle": "Pick how to connect to Jupyter",
"DataScience.jupyterSelectURIQuickPickPlaceholder": "Choose an option",

Просмотреть файл

@ -726,7 +726,7 @@ export namespace DataScience {
);
export const gatheredNotebookDescriptionInMarkdown = localize(
'DataScience.gatheredNotebookDescriptionInMarkdown',
'# Gathered Notebook\nGenerated from ```{0}```\n\nThis notebook contains only the code and cells required to produce the same results as the gathered cell.\n\nPlease note that the python analysis is quite conservative, so if it is unsure whether a line of code is necessary for execution, it will err on the side of including it.'
'## Gathered Notebook\nGenerated from ```{0}```\n\nThis notebook contains only the code and cells required to produce the same results as the gathered cell.\n\nPlease note that the python analysis is quite conservative, so if it is unsure whether a line of code is necessary for execution, it will err on the side of including it.\n\nAs this is an experimental feature, please let us know how well Gather works for you at [https://aka.ms/gathersurvey](https://aka.ms/gathersurvey)'
);
export const savePngTitle = localize('DataScience.savePngTitle', 'Save Image');
export const fallbackToUseActiveInterpeterAsKernel = localize(

Просмотреть файл

@ -0,0 +1,48 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
'use strict';
import { inject, injectable } from 'inversify';
import { traceError } from '../../common/logger';
import { noop } from '../../common/utils/misc';
import { Identifiers } from '../constants';
import { ICell, ICellHashLogger, ICellHashProvider } from '../types';
import { CellHashProvider } from './cellhashprovider';
// This class provides hashes for debugging jupyter cells. Call getHashes just before starting debugging to compute all of the
// hashes for cells.
@injectable()
export class CellHashLogger implements ICellHashLogger {
constructor(@inject(ICellHashProvider) private provider: ICellHashProvider) {}
public async preExecute(cell: ICell, silent: boolean): Promise<void> {
const providerObj: CellHashProvider = this.provider as CellHashProvider;
try {
if (!silent) {
// Don't log empty cells
const stripped = providerObj.extractExecutableLines(cell);
if (stripped.length > 0 && stripped.find(s => s.trim().length > 0)) {
// When the user adds new code, we know the execution count is increasing
providerObj.incExecutionCount();
// Skip hash on unknown file though
if (cell.file !== Identifiers.EmptyFileName) {
await providerObj.addCellHash(cell, providerObj.getExecutionCount());
}
}
}
} catch (exc) {
// Don't let exceptions in a preExecute mess up normal operation
traceError(exc);
}
}
public async postExecute(_cell: ICell, _silent: boolean): Promise<void> {
noop();
}
public getCellHashProvider(): ICellHashProvider {
return this.provider;
}
}

Просмотреть файл

@ -21,8 +21,7 @@ import {
ICellHashListener,
ICellHashProvider,
IFileHashes,
IInteractiveWindowListener,
INotebookExecutionLogger
IInteractiveWindowListener
} from '../types';
interface IRangedCellHash extends ICellHash {
@ -36,7 +35,7 @@ interface IRangedCellHash extends ICellHash {
// This class provides hashes for debugging jupyter cells. Call getHashes just before starting debugging to compute all of the
// hashes for cells.
@injectable()
export class CellHashProvider implements ICellHashProvider, IInteractiveWindowListener, INotebookExecutionLogger {
export class CellHashProvider implements ICellHashProvider, IInteractiveWindowListener {
// tslint:disable-next-line: no-any
private postEmitter: EventEmitter<{ message: string; payload: any }> = new EventEmitter<{
message: string;
@ -44,8 +43,8 @@ export class CellHashProvider implements ICellHashProvider, IInteractiveWindowLi
payload: any;
}>();
// Map of file to Map of start line to actual hash
private hashes: Map<string, IRangedCellHash[]> = new Map<string, IRangedCellHash[]>();
private executionCount: number = 0;
private hashes: Map<string, IRangedCellHash[]> = new Map<string, IRangedCellHash[]>();
private updateEventEmitter: EventEmitter<void> = new EventEmitter<void>();
constructor(
@ -127,19 +126,7 @@ export class CellHashProvider implements ICellHashProvider, IInteractiveWindowLi
noop();
}
private onChangedDocument(e: TextDocumentChangeEvent) {
// See if the document is in our list of docs to watch
const perFile = this.hashes.get(e.document.fileName);
if (perFile) {
// Apply the content changes to the file's cells.
const docText = e.document.getText();
e.contentChanges.forEach(c => {
this.handleContentChange(docText, c, perFile);
});
}
}
private extractExecutableLines(cell: ICell): string[] {
public extractExecutableLines(cell: ICell): string[] {
const cellMatcher = new CellMatcher(this.configService.getSettings(getCellResource(cell)).datascience);
const lines = splitMultilineString(cell.data.source);
// Only strip this off the first line. Otherwise we want the markers in the code.
@ -149,44 +136,7 @@ export class CellHashProvider implements ICellHashProvider, IInteractiveWindowLi
return lines;
}
private handleContentChange(docText: string, c: TextDocumentContentChangeEvent, hashes: IRangedCellHash[]) {
// First compute the number of lines that changed
const lineDiff = c.range.start.line - c.range.end.line + c.text.split('\n').length - 1;
const offsetDiff = c.text.length - c.rangeLength;
// Compute the inclusive offset that is changed by the cell.
const endChangedOffset = c.rangeLength <= 0 ? c.rangeOffset : c.rangeOffset + c.rangeLength - 1;
hashes.forEach(h => {
// See how this existing cell compares to the change
if (h.endOffset < c.rangeOffset) {
// No change. This cell is entirely before the change
} else if (h.startOffset > endChangedOffset) {
// This cell is after the text that got replaced. Adjust its start/end lines
h.line += lineDiff;
h.endLine += lineDiff;
h.startOffset += offsetDiff;
h.endOffset += offsetDiff;
} else if (h.startOffset === endChangedOffset) {
// Cell intersects but exactly, might be a replacement or an insertion
if (h.deleted || c.rangeLength > 0 || lineDiff === 0) {
// Replacement
h.deleted = docText.substr(h.startOffset, h.endOffset - h.startOffset) !== h.realCode;
} else {
// Insertion
h.line += lineDiff;
h.endLine += lineDiff;
h.startOffset += offsetDiff;
h.endOffset += offsetDiff;
}
} else {
// Intersection, delete if necessary
h.deleted = docText.substr(h.startOffset, h.endOffset - h.startOffset) !== h.realCode;
}
});
}
private async addCellHash(cell: ICell, expectedCount: number): Promise<void> {
public async addCellHash(cell: ICell, expectedCount: number): Promise<void> {
// Find the text document that matches. We need more information than
// the add code gives us
const doc = this.documentManager.textDocuments.find(d => this.fileSystem.arePathsSame(d.fileName, cell.file));
@ -284,6 +234,63 @@ export class CellHashProvider implements ICellHashProvider, IInteractiveWindowLi
}
}
public getExecutionCount(): number {
return this.executionCount;
}
public incExecutionCount(): void {
this.executionCount += 1;
}
private onChangedDocument(e: TextDocumentChangeEvent) {
// See if the document is in our list of docs to watch
const perFile = this.hashes.get(e.document.fileName);
if (perFile) {
// Apply the content changes to the file's cells.
const docText = e.document.getText();
e.contentChanges.forEach(c => {
this.handleContentChange(docText, c, perFile);
});
}
}
private handleContentChange(docText: string, c: TextDocumentContentChangeEvent, hashes: IRangedCellHash[]) {
// First compute the number of lines that changed
const lineDiff = c.range.start.line - c.range.end.line + c.text.split('\n').length - 1;
const offsetDiff = c.text.length - c.rangeLength;
// Compute the inclusive offset that is changed by the cell.
const endChangedOffset = c.rangeLength <= 0 ? c.rangeOffset : c.rangeOffset + c.rangeLength - 1;
hashes.forEach(h => {
// See how this existing cell compares to the change
if (h.endOffset < c.rangeOffset) {
// No change. This cell is entirely before the change
} else if (h.startOffset > endChangedOffset) {
// This cell is after the text that got replaced. Adjust its start/end lines
h.line += lineDiff;
h.endLine += lineDiff;
h.startOffset += offsetDiff;
h.endOffset += offsetDiff;
} else if (h.startOffset === endChangedOffset) {
// Cell intersects but exactly, might be a replacement or an insertion
if (h.deleted || c.rangeLength > 0 || lineDiff === 0) {
// Replacement
h.deleted = docText.substr(h.startOffset, h.endOffset - h.startOffset) !== h.realCode;
} else {
// Insertion
h.line += lineDiff;
h.endLine += lineDiff;
h.startOffset += offsetDiff;
h.endOffset += offsetDiff;
}
} else {
// Intersection, delete if necessary
h.deleted = docText.substr(h.startOffset, h.endOffset - h.startOffset) !== h.realCode;
}
});
}
private adjustRuntimeForDebugging(
cell: ICell,
source: string[],

Просмотреть файл

@ -2,7 +2,7 @@
// Licensed under the MIT License.
'use strict';
import { inject, injectable } from 'inversify';
import { CodeLens, Command, Event, EventEmitter, Range, TextDocument } from 'vscode';
import { CodeLens, Command, Event, EventEmitter, Range, TextDocument, Uri } from 'vscode';
import { traceWarning } from '../../common/logger';
import { IFileSystem } from '../../common/platform/types';
@ -12,7 +12,18 @@ import { noop } from '../../common/utils/misc';
import { generateCellRangesFromDocument } from '../cellFactory';
import { CodeLensCommands, Commands } from '../constants';
import { InteractiveWindowMessages } from '../interactive-common/interactiveWindowTypes';
import { ICell, ICellHashProvider, ICodeLensFactory, IFileHashes, IInteractiveWindowListener } from '../types';
import {
ICell,
ICellHashLogger,
ICellHashProvider,
ICodeLensFactory,
IFileHashes,
IInteractiveWindowListener,
IInteractiveWindowProvider,
IJupyterExecution,
INotebook,
INotebookExecutionLogger
} from '../types';
@injectable()
export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowListener {
@ -24,14 +35,14 @@ export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowList
payload: any;
}>();
private cellExecutionCounts: Map<string, string> = new Map<string, string>();
private hashProvider: ICellHashProvider | undefined;
constructor(
@inject(IConfigurationService) private configService: IConfigurationService,
@inject(ICellHashProvider) private hashProvider: ICellHashProvider,
@inject(IInteractiveWindowProvider) private interactiveWindowProvider: IInteractiveWindowProvider,
@inject(IJupyterExecution) private jupyterExecution: IJupyterExecution,
@inject(IFileSystem) private fileSystem: IFileSystem
) {
hashProvider.updated(this.hashesUpdated.bind(this));
}
) {}
public dispose(): void {
noop();
@ -45,6 +56,10 @@ export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowList
// tslint:disable-next-line: no-any
public onMessage(message: string, payload?: any) {
switch (message) {
case InteractiveWindowMessages.NotebookExecutionActivated:
this.initCellHashProvider(<string>payload).ignoreErrors();
break;
case InteractiveWindowMessages.FinishCell:
const cell = payload as ICell;
if (cell && cell.data && cell.data.execution_count) {
@ -73,7 +88,9 @@ export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowList
);
const commands = this.enumerateCommands(document.uri);
const hashes = this.configService.getSettings(document.uri).datascience.addGotoCodeLenses
? this.hashProvider.getHashes()
? this.hashProvider
? this.hashProvider.getHashes()
: []
: [];
const codeLenses: CodeLens[] = [];
let firstCell = true;
@ -92,6 +109,41 @@ export class CodeLensFactory implements ICodeLensFactory, IInteractiveWindowList
return codeLenses;
}
private async initCellHashProvider(notebookUri: string) {
const nbUri: Uri = Uri.parse(notebookUri);
if (!nbUri) {
return;
}
// First get the active server
const activeServer = await this.jupyterExecution.getServer(
await this.interactiveWindowProvider.getNotebookOptions(nbUri)
);
let nb: INotebook | undefined;
// If that works, see if there's a matching notebook running
if (activeServer) {
nb = await activeServer.getNotebook(nbUri);
// If we have an executing notebook, get its cell hash provider service.
if (nb) {
this.hashProvider = this.getCellHashProvider(nb);
if (this.hashProvider) {
this.hashProvider.updated(this.hashesUpdated.bind(this));
}
}
}
}
private getCellHashProvider(nb: INotebook): ICellHashProvider | undefined {
const cellHashLogger = <ICellHashLogger>(
nb.getLoggers().find((logger: INotebookExecutionLogger) => (<ICellHashLogger>logger).getCellHashProvider)
);
if (cellHashLogger) {
return cellHashLogger.getCellHashProvider();
}
}
private enumerateCommands(resource: Resource): string[] {
let fullCommandList: string[];
// Add our non-debug commands

Просмотреть файл

@ -1,7 +1,6 @@
import { CellSlice, DataflowAnalyzer, ExecutionLogSlicer } from '@msrvida/python-program-analysis';
import { Cell as IGatherCell } from '@msrvida/python-program-analysis/dist/es5/cell';
import * as ppatypes from '@msrvida-python-program-analysis';
import { inject, injectable } from 'inversify';
import * as uuid from 'uuid/v4';
import { IApplicationShell, ICommandManager } from '../../common/application/types';
import { traceInfo } from '../../common/logger';
import { IConfigurationService, IDisposableRegistry } from '../../common/types';
@ -9,15 +8,15 @@ import * as localize from '../../common/utils/localize';
// tslint:disable-next-line: no-duplicate-imports
import { Common } from '../../common/utils/localize';
import { Identifiers } from '../constants';
import { CellState, ICell as IVscCell, IGatherExecution } from '../types';
import { CellState, ICell as IVscCell, IGatherProvider } from '../types';
/**
* An adapter class to wrap the code gathering functionality from [microsoft/python-program-analysis](https://www.npmjs.com/package/@msrvida/python-program-analysis).
*/
@injectable()
export class GatherExecution implements IGatherExecution {
private _executionSlicer: ExecutionLogSlicer<IGatherCell>;
private dataflowAnalyzer: DataflowAnalyzer;
export class GatherProvider implements IGatherProvider {
private _executionSlicer: ppatypes.ExecutionLogSlicer<ppatypes.Cell> | undefined;
private dataflowAnalyzer: ppatypes.DataflowAnalyzer | undefined;
private _enabled: boolean;
constructor(
@ -26,35 +25,55 @@ export class GatherExecution implements IGatherExecution {
@inject(IDisposableRegistry) private disposables: IDisposableRegistry,
@inject(ICommandManager) private commandManager: ICommandManager
) {
this._enabled = this.configService.getSettings().datascience.enableGather ? true : false;
this.dataflowAnalyzer = new DataflowAnalyzer();
this._executionSlicer = new ExecutionLogSlicer(this.dataflowAnalyzer);
this._enabled =
this.configService.getSettings().datascience.enableGather &&
this.configService.getSettings().insidersChannel !== 'off'
? true
: false;
if (this._enabled) {
this.disposables.push(
this.configService.getSettings(undefined).onDidChange(e => this.updateEnableGather(e))
);
}
try {
// tslint:disable-next-line: no-require-imports
const ppa = require('@msrvida/python-program-analysis') as typeof import('@msrvida-python-program-analysis');
traceInfo('Gathering tools have been activated');
if (ppa) {
this.dataflowAnalyzer = new ppa.DataflowAnalyzer();
this._executionSlicer = new ppa.ExecutionLogSlicer(this.dataflowAnalyzer);
this.disposables.push(
this.configService.getSettings(undefined).onDidChange(e => this.updateEnableGather(e))
);
}
} catch (ex) {
traceInfo('Gathering tools could not be activated. Indicates build of VSIX was not');
}
}
}
public logExecution(vscCell: IVscCell): void {
const gatherCell = convertVscToGatherCell(vscCell);
if (gatherCell) {
this._executionSlicer.logExecution(gatherCell);
if (this._executionSlicer) {
this._executionSlicer.logExecution(gatherCell);
}
}
}
public async resetLog(): Promise<void> {
this._executionSlicer.reset();
if (this._executionSlicer) {
this._executionSlicer.reset();
}
}
/**
* For a given code cell, returns a string representing a program containing all the code it depends on.
*/
public gatherCode(vscCell: IVscCell): string {
if (!this._executionSlicer) {
return '# %% [markdown]\n## Gather not available';
}
const gatherCell = convertVscToGatherCell(vscCell);
if (!gatherCell) {
return '';
@ -65,9 +84,8 @@ export class GatherExecution implements IGatherExecution {
this.configService.getSettings().datascience.defaultCellMarker || Identifiers.DefaultCodeCellMarker;
// Call internal slice method
const slices = this._executionSlicer.sliceAllExecutions(gatherCell.persistentId);
const program =
slices.length > 0 ? slices[0].cellSlices.reduce(concat, '').replace(/#%%/g, defaultCellMarker) : '';
const slice = this._executionSlicer.sliceLatestExecution(gatherCell.persistentId);
const program = slice.cellSlices.reduce(concat, '').replace(/#%%/g, defaultCellMarker);
// Add a comment at the top of the file explaining what gather does
const descriptor = localize.DataScience.gatheredScriptDescription();
@ -106,27 +124,25 @@ export class GatherExecution implements IGatherExecution {
/**
* Accumulator to concatenate cell slices for a sliced program, preserving cell structures.
*/
function concat(existingText: string, newText: CellSlice): string {
function concat(existingText: string, newText: ppatypes.CellSlice): string {
// Include our cell marker so that cell slices are preserved
return `${existingText}#%%\n${newText.textSliceLines}\n\n`;
return `${existingText}#%%\n${newText.textSliceLines}\n`;
}
/**
* This is called to convert VS Code ICells to Gather ICells for logging.
* @param cell A cell object conforming to the VS Code cell interface
*/
function convertVscToGatherCell(cell: IVscCell): IGatherCell | undefined {
function convertVscToGatherCell(cell: IVscCell): ppatypes.Cell | undefined {
// This should always be true since we only want to log code cells. Putting this here so types match for outputs property
if (cell.data.cell_type === 'code') {
const result: IGatherCell = {
const result: ppatypes.Cell = {
// tslint:disable-next-line no-unnecessary-local-variable
text: cell.data.source,
// This may need to change for native notebook support since in the original Gather code this refers to the number of times that this same cell was executed
executionCount: cell.data.execution_count,
executionEventId: cell.id, // This is unique for now, so feed it in
executionEventId: uuid(),
// This may need to change for native notebook support, since this is intended to persist in the metadata for a notebook that is saved and then re-loaded
persistentId: cell.id,
hasError: cell.state === CellState.error
// tslint:disable-next-line: no-any

Просмотреть файл

@ -13,15 +13,16 @@ import { Identifiers } from '../constants';
import { IInteractiveWindowMapping, InteractiveWindowMessages } from '../interactive-common/interactiveWindowTypes';
import {
ICell,
IGatherExecution,
IGatherLogger,
IGatherProvider,
IInteractiveWindowListener,
IInteractiveWindowProvider,
IJupyterExecution,
INotebook,
INotebookEditorProvider,
INotebookExecutionLogger,
INotebookExporter
} from '../types';
import { GatherLogger } from './gatherLogger';
@injectable()
export class GatherListener implements IInteractiveWindowListener {
@ -31,11 +32,10 @@ export class GatherListener implements IInteractiveWindowListener {
// tslint:disable-next-line: no-any
payload: any;
}>();
private gatherLogger: GatherLogger;
private notebookUri: Uri | undefined;
private gatherProvider: IGatherProvider | undefined;
constructor(
@inject(IGatherExecution) private gather: IGatherExecution,
@inject(IApplicationShell) private applicationShell: IApplicationShell,
@inject(INotebookExporter) private jupyterExporter: INotebookExporter,
@inject(INotebookEditorProvider) private ipynbProvider: INotebookEditorProvider,
@ -44,9 +44,7 @@ export class GatherListener implements IInteractiveWindowListener {
@inject(IConfigurationService) private configService: IConfigurationService,
@inject(IDocumentManager) private documentManager: IDocumentManager,
@inject(IFileSystem) private fileSystem: IFileSystem
) {
this.gatherLogger = new GatherLogger(this.gather, this.configService);
}
) {}
public dispose() {
noop();
@ -61,7 +59,7 @@ export class GatherListener implements IInteractiveWindowListener {
public onMessage(message: string, payload?: any): void {
switch (message) {
case InteractiveWindowMessages.NotebookExecutionActivated:
this.handleMessage(message, payload, this.doSetLogger);
this.handleMessage(message, payload, this.doInitGather);
break;
case InteractiveWindowMessages.GatherCodeRequest:
@ -69,7 +67,9 @@ export class GatherListener implements IInteractiveWindowListener {
break;
case InteractiveWindowMessages.RestartKernel:
this.gather.resetLog();
if (this.gatherProvider) {
this.gatherProvider.resetLog();
}
break;
default:
@ -87,11 +87,17 @@ export class GatherListener implements IInteractiveWindowListener {
handler.bind(this)(args);
}
private doSetLogger(payload: string): void {
this.setLogger(payload).ignoreErrors();
private doGather(payload: ICell): void {
this.gatherCodeInternal(payload).catch(err => {
this.applicationShell.showErrorMessage(err);
});
}
private async setLogger(notebookUri: string) {
private doInitGather(payload: string): void {
this.initGather(payload).ignoreErrors();
}
private async initGather(notebookUri: string) {
this.notebookUri = Uri.parse(notebookUri);
// First get the active server
@ -104,21 +110,25 @@ export class GatherListener implements IInteractiveWindowListener {
if (activeServer) {
nb = await activeServer.getNotebook(this.notebookUri);
// If we have an executing notebook, add the gather logger.
// If we have an executing notebook, get its gather execution service.
if (nb) {
nb.addLogger(this.gatherLogger);
this.gatherProvider = this.getGatherProvider(nb);
}
}
}
private doGather(payload: ICell): void {
this.gatherCodeInternal(payload).catch(err => {
this.applicationShell.showErrorMessage(err);
});
private getGatherProvider(nb: INotebook): IGatherProvider | undefined {
const gatherLogger = <IGatherLogger>(
nb.getLoggers().find((logger: INotebookExecutionLogger) => (<IGatherLogger>logger).getGatherProvider)
);
if (gatherLogger) {
return gatherLogger.getGatherProvider();
}
}
private gatherCodeInternal = async (cell: ICell) => {
const slicedProgram = this.gather.gatherCode(cell);
const slicedProgram = this.gatherProvider ? this.gatherProvider.gatherCode(cell) : 'Gather internal error';
if (this.configService.getSettings().datascience.gatherToScript) {
await this.showFile(slicedProgram, cell.file);

Просмотреть файл

@ -5,13 +5,12 @@ import { concatMultilineStringInput } from '../../../datascience-ui/common';
import { IConfigurationService } from '../../common/types';
import { noop } from '../../common/utils/misc';
import { CellMatcher } from '../cellMatcher';
import { ICell as IVscCell, IGatherExecution, INotebookExecutionLogger } from '../types';
import { GatherExecution } from './gather';
import { ICell as IVscCell, IGatherLogger, IGatherProvider } from '../types';
@injectable()
export class GatherLogger implements INotebookExecutionLogger {
export class GatherLogger implements IGatherLogger {
constructor(
@inject(GatherExecution) private gather: IGatherExecution,
@inject(IGatherProvider) private gather: IGatherProvider,
@inject(IConfigurationService) private configService: IConfigurationService
) {}
@ -36,4 +35,8 @@ export class GatherLogger implements INotebookExecutionLogger {
}
}
}
public getGatherProvider() {
return this.gather;
}
}

Просмотреть файл

@ -581,6 +581,7 @@ export abstract class InteractiveBase extends WebViewHost<IInteractiveWindowMapp
}
});
}
const owningResource = await this.getOwningResource();
const observable = this._notebook.executeObservable(code, file, line, id, false);
// Indicate we executed some code
@ -588,7 +589,7 @@ export abstract class InteractiveBase extends WebViewHost<IInteractiveWindowMapp
// Sign up for cell changes
observable.subscribe(
async (cells: ICell[]) => {
(cells: ICell[]) => {
// Combine the cell data with the possible input data (so we don't lose anything that might have already been in the cells)
const combined = cells.map(this.combineData.bind(undefined, data));
@ -596,7 +597,7 @@ export abstract class InteractiveBase extends WebViewHost<IInteractiveWindowMapp
this.sendCellsToWebView(combined);
// Any errors will move our result to false (if allowed)
if (this.configuration.getSettings(await this.getOwningResource()).datascience.stopOnError) {
if (this.configuration.getSettings(owningResource).datascience.stopOnError) {
result = result && cells.find(c => c.state === CellState.error) === undefined;
}
},

Просмотреть файл

@ -359,10 +359,6 @@ export class JupyterNotebookBase implements INotebook {
return this.updateWorkingDirectoryAndPath(file);
}
public addLogger(logger: INotebookExecutionLogger) {
this._loggers.push(logger);
}
public executeObservable(
code: string,
file: string,
@ -620,6 +616,10 @@ export class JupyterNotebookBase implements INotebook {
this.kernelChanged.fire(spec);
}
public getLoggers(): INotebookExecutionLogger[] {
return this._loggers;
}
private async initializeMatplotlib(cancelToken?: CancellationToken): Promise<void> {
const settings = this.configService.getSettings(this.resource).datascience;
if (settings && settings.themeMatplotlibPlots) {

Просмотреть файл

@ -140,10 +140,6 @@ export class GuestJupyterNotebook
return Promise.resolve();
}
public addLogger(_logger: INotebookExecutionLogger): void {
noop();
}
public async setMatplotLibStyle(_useDark: boolean): Promise<void> {
// Guest can't change the style. Maybe output a warning here?
}
@ -241,6 +237,9 @@ export class GuestJupyterNotebook
public setKernelSpec(_spec: IJupyterKernelSpec | LiveKernelModel, _timeout: number): Promise<void> {
return Promise.resolve();
}
public getLoggers(): INotebookExecutionLogger[] {
return [];
}
private onServerResponse = (args: Object) => {
const er = args as IExecuteObservableResponse;

Просмотреть файл

@ -18,14 +18,15 @@ import { DataViewerProvider } from './data-viewing/dataViewerProvider';
import { DataScience } from './datascience';
import { DataScienceSurveyBannerLogger } from './dataScienceSurveyBanner';
import { DebugLocationTrackerFactory } from './debugLocationTrackerFactory';
import { CellHashLogger } from './editor-integration/cellhashLogger';
import { CellHashProvider } from './editor-integration/cellhashprovider';
import { CodeLensFactory } from './editor-integration/codeLensFactory';
import { DataScienceCodeLensProvider } from './editor-integration/codelensprovider';
import { CodeWatcher } from './editor-integration/codewatcher';
import { Decorator } from './editor-integration/decorator';
import { DataScienceErrorHandler } from './errorHandler/errorHandler';
import { GatherExecution } from './gather/gather';
import { GatherListener } from './gather/gatherListener';
import { GatherLogger } from './gather/gatherLogger';
import { DebugListener } from './interactive-common/debugListener';
import { IntellisenseProvider } from './interactive-common/intellisense/intellisenseProvider';
import { LinkProvider } from './interactive-common/linkProvider';
@ -76,6 +77,7 @@ import { StatusProvider } from './statusProvider';
import { ThemeFinder } from './themeFinder';
import {
ICellHashListener,
ICellHashLogger,
ICellHashProvider,
ICodeCssGenerator,
ICodeLensFactory,
@ -87,7 +89,8 @@ import {
IDataViewer,
IDataViewerProvider,
IDebugLocationTracker,
IGatherExecution,
IGatherLogger,
IGatherProvider,
IInteractiveWindow,
IInteractiveWindowListener,
IInteractiveWindowProvider,
@ -112,137 +115,97 @@ import {
IThemeFinder
} from './types';
// README: Did you make sure "dataScienceIocContainer.ts" has also been updated appropriately?
// tslint:disable-next-line: max-func-body-length
export function registerTypes(serviceManager: IServiceManager) {
const useCustomEditorApi = serviceManager.get<IApplicationEnvironment>(IApplicationEnvironment).packageJson
.enableProposedApi;
const useCustomEditorApi = serviceManager.get<IApplicationEnvironment>(IApplicationEnvironment).packageJson.enableProposedApi;
serviceManager.addSingletonInstance<boolean>(UseCustomEditorApi, useCustomEditorApi);
serviceManager.addSingleton<IDataScienceCodeLensProvider>(
IDataScienceCodeLensProvider,
DataScienceCodeLensProvider
);
serviceManager.addSingleton<IDataScience>(IDataScience, DataScience);
serviceManager.addSingleton<IJupyterExecution>(IJupyterExecution, JupyterExecutionFactory);
serviceManager.addSingleton<IDataScienceCommandListener>(
IDataScienceCommandListener,
InteractiveWindowCommandListener
);
serviceManager.addSingleton<IInteractiveWindowProvider>(IInteractiveWindowProvider, InteractiveWindowProvider);
serviceManager.add<IInteractiveWindow>(IInteractiveWindow, InteractiveWindow);
serviceManager.add<INotebookExporter>(INotebookExporter, JupyterExporter);
serviceManager.add<INotebookImporter>(INotebookImporter, JupyterImporter);
serviceManager.add<INotebookServer>(INotebookServer, JupyterServerWrapper);
serviceManager.addSingleton<ICodeCssGenerator>(ICodeCssGenerator, CodeCssGenerator);
serviceManager.addSingleton<IJupyterPasswordConnect>(IJupyterPasswordConnect, JupyterPasswordConnect);
serviceManager.addSingleton<IStatusProvider>(IStatusProvider, StatusProvider);
serviceManager.addSingleton<IJupyterSessionManagerFactory>(
IJupyterSessionManagerFactory,
JupyterSessionManagerFactory
);
serviceManager.addSingleton<IJupyterVariables>(IJupyterVariables, JupyterVariables);
serviceManager.add<ICellHashLogger>(ICellHashLogger, CellHashLogger, undefined, [INotebookExecutionLogger]);
serviceManager.add<ICellHashProvider>(ICellHashProvider, CellHashProvider);
serviceManager.add<ICodeWatcher>(ICodeWatcher, CodeWatcher);
serviceManager.add<IJupyterCommandFactory>(IJupyterCommandFactory, JupyterCommandFactory);
serviceManager.addSingleton<IThemeFinder>(IThemeFinder, ThemeFinder);
serviceManager.addSingleton<IDataViewerProvider>(IDataViewerProvider, DataViewerProvider);
serviceManager.add<IDataScienceErrorHandler>(IDataScienceErrorHandler, DataScienceErrorHandler);
serviceManager.add<IDataViewer>(IDataViewer, DataViewer);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, Decorator);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, IntellisenseProvider);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, LinkProvider);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, ShowPlotListener);
serviceManager.add<IInteractiveWindow>(IInteractiveWindow, InteractiveWindow);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, AutoSaveService);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, DebugListener);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, GatherListener);
serviceManager.addSingleton<IPlotViewerProvider>(IPlotViewerProvider, PlotViewerProvider);
serviceManager.add<IPlotViewer>(IPlotViewer, PlotViewer);
serviceManager.addSingleton<IJupyterDebugger>(IJupyterDebugger, JupyterDebugger);
serviceManager.add<IDataScienceErrorHandler>(IDataScienceErrorHandler, DataScienceErrorHandler);
serviceManager.addSingleton<ICodeLensFactory>(ICodeLensFactory, CodeLensFactory);
serviceManager.addSingleton<ICellHashProvider>(ICellHashProvider, CellHashProvider);
serviceManager.add<IGatherExecution>(IGatherExecution, GatherExecution);
serviceManager.addBinding(ICellHashProvider, IInteractiveWindowListener);
serviceManager.addBinding(ICellHashProvider, INotebookExecutionLogger);
serviceManager.addSingleton<IInteractiveWindowListener>(IInteractiveWindowListener, DataScienceSurveyBannerLogger);
serviceManager.addBinding(IJupyterDebugger, ICellHashListener);
serviceManager.addSingleton<INotebookEditorProvider>(
INotebookEditorProvider,
useCustomEditorApi ? NativeEditorProvider : NativeEditorProviderOld
);
serviceManager.add<INotebookStorage>(INotebookStorage, NativeEditorStorage);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, IntellisenseProvider);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, LinkProvider);
serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, ShowPlotListener);
serviceManager.add<IJupyterCommandFactory>(IJupyterCommandFactory, JupyterCommandFactory);
serviceManager.add<INotebookEditor>(INotebookEditor, useCustomEditorApi ? NativeEditor : NativeEditorOldWebView);
serviceManager.addSingleton<IDataScienceCommandListener>(IDataScienceCommandListener, NativeEditorCommandListener);
serviceManager.addBinding(ICodeLensFactory, IInteractiveWindowListener);
serviceManager.addSingleton<IDebugLocationTracker>(IDebugLocationTracker, DebugLocationTrackerFactory);
serviceManager.addSingleton<JupyterCommandFinder>(JupyterCommandFinder, JupyterCommandFinder);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, Activation);
serviceManager.addSingleton<KernelService>(KernelService, KernelService);
serviceManager.addSingleton<NotebookStarter>(NotebookStarter, NotebookStarter);
serviceManager.addSingleton<KernelSelector>(KernelSelector, KernelSelector);
serviceManager.addSingleton<KernelSelectionProvider>(KernelSelectionProvider, KernelSelectionProvider);
serviceManager.addSingleton<CommandRegistry>(CommandRegistry, CommandRegistry);
serviceManager.addSingleton<JupyterServerSelectorCommand>(
JupyterServerSelectorCommand,
JupyterServerSelectorCommand
);
serviceManager.addSingleton<KernelSwitcherCommand>(KernelSwitcherCommand, KernelSwitcherCommand);
serviceManager.addSingleton<KernelSwitcher>(KernelSwitcher, KernelSwitcher);
serviceManager.addSingleton<JupyterServerSelector>(JupyterServerSelector, JupyterServerSelector);
serviceManager.addSingleton<JupyterCommandLineSelectorCommand>(
JupyterCommandLineSelectorCommand,
JupyterCommandLineSelectorCommand
);
serviceManager.addSingleton<JupyterCommandLineSelector>(JupyterCommandLineSelector, JupyterCommandLineSelector);
serviceManager.addSingleton<JupyterInterpreterStateStore>(
JupyterInterpreterStateStore,
JupyterInterpreterStateStore
);
serviceManager.addSingleton<IExtensionSingleActivationService>(
IExtensionSingleActivationService,
JupyterInterpreterSelectionCommand
);
serviceManager.addSingleton<IExtensionSingleActivationService>(
IExtensionSingleActivationService,
PreWarmActivatedJupyterEnvironmentVariables
);
serviceManager.addSingleton<JupyterInterpreterSelector>(JupyterInterpreterSelector, JupyterInterpreterSelector);
serviceManager.addSingleton<JupyterInterpreterDependencyService>(
JupyterInterpreterDependencyService,
JupyterInterpreterDependencyService
);
serviceManager.addSingleton<JupyterInterpreterService>(JupyterInterpreterService, JupyterInterpreterService);
serviceManager.addSingleton<JupyterInterpreterOldCacheStateStore>(
JupyterInterpreterOldCacheStateStore,
JupyterInterpreterOldCacheStateStore
);
serviceManager.add<INotebookExporter>(INotebookExporter, JupyterExporter);
serviceManager.add<INotebookImporter>(INotebookImporter, JupyterImporter);
serviceManager.add<INotebookServer>(INotebookServer, JupyterServerWrapper);
serviceManager.add<INotebookStorage>(INotebookStorage, NativeEditorStorage);
serviceManager.add<IPlotViewer>(IPlotViewer, PlotViewer);
serviceManager.addSingleton<ActiveEditorContextService>(ActiveEditorContextService, ActiveEditorContextService);
serviceManager.addSingleton<ProgressReporter>(ProgressReporter, ProgressReporter);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, ServerPreload);
serviceManager.addSingleton<CellOutputMimeTypeTracker>(CellOutputMimeTypeTracker, CellOutputMimeTypeTracker, undefined, [IExtensionSingleActivationService, INotebookExecutionLogger]);
serviceManager.addSingleton<CommandRegistry>(CommandRegistry, CommandRegistry);
serviceManager.addSingleton<DataViewerDependencyService>(DataViewerDependencyService, DataViewerDependencyService);
serviceManager.addSingleton<CellOutputMimeTypeTracker>(CellOutputMimeTypeTracker, CellOutputMimeTypeTracker);
serviceManager.addBinding(CellOutputMimeTypeTracker, IExtensionSingleActivationService);
serviceManager.addBinding(CellOutputMimeTypeTracker, INotebookExecutionLogger);
serviceManager.addSingleton<ICodeCssGenerator>(ICodeCssGenerator, CodeCssGenerator);
serviceManager.addSingleton<ICodeLensFactory>(ICodeLensFactory, CodeLensFactory, undefined, [IInteractiveWindowListener]);
serviceManager.addSingleton<IDataScience>(IDataScience, DataScience);
serviceManager.addSingleton<IDataScienceCodeLensProvider>(IDataScienceCodeLensProvider, DataScienceCodeLensProvider);
serviceManager.addSingleton<IDataScienceCommandListener>(IDataScienceCommandListener, InteractiveWindowCommandListener);
serviceManager.addSingleton<IDataScienceCommandListener>(IDataScienceCommandListener, NativeEditorCommandListener);
serviceManager.addSingleton<IDataViewerProvider>(IDataViewerProvider, DataViewerProvider);
serviceManager.addSingleton<IDebugLocationTracker>(IDebugLocationTracker, DebugLocationTrackerFactory);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, Activation);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, Decorator);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, JupyterInterpreterSelectionCommand);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, PreWarmActivatedJupyterEnvironmentVariables);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, ServerPreload);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, ServerPreload);
serviceManager.addSingleton<IExtensionSingleActivationService>(IExtensionSingleActivationService, ServerPreload);
serviceManager.addSingleton<IInteractiveWindowListener>(IInteractiveWindowListener, DataScienceSurveyBannerLogger);
serviceManager.addSingleton<IInteractiveWindowProvider>(IInteractiveWindowProvider, InteractiveWindowProvider);
serviceManager.addSingleton<IJupyterDebugger>(IJupyterDebugger, JupyterDebugger, undefined, [ICellHashListener]);
serviceManager.addSingleton<IJupyterExecution>(IJupyterExecution, JupyterExecutionFactory);
serviceManager.addSingleton<IJupyterPasswordConnect>(IJupyterPasswordConnect, JupyterPasswordConnect);
serviceManager.addSingleton<IJupyterSessionManagerFactory>(IJupyterSessionManagerFactory, JupyterSessionManagerFactory);
serviceManager.addSingleton<IJupyterVariables>(IJupyterVariables, JupyterVariables);
serviceManager.addSingleton<INotebookEditorProvider>(INotebookEditorProvider, useCustomEditorApi ? NativeEditorProvider : NativeEditorProviderOld);
serviceManager.addSingleton<IPlotViewerProvider>(IPlotViewerProvider, PlotViewerProvider);
serviceManager.addSingleton<IStatusProvider>(IStatusProvider, StatusProvider);
serviceManager.addSingleton<IThemeFinder>(IThemeFinder, ThemeFinder);
serviceManager.addSingleton<JupyterCommandFinder>(JupyterCommandFinder, JupyterCommandFinder);
serviceManager.addSingleton<JupyterCommandLineSelector>(JupyterCommandLineSelector, JupyterCommandLineSelector);
serviceManager.addSingleton<JupyterCommandLineSelectorCommand>(JupyterCommandLineSelectorCommand, JupyterCommandLineSelectorCommand);
serviceManager.addSingleton<JupyterInterpreterDependencyService>(JupyterInterpreterDependencyService, JupyterInterpreterDependencyService);
serviceManager.addSingleton<JupyterInterpreterOldCacheStateStore>(JupyterInterpreterOldCacheStateStore, JupyterInterpreterOldCacheStateStore);
serviceManager.addSingleton<JupyterInterpreterSelector>(JupyterInterpreterSelector, JupyterInterpreterSelector);
serviceManager.addSingleton<JupyterInterpreterService>(JupyterInterpreterService, JupyterInterpreterService);
serviceManager.addSingleton<JupyterInterpreterStateStore>(JupyterInterpreterStateStore, JupyterInterpreterStateStore);
serviceManager.addSingleton<JupyterServerSelector>(JupyterServerSelector, JupyterServerSelector);
serviceManager.addSingleton<JupyterServerSelectorCommand>(JupyterServerSelectorCommand, JupyterServerSelectorCommand);
serviceManager.addSingleton<KernelSelectionProvider>(KernelSelectionProvider, KernelSelectionProvider);
serviceManager.addSingleton<KernelSelector>(KernelSelector, KernelSelector);
serviceManager.addSingleton<KernelService>(KernelService, KernelService);
serviceManager.addSingleton<KernelSwitcher>(KernelSwitcher, KernelSwitcher);
serviceManager.addSingleton<KernelSwitcherCommand>(KernelSwitcherCommand, KernelSwitcherCommand);
serviceManager.addSingleton<NotebookStarter>(NotebookStarter, NotebookStarter);
serviceManager.addSingleton<ProgressReporter>(ProgressReporter, ProgressReporter);
// Temporary code, to allow users to revert to the old behavior.
const cfg = serviceManager
.get<IWorkspaceService>(IWorkspaceService)
.getConfiguration('python.dataScience', undefined);
const cfg = serviceManager.get<IWorkspaceService>(IWorkspaceService).getConfiguration('python.dataScience', undefined);
if (cfg.get<boolean>('useOldJupyterServer', false)) {
serviceManager.addSingleton<IJupyterSubCommandExecutionService>(
IJupyterSubCommandExecutionService,
JupyterCommandFinderInterpreterExecutionService
);
serviceManager.addSingleton<IJupyterInterpreterDependencyManager>(
IJupyterInterpreterDependencyManager,
JupyterCommandInterpreterDependencyService
);
serviceManager.addSingleton<IJupyterInterpreterDependencyManager>(IJupyterInterpreterDependencyManager, JupyterCommandInterpreterDependencyService);
serviceManager.addSingleton<IJupyterSubCommandExecutionService>(IJupyterSubCommandExecutionService, JupyterCommandFinderInterpreterExecutionService);
} else {
serviceManager.addSingleton<IJupyterSubCommandExecutionService>(
IJupyterSubCommandExecutionService,
JupyterInterpreterSubCommandExecutionService
);
serviceManager.addSingleton<IJupyterInterpreterDependencyManager>(
IJupyterInterpreterDependencyManager,
JupyterInterpreterSubCommandExecutionService
);
serviceManager.addSingleton<IJupyterInterpreterDependencyManager>(IJupyterInterpreterDependencyManager, JupyterInterpreterSubCommandExecutionService);
serviceManager.addSingleton<IJupyterSubCommandExecutionService>(IJupyterSubCommandExecutionService, JupyterInterpreterSubCommandExecutionService);
}
registerGatherTypes(serviceManager);
}
export function registerGatherTypes(serviceManager: IServiceManager) {
// tslint:disable-next-line: no-require-imports
const gather = require('./gather/gather');
serviceManager.add<IGatherProvider>(IGatherProvider, gather.GatherProvider);
serviceManager.add<IGatherLogger>(IGatherLogger, GatherLogger, undefined, [INotebookExecutionLogger]);
}

Просмотреть файл

@ -136,11 +136,11 @@ export interface INotebook extends IAsyncDisposable {
setLaunchingFile(file: string): Promise<void>;
getSysInfo(): Promise<ICell | undefined>;
setMatplotLibStyle(useDark: boolean): Promise<void>;
addLogger(logger: INotebookExecutionLogger): void;
getMatchingInterpreter(): PythonInterpreter | undefined;
getKernelSpec(): IJupyterKernelSpec | LiveKernelModel | undefined;
setKernelSpec(spec: IJupyterKernelSpec | LiveKernelModel, timeoutMS: number): Promise<void>;
setInterpreter(interpeter: PythonInterpreter): void;
getLoggers(): INotebookExecutionLogger[];
}
export interface INotebookServerOptions {
@ -160,14 +160,19 @@ export interface INotebookExecutionLogger {
postExecute(cell: ICell, silent: boolean): Promise<void>;
}
export const IGatherExecution = Symbol('IGatherExecution');
export interface IGatherExecution {
export const IGatherProvider = Symbol('IGatherProvider');
export interface IGatherProvider {
enabled: boolean;
logExecution(vscCell: ICell): void;
gatherCode(vscCell: ICell): string;
resetLog(): void;
}
export const IGatherLogger = Symbol('IGatherLogger');
export interface IGatherLogger extends INotebookExecutionLogger {
getGatherProvider(): IGatherProvider;
}
export const IJupyterExecution = Symbol('IJupyterExecution');
export interface IJupyterExecution extends IAsyncDisposable {
sessionChanged: Event<void>;
@ -691,6 +696,13 @@ export const ICellHashProvider = Symbol('ICellHashProvider');
export interface ICellHashProvider {
updated: Event<void>;
getHashes(): IFileHashes[];
getExecutionCount(): number;
incExecutionCount(): void;
}
export const ICellHashLogger = Symbol('ICellHashLogger');
export interface ICellHashLogger extends INotebookExecutionLogger {
getCellHashProvider(): ICellHashProvider;
}
export interface IDebugLocation {

Просмотреть файл

@ -13,7 +13,8 @@ export class ServiceManager implements IServiceManager {
serviceIdentifier: identifier<T>,
// tslint:disable-next-line:no-any
constructor: new (...args: any[]) => T,
name?: string | number | symbol | undefined
name?: string | number | symbol | undefined,
bindings?: symbol[]
): void {
if (name) {
this.container
@ -23,6 +24,12 @@ export class ServiceManager implements IServiceManager {
} else {
this.container.bind<T>(serviceIdentifier).to(constructor);
}
if (bindings) {
bindings.forEach(binding => {
this.addBinding(serviceIdentifier, binding);
});
}
}
public addFactory<T>(
factoryIdentifier: interfaces.ServiceIdentifier<interfaces.Factory<T>>,
@ -39,7 +46,8 @@ export class ServiceManager implements IServiceManager {
serviceIdentifier: identifier<T>,
// tslint:disable-next-line:no-any
constructor: new (...args: any[]) => T,
name?: string | number | symbol | undefined
name?: string | number | symbol | undefined,
bindings?: symbol[]
): void {
if (name) {
this.container
@ -53,7 +61,14 @@ export class ServiceManager implements IServiceManager {
.to(constructor)
.inSingletonScope();
}
if (bindings) {
bindings.forEach(binding => {
this.addBinding(serviceIdentifier, binding);
});
}
}
public addSingletonInstance<T>(
serviceIdentifier: identifier<T>,
instance: T,

Просмотреть файл

@ -29,12 +29,14 @@ export interface IServiceManager {
add<T>(
serviceIdentifier: interfaces.ServiceIdentifier<T>,
constructor: ClassType<T>,
name?: string | number | symbol
name?: string | number | symbol | undefined,
bindings?: symbol[]
): void;
addSingleton<T>(
serviceIdentifier: interfaces.ServiceIdentifier<T>,
constructor: ClassType<T>,
name?: string | number | symbol
name?: string | number | symbol,
bindings?: symbol[]
): void;
addSingletonInstance<T>(
serviceIdentifier: interfaces.ServiceIdentifier<T>,

Просмотреть файл

@ -157,7 +157,7 @@ export function appendLineFeed(arr: string[], modifier?: (s: string) => string)
export function generateMarkdownFromCodeLines(lines: string[]) {
// Generate markdown by stripping out the comments and markdown header
return appendLineFeed(extractComments(lines.slice(1)));
return appendLineFeed(extractComments(lines.slice(lines.length > 1 ? 1 : 0)));
}
// tslint:disable-next-line: cyclomatic-complexity

Просмотреть файл

@ -145,7 +145,6 @@ const darkStyle = `
// This function generates test state when running under a browser instead of inside of
export function generateTestState(filePath: string = '', editable: boolean = false): IMainState {
const defaultSettings = getDefaultSettings();
defaultSettings.enableGather = true;
return {
cellVMs: generateTestVMs(filePath, editable),

Просмотреть файл

@ -85,6 +85,7 @@ export namespace Helpers {
// and the user has updated the cell text since then.
const newVM = {
...newVMs[index],
hasBeenRun: true,
cell: {
...newVMs[index].cell,
state: arg.payload.data.state,

Просмотреть файл

@ -575,7 +575,6 @@ export class NativeCell extends React.Component<INativeCellProps> {
'Gather the code required to generate this cell into a new notebook'
)}
hidden={gatherDisabled}
className="hover-cell-button"
>
<Image
baseTheme={this.props.baseTheme}

Просмотреть файл

@ -166,13 +166,15 @@ import { DataViewer } from '../../client/datascience/data-viewing/dataViewer';
import { DataViewerDependencyService } from '../../client/datascience/data-viewing/dataViewerDependencyService';
import { DataViewerProvider } from '../../client/datascience/data-viewing/dataViewerProvider';
import { DebugLocationTrackerFactory } from '../../client/datascience/debugLocationTrackerFactory';
import { CellHashLogger } from '../../client/datascience/editor-integration/cellhashLogger';
import { CellHashProvider } from '../../client/datascience/editor-integration/cellhashprovider';
import { CodeLensFactory } from '../../client/datascience/editor-integration/codeLensFactory';
import { DataScienceCodeLensProvider } from '../../client/datascience/editor-integration/codelensprovider';
import { CodeWatcher } from '../../client/datascience/editor-integration/codewatcher';
import { DataScienceErrorHandler } from '../../client/datascience/errorHandler/errorHandler';
import { GatherExecution } from '../../client/datascience/gather/gather';
import { GatherProvider } from '../../client/datascience/gather/gather';
import { GatherListener } from '../../client/datascience/gather/gatherListener';
import { GatherLogger } from '../../client/datascience/gather/gatherLogger';
import { IntellisenseProvider } from '../../client/datascience/interactive-common/intellisense/intellisenseProvider';
import { AutoSaveService } from '../../client/datascience/interactive-ipynb/autoSaveService';
import { NativeEditor } from '../../client/datascience/interactive-ipynb/nativeEditor';
@ -213,6 +215,7 @@ import { StatusProvider } from '../../client/datascience/statusProvider';
import { ThemeFinder } from '../../client/datascience/themeFinder';
import {
ICellHashListener,
ICellHashLogger,
ICellHashProvider,
ICodeCssGenerator,
ICodeLensFactory,
@ -224,7 +227,8 @@ import {
IDataViewer,
IDataViewerProvider,
IDebugLocationTracker,
IGatherExecution,
IGatherLogger,
IGatherProvider,
IInteractiveWindow,
IInteractiveWindowListener,
IInteractiveWindowProvider,
@ -513,7 +517,9 @@ export class DataScienceIocContainer extends UnitTestIocContainer {
this.serviceManager.add<IDataScienceErrorHandler>(IDataScienceErrorHandler, DataScienceErrorHandler);
this.serviceManager.add<IInstallationChannelManager>(IInstallationChannelManager, InstallationChannelManager);
this.serviceManager.addSingleton<IJupyterVariables>(IJupyterVariables, JupyterVariables);
this.serviceManager.addSingleton<IJupyterDebugger>(IJupyterDebugger, JupyterDebugger);
this.serviceManager.addSingleton<IJupyterDebugger>(IJupyterDebugger, JupyterDebugger, undefined, [
ICellHashListener
]);
this.serviceManager.addSingleton<IDebugLocationTracker>(IDebugLocationTracker, DebugLocationTrackerFactory);
this.serviceManager.addSingleton<INotebookEditorProvider>(INotebookEditorProvider, TestNativeEditorProvider);
this.serviceManager.addSingleton<DataViewerDependencyService>(
@ -615,15 +621,18 @@ export class DataScienceIocContainer extends UnitTestIocContainer {
this.serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, IntellisenseProvider);
this.serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, AutoSaveService);
this.serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, GatherListener);
this.serviceManager.add<IProtocolParser>(IProtocolParser, ProtocolParser);
this.serviceManager.addSingleton<IDebugService>(IDebugService, MockDebuggerService);
this.serviceManager.addSingleton<ICellHashProvider>(ICellHashProvider, CellHashProvider);
this.serviceManager.addBinding(ICellHashProvider, IInteractiveWindowListener);
this.serviceManager.add<IInteractiveWindowListener>(IInteractiveWindowListener, GatherListener);
this.serviceManager.addBinding(ICellHashProvider, INotebookExecutionLogger);
this.serviceManager.addBinding(IJupyterDebugger, ICellHashListener);
this.serviceManager.add<IGatherExecution>(IGatherExecution, GatherExecution);
this.serviceManager.addSingleton<ICodeLensFactory>(ICodeLensFactory, CodeLensFactory);
this.serviceManager.add<ICellHashProvider>(ICellHashProvider, CellHashProvider);
this.serviceManager.add<ICellHashLogger>(ICellHashLogger, CellHashLogger, undefined, [
INotebookExecutionLogger
]);
this.serviceManager.add<IGatherProvider>(IGatherProvider, GatherProvider);
this.serviceManager.add<IGatherLogger>(IGatherLogger, GatherLogger, undefined, [INotebookExecutionLogger]);
this.serviceManager.addSingleton<ICodeLensFactory>(ICodeLensFactory, CodeLensFactory, undefined, [
IInteractiveWindowListener
]);
this.serviceManager.addSingleton<IShellDetector>(IShellDetector, TerminalNameShellDetector);
this.serviceManager.addSingleton<InterpeterHashProviderFactory>(
InterpeterHashProviderFactory,

Просмотреть файл

@ -8,6 +8,7 @@ import { Position, Range, Uri } from 'vscode';
import { IDebugService } from '../../../client/common/application/types';
import { IFileSystem } from '../../../client/common/platform/types';
import { IConfigurationService, IDataScienceSettings, IPythonSettings } from '../../../client/common/types';
import { CellHashLogger } from '../../../client/datascience/editor-integration/cellhashLogger';
import { CellHashProvider } from '../../../client/datascience/editor-integration/cellhashprovider';
import {
InteractiveWindowMessages,
@ -27,6 +28,7 @@ class HashListener implements ICellHashListener {
// tslint:disable-next-line: max-func-body-length
suite('CellHashProvider Unit Tests', () => {
let hashProvider: CellHashProvider;
let hashLogger: CellHashLogger;
let documentManager: MockDocumentManager;
let configurationService: TypeMoq.IMock<IConfigurationService>;
let dataScienceSettings: TypeMoq.IMock<IDataScienceSettings>;
@ -53,6 +55,7 @@ suite('CellHashProvider Unit Tests', () => {
fileSystem.object,
[hashListener]
);
hashLogger = new CellHashLogger(hashProvider);
});
function addSingleChange(file: string, range: Range, newText: string) {
@ -73,7 +76,7 @@ suite('CellHashProvider Unit Tests', () => {
id: '1',
state: CellState.init
};
return hashProvider.preExecute(cell, false);
return hashLogger.preExecute(cell, false);
}
test('Add a cell and edit it', async () => {

Просмотреть файл

@ -16,12 +16,12 @@ import { CodeLensFactory } from '../../../client/datascience/editor-integration/
import { DataScienceCodeLensProvider } from '../../../client/datascience/editor-integration/codelensprovider';
import { CodeWatcher } from '../../../client/datascience/editor-integration/codewatcher';
import {
ICellHashProvider,
ICodeWatcher,
IDataScienceErrorHandler,
IDebugLocationTracker,
IInteractiveWindow,
IInteractiveWindowProvider
IInteractiveWindowProvider,
IJupyterExecution
} from '../../../client/datascience/types';
import { IServiceContainer } from '../../../client/ioc/types';
import { ICodeExecutionHelper } from '../../../client/terminals/types';
@ -33,6 +33,7 @@ import { createDocument } from './helpers';
suite('DataScience Code Watcher Unit Tests', () => {
let codeWatcher: CodeWatcher;
let interactiveWindowProvider: TypeMoq.IMock<IInteractiveWindowProvider>;
let jupyterExecution: TypeMoq.IMock<IJupyterExecution>;
let activeInteractiveWindow: TypeMoq.IMock<IInteractiveWindow>;
let documentManager: TypeMoq.IMock<IDocumentManager>;
let commandManager: TypeMoq.IMock<ICommandManager>;
@ -45,7 +46,6 @@ suite('DataScience Code Watcher Unit Tests', () => {
let tokenSource: CancellationTokenSource;
let debugService: TypeMoq.IMock<IDebugService>;
let debugLocationTracker: TypeMoq.IMock<IDebugLocationTracker>;
let cellHashProvider: TypeMoq.IMock<ICellHashProvider>;
const contexts: Map<string, boolean> = new Map<string, boolean>();
const pythonSettings = new (class extends PythonSettings {
public fireChangeEvent() {
@ -57,6 +57,7 @@ suite('DataScience Code Watcher Unit Tests', () => {
setup(() => {
tokenSource = new CancellationTokenSource();
interactiveWindowProvider = TypeMoq.Mock.ofType<IInteractiveWindowProvider>();
jupyterExecution = TypeMoq.Mock.ofType<IJupyterExecution>();
activeInteractiveWindow = createTypeMoq<IInteractiveWindow>('history');
documentManager = TypeMoq.Mock.ofType<IDocumentManager>();
textEditor = TypeMoq.Mock.ofType<TextEditor>();
@ -66,7 +67,6 @@ suite('DataScience Code Watcher Unit Tests', () => {
helper = TypeMoq.Mock.ofType<ICodeExecutionHelper>();
commandManager = TypeMoq.Mock.ofType<ICommandManager>();
debugService = TypeMoq.Mock.ofType<IDebugService>();
cellHashProvider = TypeMoq.Mock.ofType<ICellHashProvider>();
// Setup default settings
pythonSettings.datascience = {
@ -104,7 +104,12 @@ suite('DataScience Code Watcher Unit Tests', () => {
// Setup the file system
fileSystem.setup(f => f.arePathsSame(TypeMoq.It.isAnyString(), TypeMoq.It.isAnyString())).returns(() => true);
const codeLensFactory = new CodeLensFactory(configService.object, cellHashProvider.object, fileSystem.object);
const codeLensFactory = new CodeLensFactory(
configService.object,
interactiveWindowProvider.object,
jupyterExecution.object,
fileSystem.object
);
serviceContainer
.setup(c => c.get(TypeMoq.It.isValue(ICodeWatcher)))
.returns(
@ -143,7 +148,12 @@ suite('DataScience Code Watcher Unit Tests', () => {
return Promise.resolve();
});
const codeLens = new CodeLensFactory(configService.object, cellHashProvider.object, fileSystem.object);
const codeLens = new CodeLensFactory(
configService.object,
interactiveWindowProvider.object,
jupyterExecution.object,
fileSystem.object
);
codeWatcher = new CodeWatcher(
interactiveWindowProvider.object,

Просмотреть файл

@ -82,19 +82,34 @@ suite('DataScience gotocell tests', () => {
): Promise<INotebook | undefined> {
// Catch exceptions. Throw a specific assertion if the promise fails
try {
const testDir = path.join(EXTENSION_ROOT_DIR, 'src', 'test', 'datascience');
//const testDir = path.join(EXTENSION_ROOT_DIR, 'src', 'test', 'datascience');
// tslint:disable-next-line: no-invalid-template-strings
const testDir = '${fileDirname}';
const server = await jupyterExecution.connectToNotebookServer({
usingDarkTheme,
useDefaultConfig,
workingDir: testDir,
purpose: purpose ? purpose : '1'
purpose: purpose ? purpose : Identifiers.HistoryPurpose,
enableDebugging: true
});
if (expectFailure) {
assert.ok(false, `Expected server to not be created`);
}
return server
? await server.createNotebook(undefined, Uri.parse(Identifiers.InteractiveWindowIdentity))
: undefined;
if (!server) {
return undefined;
} else {
const nb: INotebook = await server.createNotebook(
undefined,
Uri.parse(Identifiers.InteractiveWindowIdentity)
);
const listener = (codeLensFactory as any) as IInteractiveWindowListener;
listener.onMessage(
InteractiveWindowMessages.NotebookExecutionActivated,
Identifiers.InteractiveWindowIdentity
);
return nb;
}
} catch (exc) {
if (!expectFailure) {
assert.ok(false, `Expected server to be created, but got ${exc}`);

Просмотреть файл

@ -11,8 +11,7 @@ import {
IDisposableRegistry,
IPythonSettings
} from '../../../client/common/types';
import { GatherExecution } from '../../../client/datascience/gather/gather';
import { GatherLogger } from '../../../client/datascience/gather/gatherLogger';
import { GatherProvider } from '../../../client/datascience/gather/gather';
import { ICell as IVscCell } from '../../../client/datascience/types';
// tslint:disable-next-line: max-func-body-length
@ -141,28 +140,30 @@ suite('DataScience code gathering unit tests', () => {
appShell
.setup(a => a.showInformationMessage(TypeMoq.It.isAny(), TypeMoq.It.isAny()))
.returns(() => Promise.resolve(''));
const gatherExecution = new GatherExecution(
const gatherProvider = new GatherProvider(
configurationService.object,
appShell.object,
disposableRegistry.object,
commandManager.object
);
const gatherLogger = new GatherLogger(gatherExecution, configurationService.object);
test('Logs a cell execution', async () => {
let count = 0;
for (const c of codeCells) {
await gatherLogger.postExecute(c, false);
count += 1;
assert.equal(gatherExecution.executionSlicer.executionLog.length, count);
}
});
if (gatherProvider) {
// Disabling this test as by default gather cannot operate successfully without python-program-analysis.
// test('Logs a cell execution', async () => {
// let count = 0;
// for (const c of codeCells) {
// await gatherLogger.postExecute(c, false);
// count += 1;
// const logLength = gatherProvider.executionSlicer?.executionLog.length;
// assert.equal(logLength, count);
// }
// });
test('Gathers program slices for a cell', async () => {
const defaultCellMarker = '# %%';
const cell: IVscCell = codeCells[codeCells.length - 1];
const program = gatherExecution.gatherCode(cell);
const expectedProgram = `# This file contains only the code required to produce the results of the gathered cell.\n${defaultCellMarker}\nfrom bokeh.plotting import show, figure, output_notebook\n\n${defaultCellMarker}\nx = [1,2,3,4,5]\ny = [21,9,15,17,4]\n\n${defaultCellMarker}\np=figure(title='demo',x_axis_label='x',y_axis_label='y')\n\n${defaultCellMarker}\np.line(x,y,line_width=2)\n\n${defaultCellMarker}\nshow(p)\n`;
assert.equal(program.trim(), expectedProgram.trim());
});
test('Gathers program slices for a cell', async () => {
const cell: IVscCell = codeCells[codeCells.length - 1];
const program = gatherProvider.gatherCode(cell);
const expectedProgram = '# %% [markdown]\n## Gather not available';
assert.equal(program.trim(), expectedProgram.trim());
});
}
});

Просмотреть файл

@ -919,7 +919,7 @@ Type: builtin_function_or_method`,
// Ignore white space.
assert.equal(
docManager.activeTextEditor.document.getText().trim(),
`# This file contains only the code required to produce the results of the gathered cell.\n${defaultCellMarker}\na=1\na`
`${defaultCellMarker} [markdown]\n## Gather not available`
);
}
},
@ -952,7 +952,7 @@ Type: builtin_function_or_method`,
// Ignore whitespace
assert.equal(
docManager.activeTextEditor.document.getText().trim(),
`# This file contains only the code required to produce the results of the gathered cell.\n${defaultCellMarker}\na=1\na`
`${defaultCellMarker} [markdown]\n## Gather not available`
);
}
},

Просмотреть файл

@ -9,6 +9,8 @@ import { Identifiers } from '../../client/datascience/constants';
import { LiveKernelModel } from '../../client/datascience/jupyter/kernels/types';
import {
ICell,
ICellHashProvider,
IGatherProvider,
IJupyterKernelSpec,
INotebook,
INotebookCompletion,
@ -37,6 +39,23 @@ export class MockJupyterNotebook implements INotebook {
return Uri.parse(Identifiers.InteractiveWindowIdentity);
}
public get onSessionStatusChanged(): Event<ServerStatus> {
if (!this.onStatusChangedEvent) {
this.onStatusChangedEvent = new EventEmitter<ServerStatus>();
}
return this.onStatusChangedEvent.event;
}
public get status(): ServerStatus {
return ServerStatus.Idle;
}
public getGatherProvider(): IGatherProvider | undefined {
throw new Error('Method not implemented.');
}
public getCellHashProvider(): ICellHashProvider | undefined {
throw new Error('Method not implemented.');
}
public get resource(): Resource {
return Uri.file('foo.py');
}
@ -114,14 +133,7 @@ export class MockJupyterNotebook implements INotebook {
return Promise.resolve();
}
public get onSessionStatusChanged(): Event<ServerStatus> {
if (!this.onStatusChangedEvent) {
this.onStatusChangedEvent = new EventEmitter<ServerStatus>();
}
return this.onStatusChangedEvent.event;
}
public get status(): ServerStatus {
return ServerStatus.Idle;
public getLoggers(): INotebookExecutionLogger[] {
return [];
}
}

Просмотреть файл

@ -1230,12 +1230,12 @@ df.head()`;
let imageButtons = cell.find(ImageButton);
assert.equal(imageButtons.length, 6, 'Cell buttons not found');
const deleteButton = imageButtons.at(5);
await getNativeCellResults(ioc, wrapper, async () => {
const afterDelete = await getNativeCellResults(ioc, wrapper, async () => {
deleteButton.simulate('click');
return Promise.resolve();
});
// Should have 3 cells
assert.equal(wrapper.find('NativeCell').length, 3, 'Cell not deleted');
assert.equal(afterDelete.length, 3, 'Cell not deleted');
// Undo the delete
await undo();
@ -1254,12 +1254,12 @@ df.head()`;
imageButtons = cell.find(ImageButton);
assert.equal(imageButtons.length, 6, 'Cell buttons not found');
const moveUpButton = imageButtons.at(0);
await getNativeCellResults(ioc, wrapper, async () => {
const afterMove = await getNativeCellResults(ioc, wrapper, async () => {
moveUpButton.simulate('click');
return Promise.resolve();
});
let foundCell = getOutputCell(wrapper, 'NativeCell', 2)?.instance() as NativeCell;
let foundCell = getOutputCell(afterMove, 'NativeCell', 2)?.instance() as NativeCell;
assert.equal(foundCell.props.cellVM.cell.id, 'NotebookImport#1', 'Cell did not move');
await undo();
foundCell = getOutputCell(wrapper, 'NativeCell', 2)?.instance() as NativeCell;

30
types/@msrvida-python-program-analysis/cell.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,30 @@
/**
* Generic interface for accessing data about a code cell.
*/
export interface Cell {
/**
* The cell's current text.
*/
text: string;
executionCount: number;
/**
* A unique ID generated each time a cell is executed. This lets us disambiguate between two
* runs of a cell that have the same ID *and* execution count, if the kernel was restarted.
* This ID should also be programmed to be *persistent*, so that even after a notebook is
* reloaded, the cell in the same position will still have this ID.
*/
readonly executionEventId: string;
/**
* A persistent ID for a cell in a notebook. This ID will stay the same even as the cell is
* executed, and even when the cell is reloaded from the file.
*/
readonly persistentId: string;
/**
* Whether analysis or execution of this cell has yielded an error.
*/
hasError: boolean;
/**
* Create a deep copy of the cell.
*/
deepCopy: () => Cell;
}

27
types/@msrvida-python-program-analysis/cellslice.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,27 @@
import { Cell } from './cell';
import { LocationSet } from './slice';
export declare class CellSlice {
/**
* Construct an instance of a cell slice.
*/
constructor(cell: Cell, slice: LocationSet, executionTime?: Date);
/**
* Get the text in the slice of a cell.
*/
readonly textSlice: string;
/**
* Get the text of all lines in a slice (no deletions from lines).
*/
readonly textSliceLines: string;
private getTextSlice;
/**
* Get the slice.
*/
/**
* Set the slice.
*/
slice: LocationSet;
readonly cell: Cell;
readonly executionTime: Date;
private _slice;
}

45
types/@msrvida-python-program-analysis/control-flow.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,45 @@
import * as ast from './python-parser';
export declare class Block {
id: number;
readonly hint: string;
statements: ast.SyntaxNode[];
loopVariables: ast.SyntaxNode[];
constructor(id: number, hint: string, statements: ast.SyntaxNode[], loopVariables?: ast.SyntaxNode[]);
toString(): string;
}
export declare class ControlFlowGraph {
private _blocks;
private globalId;
private entry;
private exit;
private successors;
private loopVariables;
constructor(node: ast.SyntaxNode);
private makeBlock;
readonly blocks: Block[];
getSuccessors(block: Block): Block[];
getPredecessors(block: Block): Block[];
print(): void;
private link;
private handleIf;
private handleWhile;
private handleFor;
private handleWith;
private handleTry;
private makeCFG;
/**
* Based on the algorithm in "Engineering a Compiler", 2nd ed., Cooper and Torczon:
* - p479: computing dominance
* - p498-500: dominator trees and frontiers
* - p544: postdominance and reverse dominance frontier
*/
visitControlDependencies(visit: (controlStmt: ast.SyntaxNode, stmt: ast.SyntaxNode) => void): void;
private postdominators;
private immediatePostdominators;
private reverseDominanceFrontiers;
private postdominatorExists;
private getImmediatePostdominator;
private findPostdominators;
private getImmediatePostdominators;
private buildReverseDominanceFrontiers;
}

77
types/@msrvida-python-program-analysis/data-flow.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,77 @@
import * as ast from './python-parser';
import { ControlFlowGraph } from './control-flow';
import { Set } from './set';
import { JsonSpecs, PythonType } from './specs';
declare class DefUse {
DEFINITION: RefSet;
UPDATE: RefSet;
USE: RefSet;
constructor(DEFINITION?: RefSet, UPDATE?: RefSet, USE?: RefSet);
readonly defs: Set<Ref>;
readonly uses: Set<Ref>;
union(that: DefUse): DefUse;
update(newRefs: DefUse): void;
equals(that: DefUse): boolean;
createFlowsFrom(fromSet: DefUse): [Set<Dataflow>, Set<Ref>];
}
/**
* Use a shared dataflow analyzer object for all dataflow analysis / querying for defs and uses.
* It caches defs and uses for each statement, which can save time.
* For caching to work, statements must be annotated with a cell's ID and execution count.
*/
export declare class DataflowAnalyzer {
constructor(moduleMap?: JsonSpecs);
getDefUseForStatement(statement: ast.SyntaxNode, defsForMethodResolution: RefSet): DefUse;
analyze(cfg: ControlFlowGraph, refSet?: RefSet): DataflowAnalysisResult;
getDefs(statement: ast.SyntaxNode, defsForMethodResolution: RefSet): RefSet;
private getClassDefs;
private getFuncDefs;
private getAssignDefs;
private getDelDefs;
private getFromImportDefs;
private getImportDefs;
getUses(statement: ast.SyntaxNode): RefSet;
private getNameUses;
private getClassDeclUses;
private getFuncDeclUses;
private getAssignUses;
private _symbolTable;
private _defUsesCache;
}
export interface Dataflow {
fromNode: ast.SyntaxNode;
toNode: ast.SyntaxNode;
fromRef?: Ref;
toRef?: Ref;
}
export declare enum ReferenceType {
DEFINITION = "DEFINITION",
UPDATE = "UPDATE",
USE = "USE"
}
export declare enum SymbolType {
VARIABLE = 0,
CLASS = 1,
FUNCTION = 2,
IMPORT = 3,
MUTATION = 4,
MAGIC = 5
}
export interface Ref {
type: SymbolType;
level: ReferenceType;
name: string;
inferredType?: PythonType;
location: ast.Location;
node: ast.SyntaxNode;
}
export declare class RefSet extends Set<Ref> {
constructor(...items: Ref[]);
}
export declare function sameLocation(loc1: ast.Location, loc2: ast.Location): boolean;
export declare const GlobalSyntheticVariable = "$global";
export declare type DataflowAnalysisResult = {
dataflows: Set<Dataflow>;
undefinedRefs: RefSet;
};
export {};

1
types/@msrvida-python-program-analysis/genspec/main.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1,15 @@
import * as py from '../python-parser';
import { ModuleSpec, FunctionDescription } from "..";
export declare class ModuleSpecWalker {
spec: ModuleSpec<FunctionDescription>;
constructor();
private static lookForSideEffects;
onEnterNode(node: py.SyntaxNode, ancestors: py.SyntaxNode[]): void;
}
export declare class HeuristicTransitiveClosure {
private moduleSpec;
constructor(moduleSpec: ModuleSpec<FunctionDescription>);
private transferSideEffectsAcrossCalls;
private recordSideEffects;
onEnterNode(node: py.SyntaxNode, ancestors: py.SyntaxNode[]): void;
}

10
types/@msrvida-python-program-analysis/graph.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,10 @@
export declare class Graph<T> {
private getIdentifier;
private outgoing;
private incoming;
private _nodes;
constructor(getIdentifier: (item: T) => string);
addEdge(fromNode: T, toNode: T): void;
readonly nodes: T[];
topoSort(): T[];
}

12
types/@msrvida-python-program-analysis/index.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,12 @@
export * from './set';
export * from './python-parser';
export * from './control-flow';
export * from './data-flow';
export * from './printNode';
export * from './specs';
export * from './cell';
export * from './slice';
export * from './cellslice';
export * from './log-slicer';
export * from './program-builder';
export * from './specs/index';

81
types/@msrvida-python-program-analysis/log-slicer.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,81 @@
import { Cell } from './cell';
import { CellSlice } from './cellslice';
import { DataflowAnalyzer } from './data-flow';
import { CellProgram, ProgramBuilder } from './program-builder';
import { LocationSet } from './slice';
/**
* A record of when a cell was executed.
*/
export declare class CellExecution<TCell extends Cell> {
readonly cell: TCell;
readonly executionTime: Date;
constructor(cell: TCell, executionTime: Date);
/**
* Update this method if at some point we only want to save some about a CellExecution when
* serializing it and saving history.
*/
toJSON(): any;
}
/**
* A slice over a version of executed code.
*/
export declare class SlicedExecution {
executionTime: Date;
cellSlices: CellSlice[];
constructor(executionTime: Date, cellSlices: CellSlice[]);
merge(...slicedExecutions: SlicedExecution[]): SlicedExecution;
}
export declare type CellExecutionCallback<TCell extends Cell> = (exec: CellExecution<TCell>) => void;
/**
* Makes slice on a log of executed cells.
*/
export declare class ExecutionLogSlicer<TCell extends Cell> {
private dataflowAnalyzer;
executionLog: CellExecution<TCell>[];
readonly programBuilder: ProgramBuilder;
/**
* Signal emitted when a cell's execution has been completely processed.
*/
readonly executionLogged: CellExecutionCallback<TCell>[];
/**
* Construct a new execution log slicer.
*/
constructor(dataflowAnalyzer: DataflowAnalyzer);
/**
* Log that a cell has just been executed. The execution time for this cell will be stored
* as the moment at which this method is called.
*/
logExecution(cell: TCell): void;
/**
* Use logExecution instead if a cell has just been run to annotate it with the current time
* as the execution time. This function is intended to be used only to initialize history
* when a notebook is reloaded. However, any method that eventually calls this method will
* notify all observers that this cell has been executed.
*/
addExecutionToLog(cellExecution: CellExecution<TCell>): void;
/**
* Reset the log, removing log records.
*/
reset(): void;
/**
* Get slice for the latest execution of a cell.
*/
sliceLatestExecution(cellId: string, seedLocations?: LocationSet): SlicedExecution;
/**
* Get slices of the necessary code for all executions of a cell.
* Relevant line numbers are relative to the cell's start line (starting at first line = 0).
*/
sliceAllExecutions(cellId: string, seedLocations?: LocationSet): SlicedExecution[];
readonly cellExecutions: ReadonlyArray<CellExecution<TCell>>;
/**
* Get the cell program (tree, defs, uses) for a cell.
*/
getCellProgram(executionEventId: string): CellProgram;
/**
* Returns the cells that directly or indirectly use variables
* that are defined in the given cell. Result is in
* topological order.
* @param executionEventId a cell in the log
*/
getDependentCells(executionEventId: string): Cell[];
}

2
types/@msrvida-python-program-analysis/logutil.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,2 @@
export declare function startLogging(): void;
export declare function log(message: string, ...args: any[]): void;

2
types/@msrvida-python-program-analysis/printNode.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,2 @@
import { SyntaxNode } from './python-parser';
export declare function printNode(node: SyntaxNode): string;

69
types/@msrvida-python-program-analysis/program-builder.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,69 @@
import { Cell } from './cell';
import * as ast from './python-parser';
import { DataflowAnalyzer, Ref } from './data-flow';
import { NumberSet } from './set';
/**
* Maps to find out what line numbers over a program correspond to what cells.
*/
export declare type CellToLineMap = {
[cellExecutionEventId: string]: NumberSet;
};
export declare type LineToCellMap = {
[line: number]: Cell;
};
/**
* A program built from cells.
*/
export declare class Program {
/**
* Construct a program.
*/
constructor(cellPrograms: CellProgram[]);
readonly text: string;
readonly tree: ast.Module;
readonly cellToLineMap: CellToLineMap;
readonly lineToCellMap: LineToCellMap;
}
/**
* Program fragment for a cell. Used to cache parsing results.
*/
export declare class CellProgram {
/**
* Construct a cell program
*/
constructor(cell: Cell, statements: ast.SyntaxNode[], defs: Ref[], uses: Ref[], hasError: boolean);
readonly cell: Cell;
readonly statements: ast.SyntaxNode[];
readonly defs: Ref[];
readonly uses: Ref[];
readonly hasError: boolean;
usesSomethingFrom(that: CellProgram): boolean;
}
/**
* Builds programs from a list of executed cells.
*/
export declare class ProgramBuilder {
/**
* Construct a program builder.
*/
constructor(dataflowAnalyzer?: DataflowAnalyzer);
/**
* Add cells to the program builder.
*/
add(...cells: Cell[]): void;
/**
* Reset (removing all cells).
*/
reset(): void;
/**
* Build a program from the list of cells. Program will include the cells' contents in
* the order they were added to the log. It will omit cells that raised errors (syntax or
* runtime, except for the last cell).
*/
buildTo(cellExecutionEventId: string): Program;
buildFrom(executionEventId: string): Program;
getCellProgram(executionEventId: string): CellProgram;
getCellProgramsWithSameId(executionEventId: string): CellProgram[];
private _cellPrograms;
private _dataflowAnalyzer;
}

326
types/@msrvida-python-program-analysis/python-parser.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,326 @@
/**
* This is the main interface for parsing code.
* Call this instead of the `parse` method in python3.js.
* If the `parse` method gets an error, all later calls will throw an error.
* This method resets the state of the `parse` method so that doesn't happen.
*/
export declare function parse(program: string): Module;
export declare type SyntaxNode = Module | Import | From | Decorator | Decorate | Def | Parameter | Assignment | Delete | Assert | Pass | Return | Yield | Raise | Continue | Break | Global | Nonlocal | If | Else | While | For | Try | With | Call | Index | Slice | Dot | IfExpr | CompFor | CompIf | Lambda | UnaryOperator | BinaryOperator | Starred | Tuple | ListExpr | SetExpr | DictExpr | Name | Literal | Class | Argument;
interface JisonLocation {
first_line: number;
first_column: number;
last_line: number;
last_column: number;
}
export interface Location extends JisonLocation {
path?: string;
}
export declare function locationString(loc: Location): string;
export declare function locationContains(loc1: Location, loc2: Location): boolean;
export interface Locatable {
location: Location;
cellId?: string;
executionCount?: number;
}
export declare const MODULE = "module";
export interface Module extends Locatable {
type: typeof MODULE;
code: SyntaxNode[];
}
export declare const IMPORT = "import";
export interface Import extends Locatable {
type: typeof IMPORT;
names: {
path: string;
alias?: string;
location: Location;
}[];
}
export declare const FROM = "from";
export interface From extends Locatable {
type: typeof FROM;
base: string;
imports: {
path: string;
alias: string;
location: Location;
}[];
}
export declare const DECORATOR = "decorator";
export interface Decorator extends Locatable {
type: typeof DECORATOR;
decorator: string;
args: SyntaxNode[];
}
export declare const DECORATE = "decorate";
export interface Decorate extends Locatable {
type: typeof DECORATE;
decorators: Decorator[];
def: SyntaxNode;
}
export declare const DEF = "def";
export interface Def extends Locatable {
type: typeof DEF;
name: string;
params: Parameter[];
code: SyntaxNode[];
}
export declare const PARAMETER = "parameter";
export interface Parameter extends Locatable {
type: typeof PARAMETER;
name: string;
anno: SyntaxNode;
default_value: SyntaxNode;
star: boolean;
starstar: boolean;
}
export declare const ASSIGN = "assign";
export interface Assignment extends Locatable {
type: typeof ASSIGN;
op: string | undefined;
targets: SyntaxNode[];
sources: SyntaxNode[];
}
export declare const DEL = "del";
export interface Delete extends Locatable {
type: typeof DEL;
targets: SyntaxNode[];
}
export declare const ASSERT = "assert";
export interface Assert extends Locatable {
type: typeof ASSERT;
cond: SyntaxNode;
err: SyntaxNode;
}
export declare const PASS = "pass";
export interface Pass extends Locatable {
type: typeof PASS;
}
export declare const RETURN = "return";
export interface Return extends Locatable {
type: typeof RETURN;
values: SyntaxNode[];
}
export declare const YIELD = "yield";
export interface Yield extends Locatable {
type: typeof YIELD;
value: SyntaxNode[];
from?: SyntaxNode;
}
export declare const RAISE = "raise";
export interface Raise extends Locatable {
type: typeof RAISE;
err: SyntaxNode;
}
export declare const BREAK = "break";
export interface Break extends Locatable {
type: typeof BREAK;
}
export declare const CONTINUE = "continue";
export interface Continue extends Locatable {
type: typeof CONTINUE;
}
export declare const GLOBAL = "global";
export interface Global extends Locatable {
type: typeof GLOBAL;
names: string[];
}
export declare const NONLOCAL = "nonlocal";
export interface Nonlocal extends Locatable {
type: typeof NONLOCAL;
names: string[];
}
export declare const IF = "if";
export interface If extends Locatable {
type: typeof IF;
cond: SyntaxNode;
code: SyntaxNode[];
elif: {
cond: SyntaxNode;
code: SyntaxNode[];
}[];
else: Else;
}
export declare const WHILE = "while";
export interface While extends Locatable {
type: typeof WHILE;
cond: SyntaxNode;
code: SyntaxNode[];
else: SyntaxNode[];
}
export declare const ELSE = "else";
export interface Else extends Locatable {
type: typeof ELSE;
code: SyntaxNode[];
}
export declare const FOR = "for";
export interface For extends Locatable {
type: typeof FOR;
target: SyntaxNode[];
iter: SyntaxNode[];
code: SyntaxNode[];
else?: SyntaxNode[];
decl_location: Location;
}
export declare const COMPFOR = "comp_for";
export interface CompFor extends Locatable {
type: typeof COMPFOR;
for: SyntaxNode[];
in: SyntaxNode;
}
export declare const COMPIF = "comp_if";
export interface CompIf extends Locatable {
type: typeof COMPIF;
test: SyntaxNode;
}
export declare const TRY = "try";
export interface Try extends Locatable {
type: typeof TRY;
code: SyntaxNode[];
excepts: {
cond: SyntaxNode;
name: string;
code: SyntaxNode[];
}[];
else: SyntaxNode[];
finally: SyntaxNode[];
}
export declare const WITH = "with";
export interface With extends Locatable {
type: typeof WITH;
items: {
with: SyntaxNode;
as: SyntaxNode;
}[];
code: SyntaxNode[];
}
export declare const CALL = "call";
export interface Call extends Locatable {
type: typeof CALL;
func: SyntaxNode;
args: Argument[];
}
export declare const ARG = "arg";
export interface Argument extends Locatable {
type: typeof ARG;
actual: SyntaxNode;
keyword?: SyntaxNode;
loop?: CompFor;
varargs?: boolean;
kwargs?: boolean;
}
export declare const INDEX = "index";
export interface Index extends Locatable {
type: typeof INDEX;
value: SyntaxNode;
args: SyntaxNode[];
}
export declare const SLICE = "slice";
export interface Slice extends Locatable {
type: typeof SLICE;
start?: SyntaxNode;
stop?: SyntaxNode;
step?: SyntaxNode;
}
export declare const DOT = "dot";
export interface Dot extends Locatable {
type: typeof DOT;
value: SyntaxNode;
name: string;
}
export declare const IFEXPR = "ifexpr";
export interface IfExpr extends Locatable {
type: typeof IFEXPR;
test: SyntaxNode;
then: SyntaxNode;
else: SyntaxNode;
}
export declare const LAMBDA = "lambda";
export interface Lambda extends Locatable {
type: typeof LAMBDA;
args: Parameter[];
code: SyntaxNode;
}
export declare const UNOP = "unop";
export interface UnaryOperator extends Locatable {
type: typeof UNOP;
op: string;
operand: SyntaxNode;
}
export declare const BINOP = "binop";
export interface BinaryOperator extends Locatable {
type: typeof BINOP;
op: string;
left: SyntaxNode;
right: SyntaxNode;
}
export declare const STARRED = "starred";
export interface Starred extends Locatable {
type: typeof STARRED;
value: SyntaxNode;
}
export declare const TUPLE = "tuple";
export interface Tuple extends Locatable {
type: typeof TUPLE;
items: SyntaxNode[];
}
export declare const LIST = "list";
export interface ListExpr extends Locatable {
type: typeof LIST;
items: SyntaxNode[];
}
export declare const SET = "set";
export interface SetExpr extends Locatable {
type: typeof SET;
entries: SyntaxNode[];
comp_for?: SyntaxNode[];
}
export declare const DICT = "dict";
export interface DictExpr extends Locatable {
type: typeof DICT;
entries: {
k: SyntaxNode;
v: SyntaxNode;
}[];
comp_for?: SyntaxNode[];
}
export declare const NAME = "name";
export interface Name extends Locatable {
type: typeof NAME;
id: string;
}
export declare const LITERAL = "literal";
export interface Literal extends Locatable {
type: typeof LITERAL;
value: any;
}
export declare const CLASS = "class";
export interface Class extends Locatable {
type: typeof CLASS;
name: string;
extends: SyntaxNode[];
code: SyntaxNode[];
}
/**
* returns whether two syntax nodes are semantically equivalent
*/
export declare function isEquivalent(node1: SyntaxNode, node2: SyntaxNode): boolean;
export declare function flatten<T>(arrayArrays: T[][]): T[];
/**
* Listener for pre-order traversal of the parse tree.
*/
export interface WalkListener {
/**
* Called whenever a node is entered.
*/
onEnterNode?(node: SyntaxNode, ancestors: SyntaxNode[]): void;
/**
* Called whenever a node is exited.
*/
onExitNode?(node: SyntaxNode, ancestors: SyntaxNode[]): void;
}
/**
* Preorder tree traversal with optional listener.
*/
export declare function walk(node: SyntaxNode, walkListener?: WalkListener): SyntaxNode[];
export {};

82
types/@msrvida-python-program-analysis/rewrite-magics.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,82 @@
/**
* Result of rewriting a magic line.
*/
export declare type Rewrite = {
text?: string;
annotations?: MagicAnnotation[];
};
/**
* An annotation to hold metadata about what a magic is doing.
*/
export declare type MagicAnnotation = {
key: string;
value: string;
};
/**
* Position of a text match for magics.
*/
export declare type MatchPosition = [{
line: number;
col: number;
}, {
line: number;
col: number;
}];
/**
* Interface for command-specific magic rewrites.
*/
export interface LineMagicRewriter {
/**
* Name of the magic command this will apply to.
*/
commandName: string;
/**
* Rewrite the line magic.
* @param matchedText the original matched text from the program
* @param magicStmt the line magic text with newlines and continuations removed
* @param postion ((start_line, start_col),(end_line, end_col)) of `matchedText` within the cell
* @return rewrite operation. Leave text empty if you want to use default rewrites.
*/
rewrite(matchedText: string, magicStmt: string, position: MatchPosition): Rewrite;
}
/**
* Utility to rewrite IPython code to remove magics.
* Should be applied at to cells, not the entire program, to properly handle cell magics.
* One of the most important aspects of the rewriter is that it shouldn't change the line number
* of any of the statements in the program. If it does, this will make it impossible to
* map back from the results of code analysis to the relevant code in the editor.
*/
export declare class MagicsRewriter {
/**
* Construct a magics rewriter.
*/
constructor(lineMagicRewriters?: LineMagicRewriter[]);
/**
* Rewrite code so that it doesn't contain magics.
*/
rewrite(text: string, lineMagicRewriters?: LineMagicRewriter[]): string;
/**
* Default rewrite rule for cell magics.
*/
rewriteCellMagic(text: string): string;
/**
* Default rewrite rule for line magics.
*/
rewriteLineMagic(text: string, lineMagicRewriters?: LineMagicRewriter[]): string;
private _lineMagicRewriters;
private _defaultLineMagicRewriters;
}
/**
* Line magic rewriter for the "time" magic.
*/
export declare class TimeLineMagicRewriter implements LineMagicRewriter {
commandName: string;
rewrite(matchedText: string, magicStmt: string, position: MatchPosition): Rewrite;
}
/**
* Line magic rewriter for the "pylab" magic.
*/
export declare class PylabLineMagicRewriter implements LineMagicRewriter {
commandName: string;
rewrite(matchedText: string, magicStmt: string, position: MatchPosition): Rewrite;
}

29
types/@msrvida-python-program-analysis/set.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,29 @@
export declare class Set<T> {
private getIdentifier;
private _items;
constructor(getIdentifier: (item: T) => string, ...items: T[]);
readonly size: number;
add(...items: T[]): void;
remove(item: T): void;
pop(): T;
has(item: T): boolean;
readonly items: T[];
equals(that: Set<T>): boolean;
readonly empty: boolean;
union(...those: Set<T>[]): Set<T>;
intersect(that: Set<T>): Set<T>;
filter(predicate: (item: T) => boolean): Set<T>;
map<U>(getIdentifier: (item: U) => string, transform: (item: T) => U): Set<U>;
mapSame(transform: (item: T) => T): Set<T>;
some(predicate: (item: T) => boolean): boolean;
minus(that: Set<T>): Set<T>;
take(): T;
product(that: Set<T>): Set<[T, T]>;
}
export declare class StringSet extends Set<string> {
constructor(...items: string[]);
}
export declare class NumberSet extends Set<number> {
constructor(...items: number[]);
}
export declare function range(min: number, max: number): Set<number>;

21
types/@msrvida-python-program-analysis/slice.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,21 @@
import { Location, Module } from './python-parser';
import { DataflowAnalyzer } from './data-flow';
import { NumberSet, Set } from './set';
export declare class LocationSet extends Set<Location> {
constructor(...items: Location[]);
}
export declare enum SliceDirection {
Forward = 0,
Backward = 1
}
/**
* More general slice: given locations of important syntax nodes, find locations of all relevant
* definitions. Locations can be mapped to lines later.
* seedLocations are symbol locations.
*/
export declare function slice(ast: Module, seedLocations?: LocationSet, dataflowAnalyzer?: DataflowAnalyzer, direction?: SliceDirection): LocationSet;
/**
* Slice: given a set of lines in a program, return lines it depends on.
* OUT OF DATE: use slice() instead of sliceLines().
*/
export declare function sliceLines(code: string, relevantLineNumbers: NumberSet): NumberSet;

Просмотреть файл

@ -0,0 +1,69 @@
{
"__builtins__": {
"functions": [
"credits",
{ "name": "delattr", "updates": [1] },
"dict",
"dir",
"divmod",
"enumerate",
"eval",
"exec",
"exit",
{ "name": "filter", "higherorder": 1 },
"float",
"format",
"frozenset",
"getattr",
"globals",
"hasattr",
"hash",
"help",
"hex",
"id",
"input",
"int",
"isinstance",
"issubclass",
"iter",
"len",
"license",
"list",
"locals",
{ "name": "map", "higherorder": 1 },
"max",
"memoryview",
"min",
"next",
"object",
"oct",
"open",
"ord",
"pow",
"print",
"property",
"quit",
"range",
"repr",
"reversed",
"round",
"set",
{ "name": "setattr", "updates": [1] },
"slice",
"sorted",
"staticmethod",
"str",
"sum",
"super",
"tuple",
"type",
"vars",
"zip"
],
"types": {
"BaseException": {
"methods": [{ "name": "with_traceback", "updates": [1] }]
}
}
}
}

35
types/@msrvida-python-program-analysis/specs/index.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,35 @@
export interface FunctionSpec {
name: string;
updates?: (string | number)[];
reads?: string[];
returns?: string;
returnsType?: PythonType;
higherorder?: number;
}
export declare type FunctionDescription = string | FunctionSpec;
export declare function getFunctionName(fd: FunctionDescription): string;
export declare function isFunctionSpec(fd: FunctionDescription): fd is FunctionSpec;
export declare type PythonType = ListType | ClassType;
export declare class ListType {
elementType: PythonType;
constructor(elementType: PythonType);
}
export declare class ClassType {
spec: TypeSpec<FunctionSpec>;
constructor(spec: TypeSpec<FunctionSpec>);
}
export interface TypeSpec<FD> {
methods?: FD[];
}
export interface ModuleSpec<FD> extends TypeSpec<FD> {
functions?: FD[];
modules?: ModuleMap<FD>;
types?: {
[typeName: string]: TypeSpec<FD>;
};
}
export interface ModuleMap<FD> {
[moduleName: string]: ModuleSpec<FD>;
}
export declare type JsonSpecs = ModuleMap<FunctionDescription>;
export declare const DefaultSpecs: JsonSpecs;

Просмотреть файл

@ -0,0 +1,223 @@
{
"matplotlib": {
"functions": [
"checkdep_dvipng",
"checkdep_ghostscript",
"checkdep_inkscape",
"checkdep_pdftops",
"checkdep_ps_distiller",
"checkdep_usetex",
"compare_versions",
"cycler",
"dedent",
{ "name": "get_backend", "reads": ["rcParams"] },
"get_cachedir",
"get_configdir",
"get_data_path",
"get_home",
"get_label",
"get_py2exe_datafiles",
{ "name": "interactive", "updates": ["rcParams"] },
{ "name": "is_interactive", "reads": ["rcParams"] },
"is_url",
"matplotlib_fname",
"mplDeprecation",
{ "name": "rc", "updates": ["rcParams"] },
{ "name": "rc_context", "updates": ["rcParams"] },
{ "name": "rc_file", "updates": ["rcParams"] },
{ "name": "rc_file_defaults", "updates": ["rcParams"] },
"rc_params",
"rc_params_from_file",
{ "name": "rcdefaults", "updates": ["rcParams"] },
"sanitize_sequence",
"test",
{ "name": "tk_window_focus", "updates": ["rcParams"] },
"use",
"validate_backend"
],
"modules": {
"pyplot": {
"functions": [
"acorr",
"angle_spectrum",
"annotate",
"arrow",
"autoscale",
"autumn",
"axes",
"axhline",
"axhspan",
"axis",
"axvline",
"axvspan",
"bar",
"barbs",
"barh",
"bone",
"box",
"boxplot",
"broken_barh",
"cla",
"clabel",
"clf",
"clim",
"close",
"cm",
"cohere",
"colorbar",
"colormaps",
"connect",
"contour",
"contourf",
"cool",
"copper",
"csd",
"cycler",
"dedent",
"delaxes",
"deprecated",
"disconnect",
"docstring",
"draw",
"draw_all",
"draw_if_interactive",
"errorbar",
"eventplot",
"figaspect",
"figimage",
"figlegend",
"fignum_exists",
"figtext",
"fill",
"fill_between",
"fill_betweenx",
"findobj",
"flag",
"gca",
"gcf",
"gci",
"get",
"get_backend",
"get_cmap",
"get_current_fig_manager",
"get_figlabels",
"get_fignums",
"get_plot_commands",
"get_scale_docs",
"get_scale_names",
"getp",
"ginput",
"gray",
"grid",
"hexbin",
"hist",
"hist2d",
"hlines",
"hot",
"hsv",
"importlib",
"imread",
"imsave",
"imshow",
"inferno",
"inspect",
"install_repl_d",
"matshow",
"minorticks_off",
"minorticks_on",
"mlab",
{ "name": "figure", "reads": ["rcParams"] },
"new_figure_manager",
"nipy_spectral",
"np",
"pause",
"pcolor",
"pcolormesh",
"phase_spectrum",
"pie",
"pink",
"plasma",
"plot",
"plot_date",
"plotfile",
"plotting",
"polar",
"prism",
"psd",
"pylab_setup",
"quiver",
"quiverkey",
"rc",
"rcParams",
"rcParamsDefault",
"rcParamsOrig",
"rc_context",
"rcdefaults",
"rcsetup",
"re",
"register_cmap",
"rgrids",
"savefig",
"sca",
"scatter",
"sci",
"semilogx",
"semilogy",
"set_cmap",
"setp",
"show",
"silent_list",
"specgram",
"spring",
"spy",
"stackplot",
"stem",
"step",
"streamplot",
"style",
"subplot",
{ "name": "subplot_tool", "updates": ["rcParams"] },
"subplot2grid",
"subplots",
"subplots_adjust",
"summer",
"suptitle",
{ "name": "switch_backend", "updates": ["rcParams"] },
"sys",
"table",
"text",
"thetagrids",
"tick_params",
"ticklabel_format",
"tight_layout",
"time",
"title",
"tricontour",
"tricontourf",
"tripcolor",
"triplot",
"twinx",
"twiny",
"uninstall_repl_displayhook",
"violinplot",
"viridis",
"vlines",
"waitforbuttonpress",
"warn_deprecated",
"warnings",
"winter",
"xcorr",
{ "name": "xkcd", "reads": ["rcParams"] },
"xlabel",
"xlim",
"xscale",
"xticks",
"ylabel",
"ylim",
"yscale",
"yticks"
]
}
}
}
}

Просмотреть файл

@ -0,0 +1,540 @@
{
"numpy": {
"modules": {
"random": {
"functions": [
{ "name": "seed", "updates": ["$global"] }
]
}
},
"functions": [
{ "name": "abs", "returns": "ndarray" },
{ "name": "absolute", "returns": "ndarray" },
{ "name": "add", "returns": "ndarray" },
{ "name": "add_docstring", "returns": "ndarray" },
{ "name": "add_newdoc", "returns": "ndarray" },
{ "name": "add_newdoc_ufunc", "returns": "ndarray" },
{ "name": "alen", "returns": "ndarray" },
{ "name": "all", "returns": "ndarray" },
{ "name": "allclose", "returns": "ndarray" },
{ "name": "alltrue", "returns": "ndarray" },
{ "name": "amax", "returns": "ndarray" },
{ "name": "amin", "returns": "ndarray" },
{ "name": "angle", "returns": "ndarray" },
{ "name": "any", "returns": "ndarray" },
{ "name": "append", "returns": "ndarray" },
{ "name": "apply_along_axis", "returns": "ndarray" },
{ "name": "apply_over_axes", "returns": "ndarray" },
{ "name": "arange", "returns": "ndarray" },
{ "name": "arccos", "returns": "ndarray" },
{ "name": "arccosh", "returns": "ndarray" },
{ "name": "arcsin", "returns": "ndarray" },
{ "name": "arcsinh", "returns": "ndarray" },
{ "name": "arctan", "returns": "ndarray" },
{ "name": "arctan2", "returns": "ndarray" },
{ "name": "arctanh", "returns": "ndarray" },
{ "name": "argmax", "returns": "ndarray" },
{ "name": "argmin", "returns": "ndarray" },
{ "name": "argpartition", "returns": "ndarray" },
{ "name": "argsort", "returns": "ndarray" },
{ "name": "argwhere", "returns": "ndarray" },
{ "name": "around", "returns": "ndarray" },
{ "name": "array", "returns": "ndarray" },
{ "name": "array2string", "returns": "ndarray" },
{ "name": "array_equal", "returns": "ndarray" },
{ "name": "array_equiv", "returns": "ndarray" },
{ "name": "array_repr", "returns": "ndarray" },
{ "name": "array_split", "returns": "ndarray" },
{ "name": "array_str", "returns": "ndarray" },
{ "name": "asanyarray", "returns": "ndarray" },
{ "name": "asarray", "returns": "ndarray" },
{ "name": "asarray_chkfinite", "returns": "ndarray" },
{ "name": "ascontiguousarray", "returns": "ndarray" },
{ "name": "asfarray", "returns": "ndarray" },
{ "name": "asfortranarray", "returns": "ndarray" },
{ "name": "asmatrix", "returns": "ndarray" },
{ "name": "asscalar", "returns": "ndarray" },
{ "name": "atleast_1d", "returns": "ndarray" },
{ "name": "atleast_2d", "returns": "ndarray" },
{ "name": "atleast_3d", "returns": "ndarray" },
{ "name": "average", "returns": "ndarray" },
{ "name": "bartlett", "returns": "ndarray" },
{ "name": "base_repr", "returns": "ndarray" },
{ "name": "binary_repr", "returns": "ndarray" },
{ "name": "bincount", "returns": "ndarray" },
{ "name": "bitwise_and", "returns": "ndarray" },
{ "name": "bitwise_not", "returns": "ndarray" },
{ "name": "bitwise_or", "returns": "ndarray" },
{ "name": "bitwise_xor", "returns": "ndarray" },
{ "name": "blackman", "returns": "ndarray" },
{ "name": "block", "returns": "ndarray" },
{ "name": "bmat", "returns": "ndarray" },
{ "name": "broadcast", "returns": "broadcast" },
{ "name": "broadcast_arrays", "returns": "broadcast" },
{ "name": "broadcast_to", "returns": "broadcast" },
{ "name": "busday_count", "returns": "ndarray" },
{ "name": "busday_offset", "returns": "ndarray" },
"byte_bounds",
{ "name": "can_cast", "returns": "ndarray" },
{ "name": "cbrt", "returns": "ndarray" },
{ "name": "cdouble", "returns": "ndarray" },
{ "name": "ceil", "returns": "ndarray" },
{ "name": "choose", "returns": "ndarray" },
{ "name": "clip", "returns": "ndarray" },
{ "name": "clongdouble", "returns": "ndarray" },
{ "name": "clongfloat", "returns": "ndarray" },
{ "name": "column_stack", "returns": "ndarray" },
{ "name": "common_type", "returns": "ndarray" },
{ "name": "compare_chararrays", "returns": "ndarray" },
{ "name": "compress", "returns": "ndarray" },
{ "name": "concatenate", "returns": "ndarray" },
{ "name": "conj", "returns": "ndarray" },
{ "name": "conjugate", "returns": "ndarray" },
{ "name": "convolve", "returns": "ndarray" },
{ "name": "copy", "returns": "ndarray" },
{ "name": "copysign", "returns": "ndarray" },
{ "name": "copyto", "returns": "ndarray" },
{ "name": "corrcoef", "returns": "ndarray" },
{ "name": "correlate", "returns": "ndarray" },
{ "name": "cos", "returns": "ndarray" },
{ "name": "cosh", "returns": "ndarray" },
{ "name": "count_nonzero", "returns": "ndarray" },
{ "name": "cov", "returns": "ndarray" },
{ "name": "cross", "returns": "ndarray" },
{ "name": "csingle", "returns": "ndarray" },
{ "name": "cumprod", "returns": "ndarray" },
{ "name": "cumproduct", "returns": "ndarray" },
{ "name": "cumsum", "returns": "ndarray" },
{ "name": "datetime64", "returns": "ndarray" },
{ "name": "datetime_as_string", "returns": "ndarray" },
{ "name": "datetime_data", "returns": "ndarray" },
{ "name": "deg2rad", "returns": "ndarray" },
{ "name": "degrees", "returns": "ndarray" },
{ "name": "delete", "returns": "ndarray" },
{ "name": "deprecate", "returns": "ndarray" },
{ "name": "deprecate_with_doc", "returns": "ndarray" },
{ "name": "diag", "returns": "ndarray" },
{ "name": "diag_indices", "returns": "ndarray" },
{ "name": "diag_indices_from", "returns": "ndarray" },
{ "name": "diagflat", "returns": "ndarray" },
{ "name": "diagonal", "returns": "ndarray" },
{ "name": "diff", "returns": "ndarray" },
{ "name": "digitize", "returns": "ndarray" },
{ "name": "disp", "returns": "ndarray" },
{ "name": "divide", "returns": "ndarray" },
{ "name": "divmod", "returns": "ndarray" },
{ "name": "dot", "returns": "ndarray" },
{ "name": "double", "returns": "ndarray" },
{ "name": "dsplit", "returns": "ndarray" },
{ "name": "dstack", "returns": "ndarray" },
{ "name": "dtype", "returns": "ndarray" },
{ "name": "ediff1d", "returns": "ndarray" },
{ "name": "einsum", "returns": "ndarray" },
{ "name": "einsum_path", "returns": "ndarray" },
{ "name": "empty", "returns": "ndarray" },
{ "name": "empty_like", "returns": "ndarray" },
{ "name": "equal", "returns": "ndarray" },
{ "name": "errstate", "returns": "ndarray" },
{ "name": "exp", "returns": "ndarray" },
{ "name": "exp2", "returns": "ndarray" },
{ "name": "expand_dims", "returns": "ndarray" },
{ "name": "expm1", "returns": "ndarray" },
{ "name": "extract", "returns": "ndarray" },
{ "name": "eye", "returns": "ndarray" },
{ "name": "fabs", "returns": "ndarray" },
{ "name": "fastCopyAndTranspose", "returns": "ndarray" },
{ "name": "fill_diagonal", "returns": "ndarray" },
{ "name": "find_common_type", "returns": "ndarray" },
{ "name": "finfo", "returns": "ndarray" },
{ "name": "fix", "returns": "ndarray" },
{ "name": "flatiter", "returns": "ndarray" },
{ "name": "flatnonzero", "returns": "ndarray" },
{ "name": "flexible", "returns": "ndarray" },
{ "name": "flip", "returns": "ndarray" },
{ "name": "fliplr", "returns": "ndarray" },
{ "name": "flipud", "returns": "ndarray" },
{ "name": "float", "returns": "ndarray" },
{ "name": "float16", "returns": "ndarray" },
{ "name": "float32", "returns": "ndarray" },
{ "name": "float64", "returns": "ndarray" },
{ "name": "float_", "returns": "ndarray" },
{ "name": "float_power", "returns": "ndarray" },
{ "name": "floating", "returns": "ndarray" },
{ "name": "floor", "returns": "ndarray" },
{ "name": "floor_divide", "returns": "ndarray" },
{ "name": "fmax", "returns": "ndarray" },
{ "name": "fmin", "returns": "ndarray" },
{ "name": "fmod", "returns": "ndarray" },
{ "name": "format_float_positional", "returns": "ndarray" },
{ "name": "format_float_scientific", "returns": "ndarray" },
{ "name": "format_parser", "returns": "ndarray" },
{ "name": "frexp", "returns": "ndarray" },
{ "name": "frombuffer", "returns": "ndarray" },
{ "name": "fromfile", "returns": "ndarray" },
{ "name": "fromfunction", "returns": "ndarray" },
{ "name": "fromiter", "returns": "ndarray" },
{ "name": "frompyfunc", "returns": "ndarray" },
{ "name": "fromregex", "returns": "ndarray" },
{ "name": "fromstring", "returns": "ndarray" },
{ "name": "full", "returns": "ndarray" },
{ "name": "full_like", "returns": "ndarray" },
{ "name": "fv", "returns": "ndarray" },
{ "name": "gcd", "returns": "ndarray" },
{ "name": "generic", "returns": "ndarray" },
{ "name": "genfromtxt", "returns": "ndarray" },
{ "name": "geomspace", "returns": "ndarray" },
{ "name": "get_array_wrap", "returns": "ndarray" },
{ "name": "get_include", "returns": "ndarray" },
{ "name": "get_printoptions", "returns": "ndarray" },
{ "name": "getbufsize", "returns": "ndarray" },
{ "name": "geterr", "returns": "ndarray" },
{ "name": "geterrcall", "returns": "ndarray" },
{ "name": "geterrobj", "returns": "ndarray" },
{ "name": "gradient", "returns": "ndarray" },
{ "name": "greater", "returns": "ndarray" },
{ "name": "greater_equal", "returns": "ndarray" },
{ "name": "half", "returns": "ndarray" },
{ "name": "hamming", "returns": "ndarray" },
{ "name": "hanning", "returns": "ndarray" },
{ "name": "heaviside", "returns": "ndarray" },
{ "name": "histogram", "returns": "ndarray" },
{ "name": "histogram2d", "returns": "ndarray" },
{ "name": "histogram_bin_edges", "returns": "ndarray" },
{ "name": "histogramdd", "returns": "ndarray" },
{ "name": "hsplit", "returns": "ndarray" },
{ "name": "hstack", "returns": "ndarray" },
{ "name": "hypot", "returns": "ndarray" },
{ "name": "i0", "returns": "ndarray" },
{ "name": "identity", "returns": "ndarray" },
{ "name": "iinfo", "returns": "ndarray" },
{ "name": "imag", "returns": "ndarray" },
{ "name": "in1d", "returns": "ndarray" },
{ "name": "indices", "returns": "ndarray" },
{ "name": "inexact", "returns": "ndarray" },
{ "name": "info", "returns": "ndarray" },
{ "name": "inner", "returns": "ndarray" },
{ "name": "insert", "returns": "ndarray" },
{ "name": "int_asbuffer", "returns": "ndarray" },
{ "name": "intc", "returns": "ndarray" },
{ "name": "integer", "returns": "ndarray" },
{ "name": "interp", "returns": "ndarray" },
{ "name": "intersect1d", "returns": "ndarray" },
{ "name": "intp", "returns": "ndarray" },
{ "name": "invert", "returns": "ndarray" },
{ "name": "ipmt", "returns": "ndarray" },
{ "name": "irr", "returns": "ndarray" },
{ "name": "is_busday", "returns": "ndarray" },
{ "name": "isclose", "returns": "ndarray" },
{ "name": "iscomplex", "returns": "ndarray" },
{ "name": "iscomplexobj", "returns": "ndarray" },
{ "name": "isfinite", "returns": "ndarray" },
{ "name": "isfortran", "returns": "ndarray" },
{ "name": "isin", "returns": "ndarray" },
{ "name": "isinf", "returns": "ndarray" },
{ "name": "isnan", "returns": "ndarray" },
{ "name": "isnat", "returns": "ndarray" },
{ "name": "isneginf", "returns": "ndarray" },
{ "name": "isposinf", "returns": "ndarray" },
{ "name": "isreal", "returns": "ndarray" },
{ "name": "isrealobj", "returns": "ndarray" },
{ "name": "isscalar", "returns": "ndarray" },
{ "name": "issctype", "returns": "ndarray" },
{ "name": "issubclass_", "returns": "ndarray" },
{ "name": "issubdtype", "returns": "ndarray" },
{ "name": "issubsctype", "returns": "ndarray" },
{ "name": "iterable", "returns": "ndarray" },
{ "name": "ix_", "returns": "ndarray" },
{ "name": "kaiser", "returns": "ndarray" },
{ "name": "kron", "returns": "ndarray" },
{ "name": "lcm", "returns": "ndarray" },
{ "name": "ldexp", "returns": "ndarray" },
{ "name": "left_shift", "returns": "ndarray" },
{ "name": "less", "returns": "ndarray" },
{ "name": "less_equal", "returns": "ndarray" },
{ "name": "lexsort", "returns": "ndarray" },
{ "name": "linspace", "returns": "ndarray" },
{ "name": "load", "returns": "ndarray" },
{ "name": "loads", "returns": "ndarray" },
{ "name": "loadtxt", "returns": "ndarray" },
{ "name": "log", "returns": "ndarray" },
{ "name": "log10", "returns": "ndarray" },
{ "name": "log1p", "returns": "ndarray" },
{ "name": "log2", "returns": "ndarray" },
{ "name": "logaddexp", "returns": "ndarray" },
{ "name": "logaddexp2", "returns": "ndarray" },
{ "name": "logical_and", "returns": "ndarray" },
{ "name": "logical_not", "returns": "ndarray" },
{ "name": "logical_or", "returns": "ndarray" },
{ "name": "logical_xor", "returns": "ndarray" },
{ "name": "logspace", "returns": "ndarray" },
{ "name": "long", "returns": "ndarray" },
{ "name": "longcomplex", "returns": "ndarray" },
{ "name": "longdouble", "returns": "ndarray" },
{ "name": "longfloat", "returns": "ndarray" },
{ "name": "longlong", "returns": "ndarray" },
{ "name": "lookfor", "returns": "ndarray" },
{ "name": "mafromtxt", "returns": "ndarray" },
{ "name": "mask_indices", "returns": "ndarray" },
{ "name": "mat", "returns": "ndarray" },
{ "name": "matmul", "returns": "ndarray" },
{ "name": "matrix", "returns": "ndarray" },
{ "name": "max", "returns": "ndarray" },
{ "name": "maximum", "returns": "ndarray" },
{ "name": "maximum_sctype", "returns": "ndarray" },
"may_share_memory",
{ "name": "mean", "returns": "ndarray" },
{ "name": "median", "returns": "ndarray" },
{ "name": "memmap", "returns": "ndarray" },
{ "name": "meshgrid", "returns": "ndarray" },
{ "name": "min", "returns": "ndarray" },
{ "name": "min_scalar_type", "returns": "ndarray" },
{ "name": "minimum", "returns": "ndarray" },
{ "name": "mintypecode", "returns": "ndarray" },
{ "name": "mirr", "returns": "ndarray" },
{ "name": "mod", "returns": "ndarray" },
{ "name": "modf", "returns": "ndarray" },
{ "name": "moveaxis", "returns": "ndarray" },
{ "name": "msort", "returns": "ndarray" },
{ "name": "multiply", "returns": "ndarray" },
{ "name": "nan_to_num", "returns": "ndarray" },
{ "name": "nanargmax", "returns": "ndarray" },
{ "name": "nanargmin", "returns": "ndarray" },
{ "name": "nancumprod", "returns": "ndarray" },
{ "name": "nancumsum", "returns": "ndarray" },
{ "name": "nanmax", "returns": "ndarray" },
{ "name": "nanmean", "returns": "ndarray" },
{ "name": "nanmedian", "returns": "ndarray" },
{ "name": "nanmin", "returns": "ndarray" },
{ "name": "nanpercentile", "returns": "ndarray" },
{ "name": "nanprod", "returns": "ndarray" },
{ "name": "nanquantile", "returns": "ndarray" },
{ "name": "nanstd", "returns": "ndarray" },
{ "name": "nansum", "returns": "ndarray" },
{ "name": "nanvar", "returns": "ndarray" },
{ "name": "ndarray", "returns": "ndarray" },
{ "name": "ndenumerate", "returns": "ndarray" },
{ "name": "ndfromtxt", "returns": "ndarray" },
{ "name": "ndim", "returns": "ndarray" },
{ "name": "negative", "returns": "ndarray" },
{ "name": "nested_iters", "returns": "ndarray" },
{ "name": "nextafter", "returns": "ndarray" },
{ "name": "nonzero", "returns": "ndarray" },
{ "name": "not_equal", "returns": "ndarray" },
{ "name": "nper", "returns": "ndarray" },
{ "name": "npv", "returns": "ndarray" },
{ "name": "obj2sctype", "returns": "ndarray" },
{ "name": "ones", "returns": "ndarray" },
{ "name": "ones_like", "returns": "ndarray" },
{ "name": "outer", "returns": "ndarray" },
{ "name": "packbits", "returns": "ndarray" },
{ "name": "pad", "returns": "ndarray" },
{ "name": "partition", "returns": "ndarray" },
{ "name": "percentile", "returns": "ndarray" },
{ "name": "piecewise", "returns": "ndarray" },
{ "name": "place", "returns": "ndarray" },
{ "name": "pmt", "returns": "ndarray" },
{ "name": "poly", "returns": "ndarray" },
{ "name": "polyadd", "returns": "ndarray" },
{ "name": "polyder", "returns": "ndarray" },
{ "name": "polydiv", "returns": "ndarray" },
{ "name": "polyfit", "returns": "ndarray" },
{ "name": "polyint", "returns": "ndarray" },
{ "name": "polymul", "returns": "ndarray" },
{ "name": "polysub", "returns": "ndarray" },
{ "name": "polyval", "returns": "ndarray" },
{ "name": "positive", "returns": "ndarray" },
{ "name": "power", "returns": "ndarray" },
{ "name": "ppmt", "returns": "ndarray" },
{ "name": "printoptions", "returns": "ndarray" },
{ "name": "prod", "returns": "ndarray" },
{ "name": "product", "returns": "ndarray" },
{ "name": "promote_types", "returns": "ndarray" },
{ "name": "ptp", "returns": "ndarray" },
{ "name": "put", "returns": "ndarray" },
{ "name": "put_along_axis", "returns": "ndarray" },
{ "name": "putmask", "returns": "ndarray" },
{ "name": "pv", "returns": "ndarray" },
{ "name": "quantile", "returns": "ndarray" },
{ "name": "rad2deg", "returns": "ndarray" },
{ "name": "radians", "returns": "ndarray" },
{ "name": "rank", "returns": "ndarray" },
{ "name": "rate", "returns": "ndarray" },
{ "name": "ravel", "returns": "ndarray" },
{ "name": "ravel_multi_index", "returns": "ndarray" },
{ "name": "real", "returns": "ndarray" },
{ "name": "real_if_close", "returns": "ndarray" },
{ "name": "recfromcsv", "returns": "ndarray" },
{ "name": "recfromtxt", "returns": "ndarray" },
{ "name": "reciprocal", "returns": "ndarray" },
{ "name": "remainder", "returns": "ndarray" },
{ "name": "repeat", "returns": "ndarray" },
{ "name": "require", "returns": "ndarray" },
{ "name": "reshape", "returns": "ndarray" },
{ "name": "resize", "returns": "ndarray" },
{ "name": "result_type", "returns": "ndarray" },
{ "name": "right_shift", "returns": "ndarray" },
{ "name": "rint", "returns": "ndarray" },
{ "name": "roll", "returns": "ndarray" },
{ "name": "rollaxis", "returns": "ndarray" },
{ "name": "roots", "returns": "ndarray" },
{ "name": "rot90", "returns": "ndarray" },
{ "name": "round", "returns": "ndarray" },
{ "name": "round_", "returns": "ndarray" },
{ "name": "row_stack", "returns": "ndarray" },
"safe_eval",
"save",
"savetxt",
"savez",
"savez_compressed",
{ "name": "sctype2char", "returns": "ndarray" },
{ "name": "searchsorted", "returns": "ndarray" },
{ "name": "select", "returns": "ndarray" },
{ "name": "set_numeric_ops", "returns": "ndarray" },
{ "name": "set_printoptions", "returns": "ndarray" },
{ "name": "set_string_function", "returns": "ndarray" },
{ "name": "setbufsize", "returns": "ndarray" },
{ "name": "setdiff1d", "returns": "ndarray" },
{ "name": "seterr", "returns": "ndarray" },
{ "name": "seterrcall", "returns": "ndarray" },
{ "name": "seterrobj", "returns": "ndarray" },
{ "name": "setxor1d", "returns": "ndarray" },
{ "name": "shape", "returns": "ndarray" },
{ "name": "shares_memory", "returns": "ndarray" },
{ "name": "short", "returns": "ndarray" },
{ "name": "show_config", "returns": "ndarray" },
{ "name": "sign", "returns": "ndarray" },
{ "name": "signbit", "returns": "ndarray" },
{ "name": "signedinteger", "returns": "ndarray" },
{ "name": "sin", "returns": "ndarray" },
{ "name": "sinc", "returns": "ndarray" },
{ "name": "single", "returns": "ndarray" },
{ "name": "singlecomplex", "returns": "ndarray" },
{ "name": "sinh", "returns": "ndarray" },
"size",
{ "name": "sometrue", "returns": "ndarray" },
{ "name": "sort", "returns": "ndarray" },
{ "name": "sort_complex", "returns": "ndarray" },
{ "name": "source", "returns": "ndarray" },
{ "name": "spacing", "returns": "ndarray" },
{ "name": "split", "returns": "ndarray" },
{ "name": "sqrt", "returns": "ndarray" },
{ "name": "square", "returns": "ndarray" },
{ "name": "squeeze", "returns": "ndarray" },
{ "name": "stack", "returns": "ndarray" },
{ "name": "std", "returns": "ndarray" },
{ "name": "subtract", "returns": "ndarray" },
{ "name": "sum", "returns": "ndarray" },
{ "name": "swapaxes", "returns": "ndarray" },
{ "name": "take", "returns": "ndarray" },
{ "name": "take_along_axis", "returns": "ndarray" },
{ "name": "tan", "returns": "ndarray" },
{ "name": "tanh", "returns": "ndarray" },
{ "name": "tensordot", "returns": "ndarray" },
{ "name": "tile", "returns": "ndarray" },
{ "name": "timedelta64", "returns": "ndarray" },
{ "name": "trace", "returns": "ndarray" },
{ "name": "transpose", "returns": "ndarray" },
{ "name": "trapz", "returns": "ndarray" },
{ "name": "tri", "returns": "ndarray" },
{ "name": "tril", "returns": "ndarray" },
{ "name": "tril_indices", "returns": "ndarray" },
{ "name": "tril_indices_from", "returns": "ndarray" },
{ "name": "trim_zeros", "returns": "ndarray" },
{ "name": "triu", "returns": "ndarray" },
{ "name": "triu_indices", "returns": "ndarray" },
{ "name": "triu_indices_from", "returns": "ndarray" },
{ "name": "true_divide", "returns": "ndarray" },
{ "name": "trunc", "returns": "ndarray" },
"typename",
{ "name": "union1d", "returns": "ndarray" },
{ "name": "unique", "returns": "ndarray" },
{ "name": "unpackbits", "returns": "ndarray" },
{ "name": "unravel_index", "returns": "ndarray" },
{ "name": "unwrap", "returns": "ndarray" },
{ "name": "vander", "returns": "ndarray" },
{ "name": "var", "returns": "ndarray" },
{ "name": "vdot", "returns": "ndarray" },
{ "name": "vectorize", "returns": "ndarray" },
{ "name": "vsplit", "returns": "ndarray" },
{ "name": "vstack", "returns": "ndarray" },
{ "name": "where", "returns": "ndarray" },
{ "name": "who", "returns": "ndarray" },
{ "name": "zeros", "returns": "ndarray" },
{ "name": "zeros_like", "returns": "ndarray" }
],
"types": {
"ndarray": {
"methods": [
{ "name": "all", "returns": "ndarray" },
{ "name": "any", "returns": "ndarray" },
{ "name": "argmax", "returns": "ndarray" },
{ "name": "argmin", "returns": "ndarray" },
{ "name": "argpartition", "returns": "ndarray" },
{ "name": "argsort", "returns": "ndarray" },
{ "name": "astype", "returns": "ndarray" },
{ "name": "base", "returns": "ndarray" },
{ "name": "byteswap", "returns": "ndarray", "updates": [0] },
{ "name": "choose", "returns": "ndarray" },
{ "name": "clip", "returns": "ndarray" },
{ "name": "compress", "returns": "ndarray" },
{ "name": "conj", "returns": "ndarray" },
{ "name": "conjugate", "returns": "ndarray" },
{ "name": "copy", "returns": "ndarray" },
{ "name": "cumprod", "returns": "ndarray" },
{ "name": "cumsum", "returns": "ndarray" },
{ "name": "data", "returns": "ndarray" },
{ "name": "diagonal", "returns": "ndarray" },
{ "name": "dot", "returns": "ndarray" },
{ "name": "dump", "returns": "ndarray" },
{ "name": "dumps", "returns": "ndarray" },
{ "name": "fill", "returns": "ndarray", "updates": [0] },
{ "name": "flags", "returns": "ndarray" },
{ "name": "flat", "returns": "ndarray" },
{ "name": "flatten", "returns": "ndarray" },
{ "name": "getfield", "returns": "ndarray" },
{ "name": "imag", "returns": "ndarray" },
{ "name": "item", "returns": "ndarray" },
{ "name": "itemset", "returns": "ndarray" },
{ "name": "itemsize", "returns": "ndarray" },
{ "name": "max", "returns": "ndarray" },
{ "name": "mean", "returns": "ndarray" },
{ "name": "min", "returns": "ndarray" },
{ "name": "nbytes", "returns": "ndarray" },
{ "name": "ndim", "returns": "ndarray" },
{ "name": "newbyteorder", "returns": "ndarray" },
{ "name": "nonzero", "returns": "ndarray" },
{ "name": "partition", "returns": "ndarray" },
{ "name": "prod", "returns": "ndarray" },
{ "name": "ptp", "returns": "ndarray" },
{ "name": "put", "returns": "ndarray" },
{ "name": "ravel", "returns": "ndarray" },
{ "name": "real", "returns": "ndarray" },
{ "name": "repeat", "returns": "ndarray" },
{ "name": "reshape", "returns": "ndarray" },
{ "name": "resize", "returns": "ndarray", "updates": [0] },
{ "name": "round", "returns": "ndarray" },
{ "name": "searchsorted", "returns": "ndarray" },
{ "name": "setfield", "returns": "ndarray" },
{ "name": "setflags", "returns": "ndarray" },
{ "name": "shape", "returns": "ndarray" },
{ "name": "size", "returns": "ndarray" },
{ "name": "sort", "updates": [0] },
{ "name": "squeeze", "returns": "ndarray", "updates": [0] },
{ "name": "std", "returns": "ndarray" },
{ "name": "strides", "returns": "ndarray" },
{ "name": "sum", "returns": "ndarray" },
{ "name": "swapaxes", "returns": "ndarray" },
{ "name": "take", "returns": "ndarray" },
{ "name": "tobytes", "returns": "ndarray" },
{ "name": "tofile", "returns": "ndarray" },
{ "name": "tolist", "returns": "ndarray" },
{ "name": "tostring", "returns": "ndarray" },
{ "name": "trace", "returns": "ndarray" },
{ "name": "transpose", "returns": "ndarray" },
{ "name": "var", "returns": "ndarray" },
{ "name": "view", "returns": "ndarray" }
]
}
}
}
}

Просмотреть файл

@ -0,0 +1,276 @@
{
"pandas": {
"functions": [
"array",
"bdate_range",
"concat",
"crosstab",
"cut",
"date_range",
"datetime",
"describe_option",
"eval",
"factorize",
"get_dummies",
"get_option",
"infer_freq",
"interval_range",
"isna",
"isnull",
"lreshape",
"melt",
"merge",
"merge_asof",
"merge_ordered",
"notna",
"notnull",
"option_context",
"period_range",
"pivot",
"pivot_table",
"qcut",
{ "name": "read_clipboard", "returns": "DataFrame" },
{ "name": "read_csv", "returns": "DataFrame" },
{ "name": "read_excel", "returns": "DataFrame" },
{ "name": "read_feather", "returns": "DataFrame" },
{ "name": "read_fwf", "returns": "DataFrame" },
{ "name": "read_gbq", "returns": "DataFrame" },
{ "name": "read_hdf", "returns": "DataFrame" },
{ "name": "read_html", "returns": "DataFrame" },
{ "name": "read_json", "returns": "DataFrame" },
{ "name": "read_msgpack", "returns": "DataFrame" },
{ "name": "read_parquet", "returns": "DataFrame" },
{ "name": "read_pickle", "returns": "DataFrame" },
{ "name": "read_sas", "returns": "DataFrame" },
{ "name": "read_sql", "returns": "DataFrame" },
{ "name": "read_sql_query", "returns": "DataFrame" },
{ "name": "read_sql_table", "returns": "DataFrame" },
{ "name": "read_stata", "returns": "DataFrame" },
{ "name": "read_table", "returns": "DataFrame" },
"reset_option",
"set_eng_float_format",
"set_option",
"show_versions",
"test",
"timedelta_range",
"to_datetime",
"to_msgpack",
"to_numeric",
"to_pickle",
"to_timedelta",
"unique",
"value_counts",
"wide_to_long"
],
"types": {
"DataFrame": {
"methods": [
"abs",
"add",
"add_prefix",
"add_suffix",
"agg",
"aggregate",
"align",
"all",
"any",
"append",
"apply",
"applymap",
"as_blocks",
"as_matrix",
"asfreq",
"asof",
"assign",
"astype",
"at_time",
"between_time",
"bfill",
"bool",
"boxplot",
"clip",
"clip_lower",
"clip_upper",
"combine",
"combine_first",
"compound",
"convert_objects",
"copy",
"corr",
"corrwith",
"count",
"cov",
"cummax",
"cummin",
"cumprod",
"cumsum",
"describe",
"diff",
"div",
"divide",
"dot",
"drop",
"drop_duplicates",
"droplevel",
"dropna",
"duplicated",
"eq",
"equals",
"eval",
"ewm",
"expanding",
{ "name": "ffill", "updates": [0] },
{ "name": "fillna", "updates": [0] },
"filter",
"first",
"first_valid_index",
"floordiv",
"from_csv",
"from_dict",
"from_items",
"from_records",
"ge",
"get",
"get_dtype_counts",
"get_ftype_counts",
"get_value",
"get_values",
"groupby",
"gt",
"head",
"hist",
"idxmax",
"idxmin",
"infer_objects",
"info",
"insert",
"interpolate",
"isin",
"isna",
"isnull",
"items",
"iteritems",
"iterrows",
"itertuples",
"join",
"keys",
"kurt",
"kurtosis",
"last",
"last_valid_index",
"le",
"lookup",
"lt",
"mad",
"mask",
"max",
"mean",
"median",
{ "name": "melt", "updates": [0] },
"memory_usage",
"merge",
"min",
"mod",
"mode",
"mul",
"multiply",
"ne",
"nlargest",
"notna",
"notnull",
"nsmallest",
"nunique",
"pct_change",
"pipe",
"pivot",
"pivot_table",
"plot",
{ "name": "pop", "updates": [0] },
"pow",
"prod",
"product",
"quantile",
"query",
"radd",
"rank",
"rdiv",
"reindex",
"reindex_axis",
"reindex_like",
"rename",
"rename_axis",
"reorder_levels",
"replace",
"resample",
"reset_index",
"rfloordiv",
"rmod",
"rmul",
"rolling",
"round",
"rpow",
"rsub",
"rtruediv",
"sample",
"select",
"select_dtypes",
"sem",
"set_axis",
"set_index",
"set_value",
"shift",
"skew",
"slice_shift",
"sort_index",
"sort_values",
"squeeze",
"stack",
"std",
"sub",
"subtract",
"sum",
"swapaxes",
"swaplevel",
"tail",
"take",
"to_clipboard",
"to_csv",
"to_dense",
"to_dict",
"to_excel",
"to_feather",
"to_gbq",
"to_hdf",
"to_html",
"to_json",
"to_latex",
"to_msgpack",
"to_numpy",
"to_panel",
"to_parquet",
"to_period",
"to_pickle",
"to_records",
"to_sparse",
"to_sql",
"to_stata",
"to_string",
"to_timestamp",
"to_xarray",
"transform",
"transpose",
"truediv",
"truncate",
"tshift",
"tz_convert",
"tz_localize",
"unstack",
{ "name": "update", "updates": [0] },
"var",
"where",
"xs"
]
}
}
}
}

Просмотреть файл

@ -0,0 +1,28 @@
{
"random": {
"functions": [
"betavariate",
"choice",
"choices",
"expovariate",
"gammavariate",
"gauss",
"getrandbits",
"getstate",
"lognormvariate",
"normalvariate",
"paretovariate",
"randint",
"random",
"randrange",
"sample",
{ "name": "seed", "updates": ["$global"] },
{ "name": "setstate", "updates": ["$global"] },
"shuffle",
"triangular",
"uniform",
"vonmisesvariate",
"weibullvariate"
]
}
}

Разница между файлами не показана из-за своего большого размера Загрузить разницу

26
types/@msrvida-python-program-analysis/symbol-table.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,26 @@
import { FunctionSpec, TypeSpec, ModuleSpec, ModuleMap, JsonSpecs } from "./specs";
import * as ast from './python-parser';
export declare class SymbolTable {
private jsonSpecs;
modules: ModuleMap<FunctionSpec>;
types: {
[name: string]: TypeSpec<FunctionSpec>;
};
functions: {
[name: string]: FunctionSpec;
};
constructor(jsonSpecs: JsonSpecs);
lookupFunction(name: string): FunctionSpec | undefined;
lookupNode(func: ast.SyntaxNode): FunctionSpec;
lookupModuleFunction(modName: string, funcName: string): FunctionSpec | undefined;
importModule(modulePath: string, alias: string): ModuleSpec<FunctionSpec>;
private resolveFunction;
private resolveType;
private makePythonType;
private resolveModule;
importModuleDefinitions(namePath: string, imports: {
path: string;
alias: string;
}[]): string[];
private lookupSpec;
}

Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1 @@
export {};

1
types/@msrvida-python-program-analysis/test/cfg.test.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

1
types/@msrvida-python-program-analysis/test/graph.test.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1 @@
export {};

1
types/@msrvida-python-program-analysis/test/merge.test.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1,155 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"in_vscode=0\n",
"\n",
"df = pd.read_parquet(\"data/training.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"np.random.seed(0xc0ffeeee)\n",
"df_samp = df.sample(3)\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import CountVectorizer\n",
"\n",
"vectorizer = CountVectorizer(token_pattern='(?u)\\\\b[A-Za-z]\\\\w+\\\\b', max_features = 20)\n",
"counts = vectorizer.fit_transform(df_samp[\"text\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import TfidfTransformer\n",
"\n",
"tfidf_transformer = TfidfTransformer()\n",
"df_tfidf = tfidf_transformer.fit_transform(counts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import HashingVectorizer\n",
"BUCKETS=1024\n",
"\n",
"hv = HashingVectorizer(norm=None, token_pattern='(?u)\\\\b[A-Za-z]\\\\w+\\\\b', n_features=BUCKETS, alternate_sign = False)\n",
"hv\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hvcounts = hv.fit_transform(df[\"text\"])\n",
"hvcounts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tfidf_transformer = TfidfTransformer()\n",
"hvdf_tfidf = tfidf_transformer.fit_transform(hvcounts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#PCA projection so that output can be visualised\n",
"\n",
"import sklearn.decomposition\n",
"\n",
"DIMENSIONS = 2\n",
"\n",
"pca2 = sklearn.decomposition.TruncatedSVD(DIMENSIONS)\n",
"\n",
"pca_a = pca2.fit_transform(hvdf_tfidf)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca_df = pd.DataFrame(pca_a, columns=[\"x\", \"y\"])\n",
"pca_df.sample(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot_data = pd.concat([df.reset_index(), pca_df], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import altair as alt\n",
"alt.renderers.enable('nteract')\n",
"\n",
"alt.Chart(plot_data.sample(1000)).encode(x=\"x\", y=\"y\", color=\"label\").mark_point().interactive()\n",
"\n",
""
]
}
],
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"orig_nbformat": 2,
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3
}
}

Просмотреть файл

@ -0,0 +1,404 @@
{
"cells": [
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"In this notebook we will process the synthetic Austen/food reviews data and convert it into feature vectors. In later notebooks these feature vectors will be the inputs to models which we will train and eventually use to identify spam. \n",
"\n",
"This notebook uses [term frequency-inverse document frequency](https://en.wikipedia.org/wiki/Tf–idf), or tf-idf, to generate feature vectors. Tf-idf is commonly used to summarise text data, and it aims to capture how important different words are within a set of documents. Tf-idf combines a normalized word count (or term frequency) with the inverse document frequency (or a measure of how common a word is across all documents) in order to identify words, or terms, which are 'interesting' or important within the document. \n",
"\n",
"\n",
"We begin by loading in the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"in_vscode=0\n",
"\n",
"df = pd.read_parquet(\"data/training.parquet\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"To illustrate the computation of tf-idf vectors we will first implement the method on a sample of three of the documents we just loaded. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"np.random.seed(0xc0ffeeee)\n",
"df_samp = df.sample(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_colwidth', -1) #ensures that all the text is visible\n",
"df_samp"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We begin by computing the term frequency ('tf') of the words in the three texts above. We use the `token_pattern` parameter to specify that we only want to consider words (no numeric values). We limit the number of words (`max_features`) to 20, so that we can easily inspect the output. This means that only the 20 words which appear most frequently across the three texts will be represented. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import CountVectorizer\n",
"\n",
"vectorizer = CountVectorizer(token_pattern='(?u)\\\\b[A-Za-z]\\\\w+\\\\b', max_features = 20)\n",
"counts = vectorizer.fit_transform(df_samp[\"text\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vectorizer.get_feature_names() #shows all the words used as features for this vectorizer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"counts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(counts.toarray()) "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Each row of the array corresponds to one of the texts, whilst the columns relate to the words considered in this vectorizer. (You can confirm that 'all' appears once in the first two texts, and twice in the third text, and so on.)\n",
"\n",
"The next stage of the process is to use the results of the term frequency matrix to compute the tf-idf. \n",
"\n",
"The inverse document frequency (idf) for a particular word, or feature, is computed as (the log of) a ratio of the number of documents in a corpus to the number of documents which contain that feature (up to some constant factors). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import TfidfTransformer\n",
"\n",
"tfidf_transformer = TfidfTransformer()\n",
"df_tfidf = tfidf_transformer.fit_transform(counts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(df_tfidf.toarray())"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Each row of the object above is the desired tf-idf vector for the relevant document. "
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A major disadvantage of using a vectorizer is that it will be dependent upon the dictionary of words it sees when it is 'fit' to the data. As such, if we are presented with a new passage of text and wish to compute a feature vector for for that text we are required to know which word maps to which space of the vector. Keeping track of a dictionary is impractical and will lead to inefficiency. \n",
"\n",
"Furthermore, there are only \"spaces\" in the vectorizer for words that have been seen in the fitting stage. If a new text sample contains a word which was not present when the vectorizer was first fit, there will be no place in the feature vectors to count that word. \n",
"\n",
"With that in mind, we consider using a [hashing vectorizer](https://en.wikipedia.org/wiki/Feature_hashing). Words can be hashed to buckets, and the bucket count incremented. This will give us a counts matrix, like we saw above, which we can then compute the tf-idf matrix for, without the need to keep track of which column in the matrix any given word maps to. \n",
"\n",
"One disadvantage of this approach is that collisions will occur - with a finite set of buckets multiple words will hash to the same bucket. As such we are no longer computing an exact tf-idf matrix.\n",
"\n",
"Furthermore we will not be able to recover the word (or words) associated with a bucket at a later time if we need them. (For our application this won't be needed.)\n",
"\n",
"We fix the number of buckets at 2<sup>10</sup> = 1024, but you can try using a different number of buckets and see how the spam detection models are effected. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import HashingVectorizer\n",
"BUCKETS=1024\n",
"\n",
"hv = HashingVectorizer(norm=None, token_pattern='(?u)\\\\b[A-Za-z]\\\\w+\\\\b', n_features=BUCKETS, alternate_sign = False)\n",
"hv\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hvcounts = hv.fit_transform(df[\"text\"])\n",
"hvcounts"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We can then go on to compute the \"approximate\" tf-idf matrix for this, by applying the tf-idf transformer to the hashed counts matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tfidf_transformer = TfidfTransformer()\n",
"hvdf_tfidf = tfidf_transformer.fit_transform(hvcounts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hvdf_tfidf"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We apply PCA so that we can visualize the output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#PCA projection so that output can be visualised\n",
"\n",
"import sklearn.decomposition\n",
"\n",
"DIMENSIONS = 2\n",
"\n",
"pca2 = sklearn.decomposition.TruncatedSVD(DIMENSIONS)\n",
"\n",
"pca_a = pca2.fit_transform(hvdf_tfidf)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca_df = pd.DataFrame(pca_a, columns=[\"x\", \"y\"])\n",
"pca_df.sample(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot_data = pd.concat([df.reset_index(), pca_df], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import altair as alt\n",
"alt.renderers.enable('nteract')\n",
"\n",
"alt.Chart(plot_data.sample(1000)).encode(x=\"x\", y=\"y\", color=\"label\").mark_point().interactive()"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We want to be able to easily compute feature vectors using the hashing tf-idf workflow laid out above. The `Pipeline` facility in scikit-learn streamlines this workflow by making it easy to pass data through multiple transforms. In the next cell we set up our pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import HashingVectorizer,TfidfTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"import pickle, os\n",
"\n",
"vect = HashingVectorizer(norm=None, token_pattern='(?u)\\\\b[A-Za-z]\\\\w+\\\\b', n_features=BUCKETS, alternate_sign = False)\n",
"tfidf = TfidfTransformer()\n",
"\n",
"feat_pipeline = Pipeline([\n",
" ('vect',vect),\n",
" ('tfidf',tfidf)\n",
"])"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We can then use the `fit_transform` method to apply the pipeline to our data frame. This produces a sparse matrix (only non zero entries are recorded). We convert this to a dense array using the `toarray()` function, then append the index and labels to aid readability. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"feature_vecs = feat_pipeline.fit_transform(df[\"text\"]).toarray()\n",
"labeled_vecs = pd.concat([df.reset_index()[[\"index\", \"label\"]],\n",
" pd.DataFrame(feature_vecs)], axis=1)\n",
"labeled_vecs.columns = labeled_vecs.columns.astype(str)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labeled_vecs.sample(10)"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We save the feature vectors to a parquet file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labeled_vecs.to_parquet(\"data/features.parquet\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"We will then serialize our pipeline to a file on disk so that we can reuse the document frequencies we've observed on training data to weight term vectors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlworkflows import util\n",
"util.serialize_to(feat_pipeline, \"feature_pipeline.sav\")"
]
},
{
"cell_type": "markdown",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Now that we have a feature engineering approach, next step is to train a model. Again, you have two choices for your next step: [click here](04-model-logistic-regression.ipynb) for a model based on *logistic regression*, or [click here](04-model-random-forest.ipynb) for a model based on *ensembles of decision trees.*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.0 64-bit ('ml-workflows': conda)",
"language": "python",
"name": "python37064bitmlworkflowscondad95461eefba54754b428b1d0fa47e7eb"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,464 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"df = pd.read_csv('train.csv')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Survived</th>\n <th>Pclass</th>\n <th>Name</th>\n <th>Sex</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Ticket</th>\n <th>Fare</th>\n <th>Cabin</th>\n <th>Embarked</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>0</td>\n <td>3</td>\n <td>Braund, Mr. Owen Harris</td>\n <td>male</td>\n <td>22.0</td>\n <td>1</td>\n <td>0</td>\n <td>A/5 21171</td>\n <td>7.2500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>1</td>\n <td>1</td>\n <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n <td>female</td>\n <td>38.0</td>\n <td>1</td>\n <td>0</td>\n <td>PC 17599</td>\n <td>71.2833</td>\n <td>C85</td>\n <td>C</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>1</td>\n <td>3</td>\n <td>Heikkinen, Miss. Laina</td>\n <td>female</td>\n <td>26.0</td>\n <td>0</td>\n <td>0</td>\n <td>STON/O2. 3101282</td>\n <td>7.9250</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>1</td>\n <td>1</td>\n <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n <td>female</td>\n <td>35.0</td>\n <td>1</td>\n <td>0</td>\n <td>113803</td>\n <td>53.1000</td>\n <td>C123</td>\n <td>S</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>0</td>\n <td>3</td>\n <td>Allen, Mr. William Henry</td>\n <td>male</td>\n <td>35.0</td>\n <td>0</td>\n <td>0</td>\n <td>373450</td>\n <td>8.0500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " PassengerId Survived Pclass \\\n0 1 0 3 \n1 2 1 1 \n2 3 1 3 \n3 4 1 1 \n4 5 0 3 \n\n Name Sex Age SibSp \\\n0 Braund, Mr. Owen Harris male 22.0 1 \n1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n2 Heikkinen, Miss. Laina female 26.0 0 \n3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n4 Allen, Mr. William Henry male 35.0 0 \n\n Parch Ticket Fare Cabin Embarked \n0 0 A/5 21171 7.2500 NaN S \n1 0 PC 17599 71.2833 C85 C \n2 0 STON/O2. 3101282 7.9250 NaN S \n3 0 113803 53.1000 C123 S \n4 0 373450 8.0500 NaN S "
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 891 entries, 0 to 890\nData columns (total 12 columns):\nPassengerId 891 non-null int64\nSurvived 891 non-null int64\nPclass 891 non-null int64\nName 891 non-null object\nSex 891 non-null object\nAge 714 non-null float64\nSibSp 891 non-null int64\nParch 891 non-null int64\nTicket 891 non-null object\nFare 891 non-null float64\nCabin 204 non-null object\nEmbarked 889 non-null object\ndtypes: float64(2), int64(5), object(5)\nmemory usage: 83.6+ KB\n"
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "PassengerId 0\nSurvived 0\nPclass 0\nName 0\nSex 0\nAge 177\nSibSp 0\nParch 0\nTicket 0\nFare 0\nCabin 687\nEmbarked 2\ndtype: int64"
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.apply(lambda x:sum(x.isnull()))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df['Age'].fillna(0 , inplace=True)\n",
"df['Embarked'].fillna('S' , inplace = True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def unique(var):\n",
" unique_var=[]\n",
" for x in var:\n",
" if x not in unique_var:\n",
" unique_var.append(x)\n",
" for x in unique_var:\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "S\nC\nQ\n"
}
],
"source": [
"unique(df['Embarked'])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "male\nfemale\n"
}
],
"source": [
"unique(df['Sex'])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"df['Sex'] = df['Sex'].map({'female':0,'male':1}).astype(np.int)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"df['Embarked'] = df['Embarked'].map({'nan':0,'S':1,'C':2,'Q':3}).astype(np.int)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 891 entries, 0 to 890\nData columns (total 12 columns):\nPassengerId 891 non-null int64\nSurvived 891 non-null int64\nPclass 891 non-null int64\nName 891 non-null object\nSex 891 non-null int32\nAge 891 non-null float64\nSibSp 891 non-null int64\nParch 891 non-null int64\nTicket 891 non-null object\nFare 891 non-null float64\nCabin 204 non-null object\nEmbarked 891 non-null int32\ndtypes: float64(2), int32(2), int64(5), object(3)\nmemory usage: 76.6+ KB\n"
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"del df['Name']"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"del df['Ticket']\n",
"#del df['Cabin']"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Survived</th>\n <th>Pclass</th>\n <th>Sex</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Fare</th>\n <th>Cabin</th>\n <th>Embarked</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>0</td>\n <td>3</td>\n <td>1</td>\n <td>22.0</td>\n <td>1</td>\n <td>0</td>\n <td>7.2500</td>\n <td>NaN</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>38.0</td>\n <td>1</td>\n <td>0</td>\n <td>71.2833</td>\n <td>C85</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>1</td>\n <td>3</td>\n <td>0</td>\n <td>26.0</td>\n <td>0</td>\n <td>0</td>\n <td>7.9250</td>\n <td>NaN</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>35.0</td>\n <td>1</td>\n <td>0</td>\n <td>53.1000</td>\n <td>C123</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>0</td>\n <td>3</td>\n <td>1</td>\n <td>35.0</td>\n <td>0</td>\n <td>0</td>\n <td>8.0500</td>\n <td>NaN</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " PassengerId Survived Pclass Sex Age SibSp Parch Fare Cabin \\\n0 1 0 3 1 22.0 1 0 7.2500 NaN \n1 2 1 1 0 38.0 1 0 71.2833 C85 \n2 3 1 3 0 26.0 0 0 7.9250 NaN \n3 4 1 1 0 35.0 1 0 53.1000 C123 \n4 5 0 3 1 35.0 0 0 8.0500 NaN \n\n Embarked \n0 1 \n1 2 \n2 1 \n3 1 \n4 1 "
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"feature=['PassengerId','Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']\n",
"x=df[feature]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "0 0\n1 1\n2 1\n3 1\n4 0\nName: Survived, dtype: int64"
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y=df['Survived']\n",
"y.head()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "0 549\n1 342\nName: Survived, dtype: int64"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Survived'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'sklearn.cross_validation'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-18-bb6d9a5778f1>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcross_validation\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mtrain_test_split\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mx_train\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mx_test\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0my_train\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0my_test\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtrain_test_split\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0my\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mtest_size\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m0.25\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mrandom_state\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m6\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'sklearn.cross_validation'"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=6)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"print(x_train.shape) \n",
"print(x_test.shape)\n",
"print(y_train.shape)\n",
"print(y_test.shape)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"#Linear Regression\n",
"from sklearn.linear_model import LinearRegression\n",
"linreg=LinearRegression()\n",
"linreg.fit(x_train,y_train)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"y_pred = linreg.predict(x_test)\n",
"from sklearn import metrics\n",
"pred_accuracy = np.sqrt(metrics.mean_absolute_error(y_test,y_pred))\n",
"print(pred_accuracy)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"#Logistic Regression\n",
"from sklearn.linear_model import LogisticRegression\n",
"logreg=LogisticRegression()\n",
"logreg.fit(x_train,y_train)\n",
"y_pred=logreg.predict(x_test)\n",
"print(metrics.accuracy_score(y_test,y_pred))"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"knn=KNeighborsClassifier(n_neighbors=2)\n",
"knn.fit(x_train,y_train)\n",
"pred=knn.predict(x_test)\n",
"print(metrics.accuracy_score(y_test,y_pred))"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test=pd.read_csv('test.csv')"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test.head()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test.info()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test['Age'].fillna(0 , inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"del df_test['Ticket']\n",
"del df_test['Cabin']\n",
"del df_test['Name']"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test.info()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test['Sex'] = df_test['Sex'].map({'female':0,'male':1}).astype(np.int)\n",
"df_test['Embarked'] = df_test['Embarked'].map({'nan':0,'S':1,'C':2,'Q':3}).astype(np.int)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test['Fare'].fillna(0 , inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"test_feat = ['PassengerId','Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']\n",
"X_test = df_test[test_feat]"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"Y_test_pred=logreg.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"df_test['Survived']=Y_test_pred\n",
"\n",
"df_result = df_test.drop(['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked'], axis=1)\n",
"#df_result['Gender'] = df_test['Sex'].map({1:'Male',0:'Female'})\n",
"df_result['Survived'] = df_result['Survived']\n",
"df_result.to_csv('result.csv', index=False)\n",
"df_result.head(50)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -0,0 +1,165 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": "C:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\envs\\ml-workflows\\lib\\site-packages\\pyarrow\\pandas_compat.py:708: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.\n labels = getattr(columns, 'labels', None) or [\nC:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\envs\\ml-workflows\\lib\\site-packages\\pyarrow\\pandas_compat.py:735: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead\n return pd.MultiIndex(levels=new_levels, labels=labels, names=columns.names)\nC:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\envs\\ml-workflows\\lib\\site-packages\\pyarrow\\pandas_compat.py:752: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.\n labels, = index.labels\n"
}
],
"source": [
"import pandas as pd\n",
"data = pd.read_parquet(\"data/training.parquet\")\n",
""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def model_accuracy(osm, df):\n",
" correct = 0\n",
" incorrect = 0\n",
" for row in df.itertuples(): \n",
" if row.label == osm.predict(row.text):\n",
" correct += 1\n",
" else:\n",
" incorrect += 1\n",
" \n",
" if correct + incorrect == 0:\n",
" return 100\n",
" \n",
" return (float(correct) / float(correct + incorrect) * 100)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"legit_sample = data[data.label == 'legitimate'].sample(2000)\n",
"spam_sample = data[data.label == 'spam'].sample(18000)\n",
"unbalanced = pd.DataFrame.append(legit_sample, spam_sample)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"unbalanced_train, unbalanced_test = train_test_split(unbalanced, test_size=0.3)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
"import re\n",
" \n",
"class SensitiveSpamModel(object):\n",
" \n",
" def __init__(self):\n",
" self.legit = set()\n",
" self.spam = set()\n",
" \n",
" def fit(self, df):\n",
" \"\"\" Train a model based on the most frequent unique \n",
" words in each class of documents \"\"\"\n",
" legit_words = defaultdict(lambda: 0)\n",
" spam_words = defaultdict(lambda: 0)\n",
" \n",
" for tup in df.itertuples():\n",
" target = spam_words\n",
" if tup.label == \"legitimate\":\n",
" target = legit_words\n",
" for word in re.split(r\"\\W+\", tup.text):\n",
" if len(word) > 0:\n",
" target[word.lower()] += 1\n",
" \n",
" # remove words common to both classes\n",
" for word in set(legit_words.keys()).intersection(set(spam_words.keys())):\n",
" del legit_words[word]\n",
" del spam_words[word]\n",
" \n",
" top_legit_words = sorted(legit_words.items(), key=lambda kv: kv[1], reverse=True)\n",
" top_spam_words = sorted(spam_words.items(), key=lambda kv: kv[1], reverse=True)\n",
" \n",
" # store ten times as many words from the spam set\n",
" self.legit = set([t[0] for t in top_legit_words[:100]])\n",
" self.spam = set([t[0] for t in top_spam_words[:1000]])\n",
" \n",
" def predict(self, text):\n",
" legit_score = 0\n",
" spam_score = 0\n",
" \n",
" for word in re.split(r\"\\W+\", text):\n",
" w = word.lower()\n",
" if word in self.legit:\n",
" legit_score = legit_score + 1\n",
" elif word in self.spam:\n",
" spam_score = spam_score + 1\n",
" \n",
" # bias results towards spam in the event of ties\n",
" return (legit_score > spam_score) and \"legitimate\" or \"spam\"\n",
"\n",
"ssm = SensitiveSpamModel()\n",
"ssm.fit(unbalanced_train)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "63.582499999999996"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_accuracy(ssm, data)\n",
"\n",
"\n",
""
]
}
],
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"orig_nbformat": 2,
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3
}
}

Просмотреть файл

@ -0,0 +1,174 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"source": [
"import pandas as pd\n",
"data = pd.read_parquet(\"data/training.parquet\")\n",
""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"source": [
"def model_accuracy(osm, df):\n",
" correct = 0\n",
" incorrect = 0\n",
" for row in df.itertuples(): \n",
" if row.label == osm.predict(row.text):\n",
" correct += 1\n",
" else:\n",
" incorrect += 1\n",
" \n",
" if correct + incorrect == 0:\n",
" return 100\n",
" \n",
" return (float(correct) / float(correct + incorrect) * 100)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"source": [
"legit_sample = data[data.label == 'legitimate'].sample(2000)\n",
"spam_sample = data[data.label == 'spam'].sample(18000)\n",
"unbalanced = pd.DataFrame.append(legit_sample, spam_sample)\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" To avoid overfitting, we'll split the unbalanced data set into training and test sets, using functionality from scikit-learn."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"source": [
"from sklearn.model_selection import train_test_split\n",
"unbalanced_train, unbalanced_test = train_test_split(unbalanced, test_size=0.3)\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We'll now create a simple model that should work pretty well for spam messages but not necessarily as well for legitimate ones."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"source": [
"from collections import defaultdict\n",
"import re\n",
" \n",
"class SensitiveSpamModel(object):\n",
" \n",
" def __init__(self):\n",
" self.legit = set()\n",
" self.spam = set()\n",
" \n",
" def fit(self, df):\n",
" \"\"\" Train a model based on the most frequent unique \n",
" words in each class of documents \"\"\"\n",
" legit_words = defaultdict(lambda: 0)\n",
" spam_words = defaultdict(lambda: 0)\n",
" \n",
" for tup in df.itertuples():\n",
" target = spam_words\n",
" if tup.label == \"legitimate\":\n",
" target = legit_words\n",
" for word in re.split(r\"\\W+\", tup.text):\n",
" if len(word) > 0:\n",
" target[word.lower()] += 1\n",
" \n",
" # remove words common to both classes\n",
" for word in set(legit_words.keys()).intersection(set(spam_words.keys())):\n",
" del legit_words[word]\n",
" del spam_words[word]\n",
" \n",
" top_legit_words = sorted(legit_words.items(), key=lambda kv: kv[1], reverse=True)\n",
" top_spam_words = sorted(spam_words.items(), key=lambda kv: kv[1], reverse=True)\n",
" \n",
" # store ten times as many words from the spam set\n",
" self.legit = set([t[0] for t in top_legit_words[:100]])\n",
" self.spam = set([t[0] for t in top_spam_words[:1000]])\n",
" \n",
" def predict(self, text):\n",
" legit_score = 0\n",
" spam_score = 0\n",
" \n",
" for word in re.split(r\"\\W+\", text):\n",
" w = word.lower()\n",
" if word in self.legit:\n",
" legit_score = legit_score + 1\n",
" elif word in self.spam:\n",
" spam_score = spam_score + 1\n",
" \n",
" # bias results towards spam in the event of ties\n",
" return (legit_score > spam_score) and \"legitimate\" or \"spam\"\n",
"\n",
"ssm = SensitiveSpamModel()\n",
"ssm.fit(unbalanced_train)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"source": [
"model_accuracy(ssm, unbalanced_train)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"source": [
"model_accuracy(ssm, unbalanced_test)\n",
""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"source": [
"model_accuracy(ssm, data)\n",
""
]
}
],
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"orig_nbformat": 2,
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3
}
}

Просмотреть файл

@ -0,0 +1,17 @@
export interface Notebook {
cells: Cell[];
}
export interface Cell {
cell_type: 'code' | 'markdown';
execution_count: number;
source: string[];
}
export declare function cellCode(nb: Notebook): string[];
export declare const vvNotebook: Notebook;
export declare const titanicNotebook: Notebook;
export declare const titanicNotebook2: Notebook;
export declare const pimaNotebook: Notebook;
export declare const evalModelsNotebook: Notebook;
export declare const evalModelsExpectedNotebook: Notebook;
export declare const featureEngineeringNotebook: Notebook;
export declare const featureEngineeringExpectedNotebook: Notebook;

Просмотреть файл

@ -0,0 +1,306 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at a simple way to trying to identify some structure in our data. Getting some understanding of the data is an important first step before we even start to look at using machine learning techniques to train a model; in this notebook, we'll approach that problem from a couple of different angles.\n",
"\n",
"We'll start by loading our training data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\pyarrow\\pandas_compat.py:708: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.\n",
" labels = getattr(columns, 'labels', None) or [\n",
"C:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\pyarrow\\pandas_compat.py:735: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead\n",
" return pd.MultiIndex(levels=new_levels, labels=labels, names=columns.names)\n",
"C:\\Users\\jimgries\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\pyarrow\\pandas_compat.py:752: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.\n",
" labels, = index.labels\n"
]
}
],
"source": [
"import pandas as pd\n",
"data = pd.read_parquet(\"data/training.parquet\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our training data (which we generated in [the previous notebook](00-generator.ipynb)) consists of labels (either `legitimate` or `spam`) and short documents of plausible English text. We can inspect these data:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"source": [
"data.sample(50)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ultimately, machine learning algorithms operate on data that is structured differently than the data we might deal with in database tables or application programs. In order to identify and exploit structure in these data, we are going to convert our natural-language documents to points in space by converting them to vectors of floating-point numbers.\n",
"\n",
"This process is often tricky, since you want a way to map from arbitrary data to some points in (some) space that preserves the structure of the data. That is, documents that are similar should map to points that are similar (for some definition of similarity), and documents that are dissimilar should not map to similar points. The name for this process of turning real-world data into a form that a machine learning algorithm can take advantage of is *feature engineering*. \n",
"\n",
"You'll learn more about feature engineering in the next notebook; for now, we'll just take a very basic approach that will let us visualize our data. We'll first convert our documents to *k-shingles*, or sequences of *k* characters (for some small value of *k*). This means that a document like\n",
"\n",
"`the quick brown fox jumps over the lazy dog`\n",
"\n",
"would become this sequence of 4-shingles: \n",
"\n",
"`['the ', 'he q', 'e qu', ' qui', 'quic', 'uick', 'ick ', 'ck b', 'k br', ' bro', 'brow', 'rown', 'own ', 'wn f', 'n fo', ' fox', 'fox ', 'ox j', 'x ju', ' jum', 'jump', 'umps', 'mps ', 'ps o', 's ov', ' ove', 'over', 'ver ', 'er t', 'r th', ' the', 'the ', 'he l', 'e la', ' laz', 'lazy', 'azy ', 'zy d', 'y do', ' dog']`\n",
"\n",
"Shingling gets us a step closer to having vector representations of documents -- ultimately, our assumption is that spam documents will have some k-shingles that legitimate documents don't, and vice versa. Here's how we'd add a field of shingles to our data:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def doc2shingles(k):\n",
" def kshingles(doc):\n",
" return [doc[i:i + k] for i in range(len(doc) - k + 1)]\n",
" return kshingles\n",
"\n",
"data[\"shingles\"] = data[\"text\"].apply(doc2shingles(4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember, our goal is to be able to learn a function that can separate between documents that are likely to represent legitimate messages (i.e., prose in the style of Jane Austen) or spam messages (i.e., prose in the style of food-product reviews), so we'll still want to transform our lists of shingles into vectors.\n",
"\n",
"1. We'll collect shingle counts for each example, showing us how frequent each shingle is in a given document;\n",
"2. We'll then turn those raw counts into frequencies (i.e., for a given shingle what percentage of shingle in given document are that word?), giving us a mapping from shingles to frequencies for each document;\n",
"3. Finally, we'll encode these mappings as fixed-size vectors in a space-efficient way, by using a hash function to determine which vector element should get a given frequency. Hashing has a few advantages, but for our purposes the most important advantage is that we don't need to know all of the shingles we might see in advance. \n",
"\n",
"(That's what we'll _logically_ do -- we'll _actually_ do these steps a bit out of order because it will make our code simpler and more efficient without changing the results.)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def hashing_frequency(vecsize, h):\n",
" \"\"\" \n",
" returns a function that will collect shingle frequencies \n",
" into a vector with _vecsize_ elements and will use \n",
" the hash function _h_ to choose which vector element \n",
" to update for a given term\n",
" \"\"\"\n",
" \n",
" def hf(words):\n",
" if type(words) is type(\"\"):\n",
" # handle both lists of words and space-delimited strings\n",
" words = words.split(\" \")\n",
" \n",
" result = np.zeros(vecsize)\n",
" for term in words:\n",
" result[h(term) % vecsize] += 1.0\n",
" \n",
" total = sum(result)\n",
" for i in range(len(result)):\n",
" result[i] /= total\n",
"\n",
" return result\n",
" \n",
" return hf"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([hashing_frequency(1024, hash)(v) for v in data[\"shingles\"].values])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now instead of having documents (which we had from the raw data) or lists of shingles, we have vectors representing shingle frequencies. Because we've hashed shingles into these vectors, we can't in general reconstruct a document or the shingles from a vector, but we *do* know that if the same shingle appears in two documents, their vectors will reflect it in corresponding buckets.\n",
"\n",
"However, we've generated a 1024-element vector. Recall that our ultimate goal is to place documents in space so that we can identify a way to separate legitimate documents from spam documents. Our 1024-element vector is a point in a space, but it's a point in a space that most of our geometric intuitions don't apply to (some of us have enough trouble navigating the three dimensions of the physical world). \n",
"\n",
"Let's use a very basic technique to project these vectors to a much smaller space that we can visualize. [Principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis), or PCA, is a statistical technique that is over a century old; it takes observations in a high-dimensional space (like our 1024-element vectors) and maps them to a (potentially much) smaller number of dimensions. It's an elegant technique, and the most important things to know about it are that it tries to ensure that the dimensions that have the most variance contribute the most to the mapping, while the dimensions with the least variance are (more-or-less) disregarded. The other important thing to know about PCA is that there are very efficient ways to compute it, even on large datasets that don't fit in memory on a single machine. We'll see it in action now, using the [implementation from scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import sklearn.decomposition\n",
"\n",
"DIMENSIONS = 2\n",
"\n",
"pca2 = sklearn.decomposition.PCA(DIMENSIONS)\n",
"\n",
"pca_a = pca2.fit_transform(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `.fit_transform()` method takes an array of high-dimensional observations and will both perform the principal component analysis (the \"fit\" part) and use that to map the high-dimensional values to low-dimensional ones (the \"transform\" part). We can see what the transformed vectors look like:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"source": [
"pca_df = pd.DataFrame(pca_a, columns=[\"x\", \"y\"])\n",
"pca_df.sample(50)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot these points to see if it looks like there is some structure in our data. We'll use the [Altair](https://altair-viz.github.io) library, which is a declarative visualization library, meaning that the presentation of our data will depend on the data itself -- for example, we'll say to use the two elements of the vectors for *x* and *y* coordinates but to use whether a document is legitimate or spam to determine how to color the point.\n",
"\n",
"We'll start by using the [`concat` function in the Pandas library](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to make a data frame consisting of the original data frame with the PCA vector for each row."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"plot_data = pd.concat([data.reset_index(), pca_df], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our next step will be to set up Altair, tell it how to encode our data frame in a plot, by using the `.encode(...)` method to tell it which values to use for x and y coordinates, as well as which value to use to decide how to color points. Altair will restrict us to plotting 5,000 points (so that the generated chart will not overwhelm our browser), so we'll also make sure to sample a subset of the data (in this case, 1,000 points)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"source": [
"import altair as alt\n",
"alt.renderers.enable('nteract')\n",
"\n",
"alt.Chart(plot_data.sample(1000)).encode(x=\"x\", y=\"y\", color=\"label\").mark_point().interactive()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That plot in particular is interactive (note the call to `.interactive()` at the end of the command), which means that you can pan around by dragging with the mouse or zoom with the mouse wheel. Try it out!\n",
"\n",
"Notice that, for the most part, even our simple shingling approach has identified some structure in the data: there is a clear dividing line between legitimate and spam documents. (It's important to remember that we're only using the labels to color points after we've placed them -- the PCA transformation isn't taking labels into account when mapping the vectors to two dimensions.)\n",
"\n",
"The next approach we'll try is called t-distributed stochastic neighbor embedding, or t-SNE for short. t-SNE learns a mapping from high-dimensional points to low-dimensional points so that points that are similar in high-dimensional space are likely to be similar in low-dimensional space as well. t-SNE can sometimes identify structure that simpler techniques like PCA can't, but this power comes at a cost: it is much more expensive to compute than PCA and doesn't parallelize well. (t-SNE also works best for visualizing two-dimensional data when it is reducing from tens of dimensions rather than hundreds or thousands. So, in some cases, you'll want to use a fast technique like PCA to reduce your data to a few dozen dimensions before using t-SNE. We haven't done that in this notebook, though.)\n",
"\n",
"So we can finish this notebook quickly and get on to the rest of our material, we'll only use t-SNE to visualize a subset of our data. We've [declared a helper function called `sample_corresponding`](mlworkflows/util.py), which takes a sequence of arrays or data frames, generates a set of random indices, and returns collections with the elements corresponding to the selected indices from each array or data frame. So if we had the collections `[1, 2, 3, 4, 5]` and `[2, 4, 6, 8, 10]`, a call to `sample_corresponding` asking for two elements might return `[[1, 4], [2, 8]]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sklearn.manifold\n",
"from mlworkflows import util as mlwutil\n",
"\n",
"np.random.seed(0xc0ffee)\n",
"sdf, sa = mlwutil.sample_corresponding(800, data, a)\n",
"\n",
"tsne = sklearn.manifold.TSNE()\n",
"tsne_a = tsne.fit_transform(sa)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tsne_plot_data = pd.concat([sdf.reset_index(), pd.DataFrame(tsne_a, columns=[\"x\", \"y\"])], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Altair library, which we introduced while looking at our PCA results, is easy to use. However, to avoid cluttering our notebooks in a common case, we've [introduced a helper function called `plot_points`](mlworkflows/plot.py) that will just take a data frame and a data encoding before generating an interactive Altair scatterplot. (For more complicated cases, we'll still want to use Altair directly.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlworkflows import plot\n",
"\n",
"plot.plot_points(tsne_plot_data, x=\"x\", y=\"y\", color=\"label\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, you've learned about two ways to visualize multidimensional data in two dimensions, which helps you to evaluate whether or not a given feature engineering approach is revealing structure in your data. In [our next notebook](02-evaluating-models.ipynb), you'll learn how to evaluate models based on the predictions that they make."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

1
types/@msrvida-python-program-analysis/test/parser.test.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

Просмотреть файл

@ -0,0 +1 @@
export {};

1
types/@msrvida-python-program-analysis/test/slice.test.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
export {};

10
types/@msrvida-python-program-analysis/test/testcell.d.ts поставляемый Normal file
Просмотреть файл

@ -0,0 +1,10 @@
import { Cell } from "..";
export declare class TestCell implements Cell {
text: string;
executionCount: number;
hasError: boolean;
executionEventId: string;
persistentId: string;
constructor(text: string, executionCount: number, executionEventId?: string, persistentId?: string, hasError?: boolean);
deepCopy(): this;
}