An early-stage PHP parser designed for IDE usage scenarios.
Перейти к файлу
Tyson Andre ce5acf3186 Use PHPDoc `@param string ...$classNames`
The `string` is the type of each argument.
The `...` indicates that it is an array of that type, like
`function (string ...$classNames)` also would indicate.

Update code to PHPStan Level 3

Added ReturnTypeWillChange

Fix return type for Expression

Add type hints for returned variables

Add generic parameter for delimted list

Fixing type hints

Ignore co-variance error

Remove TODO - it already must always be a Node

Ignore error

Add missing types

Update phpstan to level 3

Add phpstan to dev reqs

No need to install phpstan independently

Do not override unary expression operand

It seems to me that the operand can be any expression

Added token to operand types -- should this be MissingToken?

Include tokens in return types

ExpressionStatement => EchoStatement

It seems this is always an EchoStatement not an ExpressionStatement

unaryExpressionOrHigher can return a ThrowExpression

Remove overridden property

Add Token to union

Remove QualifiedName and rename local variable

Remove unused import

Add return types

Remove trailing whitespace

Add MissingToken type

Ignore "should not happen" statement

Bump to level 4

Remove !is_null conditional branch - it always returns Node

NamespaceUseDeclaration#useClauses is technically nullable

NamespaceUseGroupCluase#functionOrConst can be NULL

Add TODO

Removed redundant condition (check)

Add return type

Add retutn type, remove inaccurate docblock

Fix bug with get string literal text, as it returned the quotes

Ignore assumed false-positive from PHPStan

If allowEmptyElements if `false`...

... then it MUST be true in the right hand side of ||

backslash can be NULL

BinaryExpresionOrHigher can return MIssingToken

Ignoring error to be safe (as commented) and more type hints

Set level to 4

Add explanation
2023-01-05 23:50:26 +00:00
.github/workflows Add .sh extension to run_tests files 2022-08-18 18:57:24 -04:00
.vscode Add support folders to search.exclude 2017-06-12 15:28:02 -07:00
ci Add .sh extension to run_tests files 2022-08-18 18:57:24 -04:00
docs v0.1.0: Raise minimum php version, change some AST representations 2021-04-29 17:11:53 -04:00
experiments Optimize getDiagnostics for speed 2017-12-19 21:55:00 -08:00
php-langspec Merge commit 'a7d64bac1248a8ad2fba9bfd0e96efd6574f5d92' as 'php-langspec' 2016-10-26 19:42:47 -07:00
src Use PHPDoc `@param string ...$classNames` 2023-01-05 23:50:26 +00:00
syntax-visualizer Update example syntax-visualizer output to latest version 2022-08-25 21:09:24 -04:00
tests Use PHPDoc `@param string ...$classNames` 2023-01-05 23:50:26 +00:00
tools Refactoring modified type behaviors 2020-08-10 16:20:57 +02:00
validation Merge branch 'master' of https://github.com/Microsoft/tolerant-php-parser into lang-server 2017-05-05 09:53:35 -07:00
.dockerignore Set up github workflows for multiple php versions 2022-08-18 18:56:10 -04:00
.gitattributes Exclude phpstan.neon and .github directories from packages 2022-09-25 09:01:51 -04:00
.gitignore Set up github workflows for multiple php versions 2022-08-18 18:56:10 -04:00
.gitmodules add drupal to validation tests 2017-01-19 19:22:08 -08:00
.travis.yml Update code to PHPStan Level 3 2022-10-08 10:49:43 +02:00
Contributing.md typo 2017-04-24 10:50:33 -07:00
LICENSE.txt rename LICENSE -> LICENSE.txt 2016-12-18 14:58:59 -08:00
README.md Refactor to use getStartPosition 2021-04-29 17:19:15 -04:00
ThirdPartyNotices.txt third party notices 2017-01-04 11:28:39 -08:00
composer.json Use PHPDoc `@param string ...$classNames` 2023-01-05 23:50:26 +00:00
phpstan.neon Use PHPDoc `@param string ...$classNames` 2023-01-05 23:50:26 +00:00
phpunit.xml Don't emit "this test did not perform any assertions" 2018-02-11 18:28:04 -08:00

README.md

Tolerant PHP Parser

Build Status

This is an early-stage PHP parser designed, from the beginning, for IDE usage scenarios (see Design Goals for more details). There is still a ton of work to be done, so at this point, this repo mostly serves as an experiment and the start of a conversation.

image

This is the v0.1 branch, which changes data structures to support syntax added after the initial 0.0.x release line.

Get Started

After you've configured your machine, you can use the parser to generate and work with the Abstract Syntax Tree (AST) via a friendly API.

<?php
// Autoload required classes
require __DIR__ . "/vendor/autoload.php";

use Microsoft\PhpParser\{DiagnosticsProvider, Node, Parser, PositionUtilities};

// Instantiate new parser instance
$parser = new Parser();

// Return and print an AST from string contents
$astNode = $parser->parseSourceFile('<?php /* comment */ echo "hi!"');
var_dump($astNode);

// Gets and prints errors from AST Node. The parser handles errors gracefully,
// so it can be used in IDE usage scenarios (where code is often incomplete).
$errors = DiagnosticsProvider::getDiagnostics($astNode);
var_dump($errors);

// Traverse all Node descendants of $astNode
foreach ($astNode->getDescendantNodes() as $descendant) {
    if ($descendant instanceof Node\StringLiteral) {
        // Print the Node text (without whitespace or comments)
        var_dump($descendant->getText());

        // All Nodes link back to their parents, so it's easy to navigate the tree.
        $grandParent = $descendant->getParent()->getParent();
        var_dump($grandParent->getNodeKindName());

        // The AST is fully-representative, and round-trippable to the original source.
        // This enables consumers to build reliable formatting and refactoring tools.
        var_dump($grandParent->getLeadingCommentAndWhitespaceText());
    }

    // In addition to retrieving all children or descendants of a Node,
    // Nodes expose properties specific to the Node type.
    if ($descendant instanceof Node\Expression\EchoExpression) {
        $echoKeywordStartPosition = $descendant->echoKeyword->getStartPosition();
        // To cut down on memory consumption, positions are represented as a single integer
        // index into the document, but their line and character positions are easily retrieved.
        $lineCharacterPosition = PositionUtilities::getLineCharacterPositionFromPosition(
            $echoKeywordStartPosition,
            $descendant->getFileContents()
        );
        echo "line: $lineCharacterPosition->line, character: $lineCharacterPosition->character";
    }
}

Note: the API is not yet finalized, so please file issues let us know what functionality you want exposed, and we'll see what we can do! Also please file any bugs with unexpected behavior in the parse tree. We're still in our early stages, and any feedback you have is much appreciated 😃.

Design Goals

  • Error tolerant design - in IDE scenarios, code is, by definition, incomplete. In the case that invalid code is entered, the parser should still be able to recover and produce a valid + complete tree, as well as relevant diagnostics.
  • Fast and lightweight (should be able to parse several MB of source code per second, to leave room for other features).
    • Memory-efficient data structures
    • Allow for incremental parsing in the future
  • Adheres to PHP language spec, supports both PHP5 and PHP7 grammars
  • Generated AST provides properties (fully representative, etc.) necessary for semantic and transformational operations, which also need to be performant.
    • Fully representative and round-trippable back to the text it was parsed from (all whitespace and comment "trivia" are included in the parse tree)
    • Possible to easily traverse the tree through parent/child nodes
    • < 100 ms UI response time, so each language server operation should be < 50 ms to leave room for all the other stuff going on in parallel.
  • Simple and maintainable over time - parsers have a tendency to get really confusing, really fast, so readability and debug-ability is high priority.
  • Testable - the parser should produce provably valid parse trees. We achieve this by defining and continuously testing a set of invariants about the tree.
  • Friendly and descriptive API to make it easy for others to build on.
  • Written in PHP - make it as easy as possible for the PHP community to consume and contribute.

Current Status and Approach

To ensure a sufficient level of correctness at every step of the way, the parser is being developed using the following incremental approach:

  • Phase 1: Write lexer that does not support PHP grammar, but supports EOF and Unknown tokens. Write tests for all invariants.
  • Phase 2: Support PHP lexical grammar, lots of tests
  • Phase 3: Write a parser that does not support PHP grammar, but produces tree of Error Nodes. Write tests for all invariants.
  • Phase 4: Support PHP syntactic grammar, lots of tests
  • Phase 5 (in progress 🏃): Real-world validation and optimization
    • Correctness: validate that there are no errors produced on sample codebases, benchmark against other parsers (investigate any instance of disagreement), fuzz-testing
    • Performance: profile, benchmark against large PHP applications
  • Phase 6: Finalize API to make it as easy as possible for people to consume.

Additional notes

A few of the PHP grammatical constructs (namely yield-expression, and template strings) are not yet supported and there are also other miscellaneous bugs. However, because the parser is error-tolerant, these errors are handled gracefully, and the resulting tree is otherwise complete. To get a more holistic sense for where we are, you can run the "validation" test suite (see Contributing Guidelines for more info on running tests). Or simply, take a look at the current validation test results.

Even though we haven't yet begun the performance optimization stage, we have seen promising results so far, and have plenty more room for improvement. See How It Works for details on our current approach, and run the Performance Tests on your own machine to see for yourself.

Learn more

🎯 Design Goals - learn about the design goals of the project (features, performance metrics, and more).

📖 Documentation - learn how to reference the parser from your project, and how to perform operations on the AST to answer questions about your code.

👀 Syntax Visualizer Tool - get a more tangible feel for the AST. Get creative - see if you can break it!

📈 Current Status and Approach - how much of the grammar is supported? Performance? Memory? API stability?

🔧 How it works - learn about the architecture, design decisions, and tradeoffs.

💖 Contribute! - learn how to get involved, check out some pointers to educational commits that'll help you ramp up on the codebase (even if you've never worked on a parser before), and recommended workflows that make it easier to iterate.


This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.