Parsing SQL

Parsing is done by \sad_spirit\pg_builder\Parser class backed by \sad_spirit\pg_builder\Lexer. The latter splits the SQL string into tokens, the former goes over the tokens building the Abstract Syntax Tree.

This section describes usage of these classes. An additional section describes implementation details that may be of interest to those trying to extend pg_builder in some way.

Parser API

Tip

It is generally not necessary to manually call methods that parse SQL statements or their parts: use StatementFactory::createFromString() to parse complete statements or one of its builder methods to create Statement instances from scratch. Statement instances created by StatementFactory will automatically accept strings for their properties and call relevant Parser methods.

It may be necessary to configure the Parser instance, however. Parser constructor accepts an instance of Lexer (see below for its configuration options) and an optional instance of a class implementing CacheItemPoolInterface from PSR-6.

parseSomething() methods

All public Parser methods that have parse prefix and process (parts of) SQL statements are actually overloaded via __call() magic method. It contains code for getting / setting cache if available, tokenizing strings with Lexer and forwarding a call to a protected method that does actual parsing work.

Several dozens of such methods are defined, e.g.

parseStatement(string|TokenStream $input): Statement

Parses a complete SQL statement. Used internally by StatementFactory::createFromString().

parseTypeName(string|TokenStream $input): nodes\TypeName

Parses a type name. Used by converters\BuilderSupportDecorator so that it can handle any type name Postgres itself can.

Other parse*() methods are used by Node implementations that accept strings for their properties or array offsets.

Caching of ASTs

Parser can automatically cache ASTs generated by its parseSomething() methods. You only need to provide an instance of class implementing CacheItemPoolInterface from PSR-6 either to Parser constructor or to its setCache(CacheItemPoolInterface $cache): void method.

$parser = new Parser(new Lexer(), new CacheImplementation());

$parser->setCache(new AnotherCacheImplementation());

ASTs will be stored in cache under keys having parsetree- prefix.

Tip

Unserializing AST is at least 4-5 times faster than creating it from SQL. Use cache if possible.

Lexer API

The class has only one public method

tokenize(string $sql): \sad_spirit\pg_builder\TokenStream

Tokenizes the input string. Usually you don’t need to call it yourself as it is automatically called by Parser when a string is passed to any of its parse*() methods.

You may need to set options via Lexer’s constructor, however:

'standard_conforming_strings'

Has the same meaning as postgresql.conf parameter of the same name: when true (default), then backslashes in '...' strings are treated literally, when false they are treated as escape characters. Backslashes in e'...' strings are always treated as escape characters, of course.

use sad_spirit\pg_builder\Lexer;

$strings = <<<TEST
'foo\\\\bar' e'foo\\\\bar'
TEST;

$lexerStandard = new Lexer([
    'standard_conforming_strings' => true
]);

$lexerNonStandard = new Lexer([
    'standard_conforming_strings' => false
]);

echo $lexerStandard->tokenize($strings)
     . "\n\n"
     . $lexerNonStandard->tokenize($strings);

will output

string literal 'foo\\bar' at position 0
string literal 'foo\bar' at position 11
end of input

string literal 'foo\bar' at position 0
string literal 'foo\bar' at position 11
end of input