.. _parsing: =========== Parsing SQL =========== Parsing is done by ``\sad_spirit\pg_builder\Parser`` class backed by ``\sad_spirit\pg_builder\Lexer``. The latter splits the SQL string into tokens, the former goes over the tokens building the Abstract Syntax Tree. This section describes usage of these classes. An :ref:`additional section ` describes implementation details that may be of interest to those trying to extend **pg_builder** in some way. ``Parser`` API ============== .. tip:: It is generally not necessary to manually call methods that parse SQL statements or their parts: use :ref:`StatementFactory::createFromString() ` to parse complete statements or :ref:`one of its builder methods ` to create ``Statement`` instances from scratch. ``Statement`` instances created by ``StatementFactory`` will automatically accept strings for their properties and call relevant ``Parser`` methods. It may be necessary to configure the ``Parser`` instance, however. ``Parser`` constructor accepts an instance of ``Lexer`` (:ref:`see below ` for its configuration options) and an optional instance of a class implementing ``CacheItemPoolInterface`` from `PSR-6 `__. ``parseSomething()`` methods ---------------------------- All public ``Parser`` methods that have ``parse`` prefix and process (parts of) SQL statements are actually overloaded via ``__call()`` magic method. It contains code for getting / setting cache if available, tokenizing strings with ``Lexer`` and forwarding a call to a protected method that does actual parsing work. Several dozens of such methods are defined, e.g. ``parseStatement(string|TokenStream $input): Statement`` Parses a complete SQL statement. Used internally by :ref:`StatementFactory::createFromString() `. ``parseTypeName(string|TokenStream $input): nodes\TypeName`` Parses a type name. Used by :ref:`converters\\BuilderSupportDecorator ` so that it can handle any type name Postgres itself can. Other ``parse*()`` methods are used by ``Node`` implementations that accept strings for their properties or array offsets. .. _parsing-cache: Caching of ASTs --------------- ``Parser`` can automatically cache ASTs generated by its ``parseSomething()`` methods. You only need to provide an instance of class implementing ``CacheItemPoolInterface`` from `PSR-6 `__ either to ``Parser`` constructor or to its ``setCache(CacheItemPoolInterface $cache): void`` method. .. code-block:: php $parser = new Parser(new Lexer(), new CacheImplementation()); $parser->setCache(new AnotherCacheImplementation()); ASTs will be stored in cache under keys having ``parsetree-`` prefix. .. tip:: Unserializing AST is at least 4-5 times faster than creating it from SQL. Use cache if possible. .. _parsing-lexer: ``Lexer`` API ============= The class has only one public method ``tokenize(string $sql): \sad_spirit\pg_builder\TokenStream`` Tokenizes the input string. Usually you don't need to call it yourself as it is automatically called by ``Parser`` when a string is passed to any of its ``parse*()`` methods. You may need to set options via ``Lexer``'s constructor, however: ``'standard_conforming_strings'`` Has the same meaning as `postgresql.conf parameter `__ of the same name: when ``true`` (default), then backslashes in ``'...'`` strings are treated literally, when ``false`` they are treated as escape characters. Backslashes in ``e'...'`` strings are always treated as escape characters, of course. .. code-block:: php use sad_spirit\pg_builder\Lexer; $strings = << true ]); $lexerNonStandard = new Lexer([ 'standard_conforming_strings' => false ]); echo $lexerStandard->tokenize($strings) . "\n\n" . $lexerNonStandard->tokenize($strings); will output .. code-block:: output string literal 'foo\\bar' at position 0 string literal 'foo\bar' at position 11 end of input string literal 'foo\bar' at position 0 string literal 'foo\bar' at position 11 end of input