Parse standard balanced group within verbatim context

by siracusa   Last Updated October 10, 2019 01:23 AM - source

Say I want to write a custom parser which reads characters in a verbatim context, where the parser argument should be split into several kinds of tokens:

  • Ordinary characters, i.e. all characters beside spaces and \ (and also the } that limits the argument), which should be printed literally;
  • spaces, which are just ignored;
  • characters escaped by \, which are also printed literally; and
  • snippets of arbitrary TeX code, interspersed in the form \{...}.

All tokens are parsed in a context where every character has catcode 12. For the TeX snippets the original catcode table should be used temporarily. Also, the argument string for parsing should be read from the current input stream, because it may include unbalanced braces.

The code below pretty much implements the above requirements, with the only limitation that the TeX snippets are only parsed correctly if preceded by an extra opening brace, i.e. \{{...}. This problem arises because the parser has to look ahead what character follows \ to decide if it is an escaped character or the beginning of a TeX snippet, thus assigning the wrong catcode to { if a balanced group follows.

Is there a way to somehow back up the missing { in the token stream such the TeX's normal argument grabber can be used for collecting the snippet code? I would really like to avoid an extra prefix to signal the begin of a group or to rebuild a balanced group parser.

Current implementation:

\documentclass{article}
\usepackage{l3cctab}
\usepackage{xcolor}

\ExplSyntaxOn

\cs_set:Npn \__foo_temp:w #1 #2 {
    \exp_last_unbraced:NNf \cs_set_eq:NN #1 { \char_generate:nn { `#2 } { 12 } }
}
\__foo_temp:w \c__foo_space_char     \ %
\__foo_temp:w \c__foo_lbrace_char    \{
\__foo_temp:w \c__foo_rbrace_char    \}
\__foo_temp:w \c__foo_backslash_char \\

\cctab_new:N \g__foo_orig_cctab

\cs_new_protected:Npn \foo_parse:w {
    \cctab_gset:Nn \g__foo_orig_cctab { }
    \cctab_begin:N \c_other_cctab
    \__foo_parse_aux:w
}

\cs_new_protected:Npn \__foo_parse_aux:w #1 {
    \token_if_eq_charcode:NNTF #1 \c__foo_lbrace_char
        { \__foo_parse_token:w }
        { \PackageError {foo} { Wrong~argument } }
}

\cs_new_protected:Npn \__foo_parse_token:w #1 {
    \token_if_eq_charcode:NNTF #1 \c__foo_rbrace_char {
        \__foo_parse_finish:
    } {
        \token_if_eq_charcode:NNTF #1 \c__foo_backslash_char {
            \__foo_parse_esc_token:w
        } {
            % Print only if not a space
            \token_if_eq_charcode:NNF #1 \c__foo_space_char
                { \texttt{#1} }
            \__foo_parse_token:w
        }
    }
}

\cs_new_protected:Npn \__foo_parse_esc_token:w #1 {
    \token_if_eq_charcode:NNTF #1 \c__foo_lbrace_char {
        \cctab_begin:N \g__foo_orig_cctab
        % >>> How to insert the missing opening brace here? <<<
        \__foo_parse_balanced_group:n
    } {
        \texttt{#1}
        \__foo_parse_token:w
    }
}

\cs_new_protected:Npn \__foo_parse_balanced_group:n #1 {
    \fbox { #1 }
    \cctab_end:
    \__foo_parse_token:w
}

\cs_new_protected:Npn \__foo_parse_finish: {
    \cctab_end:
}

\ExplSyntaxOff

\begin{document}
\ExplSyntaxOn

\foo_parse:w{
    $foo_\b\a\r_
    \{{\textcolor{blue}{\LaTeX}} \}\}\}%{{{\\
} %   ^----- extra brace here

\ExplSyntaxOff
\end{document}


Answers 1


With Dirty Tricks, of course ;-)

To put it simply, you have:

\foo some text}

and you want \foo to grab everything up to the }, so the code of \foo has to put a { there. \foo has to be basically \fooaux{.

But TeX doesn't let you have an unbalanced { there, so you have to trick it into believing the braces are balanced. You can do it like this:

\def\foo{%
  \expandafter\fooaux\expandafter{\iffalse}\fi}
\def\fooaux#1{(#1)}

\foo some text}

and the output will be (some text).

TeX first replaces \foo by its definition. Then the two \expandafter will trigger the \iffalse which, being a false conditional, will eat everything to the matching \fi, which is the }. In this process the two \expandafter and the \iffalse}\fi disappear, leaving you with \fooaux{, which then matches with the some text} left in the input stream.

expl3-ifying that in your code:

\documentclass{article}
\usepackage{l3cctab}
\usepackage{xcolor}

\ExplSyntaxOn

\cs_set:Npn \__foo_temp:w #1 #2
  { \exp_last_unbraced:NNf \cs_new_eq:NN #1 { \char_generate:nn { `#2 } { 12 } } }
\__foo_temp:w \c__foo_space_char     \ %
\__foo_temp:w \c__foo_lbrace_char    \{
\__foo_temp:w \c__foo_rbrace_char    \}
\__foo_temp:w \c__foo_backslash_char \\

\cctab_new:N \g__foo_orig_cctab

\cs_new_protected:Npn \foo_parse:w
  {
    \cctab_gset:Nn \g__foo_orig_cctab { }
    \cctab_begin:N \c_other_cctab
    \__foo_parse_aux:w
  }
\cs_new_protected:Npn \__foo_parse_aux:w #1
  {
    \token_if_eq_charcode:NNTF #1 \c__foo_lbrace_char
      { \__foo_parse_token:w }
      { \PackageError {foo} { Wrong~argument } }
  }
\cs_new_protected:Npn \__foo_parse_token:w #1
  {
    \token_if_eq_charcode:NNTF #1 \c__foo_rbrace_char
      { \__foo_parse_finish: }
      {
        \token_if_eq_charcode:NNTF #1 \c__foo_backslash_char
          { \__foo_parse_esc_token:w }
          {
            % Print only if not a space
            \token_if_eq_charcode:NNF #1 \c__foo_space_char
              { \texttt{#1} }
            \__foo_parse_token:w
          }
      }
  }
\cs_new_protected:Npn \__foo_parse_esc_token:w #1
  {
    \token_if_eq_charcode:NNTF #1 \c__foo_lbrace_char
      {
        \cctab_begin:N \g__foo_orig_cctab
        \exp_after:wN
        \__foo_parse_balanced_group:n
          % Brace hack: removes this V
          \exp_after:wN { \if_false: } \fi:
      }
      {
        \texttt{#1}
        \__foo_parse_token:w
      }
  }
\cs_new_protected:Npn \__foo_parse_balanced_group:n #1
  {
    \fbox {#1}
    \cctab_end:
    \__foo_parse_token:w
  }
\cs_new_protected:Npn \__foo_parse_finish:
  { \cctab_end: }

\ExplSyntaxOff

\begin{document}
\ExplSyntaxOn

\foo_parse:w{
    $foo_\b\a\r_
    \{\textcolor{blue}{\LaTeX}} \}\}\}%{{{\\
} %  ^----- NO extra brace here :-)

\ExplSyntaxOff
\end{document}

and the output:

enter image description here

Phelype Oleinik
Phelype Oleinik
October 10, 2019 01:14 AM

Related Questions


Next word macro inside braces

Updated July 17, 2015 13:10 PM




\tl_replace_all:Nnn recurse subgroups

Updated June 05, 2016 08:09 AM