| [-+][0-9]+(\.[0-9]+)?([eE][-+][0-9]+)? | number | 
| [:alpha:][:alnum:]+ | word | 
| [:space:]+ | skipped | 
| anything else | single-character | 
Character classification is based on the C-library iswalnum() etc. functions. Recognised numbers are passed to Prolog read/1, supporting unbounded integers.
It is likely that future versions of this library will provide tokenize_atom/3 with additional options to modify space handling as well as the definition of words.