Changelog
spaghetti 0.3.0
Breaking changes
Locale translation data is no longer bundled with the package. Previously,
inst/extdata/excel_functions.rdsshipped a pre-built function-name translation table extracted from the Microsoft Terminology Collection. Microsoft does not publish an explicit license for the Terminology Collection contents, so this package now ships only the parser. Users invokesetup_terminology()once per machine to download (~100 MB) and parse the data into a per-user cache directory (tools::R_user_dir("spaghetti", "data")).to_xml()/from_xml()/check_formula()calls with a non-NULLlocaleargument will now error if the terminology cache has not been built. The error message directs the user tosetup_terminology(). Calls withlocale = NULL(the default) work as before with no setup required.R version requirement bumped to R >= 4.0.0 (for
tools::R_user_dir).
New features
setup_terminology(expected_sha256, force, workers, quiet): downloads the Microsoft Terminology Collection zip from Microsoft’s public download URL, verifies it against a known-good SHA-256 digest (overridable; mismatches warn but do not abort, since Microsoft can republish the file at any time), unzips, parses, and writes the cached RDS. Passexpected_sha256 = NULLto skip verification. Thin R wrapper around the parser script ininst/extdata/parse_locales.R.terminology_info(): returns the provenance attributes attached to the cached RDS at download time — source URL, observed SHA-256, timestamp, spaghetti version, cache path, function count, locale count. Useful for verifying which version of the Microsoft data is currently loaded.has_terminology(): returns TRUE if a terminology cache is loaded.clear_terminology(): removes the cached RDS.inst/extdata/parse_locales.Rgains adownload_and_parse_tbx()function that the R wrapper invokes. The script can also be sourced manually (source(system.file("extdata", "parse_locales.R", package = "spaghetti"))) for users who want to drive the parser themselves.
Internal
openxlsx2anddigestare declared inSuggests;setup_terminology()checks for them at runtime..onLoad()looks for a cached RDS in the user data directory; if found it loads it silently, otherwise leaves the locale tables empty.
spaghetti 0.2.0
Breaking changes
round_trip()now returns a list with elementsxmlandformula. The previous element name wasexcel(and earlierlocale, which was a bug). Update any code readingrt$excelorrt$localetort$formula.The
LEGACYfunction registry has been split intoLEGACY_WORKSHEET(modern worksheet functions) andLEGACY_XLM(Excel 4 macro language). External code accessingspaghetti:::.spaghetti_env$LEGACYmust use one of the new names.function_prefix()continues to return"legacy"for both.The R/Zzz.R file was renamed to R/zzz.R to match standard convention and fix incorrect documentation about ASCII source order.
IDENT tokens (LAMBDA / LET parameters, named ranges) are no longer passed through the locale lookup in
from_xml(). The previous behaviour would mis-translate a LAMBDA parameter named e.g.sumto the local function name for SUM. If you relied on locale-translation of bare identifiers, this is a breaking change.
New features
-
Lexer now distinguishes a dedicated
REFtoken type fromIDENT. Cell references — including sheet-qualified, quoted-sheet, 3D, external-workbook, and structured table refs — are emitted as a single REF token:-
Sheet1!A1,'My Sheet'!A1 -
Sheet1:Sheet5!A1(3D range across sheets) -
[Book1]Sheet1!A1,[1]Sheet1!A1,'[Book1.xlsx]Sheet1'!A1 -
Table1[Col],Table1[#Headers],Table1[@Col],Table1[[#All],[Col]]
-
@FUNC(...)now wraps as_xlfn.SINGLE(FUNC(...))instead of dropping the function call.Excel error literals (
#REF!,#N/A,#DIV/0!,#VALUE!,#NAME?,#NUM!,#NULL!,#GETTING_DATA,#SPILL!,#CALC!,#BLOCKED!,#FIELD!,#PYTHON!, etc.) lex as opaque values rather than being fragmented as#(anchor) + identifier.Array literals
{...}are lexed as a single opaque token. Internal column (,) and row (;) separators are no longer normalised by the outer locale-separator pass, so=SUM({1,2;3,4};A1)round-trips correctly under German locale.User-typed whitespace inside ranges (
A1 : B10) is normalised to the compact form (A1:B10) into_xml()output. The range-intersection operator (a space between two refs,A1:B10 C5:D15) is preserved.Locale codes can now be multi-segment (
de-DE,pt-BR,zh-Hans,zh-Hans-CN). Lookup tries the full string first then falls back segment-by-segment (zh-Hans-CN → zh-Hans → zh).xlex()now labels three new token kinds explicitly:errorfor error literals,arrayfor{...}literals, andintersectionfor a space between two refs.
Bug fixes
LAMBDA / LET parameter detection no longer prefixes named ranges that appear in the body. The rule is: at depth==1 of the innermost LAMBDA/LET, an IDENT is a bound name iff its next token is
,.Number lexer no longer consumes trailing
+/-as part of a number.1+2and similar arithmetic now tokenises correctly. Scientific notation1e-3,1.5E+2still works.is_ooxml()regex now detects_xlws.in addition to_xlfn.and_xlpm...spell_suggest()orders results by edit distance instead of returning matches in registry order.TEXTJOINis no longer duplicated across the LEGACY and XLFN tiers.function_table()no longer leaks the internalterm_idcolumn.from_xml()recursively transforms the inner tokens of_xlfn.ANCHORARRAY(...)and_xlfn.SINGLE(...), so nested prefixed function calls inside the wrapper get their prefixes stripped and their names localised correctly.parse_locales.Rno longer auto-executes its bottom-of-file call when the script issource()d. Wrap it inif (sys.nframe() == 0L && ...).
Performance
O(n²) list-growth eliminated from
.tokenise(),.transform_to_xml(), and.transform_from_xml()by preallocating output lists and using amortised-growth indexing.Lexer switched from per-character
paste0(s, c)accumulation to substring extraction, reducing character-level allocation in long formulas.Locale lookups in
.locale_to_english()/.english_to_locale()are now O(1) hashed environment lookups instead of linear column scans with per-calltoupper()..spell_suggest()andcheck_formula()use a cachedALL_KNOWNvector built once at.onLoad()..get_sep()in vapply hot loops is now lifted once outside the loop body into_xml(),from_xml(), andcheck_formula().