Skip to content

0.4.0

Latest
Compare
Choose a tag to compare
@unreadablewxy unreadablewxy released this 30 Jan 10:08
ec68bdc

It seems a recent libmagic regression (detected on Gentoo and Arch) is causing webm files to be incorrectly identified. If you have them in your mono-collection, it might be a good time to ask for a patrolling read against your by-id index

Have received some complaints that the *nix binaries are built with WAY too new glibc. So they will now be built on latest release of Debian instead of bleeding edge Gentoo.

Breaking Changes

  • Risk: moderate. Deprecated source_* parameters has been dropped
    • This affects qualifier expressions of all stages of the pipeline
    • This also affects transform argument generation
  • Risk: moderate. Store qualifiers and path generation no longer bind file_* attributes (except for file_extension)
    • Offering files to stores is a self contained process. Hoppers can be configured to auto invoke this process after certain files are ingested, but should not change said process. To convey extra information when auto invoked by hoppers is contrarian to this design
    • If we need per-file attributes lets design it properly as opposed to hacking pieces of it onto two colocated features

New Features

  • Added inline named capture groups support for regex
    • Realized through the PCRE2 library
    • Yes these are still applied at a lower precedence to named constants
    • Yes this means we now support match specific group attributes
  • Regex qualifiers now support minimum match length thresholds
    • The new value for the include config directive is PROPERTY /EXPRESSION/FLAGS THRESHOLD
    • eg: require the expression match at least 50% of the value include = x /\d+/ 50%
    • eg: require the expression match at least 12 characters include = x /\d+/ 12

Behavior Changes

  • Workflows resumed through WIP files now bypass hopper evaluation
    • WIP files now contain group attributes as well as workflow parameters, allowing manual touch ups
  • Store qualifiers and path generation now bind file_extension from the file identification process instead of copied verbatim from the imported file's path
  • Order assignment now sorts all files by length then character codes
    • This ensures semantically correct order for variable length numbers in file names: 0, 1, 10, 11, 2, 3 (the order without length factoring)
    • Another happy coincidence is this tends to cluster together similarly named files

Performance

  • Removed extraneous memory allocations from INI parsing
  • Removed unnecessary memory allocations for attribute matching at the cost of a bit of short lived heap fragmentation
  • Time complexity of matching files has been improved from m log(n) to m + n

Bug Fixes

  • Reduced FFMPEG warning spam when dealing with JPEG files
    • A side effect of this change is that phash has started producing slightly different results
    • So do not be alarmed if you see a lot of phash corrections while patrolling by-id