Skip to content

Instantly share code, notes, and snippets.

@jbarth-ubhd
jbarth-ubhd / inspect-wordlist.pl
Last active February 14, 2024 04:12
inspect wordlist of tesserat traineddata file
#!/usr/bin/perl
use strict;
use utf8;
use warnings; no warnings "uninitialized";
use autodie qw(:all);
use Getopt::Long;
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
use File::Temp qw/tempdir/;

Comparison of text image compression

I've compared mozjpeg, webp, jpeg2000 (openjpeg & an proprietary encoder), jpeg-xl and avif with a text-like image containing circles with different width and tint.

See comparison-2022-11

The generated 920×920 px images (with different noise levels) are compressed with

/opt/mozjpeg/bin/cjpeg -quant-table 4 -q 75  
@jbarth-ubhd
jbarth-ubhd / abbyy2page.pl
Created January 14, 2022 12:46
minimalistic ABBYY XML to PAGE XML
#!/usr/bin/perl
use strict;
use utf8;
use XML::LibXML;
use XML::Quote;
binmode STDOUT, ":utf8";
my $dom=XML::LibXML->load_xml(location=>$ARGV[0]);
my $root=$dom->documentElement;
#!/usr/bin/perl
use strict;
use utf8;
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
use XML::LibXML;
use File::Slurp;
use XML::Quote;
use List::Util qw(min max);