Go Back   ENCODE.DREAMHOSTERS.COM FORUM > Data Compression
TCC Members List Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
Old 11th June 2008, 10:24   #1
kaitz
Banned
 
Join Date: May 2008
Location: EE
Posts: 90
Default Paq8o10t

text detection (utf-8 partial)
Nestmodel form paq8g
modified contextModel2 like this:
    switch (filetype)
    {
    case TXTUTF8:
    case TEXT: { 
            sparseModel(m,ismatch,order);
            nestModel(m);
            wordModel(m);
            indirectModel(m);
            dmcModel(m);
             break;
        }
    case EXE: {
            sparseModel(m,ismatch,order);
            indirectModel(m);
            dmcModel(m);
            exeModel(m);
            break;
        } 
    case BMPFILE1: break;
    default: { 
            sparseModel(m,ismatch,order);
            distanceModel(m);
            picModel(m);
            recordModel(m);  
            indirectModel(m);
            dmcModel(m);
            break;
        } 
    }
Speed is better in most cases. Compression is same on some data and worse on other data.

Word model modifications are based on paq8hp12.
Memory usage is increased.
Attached Files
File Type: zip paq8o10t.zip (110.0 KB, 374 views)

Last edited by kaitz : 11th June 2008 at 16:35.
kaitz is offline Reply With Quote
Old 11th June 2008, 11:47   #2
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Thumbs up

Thanks kaitz!

Mirror: Download

Last edited by LovePimple : 11th June 2008 at 15:52.
LovePimple is offline Reply With Quote
Old 11th June 2008, 13:27   #3
kaitz
Banned
 
Join Date: May 2008
Location: EE
Posts: 90
Post

paq8o10t.exe -4 A10.jpg      842468 -> 640447   Time 27.69 sec,   used 130692920 MEM
paq8o9.exe   -4 A10.jpg      842468 -> 640447   Time 27.25 sec,   used 147485800 MEM
paq8o10t.exe -4 AcroRd32.exe 3870784 -> 942896  Time 357.25 sec,  used 135288361 MEM
paq8o9.exe   -4 AcroRd32.exe 3870784 -> 931760  Time 463.66 sec,  used 134210601 MEM
paq8o10t.exe -4 english.dic  4067439 -> 391480  Time 435.09 sec,  used 128946859 MEM
paq8o9.exe   -4 english.dic  4067439 -> 390358  Time 423.70 sec,  used 127873259 MEM
paq8o10t.exe -4 FlashMX.pdf  4526946 -> 3563114 Time 454.69 sec,  used 148559391 MEM
paq8o9.exe   -4 FlashMX.pdf  4526946 -> 3563934 Time 519.47 sec,  used 147485791 MEM
paq8o10t.exe -4 FP.LOG       20617071 -> 269979 Time 2309.97 sec, used 128946868 MEM
paq8o9.exe   -4 FP.LOG       20617071 -> 275173 Time 2354.98 sec, used 127873268 MEM
paq8o10t.exe -4 MSO97.DLL    3782416 -> 1340660 Time 350.36 sec,  used 149620515 MEM
paq8o9.exe   -4 MSO97.DLL    3782416 -> 1328473 Time 510.41 sec,  used 148546915 MEM
paq8o10t.exe -4 ohs.doc      4168192 -> 490089  Time 143.16 sec,  used 148559399 MEM
paq8o9.exe   -4 ohs.doc      4168192 -> 487629  Time 156.78 sec,  used 147485799 MEM
paq8o10t.exe -4 rafale.bmp   4149414 -> 551463  Time 59.91 sec,   used 116595869 MEM
paq8o9.exe   -4 rafale.bmp   4149414 -> 551466  Time 64.48 sec,   used 133388749 MEM
paq8o10t.exe -4 vcfiu.hlp    4121418 -> 413225  Time 373.06 sec,  used 128946863 MEM
paq8o9.exe   -4 vcfiu.hlp    4121418 -> 405263  Time 467.00 sec,  used 127873263 MEM
paq8o10t.exe -4 world95.txt  2988578 -> 367188  Time 368.76 sec,  used 128946859 MEM
paq8o9.exe   -4 world95.txt  2988578 -> 370766  Time 357.42 sec,  used 127873259 MEM
Encode's Compression Corpus (EncCC)
paq8o10t.exe -4 Doom3.exe   5427200 -> 1060739  Time 496.59 sec, used 130007983 MEM
paq8o9.exe   -4 Doom3.exe   5427200 -> 1040972  Time 631.25 sec, used 128934383 MEM
paq8o10t.exe -4 Reaktor.exe 14446592 -> 1212197 Time 1195.41 sec, used 130007978 MEM
paq8o9.exe   -4 Reaktor.exe 14446592 -> 1185980 Time 1495.48 sec, used 128934378 MEM

Last edited by kaitz : 11th June 2008 at 16:32.
kaitz is offline Reply With Quote
Old 11th June 2008, 16:48   #4
encode
Administrator
 
encode's Avatar
 
Join Date: May 2008
Location: Moscow, Russia
Posts: 961
Default

I think we should do some advanced custom model set switching. i.e. detecting file-type dynamically and choosing model sets accordingly. For example, as with PAQ6 we may check for recent 0x00 (zero byte) or 0x20 (space character) to determine TEXT/BINARY data.
encode is offline Reply With Quote
Old 11th June 2008, 18:34   #5
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Thumbs up

Here's my own speed optimised compile for Pentium Pro or later processor.

ENJOY!
Attached Files
File Type: zip paq8o10tlp.zip (63.8 KB, 324 views)
LovePimple is offline Reply With Quote
Old 11th June 2008, 18:34   #6
kaitz
Banned
 
Join Date: May 2008
Location: EE
Posts: 90
Default

paq8o10t.exe -4 Mech8.s3m 747600 -> 296673 Time 69.67 sec, 128946864 MEM
paq8o9.exe   -4 Mech8.s3m 747600 -> 295853 Time 83.09 sec, 127873264 MEM
paq8o10t.exe -4 PariahInterface.utx 24375895 -> 3814862 Time 2093.17 sec, 111080362 MEM
paq8o9.exe   -4 PariahInterface.utx 24375895 -> 3803631 Time 2650.56 sec, 127873242 MEM
kaitz is offline Reply With Quote
Old 11th June 2008, 19:23   #7
joerg
Member
 
Join Date: May 2008
Location: Germany
Posts: 92
Default

hi lovepimple

unfortunately F-Secure AntiVirus & AntiSpy
detects your implementation "paq8o10tlp"
as W32/Suspicious U.gen (virus)

the original "paq8o10t" has no such effect.

can you tell us , what is the secret of your "implementation"/"compile"

which modification have you done?
joerg is offline Reply With Quote
Old 11th June 2008, 19:42   #8
kaitz
Banned
 
Join Date: May 2008
Location: EE
Posts: 90
Default

joerg wrote:
hi lovepimple

unfortunately F-Secure AntiVirus & AntiSpy
detects your implementation "paq8o10tlp"
as W32/Suspicious U.gen (virus)

the original "paq8o10t" has no such effect.

can you tell us , what is the secret of your "implementation"/"compile"

which modification have you done?
My Bitdefender did not detect nothing.
W32/Suspicious U.gen (virus)
kaitz is offline Reply With Quote
Old 11th June 2008, 20:12   #9
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

joerg wrote:
hi lovepimple

unfortunately F-Secure AntiVirus & AntiSpy
detects your implementation "paq8o10tlp"
as W32/Suspicious U.gen (virus)

the original "paq8o10t" has no such effect.

can you tell us , what is the secret of your "implementation"/"compile"

which modification have you done?
Its a false positive. F-Secure AntiVirus & AntiSpy detects it as a upack compressed file, but can't decompress the file, so it cheats by reporting "W32/Suspicious U.gen (virus)".

See this thread for more compile info.
http://www.encode.ru/forum/showthread.php?t=65
LovePimple is offline Reply With Quote
Old 11th June 2008, 23:54   #10
joerg
Member
 
Join Date: May 2008
Location: Germany
Posts: 92
Default

thank you lovepimple

if i understand good:
You dont know, which compiler-switches are used
for the resulting compile ?
joerg is offline Reply With Quote
Old 12th June 2008, 06:40   #11
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

Correct!
LovePimple is offline Reply With Quote
Old 12th June 2008, 13:11   #12
joerg
Member
 
Join Date: May 2008
Location: Germany
Posts: 92
Default

hi lovepimple

can you please help me to avoid confusion
about compression level and memory usage:

without compression: -0 = ???? MB

fast compression: -1 = 35 MB, -2 = 48 MB, -3 = 59 MB, -4 = 133 MB

standard mode: -5 = 233 MB

better compression: -6 = 435 MB, -7 = 837 MB, -8 = 1643 MB, -9 = ???? MB

a) is this right ?
b) which amount of memory is used for -9 ?
c) which amount of memory is used for -0 ?
joerg is offline Reply With Quote
Old 12th June 2008, 19:01   #13
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

a) Yes, but memory usage is slightly increased for this latest version.

b) The -9 option would need about 3290 MB.

c) About 0.5 MB.
LovePimple is offline Reply With Quote
Old 13th June 2008, 01:40   #14
Matt Mahoney
Expert
 
Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 291
Default

AFAIK there is no -9 option.

Anyway, paq8o10t now has the best enwik8 compression for a non-dictionary based program (paq8hp* and durilca* use dictionaries).

http://cs.fit.edu/~mmahoney/compression/text.html#1323
http://cs.fit.edu/~mmahoney/compression/#paq

I didn't test enwik9. A test would take over 3 days. enwik8 ran overnight (8 hours to compress and decompress).
Matt Mahoney is offline Reply With Quote
Old 13th June 2008, 02:03   #15
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

Matt Mahoney wrote:
AFAIK there is no -9 option.
That's what I thought until I tried the -9 option and it returned with an 'out of memory' error.

From the paq8o10t source code:
 
int main(int argc, char** argv) {
  bool pause=argc<=2;  // Pause when done?
  try {

    // Get option
    bool doExtract=false;  // -d option
    if (argc>1 && argv[1][0]=='-' && argv[1][1] && !argv[1][2]) {
      if (argv[1][1]>='0' && argv[1][1]<='9')
        level=argv[1][1]-'0';
      else if (argv[1][1]=='d')
        doExtract=true;
      else
        quit("Valid options are -0 through -9 or -d\n");
      --argc;
      ++argv;
      pause=false;
    }
LovePimple is offline Reply With Quote
Old 13th June 2008, 16:56   #16
kaitz
Banned
 
Join Date: May 2008
Location: EE
Posts: 90
Exclamation

BTW Matt you got mistake on your webpage in History section.
 paq8o10z is by KZ, June 11, 2008. Compression .........
It is t not z.
kaitz is offline Reply With Quote
Old 15th June 2008, 04:23   #17
joerg
Member
 
Join Date: May 2008
Location: Germany
Posts: 92
Default

@lovepimple
"That's what I thought until I tried the -9 option and
it returned with an 'out of memory' error."

microsoft says:
win32 - The virtual address space of processes and applications
is still limited to 2 GB unless the /3GB switch is used in the Boot.ini file.

May be because this the program displays the "out of memory" ?

the compression-ratio-result is awesome

a oracle-dump with 648.331.264 bytes
is compressed (with -7) to 9.714.384 bytes
7zip compresses it (with -mx=9) to 35.151.362 bytes
rings 1.5c compresses it to 26.580.783 bytes

but paq8o10tlp needs 24 hours
7zip needs 0,5 hour
rings 1.5c needs 2 minutes

for me at runs on a windows server 2003 with 2x XEON 2,8 GHz and 4 MB

in practice i use 7zip because it has full directory-support

@kz
the resulting compression-ratio is awesome

especially i want to remark that this program
do not block the whole system
- that is wonderfull
that means i can work on the system
with an other programm with lower requirements at the same time

1. have we any chance to modify the paq8o10t
to use two processors or two cores?
2. have we any chance to modify the paq8o10t
to compress a complete directory with subdirectories
inclusive storing the path and filenames?

best regards Joerg
joerg is offline Reply With Quote
Old 15th June 2008, 23:32   #18
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Thumbs up

This build coaxes a little more speed from paq8o10t. I have also removed the obsolete -9 option from this release.

ENJOY!
Attached Files
File Type: rar paq8o10tlp2.rar (106.1 KB, 379 views)
LovePimple is offline Reply With Quote
Old 15th June 2008, 23:45   #19
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

Some timings from my AMD Sempron 2400+ machine:

paq8o10t -4 world95.txt
Time 988.53 sec, used 128946859 bytes of memory
text 2988578

paq8o10tlp -4 world95.txt
Time 923.33 sec, used 128946859 bytes of memory
text 2988578

paq8o10tlp2 -4 world95.txt
Time 891.58 sec, used 128946859 bytes of memory
text 2988578
LovePimple is offline Reply With Quote
Old 16th June 2008, 01:42   #20
Rugxulo
Member
 
Join Date: Jun 2008
Location: USA
Posts: 104
Default

kaitz wrote:
BTW Matt you got mistake on your webpage in History section.
 paq8o10z is by KZ, June 11, 2008. Compression .........
It is t not z.
He probably got confused between paq8o8z and your paq8o10t. (BTW, his whole site seems down, ATM. Weird.)
Rugxulo is offline Reply With Quote
Old 16th June 2008, 01:43   #21
Rugxulo
Member
 
Join Date: Jun 2008
Location: USA
Posts: 104
Default

LovePimple wrote:
This build coaxes a little more speed from paq8o10t. I have also removed the obsolete -9 option from this release.

ENJOY!
Dare I ask, but what compiler did you use? It's not MinGW nor OpenWatcom (unless you used some kind of external tool). BTW, MinGW stuff seems to hate running on non-admin cpus (e.g. Vista, "tmpfile: access denied"). Bleh.
Rugxulo is offline Reply With Quote
Old 16th June 2008, 01:53   #22
Black_Fox
Tester
 
Black_Fox's Avatar
 
Join Date: May 2008
Location: Brno, Czechia
Posts: 255
Default

Even though main site is down, you can use (usually outdated, but not now) mirror.

Also, I have tested PAQ8o10t - nice speedup.
__________________
I am... ߣαск_ƒo×... my discontinued benchmark, all non-copyrighted files from it can be downloaded here

Last edited by Black_Fox : 16th June 2008 at 02:03.
Black_Fox is offline Reply With Quote
Old 16th June 2008, 03:30   #23
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Thumbs up

Thanks BF!
LovePimple is offline Reply With Quote
Old 17th June 2008, 00:36   #24
Rugxulo
Member
 
Join Date: Jun 2008
Location: USA
Posts: 104
Default

Black_Fox wrote:
Even though main site is down, you can use (usually outdated, but not now) mirror.
His main site is back up now.

BTW, no answer re: my compiler question, LovePimple? Also, you didn't mirror paq8o8z (bah). Anyways, don't forget that Geocities has an hourly bandwidth limit for downloads (5 MB, IIRC). You may wish to get a Google Pages site instead (100 MB vs. wimpy 15 MB storage, anyways).
Rugxulo is offline Reply With Quote
Old 17th June 2008, 02:14   #25
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

Rugxulo wrote:
BTW, no answer re: my compiler question, LovePimple? )
That's because this question has already been answered in my reply to joerg's post above. For the record, its GCC 4.3.0.


Rugxulo wrote:
Also, you didn't mirror paq8o8z (bah).
I only mirror the latest versions. AFAIC paq8o8 is history.


Rugxulo wrote:
Anyways, don't forget that Geocities has an hourly bandwidth limit for downloads (5 MB, IIRC). You may wish to get a Google Pages site instead (100 MB vs. wimpy 15 MB storage, anyways).
I'm well aware of the badwidth limit, but I wouldn't swap to Google Pages ATM because I have been using Geocities for many years, and its always been a very reliable service. I will probably swap to GP sometime in the future.
LovePimple is offline Reply With Quote
Old 17th June 2008, 19:52   #26
Matt Mahoney
Expert
 
Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 291
Default

Rugxulo wrote:
He probably got confused between paq8o8z and your paq8o10t. (BTW, his whole site seems down, ATM. Weird.)
OK, it is fixed now, and site is back up.

Also I ran some tests of durilca4linux_3 v3 with 2 GB. It still beats paq8hp12.
Matt Mahoney is offline Reply With Quote
Old 20th June 2008, 05:30   #27
Rugxulo
Member
 
Join Date: Jun 2008
Location: USA
Posts: 104
Default

LovePimple wrote:
That's because this question has already been answered in my reply to joerg's post above. For the record, its GCC 4.3.0.
Sorry, I didn't read the other thread.

I only mirror the latest versions. AFAIC paq8o8 is history.
And paq8o9 isn't? What about PKZIP 2.50/DOS ? Bzip 1.0.4 ? Whatever, do what you want, it's your site. (BTW, there's a DOS/DJGPP port of p7zip or you could run Win32's faster 7ZA under HXRT.)

I'm well aware of the bandwidth limit, but I wouldn't swap to Google Pages ATM because I have been using Geocities for many years, and its always been a very reliable service. I will probably swap to GP sometime in the future.
They aren't mutually exclusive. I think you can have both. (Mirror that mirror!) BTW, Geocities has been owned by Yahoo! for quite a while, and I don't think they've increased the space storage in 10 years!
Rugxulo is offline Reply With Quote
Old 20th June 2008, 16:35   #28
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Default

Rugxulo wrote:
And paq8o9 isn't? What about PKZIP 2.50/DOS ? Bzip 1.0.4 ?
PKZIP v2.50 is the latest version for DOS. The other (older) files are there for good reason, but I don't intend sending everyone to sleep with the explanation.


Rugxulo wrote:
Whatever, do what you want, it's your site.
Exactly!
LovePimple is offline Reply With Quote
Old 29th June 2008, 03:27   #29
Rugxulo
Member
 
Join Date: Jun 2008
Location: USA
Posts: 104
Default

Rugxulo wrote:
BTW, MinGW stuff seems to hate running on non-admin cpus (e.g. Vista, "tmpfile: access denied"). Bleh.
Apparently, MS' implementation (in MSVCRT.DLL ??) of tmpfile() and tmpfile_s() both use the root dir for placing files. You have to use something else like tmpnam_s() instead. (This means everybody here using MinGW or similar should be aware of this.)
Rugxulo is offline Reply With Quote
Old 29th June 2008, 16:10   #30
LovePimple
Moderator
 
Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
Thumbs up

Thanks for the info.
LovePimple is offline Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT +4. The time now is 15:13.


Powered by vBulletin; Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.