|
|
#1 |
|
Banned
Join Date: May 2008
Location: EE
Posts: 90
|
text detection (utf-8 partial)
Nestmodel form paq8g modified contextModel2 like this: switch (filetype)
{
case TXTUTF8:
case TEXT: {
sparseModel(m,ismatch,order);
nestModel(m);
wordModel(m);
indirectModel(m);
dmcModel(m);
break;
}
case EXE: {
sparseModel(m,ismatch,order);
indirectModel(m);
dmcModel(m);
exeModel(m);
break;
}
case BMPFILE1: break;
default: {
sparseModel(m,ismatch,order);
distanceModel(m);
picModel(m);
recordModel(m);
indirectModel(m);
dmcModel(m);
break;
}
}
Word model modifications are based on paq8hp12. Memory usage is increased. Last edited by kaitz : 11th June 2008 at 16:35. |
|
|
|
|
|
#3 |
|
Banned
Join Date: May 2008
Location: EE
Posts: 90
|
paq8o10t.exe -4 A10.jpg 842468 -> 640447 Time 27.69 sec, used 130692920 MEM paq8o9.exe -4 A10.jpg 842468 -> 640447 Time 27.25 sec, used 147485800 MEM paq8o10t.exe -4 AcroRd32.exe 3870784 -> 942896 Time 357.25 sec, used 135288361 MEM paq8o9.exe -4 AcroRd32.exe 3870784 -> 931760 Time 463.66 sec, used 134210601 MEM paq8o10t.exe -4 english.dic 4067439 -> 391480 Time 435.09 sec, used 128946859 MEM paq8o9.exe -4 english.dic 4067439 -> 390358 Time 423.70 sec, used 127873259 MEM paq8o10t.exe -4 FlashMX.pdf 4526946 -> 3563114 Time 454.69 sec, used 148559391 MEM paq8o9.exe -4 FlashMX.pdf 4526946 -> 3563934 Time 519.47 sec, used 147485791 MEM paq8o10t.exe -4 FP.LOG 20617071 -> 269979 Time 2309.97 sec, used 128946868 MEM paq8o9.exe -4 FP.LOG 20617071 -> 275173 Time 2354.98 sec, used 127873268 MEM paq8o10t.exe -4 MSO97.DLL 3782416 -> 1340660 Time 350.36 sec, used 149620515 MEM paq8o9.exe -4 MSO97.DLL 3782416 -> 1328473 Time 510.41 sec, used 148546915 MEM paq8o10t.exe -4 ohs.doc 4168192 -> 490089 Time 143.16 sec, used 148559399 MEM paq8o9.exe -4 ohs.doc 4168192 -> 487629 Time 156.78 sec, used 147485799 MEM paq8o10t.exe -4 rafale.bmp 4149414 -> 551463 Time 59.91 sec, used 116595869 MEM paq8o9.exe -4 rafale.bmp 4149414 -> 551466 Time 64.48 sec, used 133388749 MEM paq8o10t.exe -4 vcfiu.hlp 4121418 -> 413225 Time 373.06 sec, used 128946863 MEM paq8o9.exe -4 vcfiu.hlp 4121418 -> 405263 Time 467.00 sec, used 127873263 MEM paq8o10t.exe -4 world95.txt 2988578 -> 367188 Time 368.76 sec, used 128946859 MEM paq8o9.exe -4 world95.txt 2988578 -> 370766 Time 357.42 sec, used 127873259 MEM paq8o10t.exe -4 Doom3.exe 5427200 -> 1060739 Time 496.59 sec, used 130007983 MEM paq8o9.exe -4 Doom3.exe 5427200 -> 1040972 Time 631.25 sec, used 128934383 MEM paq8o10t.exe -4 Reaktor.exe 14446592 -> 1212197 Time 1195.41 sec, used 130007978 MEM paq8o9.exe -4 Reaktor.exe 14446592 -> 1185980 Time 1495.48 sec, used 128934378 MEM Last edited by kaitz : 11th June 2008 at 16:32. |
|
|
|
|
|
#4 |
AdministratorJoin Date: May 2008
Location: Moscow, Russia
Posts: 961
|
I think we should do some advanced custom model set switching. i.e. detecting file-type dynamically and choosing model sets accordingly. For example, as with PAQ6 we may check for recent 0x00 (zero byte) or 0x20 (space character) to determine TEXT/BINARY data.
![]() |
|
|
|
|
|
#5 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
Here's my own speed optimised compile for Pentium Pro or later processor.
ENJOY! ![]() |
|
|
|
|
|
#6 |
|
Banned
Join Date: May 2008
Location: EE
Posts: 90
|
paq8o10t.exe -4 Mech8.s3m 747600 -> 296673 Time 69.67 sec, 128946864 MEM paq8o9.exe -4 Mech8.s3m 747600 -> 295853 Time 83.09 sec, 127873264 MEM paq8o10t.exe -4 PariahInterface.utx 24375895 -> 3814862 Time 2093.17 sec, 111080362 MEM paq8o9.exe -4 PariahInterface.utx 24375895 -> 3803631 Time 2650.56 sec, 127873242 MEM |
|
|
|
|
|
#7 |
|
Member
Join Date: May 2008
Location: Germany
Posts: 92
|
hi lovepimple
unfortunately F-Secure AntiVirus & AntiSpy detects your implementation "paq8o10tlp" as W32/Suspicious U.gen (virus) the original "paq8o10t" has no such effect. can you tell us , what is the secret of your "implementation"/"compile" which modification have you done? |
|
|
|
|
|
#8 | ||
|
Banned
Join Date: May 2008
Location: EE
Posts: 90
|
|
||
|
|
|
|
|
#9 | |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
F-Secure AntiVirus & AntiSpy detects it as a upack compressed file, but can't decompress the file, so it cheats by reporting "W32/Suspicious U.gen (virus)". ![]() See this thread for more compile info. http://www.encode.ru/forum/showthread.php?t=65 |
|
|
|
|
|
|
#10 |
|
Member
Join Date: May 2008
Location: Germany
Posts: 92
|
thank you lovepimple
if i understand good: You dont know, which compiler-switches are used for the resulting compile ? |
|
|
|
|
|
#11 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
Correct!
![]() |
|
|
|
|
|
#12 |
|
Member
Join Date: May 2008
Location: Germany
Posts: 92
|
hi lovepimple
can you please help me to avoid confusion about compression level and memory usage: without compression: -0 = ???? MB fast compression: -1 = 35 MB, -2 = 48 MB, -3 = 59 MB, -4 = 133 MB standard mode: -5 = 233 MB better compression: -6 = 435 MB, -7 = 837 MB, -8 = 1643 MB, -9 = ???? MB a) is this right ? b) which amount of memory is used for -9 ? c) which amount of memory is used for -0 ? |
|
|
|
|
|
#13 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
a) Yes, but memory usage is slightly increased for this latest version.
b) The -9 option would need about 3290 MB. c) About 0.5 MB. |
|
|
|
|
|
#14 |
|
Expert Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 291
|
AFAIK there is no -9 option.
Anyway, paq8o10t now has the best enwik8 compression for a non-dictionary based program (paq8hp* and durilca* use dictionaries). http://cs.fit.edu/~mmahoney/compression/text.html#1323 http://cs.fit.edu/~mmahoney/compression/#paq I didn't test enwik9. A test would take over 3 days. enwik8 ran overnight (8 hours to compress and decompress). |
|
|
|
|
|
#15 | |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
![]() From the paq8o10t source code:
int main(int argc, char** argv) {
bool pause=argc<=2; // Pause when done?
try {
// Get option
bool doExtract=false; // -d option
if (argc>1 && argv[1][0]=='-' && argv[1][1] && !argv[1][2]) {
if (argv[1][1]>='0' && argv[1][1]<='9')
level=argv[1][1]-'0';
else if (argv[1][1]=='d')
doExtract=true;
else
quit("Valid options are -0 through -9 or -d\n");
--argc;
++argv;
pause=false;
}
|
|
|
|
|
|
|
#16 |
|
Banned
Join Date: May 2008
Location: EE
Posts: 90
|
BTW Matt you got mistake on your webpage in History section.
paq8o10z is by KZ, June 11, 2008. Compression ......... |
|
|
|
|
|
#17 |
|
Member
Join Date: May 2008
Location: Germany
Posts: 92
|
@lovepimple
"That's what I thought until I tried the -9 option and it returned with an 'out of memory' error." microsoft says: win32 - The virtual address space of processes and applications is still limited to 2 GB unless the /3GB switch is used in the Boot.ini file. May be because this the program displays the "out of memory" ? the compression-ratio-result is awesome a oracle-dump with 648.331.264 bytes is compressed (with -7) to 9.714.384 bytes 7zip compresses it (with -mx=9) to 35.151.362 bytes rings 1.5c compresses it to 26.580.783 bytes but paq8o10tlp needs 24 hours 7zip needs 0,5 hour rings 1.5c needs 2 minutes for me at runs on a windows server 2003 with 2x XEON 2,8 GHz and 4 MB in practice i use 7zip because it has full directory-support @kz the resulting compression-ratio is awesome especially i want to remark that this program do not block the whole system - that is wonderfull that means i can work on the system with an other programm with lower requirements at the same time 1. have we any chance to modify the paq8o10t to use two processors or two cores? 2. have we any chance to modify the paq8o10t to compress a complete directory with subdirectories inclusive storing the path and filenames? best regards Joerg |
|
|
|
|
|
#18 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
This build coaxes a little more speed from paq8o10t. I have also removed the obsolete -9 option from this release.
ENJOY! ![]() |
|
|
|
|
|
#19 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
Some timings from my AMD Sempron 2400+ machine:
paq8o10t -4 world95.txt Time 988.53 sec, used 128946859 bytes of memory text 2988578 paq8o10tlp -4 world95.txt Time 923.33 sec, used 128946859 bytes of memory text 2988578 paq8o10tlp2 -4 world95.txt Time 891.58 sec, used 128946859 bytes of memory text 2988578 |
|
|
|
|
|
#20 | |
|
Member
Join Date: Jun 2008
Location: USA
Posts: 104
|
|
|
|
|
|
|
|
#21 | |
|
Member
Join Date: Jun 2008
Location: USA
Posts: 104
|
![]() |
|
|
|
|
|
|
#22 |
|
Tester Join Date: May 2008
Location: Brno, Czechia
Posts: 255
|
Even though main site is down, you can use (usually outdated, but not now) mirror.
Also, I have tested PAQ8o10t - nice speedup.
__________________
I am... ߣαск_ƒo×... my discontinued benchmark, all non-copyrighted files from it can be downloaded here Last edited by Black_Fox : 16th June 2008 at 02:03. |
|
|
|
|
|
#23 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
Thanks BF!
![]() |
|
|
|
|
|
#24 | |
|
Member
Join Date: Jun 2008
Location: USA
Posts: 104
|
BTW, no answer re: my compiler question, LovePimple? Also, you didn't mirror paq8o8z (bah). Anyways, don't forget that Geocities has an hourly bandwidth limit for downloads (5 MB, IIRC). You may wish to get a Google Pages site instead (100 MB vs. wimpy 15 MB storage, anyways). |
|
|
|
|
|
|
#25 | |||
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
![]() |
|||
|
|
|
|
|
#26 | |
|
Expert Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 291
|
Also I ran some tests of durilca4linux_3 v3 with 2 GB. It still beats paq8hp12. |
|
|
|
|
|
|
#27 | |||
|
Member
Join Date: Jun 2008
Location: USA
Posts: 104
|
![]()
|
|||
|
|
|
|
|
#28 | ||
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
![]() |
||
|
|
|
|
|
#29 | |
|
Member
Join Date: Jun 2008
Location: USA
Posts: 104
|
|
|
|
|
|
|
|
#30 |
|
Moderator Join Date: May 2008
Location: Tristan da Cunha
Posts: 611
|
Thanks for the info.
![]() |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|