Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage

LZ4 Compression Algorithm Gets Multi-Threaded Update (linuxiac.com) 44

Slashdot reader Seven Spirals brings news about the lossless compression algorithm LZ4: The already wonderful performance of the LZ4 compressor just got better with multi-threaded additions to it's codebase. In many cases, LZ4 can compress data faster than it can be written to disk giving this particular compressor some very special applications. The Linux kernel as well as filesystems like ZFS use LZ4 compression extensively. This makes LZ4 more comparable to the Zstd compression algorithm, which has had multi-threaded performance for a while, but cannot match the LZ4 compressor for speed, though it has some direct LZ4.
From Linuxiac.com: - On Windows 11, using an Intel 7840HS CPU, compression time has improved from 13.4 seconds to just 1.8 seconds — a 7.4 times speed increase.
- macOS users with the M1 Pro chip will see a reduction from 16.6 seconds to 2.55 seconds, a 6.5 times faster performance.
- For Linux users on an i7-9700k, the compression time has been reduced from 16.2 seconds to 3.05 seconds, achieving a 5.4 times speed boost...

The release supports lesser-known architectures such as LoongArch, RISC-V, and others, ensuring LZ4's portability across various platforms.

This discussion has been archived. No new comments can be posted.

LZ4 Compression Algorithm Gets Multi-Threaded Update

Comments Filter:
  • its, not it's (Score:4, Informative)

    by Anonymous Coward on Saturday July 27, 2024 @08:08PM (#64660754)

    its codebase

    FTFY. It's just like yours, mine, theirs, etc. No apostrophe needed!

    • Who's code base? Apostrophes are correctly used to indicate possession.

      • In this case, you're saying, "Who is code base?" The form you're looking for is, "Whose code base?"
      • by cas2000 ( 148703 )

        > Apostrophes are correctly used to indicate possession.

        Sometimes yes, but not always.

        Apostrophes are used to indicate both contraction and possession - in that order of precedence.

        Example contractions:

        "it's" = "it is"
        "who's" = "who is"

        Example possessions:

        "its" indicates possession by "it"
        "whose" indicates possession by "who" (a previously mentioned owner)

        Using an apostrophe to indicate possession in these cases is an INCORRECT use of the apostrophe.

        • Do we really need to teach everybody on the internet the basics of grammar, one by one? Surely there must be a more efficient way than this!? Can't they go watch a YouTube or something?
  • TL;DR (Score:5, Funny)

    by Unnamed Chickenheart ( 882453 ) on Saturday July 27, 2024 @08:37PM (#64660788)

    Article too long. Can someone compress it for me?

    • ChatGPT, summarize the article and translate into Klingon.
      • Re: TL;DR (Score:5, Insightful)

        by crackerjack155 ( 1328815 ) on Saturday July 27, 2024 @09:54PM (#64660894)

        LZ4 to Daqtagh bejqu' muDaq Hoch paQDI'norgh
        I-ngaDHa'ghach yaH
        Slashdot qel Seven Spirals LZ4 Daqtagh ja':
        LZ4 mIvwa' bejqu' moHaq HIvje' vo' multi-Threaded che'laHghach pe'vIl 'IH je'

        'ejchugh, LZ4 DaqlIj bejqu' qaSpu'DI' vuDwI' qImHa'. wa'DIch Qapbe' 'oH bejqu' Haq. Linux qawHaq je' Qo'noSmey lo'chu' 'IH ZFS bejqu' LZ4 je' 'op vo' qo' mIwvetlh Daqtagh bIt.

        vo' Linuxiac.com:
        - Windows 11, Intel 7840HS CPU, bortaS 13.4 lup je' neH 1.8 lup che' 7.4 logh qet
        - macOS M1 Pro chip, lupvo' 16.6 neH 2.55 lup che' 6.5 logh puS
        - Linux lo', i7-9700k, lupvo' 16.2 neH 3.05 lup che' 5.4 logh puS...

        naDev lo'Ha'ghach yaHwIj lo' LaongArch, RISC-V je', Qapla' Daqtagh muDaq Hoch tlhoy'meH.

    • I think I got an old copy of ARJ sitting around here somewhere. Gotta minute to wait?
    • by ls671 ( 1122017 )

      I assume it's a copy cat of pigz, pigz vs gz is same as new LZ4 vs old I assume.
      https://linuxhandbook.com/pigz... [linuxhandbook.com]

      Be aware of default values and manage your threads and IO usage. /s

  • by Gavino ( 560149 ) on Saturday July 27, 2024 @09:25PM (#64660842)
    what does that even mean?
    • by sl3xd ( 111641 )

      I'm going to bet it's that: zstd has the capability to create/use lz4 (and gzip, xz, and lzma) archives. (zstd --format=lz4 foo.lz4 file1.txt file2.txt ...)

      No love to bz2 or brotli, though.

  • Loonarch? Don't you mean pirated MIPS64?

    • by sl3xd ( 111641 )

      I'm not sure I'd call it pirated, given the MIPS64 architecture owner open-sourced it (and released MIPS r6) in 2019. At least, probably not anymore (chronology probably matters, but...)

      Even the IP owner of MIPS has moved on to RISC-V.

      • by kriston ( 7886 )

        Did those MIPS patents expire yet or did the owners release them to the public domain?

  • by ElrondHubbard ( 13672 ) on Saturday July 27, 2024 @09:51PM (#64660882)
    A speed increase ranging from 5.4x to 7.4x would sound impressive, if only they said just what kind and quantity of data was being compressed. Was it English text? If so, how many characters? Was it a bitmap file? PCM audio? Whatwhatwhat? It says LZ4 prioritizes speed over compression, but I didn't learn much about the compression ratio or any time/ratio trade-off, either. How can anyone judge based on what's in this article?
    • by Dwedit ( 232252 ) on Saturday July 27, 2024 @10:48PM (#64660972) Homepage

      It is looking exclusively for backreferences, or data which has previously appeared and has been repeated. It does not do any entropy or huffman encoding, does not do any audio sample or pixel prediction, or anything like that. It's backreferences only. LZ4 has a maximum distance of 64KB for its backreferences.

    • by thegarbz ( 1787294 ) on Sunday July 28, 2024 @01:56AM (#64661098)

      The kind of data being compressed isn't relevant for speed due to the simplicity of the compressor. It is designed for on the fly compression and decompression so it's actual ability to compress data is very limited.

      The compression ratio is rarely above two even for highly compressible content such as text. Just for example using LZ4 to compress the contents of its own README.md file:
      Compressed 3058 bytes into 1769 bytes ==> 57.85% (compared to 1283 bytes for zip)

      vs compressing it's own exe file:
      Compressed 882789 bytes into 462399 bytes ==> 52.38% (compared to 309687 bytes for zip)

      You're not using LZ4 if you want good compression, you are using LZ4 if you want compression which doesn't degrade performance during I/O activities, such as disk compression, RAM compression, etc.

      • If you think it's so simple why dont you write an improved version. 50% compression is very good for a general purpose lossless compressor. If you want high compression you need data specific algorithms as one size does not fit all and simple doesnt preclude a high compression ratio, eg run length encoding can compress sparse data down to almost nothing.

        • If you think it's so simple why dont you write an improved version.

          What the fuck are you talking about. Where did I say it was simple or that one size fits all (I specifically said the opposite). If you have voices in your head, please talk directly to them rather than posting pointless non-arguments on Slashdot.

        • 50% compression is very good for a general purpose lossless compressor.

          You can't make blanket statements like that without citing the data set being used because the compression factor depends on the input. Some data is entirely uncompressible (literally 0%). Some data is highly compressible (99.99999...%). Just depends.

          In this case, we can safely say that 50% is not that impressive, however, for the simple reason that the previous poster shared the performance of a standard zip (i.e. likely DEFLATE, a general purpose, lossless approach) and the fact that it significantly outp

          • by Viol8 ( 599362 )

            Web pages contain reams of HTML language tokens. Easy to compress those down to a few bits each.

            • by Dwedit ( 232252 )

              Brotli is specifically made for web content like HTML. Just look at the preset dictionary and you see not only words, but also lots of likely code fragments for HTML, XML, CSS, and JavaScript.

            • Yes. I know. Hence why I mentioned it.

    • by ls671 ( 1122017 )

      It's most use cases since this is multi-threaded version of the compression algorithm.

      I posted above:
      https://hardware.slashdot.org/... [slashdot.org]

      Use case example:
      I have a machine with 2 physical cpu. Each cpu has 24 cores for a total of 48 threads that can really physically run at the same time on the machine.

      So in theory, any multi-threaded compression could be 48 times faster by splitting the work between all threads available compared to its single-threaded version which would never use more than one core.

      But it's

    • Most importantly, is it 3D video?
    • by edwdig ( 47888 )

      LZ4 is one of the algorithms that was common right before Zip mostly took over for general purpose compression. It's pretty similar to zip (deflate), good and bad on the same types of data zip is. LZ4 compressed files are a little bigger than zip/deflate compressed files, but they're faster to compress and decompress.

      LZ4 was really common on the older cartridge based systems. Good enough compression, but the faster decompression speed than zip was beneficial on those CPUs.

  • by Aryeh Goretsky ( 129230 ) on Saturday July 27, 2024 @10:47PM (#64660968) Homepage

    Hello,

    I was unfamiliar with the Intel 7840HS CPU mentioned in the article, and figured it was either some model for embedded systems, servers or other computers not generally used by the public.

    One quick search later, and I found out is an AMD CPU for laptops, specifically the AMD Ryzen 7 7840HS. Here are the specs for it: https://www.amd.com/en/product... [amd.com].

    The changelog for the LZ4 release gives more information about the speed improvements: https://github.com/lz4/lz4/rel... [github.com]. It does not mention the manufacturers of the CPUs used in benchmarking, which is probably why it was misidentified in the article.

    Regards,

    Aryeh Goretsky

    • Yeah, I run a Ryzen 7840HS on my main pc, and a Ryzen 7840U on my handheld (ROG Ally), so I chuckled immediately when I read "Intel 7840HS". It's quite clear that editors just slap "Intel" on any x86-64 architecture cpu. The funny thing is that x86 used to be Intel, and all other x86 cpus were "Intel-compatible". But x86-64 is AMD (Linux even calls it AMD64 architecture), so all other x86-64 cpus are "AMD-compatible" now.

      • Classic move by AMD to upstage Intel with a backwards-compatible x86 64-bit processor. Intel begrudgingly adopted it as their standard, too. It must have been humiliating for Intel especially after the massive failure of the 64-bit Itanium.

"Why should we subsidize intellectual curiosity?" -Ronald Reagan

Working...