AfterDawn | News | Guides | Software downloads | Tech Support | Forums | HIGH.FI
AfterDawn

Version history for x264 full

<<Back to software description

Changes for rev. 744 - rev. 745

  • pic macros now keep track of which register holds the GOT, so variable access doesn't have to care



Changes for rev. 735 - rev. 736

  • intra_rd_refine in B-frames



Changes for rev. 721 - rev. 735

  • print average of macroblock QPs instead of frame's nominal QP



Changes for rev. 720 - rev. 721

  • change the meaning of --ref: it now selects DPB size (including B-frames), rather than L0 size (which B-frames are added to)



Changes for rev. 719 - rev. 720

  • add / fix support for FreeBSD, based on a patch by Igor Mozolevsky % igor A hybrid-lab P co P uk %



Changes for rev. 718 - rev. 719

  • shut up some valgrind warnings



Changes for rev. 715 - rev. 717

  • round esa range to a multiple of 4
  • convert absolute difference of sums from mmx to sse2
  • convert mv bits cost and ads threshold from C to sse2
  • convert bytemask-to-list from C to scalar asm
  • 1.6x faster me=esa (x86_64) or 1.3x faster (x86_32). (times consider only motion estimation. overall encode speedup may vary.)



Changes for rev. 714 - rev. 715

  • use define _WIN32 instead of __WIN32__ or WIN32 defines.
  • NSDN reference: http://msdn2.microsoft.com/en-us/library/b0084kay(VS.80).aspx
  • Patch by BugMaster %BugMaster A narod P ru%
  • Original thread:
  • date: Dec 27, 2007 3:18 AM
  • subject: [x264-devel] VS2008 compilation error (need of replacement __WIN32__ with _WIN32)



Changes for rev. 713 - rev. 714

  • tweak x264_pixel_sad_x4_16x16_sse2 horizontal sum. 168 -> 166 cycles on core2.



Changes for rev. 711 - rev. 712

  • also test arch-specific x264_zigzag_* implementations in checkasm.c



Changes for rev. 709 - rev. 711

  • adds AliVec implementation of predict_16x16_p()
  • over 4x faster than C version
  • Add AltiVec implementation of
  • x264_zigzag_scan_4x4_frame_altivec()
  • x264_zigzag_scan_4x4ac_frame_altivec()
  • x264_zigzag_scan_4x4_field_altivec()
  • x264_zigzag_scan_4x4ac_field_altivec()
  • each around 1.3 tp 1.8x faster than C version
  • Patch by Noboru Asai % noboru P asai A gmail P com%



Changes for rev. 708 - rev. 709

  • revert the x86_32 part of r708. elf shared libraries aren't important enough to be worth the extra lines of code to check for nasm.



Changes for rev. 704 - rev. 708

  • faster removal of duplicate mv predictors
  • reduce the data type used in some tables. 16KB smaller exe.
  • check whether ld supports -Bsymbolic before using it
  • mark asm functions as hidden



Changes for rev. 702 - rev. 704

  • avoid a division in umh.
  • patch by Dark Shikari.
  • avoid a division in x264_mb_predict_mv_ref16x16.
  • patch by Dark Shikari.



Changes for rev. 701 - rev. 702

  • fix a memleak in h->mb.mvr



Changes for rev. 700 - rev. 701

  • fix compilation as a shared library on x86_64 (regression in r696)



Changes for rev. 699 - rev. 700

  • add support for x86_64 on Darwin9.0 (Mac OS X 10.5, aka Leopard)
  • Patch by Antoine Gerschenfeld %gerschen A clipper P ens P fr%



Changes for rev. 697 - rev. 699

  • Add AltiVec implementation of x264_pixel_ssd_8x8, 3x faster than C version
  • Overall speed-up: 0.7% with --bframes 3 --ref 5 -m 7 --b-rdo
  • Patch by Noboru Asai %noboru P asai A gmail P com%
  • cover some more options in fprofile. (esa, bime, cqm, nr, no-dct-decimate, trellis2)
  • previously, esa was slower with fprofile than without, since gcc thought it wasn't important. now esa benefits like anything else.



Changes for rev. 696 - rev. 697

  • limit mvs to [-512,511.75] instead of [-512,512]



Changes for rev. 690 - rev. 692

  • Add AltiVec implementation ofx264_hpel_filter. Provides a 10-11% overall speed-up with default encoding options
  • Patch by Noboru Asai %noboru P asai A gmail P com%
  • add AltiVec implementation of ssim_4x4x2_core, about 4x faster than C version.
  • Overall: 0.1-0.2% faster with default encoding settings
  • Patch by Noboru Asai %noboru P asai A gmail P com%



Changes for rev. 689 - rev. 690

  • cosmetics in dsp function selection



Changes for rev. 687 - rev. 688

  • cosmetics: use symbolic constants for frame padding radius



Changes for rev. 685 - rev. 686

  • cosmetics: use separate variables for frame width and stride



Changes for rev. 683 - rev. 685

  • Add AltiVec implementation of add4x4_idct, add8x8_idct, add16x16_idct, 3.2x faster on average
  • 1.05x faster overall with default encoding options
  • add AltiVec implementation of dequant_4x4 and dequant_8x8, 2.8x faster than C, 1.01x faster than previous revision with default encoding options
  • Patch by Noboru Asai % noboru DD asai AA gmail DD com %



Changes for rev. 682 - rev. 683

  • Add AltiVec implementation of quant_2x2_dc,
  • fix Altivec implementation of quant_(4x4|8x8)(|_dc) wrt current C implementation
  • Patch by Noboru Asai % noboru DD asai AA gmail DD com %



Changes for rev. 681 - rev. 682

  • fix a possible nondeterminism with me=umh + threads.



Changes for rev. 678 - rev. 680

  • don't overwrite pthread* namespace, because system headers might define those functions even if we don't want them
  • port sad_*_x3_sse2 to x86_64



Changes for rev. 676 - rev. 678

  • fix an arithmetic overflow in trellis at high qp.
  • faster 4x4 sad



Changes for rev. 675 - rev. 676

  • implement multithreaded me=esa



Changes for rev. 674 - rev. 675

  • fix some integer overflows. now vbv size can exceed 2 Gbit.



Changes for rev. 673 - rev. 674

  • allow --vbv-init to take absolute values (in kbit), in addition to the previous fractions of vbv-bufsize.



Changes for rev. 672 - rev. 673

  • remove a bashism



Changes for rev. 671 - rev. 672

  • reorder headers so that largefile support is defined before the first copy of stdio



Changes for rev. 670 - rev. 671

  • regression in r669: broke saving of configure args if make has to re-run configure



Changes for rev. 669 - rev. 670

  • regression in r669: --enable-shared should imply --enable-pic on some archs.



Changes for rev. 667 - rev. 669

  • Update config.guess.
  • Add a --host flag to allow overriding config.guess; this is particularly useful with a 64-bits kernel running a 32-bits userland to build 32-bits apps.
  • Normalize any host triplet into a quadruplet via config.sub.
  • Move option parsing before any use of architecture information.



Changes for rev. 666 - rev. 667

  • mingw doesn't have strtok_r



Changes for rev. 663 - rev. 664

  • cosmetics



Changes for rev. 662 - rev. 663

  • limit vertical motion vectors to +/-512, since some decoders actually depend on that limit.



Changes for rev. 661 - rev. 662

  • Add vertical and horizontal luma deblocking accelerated with Altivec,
  • based on Graham Booker's code written for FFmpeg with slight modifications
  • to re-use x264's macros



Changes for rev. 660 - rev. 661

  • cosmetics in cpu detection



Changes for rev. 658 - rev. 659

  • exempt 1080p from the non-mod16 warning.



Changes for rev. 657 - rev. 658

  • allow compiling without yasm/nasm on x86 and x86-64 platforms



Changes for rev. 655 - rev. 656

  • replace alloca with malloc everywhere. per manpage, use of alloca is discouraged. this may have a minor effect on the speed of ssim and esa, but that appears too small to measure.



Changes for rev. 654 - rev. 655

  • require a ratecontrol method to be specified, it no longer defaults to cqp=26.



Changes for rev. 653 - rev. 654

  • fix nnz computation in cavlc+8x8dct+deblock. (regression in r607)



Changes for rev. 651 - rev. 652

  • c89 compile fix



Changes for rev. 650 - rev. 651

  • cabac: use bytestream instead of bitstream.
  • 35% faster cabac, 20% faster overall lossless, ~1% faster overall at normal bitrates.



Changes for rev. 649 - rev. 650

  • remove the restriction on number of threads as a function of resolution (it was wrong anyway in the presence of B-frames), and raise the max number of threads in general (though more will have to be done before it can really scale to lots of cores).



Changes for rev. 648 - rev. 649

  • tweak ssse3 quant



Changes for rev. 645 - rev. 648

  • workaround gcc's inability to align variables on the stack.
  • this crash was introduced in r642, but only because previous versions didn't use sse2 on the stack.
  • faster cabac rdo. up to 10% faster at q0, but negligible at normal bitrates.
  • change some tables from int to int8_t. 13KB smaller executable.



Changes for rev. 644 - rev. 645

  • 32bit version of ssse3 satd.
  • switch default assembler to yasm. it will still fallback to nasm if you don't have yasm.



Changes for rev. 639 - rev. 644

  • simplify trellis
  • fix an arithmetic overflow in trellis with QP >= 42
  • 2x faster quant. 2% overall.
  • side effects:
  • not bit-identical to the previous algorithm.
  • while the new algorithm covers a wider range of cqms than the previous one did,
  • I couldn't find a good way to fallback to a general version for the extreme
  • cqms. so now it refuses to encode extreme cqms instead of just being slower.
  • lays a framework for custom deadzone matrices, though I didn't add an api.
  • when encoding with a cqm, probe_skip now also uses the cqm, instead of the flat matrix
  • cosmetics in asm macros



Changes for rev. 638 - rev. 639

  • use only c-style comments in public header (patch by Vincent Torres)



Changes for rev. 636 - rev. 638

  • Compile fix
  • in hpel search, merge two 16x16 mc calls into one 16x17. 15% faster hpel, .3% overall.



Changes for rev. 635 - rev. 636

  • remove private stuff from public headers. no more need for -D__X264__



Changes for rev. 634 - rev. 635

  • adjust bitstream buffer sizes for very large frames



Changes for rev. 628 - rev. 634

  • conflate HAVE_MMXEXT with HAVE_SSE2, since they were never used distinctly.
  • Made -DNEED_ALTIVEC unnecessary, thanks to Guillaume Poirier.
  • check x264_cpu_detect() before calling AltiVec functions.
  • ssse3 detection. x86_64 ssse3 satd and quant.
  • requires yasm >= 0.6.0
  • Use -maltivec when building dependencies, or cannot be used.
  • Do not declare vectors in non-AltiVec files.
  • common/cpu.c: runtime AltiVec autodetection on Linux.
  • configure, Makefile: do not build the whole project with -maltivec because it generates AltiVec code in weird places.



Changes for rev. 627 - rev. 628

  • fix a small memleak.
  • patch by Limin Wang.



Changes for rev. 626 - rev. 627

  • compile fix for GCC-3.3 on OSX, based on a patch by
  • Patrice Bensoussan % patrice P bensoussan A free P fr%
  • Note: regression test still do not pass with GCC-3.3,
  • but they never did as far as I can remember.



Changes for rev. 623 - rev. 624

  • add ability to generate doxygen documentation; make dox



Changes for rev. 622 - rev. 623

  • oops, scenecut detection failed to activate when using threads and not using B-frames



Changes for rev. 621 - rev. 622

  • extras/getopt.c was BSD licensed. replace with a LGPL version (from glibc).



Changes for rev. 620 - rev. 621

  • Fix build issues on Linux. Only gcc-4.x is supported, as on OSX.
  • Cleans up a few inconsistencies in the code too.



Changes for rev. 619 - rev. 620

  • tweak block_residual_write_cavlc.
  • up to 1% faster lossless, no difference at normal bitrates.



Changes for rev. 618 - rev. 619

  • don't assume int is exactly 4 bytes



Changes for rev. 617 - rev. 618

  • make array_non_zero() compatible with -fstrict-aliasing



Changes for rev. 616 - rev. 617

  • Honor CFLAGS and LDFLAGS set by the user



Changes for rev. 615 - rev. 616

  • Check whether 'echo -n' works, otherwise try printf (fixes build on current OS X 10.5)



Changes for rev. 614 - full rev. 614

  • Check version of nasm on OS X / Intel



Changes for rev. 613 - rev. 614

  • wrong reference frames were used with refs>=14 + pyramid (regression in r607)



Changes for rev. 612 - rev. 613

  • enable thread synchronization primitives on linux too



Changes for rev. 611 - rev. 612

  • fix a crash with x264_encoder_headers() + threads



Changes for rev. 610 - rev. 611

  • don't skip autodection on configure --enable-pthread



Changes for rev. 605 - rev. 606

  • Do not assume anything about sizeof(cpu_set_t).



Changes for rev. 603 - rev. 604

  • Add Altivec implementations of add8x8_idct8, add16x16_idct8, sa8d_8x8 and sa8d_16x16
  • Note: doesn't take advantage of some possible aligned memory accesses, so there's still room for improvement



Changes for rev. 600 - rev. 601

  • Merges Guillaume Poirier's AltiVec changes:
  • * Adds optimized quant and sub*dct8 routines
  • * Faster sub*dct routines
  • ~8% overall speed-up with default settings



<<Back to software description