Compressing (re-writing) videos for network sharing

Profile photo of Rihards Baumanis
Rihards Baumanis
Jul 22, 2019
7 min
Categories: Development
Metal spring
Metal spring

This would be a really cool article if I could replicate something as magical as Richard Hendricks from HBO’s Silicon Valley but I’m just a regular pleb doing software development because I like to.

So, compressing videos — right now the cameras in phones are insanely powerful. 4k videos, insanely high bitrates, quality this, performance that, etc. But do we need that all? Do all those cat and dog videos require 4k resolution? Does your latest Instagram story require to be the highest quality 4 seconds of whatever Instagram stories tend to be about? Well, maybe, but I prefer to think that saving money for storage costs, network costs and the speed at which you can send and/or receive videos is a more entertaining topic. For me at least.

The problem

To give a brief overview of what drove me to dive into playing around with video compression on mobile platforms:

#1 Recording videos in good / best quality

Pros: Looks good all the way;

Cons: Huge sizes, huge storage costs, the slow network connection will make you delete the app faster than a snap of your fingers.

#2 Recording videos in bad quality

Pros: Small size, good speed all round

Cons: No one will look at 240p video even on mobile. Swiped left.

Is there a middle ground? Well, maybe, but I didn’t manage to find it.

https://miro.medium.com/max/1400/0*J884_vZUlQvTqN0q.jpeg

The solution

Middle-out compression algorithm? No. Just regular iOS stuff to compress videos with a really good size-to-quality ratio.

Let’s assume that the platform we’re building is mobile-based. So, users have videos from wherever — iOS phones, GoPros, Android phones, etc — and they want to share them through their phones to other users.

So we want to take the video, make it as eye-pleasing as possible on mobile devices and have it take as little space and network bandwidth as possible.

Easy, right?

Not really, but doable.

Disclaimer: I’m aware that it’s also possible to somehow exclude the step of recording a video and then compressing it, but I’m too lazy to obtain a PhD to work with AVAssetWriter, Reader, and all their cousins. So, let’s keep things relatively civil.

Code

I will be posting a demo project at the end of this article, so if you want to skip my courageous attempt at explaining this and that, go straight to the bottom.

The demo will contain a smidge of RxSwift code and some view handlers to make the demo usable, but that's not the focus here. The focus here is Compressor. Well, technically, it's not compression, it's more something along the lines of re-writing the original to a more friendly video to share with your server and other devices after the recording has been done in very high quality.

The two main players will be AVAssetReader and AVAssetWriter.

So, let's start by creating a simple class called Compressor and give it a couple of helper functions.

import UIKit import RxSwift import RxCocoa import AVFoundation final class Compressor { private let bag = DisposeBag() let videoCompressionProgress = BehaviorRelay<Double>(value: 0) let audioCompressionProgress = BehaviorRelay<Double>(value: 0) let compressionProgress = BehaviorRelay<Double>(value: 0) var assetWriter: AVAssetWriter? var assetReader: AVAssetReader? private static let videoQueue = "videoQueue" private static let audioQueue = "audioQueue" enum CompressionResult { case failed case success } init() { addHandlers() } private func addHandlers() { } private func getVideoSettings(for track: AVAssetTrack) -> [String: Any] { } private var getVideoReaderSettings: [String: Any] { } private func setupReader(for asset: AVAsset) -> AVAssetReader? { } private func setupWriter(to url: URL, with videoInput: AVAssetWriterInput, and audioInput: AVAssetWriterInput) -> AVAssetWriter? { } func compressFile(urlToCompress: URL, outputURL: URL, completion: @escaping ((CompressionResult) -> Void)) { } }

As I said, the code will contain a smidge of RxSwift, because I have come to quite enjoy it.

So, to start we have a private DisposeBag. Just a reactive resource management bag.

After that we have 2 variables — one will contain the progress of writing the audio track from the old file to the new file and the other one for video track.

Then a third variable, that will be a combination of both — to indicate the total progress of the 'compression'.

Then we have our two main workhorses — reader and writer. Essentially their names imply their tasks — one will read data from the original file and the other will write the read values to the new file.

Then the queue names — as it turns out, video and audio tracks are read on separate queues, therefore we need two. We'll get to the actual queues later.

We'll start filling up the functions step by step and I'll try to explain what goes where.

private func setupReader(for asset: AVAsset) -> AVAssetReader? { assetReader = try? AVAssetReader(asset: asset) return assetReader } private func setupWriter(to url: URL, with videoInput: AVAssetWriterInput, and audioInput: AVAssetWriterInput) -> AVAssetWriter? { assetWriter = try? AVAssetWriter(outputURL: url, fileType: AVFileType.mov) assetWriter?.shouldOptimizeForNetworkUse = true assetWriter?.add(videoInput) assetWriter?.add(audioInput) return assetWriter }

This is how you set up the reader and writer.

The reader is very simple, you just initialize it with an AVAsset object — a video.

Then the Writer, however, requires a bit more setup. It should receive an object of type AVAssetWriterInput for both video and audio elements that are going to be written. We'll be creating them shortly after. As well as set the shouldOptimizeForNetworkUse to true since the goal here is to have videos that can be shared online.

Now, let's start setting up the compression function.

var audioFinished = false var videoFinished = false let asset = AVAsset(url: urlToCompress) let duration = asset.duration let durationTime = CMTimeGetSeconds(duration) guard let reader = setupReader(for: asset), let videoTrack = asset.tracks(withMediaType: .video).first, let audioTrack = asset.tracks(withMediaType: .audio).first else { completion(.failed) return } let assetReaderVideoOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: getVideoReaderSettings) let assetReaderAudioOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: nil) assetReaderVideoOutput.alwaysCopiesSampleData = false assetReaderAudioOutput.alwaysCopiesSampleData = false for output in [assetReaderVideoOutput, assetReaderAudioOutput] { guard reader.canAdd(output) else { completion(.failed) return } reader.add(output) }

This is the start of the compression function.

What do we do here is set up 2 booleans at the start to indicate the completion of reading/writing for each of the queues. Both queues work independently, therefore, it's necessary to monitor them separately.

After that, we create the AVAsset from the original URL and get its duration.

Then we create the reader and get video and audio tracks from the asset.

Disclaimer: Yes, I'm aware that there can be multiple video and audio tracks. As well as there are videos possible that can have an empty (non-existent) audio track. But let's assume that we're recording videos with sound.

Then we create the outputs from the previously created tracks.

For video output settings we have a simple dictionary that looks like this and has the pixel settings setup.

private var getVideoReaderSettings: [String: Any] { return [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32ARGB ] }

Nothing particularly interesting about this. But only because my level understanding of the reading settings is not the best. Anyway.

Both of the outputs have alwaysCopiesSampleData marked as 'true'. During my tests, I've experienced that this has a slight performance boost.

And then we just add the created outputs to the reader.

After this, we move on to launching the read/write process.

let audioInput = AVAssetWriterInput(mediaType: .audio, outputSettings: nil) let videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: getVideoSettings(for: videoTrack)) videoInput.transform = videoTrack.preferredTransform let videoInputQueue = DispatchQueue(label: Compressor.videoQueue) let audioInputQueue = DispatchQueue(label: Compressor.audioQueue) guard let writer = setupWriter(to: outputURL, with: videoInput, and: audioInput) else { completion(.failed) return } writer.startWriting() reader.startReading() writer.startSession(atSourceTime: CMTime.zero)

Now we create the inputs. These will be the things that make up our 'compressed' video.

Audio input is straight forward, we don't do any modification there, but the real magic is by setting up the output settings of the video writer input.

private func getVideoSettings(for track: AVAssetTrack) -> [String: Any] { let bitrate = min(track.estimatedDataRate, 3500000) let horizontalSize = CGSize(width: 1280, height: 720) let verticalSize = CGSize(width: 720, height: 1280) let squareSize = CGSize(width: 720, height: 720) let newSize: CGSize let expectedLargest: CGFloat = 1280 let size = track.naturalSize.applying(track.preferredTransform) if expectedLargest < (max(abs(size.width), abs(size.height))) { if size.height > size.width { newSize = size.width < 0 ? horizontalSize : verticalSize } else if size.width > size.height { newSize = size.height < 0 ? verticalSize : horizontalSize } else { newSize = squareSize } } else { newSize = track.naturalSize } return [ AVVideoCodecKey: AVVideoCodecType.h264, AVVideoCompressionPropertiesKey: [AVVideoAverageBitRateKey: bitrate], AVVideoHeightKey: newSize.height, AVVideoWidthKey: newSize.width ] }

So now in these settings, we set up the new video bitrate, width, height and codec.

Codec is very straight forward, we just use the .h264 for video.

Bitrate is also relatively straight forward. Bitrate is essentially a value that has a direct impact on the size and quality of the video. Higher bitrates give a smooth quality video where lower values make the video choppy and pixelated. Experiment with this value to see the impact.

I've found out that a value of 3500000 bits makes up a pretty decent video with relatively small quality loss.

But as the last part, we should also reduce the video size. 1280x720 is a perfectly good resolution to view on mobile devices but to properly set it up, there is a problem.

After applying the preferred transformation of the video track to the tracks natural size, something weird happens. I often managed to get a size value that was negative in width or height.

https://miro.medium.com/max/1400/0*aFDTa-yjyyajHFzN.jpg

I've never seen it happen before and maybe it's my mistake somewhere, but to my fix for this was to manually check the size and its properties and then set the expected size and width to the video as seen in the code sample above.

After initialising the queues it's time to fire this puppy up.

let closeWriter: () -> Void = { [weak self] in guard audioFinished && videoFinished else { return } self?.assetWriter?.finishWriting(completionHandler: { [weak self] in self?.assetReader?.cancelReading() self?.assetReader = nil self?.assetWriter = nil completion(.success) }) } audioInput.requestMediaDataWhenReady(on: audioInputQueue) { [weak self] in while audioInput.isReadyForMoreMediaData { guard let sample = assetReaderAudioOutput.copyNextSampleBuffer() else { guard self?.assetWriter != nil, self?.assetWriter?.inputs.contains(audioInput) == true else { return } audioInput.markAsFinished() audioFinished = true closeWriter() break } let timeStamp = CMSampleBufferGetPresentationTimeStamp(sample) let timeSecond = CMTimeGetSeconds(timeStamp) let per = timeSecond / durationTime self?.audioCompressionProgress.accept(per) audioInput.append(sample) } } videoInput.requestMediaDataWhenReady(on: videoInputQueue) { [weak self] in while videoInput.isReadyForMoreMediaData { guard let sample = assetReaderVideoOutput.copyNextSampleBuffer() else { guard self?.assetWriter != nil, self?.assetWriter?.inputs.contains(audioInput) == true else { return } videoInput.markAsFinished() videoFinished = true closeWriter() break } let timeStamp = CMSampleBufferGetPresentationTimeStamp(sample) let timeSecond = CMTimeGetSeconds(timeStamp) let per = timeSecond / durationTime self?.videoCompressionProgress.accept(per) videoInput.append(sample) } }

3 blocks — two that work on their separate queues by requesting data from their AVAssetWriterInput objects and one completion block that gets triggered from both of them to finalize the process when ready.

Summary

  1. Reduce bitrate (if possible);
  2. Reduce video frame size (if possible);
  3. Optimize for web use;
  4. TODO (Optimise audio re-writing)

This isn't a compression algorithm, it just rewrites the video file in a more network transfer-friendly way. But in the end, we get a really small loss on performance and quite a high gain in video size. (Somewhere between 50 and 70 per cent (depends on the original video complexity)

https://miro.medium.com/max/1266/0*diOG06ShyjaJznA0.jpg

Feel free to check out the Github project and see what you think. Personally, it was difficult to find something that works like this and has results like this so I hope this helps someone out.

Thanks for reading & happy coding.

—R

https://miro.medium.com/max/300/1*hB1OofoTpqI6wm43oKjH8A.png

Receive a new Mobile development related stories first. — Hit that follow button

Twitter: @ChiliLabs

www.chililabs.io

Share with friends

Let’s build products together!

Contact us