Jekyll2022-03-03T21:13:29+00:00https://flight.school/feed.xmlFlight SchoolFounded in 2018,
Flight School is an independently-published book series
for advanced Swift developers interested in
taking their skills to the next level.
Our mission is to write the kinds of programming books
we wish we had when we were first starting out ---
material that connects the computer science theory
with practical insights from experience working in the software industry.
...all of that, plus a sprinkle of code samples and trivia
from the world of aviation.
It's an unconventional combination, but we think you're gonna love it. đ
Training a Text Classifier with Create ML and the Natural Language Framework2018-06-13T00:00:00+00:002018-06-13T00:00:00+00:00https://flight.school/articles/classifying-programming-languages-with-createml<p>Machine Learning can be difficult to get your head around as a programmer.
But aside from all the advanced mathematics and tooling,
perhaps most difficult of all is learning how to let go.</p>
<p>Many of us have spent years
honing our craft of writing code:
expressive, type-safe, unit-tested, refactored, clean, DRY code.
So itâs tough to hear that
pretty much anyone with enough data (and patience)
can use machine learning to solve hard problems
without understanding why or how it works.</p>
<p>But this isnât news to you.
In fact, ML is at the top of your list of things to learn next!
You have <a href="https://www.coursera.org/learn/machine-learning" rel="noopener noreferrer">Andrew Ngâs course</a>
bookmarked in Safari
and a load of unread PDFs littering your Downloads folder.
All you need is a free weekend andâŚ</p>
<p>If that strikes a bit close to home for you
then youâll love
<a href="https://developer.apple.com/documentation/create_ml" rel="noopener noreferrer">CreateML</a>.</p>
<p>Create ML is a new framework
that makes it easy to train machine learning models.
How easy?
Drag pictures of dogs into a âDogsâ folder
and pictures of cats into a âCatsâ folder,
write a few lines of Swift code,
wait a couple minutes to train the model
and boom:
you have a image classifier that can tell you whether
a picture contains a cat or a dog.
You can even do this for text classification
or regression for data tables.</p>
<p>If you havenât already,
go ahead and watch the
<a href="https://developer.apple.com/videos/play/wwdc2018/703/" rel="noopener noreferrer">Introducing Create ML</a>
session from this yearâs WWDC.
Itâs like magic.</p>
<p>Pulling files from labeled directories makes for a nice demo,
but what if your data set isnât so nicely organized?
This article shows how you can use Create ML to train a text classifier
that predicts the programming language of unknown source code
by manually creating the corpus from a heterogeneous data set.</p>
<blockquote>
<p>You can find the complete training script and a demo playground
<a href="https://github.com/Flight-School/Programming-Language-Classifier" rel="noopener noreferrer">here</a>.</p>
</blockquote>
<hr>
<h2>
<a class="anchor" aria-hidden="true" id="acquiring-a-corpus-of-data" href="#acquiring-a-corpus-of-data">#</a>Acquiring a Corpus of Data</h2>
<p>First things first:
we need some examples of source code.</p>
<p>If youâre anything like the author,
you might have thought to use
<a href="https://github.com/search?o=desc&q=if+language%3Aswift&s=indexed&type=Code" rel="noopener noreferrer">GitHub search</a>
to find code by language.
And if so, youâd eventually realize that the both the
<a href="https://developer.github.com" rel="noopener noreferrer">GitHub API</a> (both the REST and GraphQL versions)
requires code search results to be scoped by user or project.
At that point,
youâd probably wonder why you decided you needed to make this yourself
before finding something off-the-shelf,
like
<a href="https://github.com/source-foundry/code-corpora" rel="noopener noreferrer">this project by Source Foundry</a>.</p>
<p>Our corpus includes labeled code samples in
C, C++, Go, Java, JavaScript, Objective-C, PHP, Python, Ruby, Rust, and Swift.
Each directory in the project root corresponds to a language
and contains flattened checkouts of
a handful of popular open source repositories for that language:</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> tree <span class="nt">-L</span> 2 code-corpora/swift
<span class="go">code-corpora/swift
âââ alamofire
â  âââ Alamofire.h
â  âââ Alamofire.swift
â  âââ App<wbr></wbr>Delegate.swift
â  âââ Authentication<wbr></wbr>Tests.swift
</span><span class="gp">#</span> ...
</code></pre>
</div>
</div>
<p>Notice that Objective-C header in the Swift project, though.
We canât rely entirely on the top-level directories
as labels for their contents,
because most projects include other auxillary scripts and source files
(as well as README, LICENSE, and other repository miscellany).</p>
<p>Our training script uses the containing directory and file extension
to determine the correct label for each file in our corpus.
For example,
<code>.h</code> files in the <code>c</code> directory are labeled as C,
<code>.h</code> files in the <code>cc</code> directory are labeled as C++,
and any file with the <code>.go</code> extension is labeled as Go:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="nf">switch</span> <span class="p">(</span><span class="n">directory</span><span class="p">,</span> <span class="n">file<wbr></wbr>Extension</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> ("c", "h"), (_, "c"): label = "C"
<span class="k">case</span> ("cc", "h"), (_, "cc"), (_, "cpp"): label = "C++"
<span class="k">case</span> (_, "go"): label = "Go"
<span class="c1">// ...</span>
<span class="k">default</span><span class="p">:</span>
<span class="c1">// Unknown, skip</span>
<span class="p">}</span>
</code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="training-the-model" href="#training-the-model">#</a>Training the Model</h2>
<p>To build our data table,
we recursively enumerate the contents of the corpus directory
and append the contents of each source file that we can identify:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">var</span> <span class="nv">corpus</span><span class="p">:</span> <span class="p">[(</span><span class="nv">text</span><span class="p">:</span> <span class="kt">String</span><span class="p">,</span> <span class="nv">label</span><span class="p">:</span> <span class="kt">String</span><span class="p">)]</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">let</span> <span class="nv">enumerator</span> <span class="o">=</span> <span class="kt">File<wbr></wbr>Manager</span><span class="o">.</span><span class="k">default</span><span class="o">.</span><span class="nf">enumerator</span><span class="p">(</span>
<span class="nv">at</span><span class="p">:</span> <span class="n">corpus<wbr></wbr>URL</span><span class="p">,</span>
<span class="nv">including<wbr></wbr>Properties<wbr></wbr>For<wbr></wbr>Keys</span><span class="p">:</span> <span class="p">[</span><span class="o">.</span><span class="n">is<wbr></wbr>Directory<wbr></wbr>Key</span><span class="p">]</span>
<span class="p">)</span><span class="o">!</span>
<span class="k">for</span> <span class="k">case</span> let resource as URL in enumerator {
<span class="k">guard</span> <span class="o">!</span><span class="n">resource</span><span class="o">.</span><span class="n">has<wbr></wbr>Directory<wbr></wbr>Path</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">language</span> <span class="o">=</span> <span class="kt">Programming<wbr></wbr>Language</span><span class="p">(</span><span class="nv">for</span><span class="p">:</span> <span class="n">resource</span><span class="p">,</span>
<span class="nv">at</span><span class="p">:</span> <span class="n">enumerator</span><span class="o">.</span><span class="n">level</span><span class="p">),</span>
<span class="k">let</span> <span class="nv">text</span> <span class="o">=</span> <span class="k">try</span><span class="p">?</span> <span class="kt">String</span><span class="p">(</span><span class="nv">contents<wbr></wbr>Of</span><span class="p">:</span> <span class="n">resource</span><span class="p">)</span>
<span class="k">else</span> <span class="p">{</span>
<span class="k">continue</span>
<span class="p">}</span>
<span class="n">corpus</span><span class="o">.</span><span class="nf">append</span><span class="p">((</span><span class="nv">text</span><span class="p">:</span> <span class="n">text</span><span class="p">,</span> <span class="nv">label</span><span class="p">:</span> <span class="n">language</span><span class="o">.</span><span class="n">raw<wbr></wbr>Value</span><span class="p">))</span>
<span class="p">}</span>
<span class="k">let</span> <span class="p">(</span><span class="nv">texts</span><span class="p">,</span> <span class="nv">labels</span><span class="p">):</span> <span class="p">([</span><span class="kt">String</span><span class="p">],</span> <span class="p">[</span><span class="kt">String</span><span class="p">])</span> <span class="o">=</span>
<span class="n">corpus</span><span class="o">.</span><span class="nf">reduce</span><span class="p">(</span><span class="nv">into</span><span class="p">:</span> <span class="p">([],</span> <span class="p">[]))</span> <span class="p">{</span> <span class="p">(</span><span class="n">columns</span><span class="p">,</span> <span class="n">row</span><span class="p">)</span>
<span class="n">columns</span><span class="o">.</span><span class="mi">0</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
<span class="n">columns</span><span class="o">.</span><span class="mi">1</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">label</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">let</span> <span class="nv">data<wbr></wbr>Table</span> <span class="o">=</span>
<span class="k">try</span> <span class="kt">MLData<wbr></wbr>Table</span><span class="p">(</span><span class="nv">dictionary</span><span class="p">:</span> <span class="p">[</span><span class="s">"text"</span><span class="p">:</span> <span class="n">texts</span><span class="p">,</span> <span class="s">"label"</span><span class="p">:</span> <span class="n">labels</span><span class="p">])</span>
</code></pre>
</div>
</div>
<blockquote>
<p>Our original implementation
<a href="https://github.com/Flight-School/Programming-Language-Classifier/blob/7362a7368ed689d98fce7615b6f8ef7343aa9366/Trainer.playground/Contents.swift#L37-L38" rel="noopener noreferrer">appended <code>MLData<wbr></wbr>Table</code> objects</a>,
instead of initializing a single data table from an accumulated array.
We found this to have nonlinear performance characteristics,
which caused training to take closer to an hour instead of a few minutes.</p>
</blockquote>
<p>With our data table in hand,
we use the <code>random<wbr></wbr>Split(by:seed:)</code> method
to segment our training and testing data.
The former is used immediately,
passed into the <code>MLText<wbr></wbr>Classifier</code> initializer;
the latter will be used next to evaluate the model.</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">let</span> <span class="p">(</span><span class="nv">training<wbr></wbr>Data</span><span class="p">,</span> <span class="nv">testing<wbr></wbr>Data</span><span class="p">)</span> <span class="o">=</span>
<span class="n">data<wbr></wbr>Table</span><span class="o">.</span><span class="nf">random<wbr></wbr>Split</span><span class="p">(</span><span class="nv">by</span><span class="p">:</span> <span class="mf">0.8</span><span class="p">,</span> <span class="nv">seed</span><span class="p">:</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">classifier</span> <span class="o">=</span> <span class="k">try</span> <span class="kt">MLText<wbr></wbr>Classifier</span><span class="p">(</span><span class="nv">training<wbr></wbr>Data</span><span class="p">:</span> <span class="n">training<wbr></wbr>Data</span><span class="p">,</span>
<span class="nv">text<wbr></wbr>Column</span><span class="p">:</span> <span class="s">"text"</span><span class="p">,</span>
<span class="nv">label<wbr></wbr>Column</span><span class="p">:</span> <span class="s">"label"</span><span class="p">)</span>
</code></pre>
</div>
</div>
<p>Creating an <code>MLText<wbr></wbr>Classifier</code> object takes a while,
but you can track the progress by tailing STDOUT:</p>
<div class="highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code>Automatically generating validation set from 5% of the data.
Tokenizing data and extracting features
10% complete
20% complete
30% complete
40% complete
50% complete
60% complete
70% complete
80% complete
90% complete
100% complete
Starting Max<wbr></wbr>Ent training with 8584 samples
Iteration 1 training accuracy 0.285182
Iteration 2 training accuracy 0.946295
Iteration 3 training accuracy 0.988001
Iteration 4 training accuracy 0.997554
Iteration 5 training accuracy 0.998602
Iteration 6 training accuracy 0.999185
Iteration 7 training accuracy 0.999651
Iteration 8 training accuracy 0.999767
Finished Max<wbr></wbr>Ent training in 7.12 seconds
</code></pre>
</div>
</div>
<p>The resulting model seems large for what it can do,
weighing in at 3MB.
However,
itâs able to classify a file in ~20ms,
which should be fast enough for most use cases.</p>
<h2>
<a class="anchor" aria-hidden="true" id="evaluating-the-model" href="#evaluating-the-model">#</a>Evaluating the Model</h2>
<p>Letâs see how our classifier performs
by calling the <code>evaluation(on:)</code> method
and passing the <code>testing<wbr></wbr>Data</code> that we segmented before.</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">let</span> <span class="nv">evaluation</span> <span class="o">=</span> <span class="n">classifier</span><span class="o">.</span><span class="nf">evaluation</span><span class="p">(</span><span class="nv">on</span><span class="p">:</span> <span class="n">testing<wbr></wbr>Data</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">evaluation</span><span class="p">)</span>
</code></pre>
</div>
</div>
<h3>
<a class="anchor" aria-hidden="true" id="accuracy" href="#accuracy">#</a>Accuracy</h3>
<p>At the top of our evaluation,
we get a summary with
the number of examples,
the number of classes,
and the accuracy:</p>
<table>
<tbody>
<tr>
<td><strong>Number of Examples</strong></td>
<td>1138</td>
</tr>
<tr>
<td><strong>Number of Classes</strong></td>
<td>10</td>
</tr>
<tr>
<td><strong>Accuracy</strong></td>
<td>99.56%</td>
</tr>
</tbody>
</table>
<p>99.56% accuracy.
Thatâs good, right?
Letâs dig into the numbers to get a better understanding of how this behaves.</p>
<hr>
<p>When you <code>print(_:)</code> an <code>MLClassifier<wbr></wbr>Metrics</code> object,
it shows a summary of the overall accuracy
as well as a confusion matrix and a precision / recall table.</p>
<h3>
<a class="anchor" aria-hidden="true" id="confusion-matrix" href="#confusion-matrix">#</a>Confusion Matrix</h3>
<p>A <em>confusion matrix</em> is a tool for visualizing the accuracy of predictions.
Each column shows the predicted classes,
and each row shows the actual class:</p>
<table>
<thead>
<tr>
<th>Â </th>
<th>C</th>
<th>C++</th>
<th>Go</th>
<th>Java</th>
<th>JS</th>
<th>Obj-C</th>
<th>PHP</th>
<th>Ruby</th>
<th>Rust</th>
<th>Swift</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>C</strong></td>
<td>122</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>C++</strong></td>
<td>0</td>
<td>73</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Go</strong></td>
<td>1</td>
<td>0</td>
<td>333</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Java</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>137</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>JS</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>55</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Obj-C</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>97</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>PHP</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>95</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Ruby</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>136</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Rust</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>73</td>
<td>0</td>
</tr>
<tr>
<td><strong>Swift</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>12</td>
</tr>
</tbody>
</table>
<p>100% accuracy would have values along the diagonal line
where the predicted and actual classes match,
and zeroes everywhere else.
However, our accuracy isnât perfect, so we have a few stray figures.
From the table,
we can see that Go was mistaken for C once
and C++ was incorrectly labeled as Objective-C twice.</p>
<h3>
<a class="anchor" aria-hidden="true" id="precision-and-recall" href="#precision-and-recall">#</a>Precision and Recall</h3>
<p>Another way of analyzing our results
is in terms of precision and recall.</p>
<table>
<thead>
<tr>
<th>Class</th>
<th>Precision(%)</th>
<th>Recall(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>C</strong></td>
<td>99.19</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>C++</strong></td>
<td>100.00</td>
<td>97.33</td>
</tr>
<tr>
<td><strong>Go</strong></td>
<td>100.00</td>
<td>98.94</td>
</tr>
<tr>
<td><strong>Java</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>JavaScript</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>Objective-C</strong></td>
<td>98.91</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>PHP</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>Ruby</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>Rust</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><strong>Swift</strong></td>
<td>100.00</td>
<td>100.00</td>
</tr>
</tbody>
</table>
<p><em>Precision</em> measures the ability of the model
to identify only the relevant classification within a data set.
For example,
our model had perfect precision for C++
because it never misidentified any source files as being C++,
however it has imperfect precision for C
because it incorrectly identified a Go file as being C.</p>
<!-- $$
\text{precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}
$$ -->
<p><em>Recall</em> measures the ability of a model
to identify all of the relevant classifications within a data set.
For example,
our model had perfect recall for C
because it correctly identified all of the C source code in the training data,
and imperfect recall for C++
because it missed two C++ files in the training data.</p>
<!-- $$
\text{recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}
$$ -->
<h2>
<a class="anchor" aria-hidden="true" id="writing-the-model-to-disk" href="#writing-the-model-to-disk">#</a>Writing the Model to Disk</h2>
<p>So,
we have our classifier,
weâve evaluated it
and found it to be satisfactory.
The only thing left to do is to is write it to disk:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">let</span> <span class="nv">model<wbr></wbr>Path</span> <span class="o">=</span> <span class="kt">URL</span><span class="p">(</span><span class="nv">file<wbr></wbr>URLWith<wbr></wbr>Path</span><span class="p">:</span> <span class="n">destination<wbr></wbr>Path</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">metadata</span> <span class="o">=</span> <span class="kt">MLModel<wbr></wbr>Metadata</span><span class="p">(</span>
<span class="nv">author</span><span class="p">:</span> <span class="s">"Mattt"</span><span class="p">,</span>
<span class="nv">short<wbr></wbr>Description</span><span class="p">:</span> <span class="s">"A model trained to classify programming languages"</span><span class="p">,</span>
<span class="nv">version</span><span class="p">:</span> <span class="s">"1.0"</span>
<span class="p">)</span>
<span class="k">try</span> <span class="n">classifier</span><span class="o">.</span><span class="nf">write</span><span class="p">(</span><span class="nv">to</span><span class="p">:</span> <span class="n">model<wbr></wbr>Path</span><span class="p">,</span> <span class="nv">metadata</span><span class="p">:</span> <span class="n">metadata</span><span class="p">)</span>
</code></pre>
</div>
</div>
<p>All told,
training, evaluating, and writing the model took less than 5 minutes:</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> <span class="nb">time </span>swift ./Trainer.swift
<span class="go">281.84 real 275.51 user 5.60 sys
</span></code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="testing-out-the-model-in-a-playground" href="#testing-out-the-model-in-a-playground">#</a>Testing Out the Model in a Playground</h2>
<p>In order to use our model from a Playground,
we need to compile it first.
For an iOS or Mac app,
Xcode would automatically generate a programmatic interface for us.
However, in a Playground,
we need to do this ourselves.</p>
<p>Call the <code>coremlc</code> tool using the <code>xcrun</code> command,
specifying the <code>compile</code> action on the <code>.mlmodel</code> file
and target the current directory for the output:</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> xcrun coremlc compile Programming<wbr></wbr>Language<wbr></wbr>Classifier.mlmodel <span class="nb">.</span>
</code></pre>
</div>
</div>
<p>Take the resulting <code>.mlmodelc</code> bundle
(itâll look like a normal directory in Finder)
and move it into the Resources folder of your playground.
You can use this to initialize a Natural Language framework <code>NLModel</code>
to classify text using the <code>predicted<wbr></wbr>Label(for:)</code> method:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">let</span> <span class="nv">url</span> <span class="o">=</span> <span class="kt">Bundle</span><span class="o">.</span><span class="n">main</span><span class="o">.</span><span class="nf">url</span><span class="p">(</span>
<span class="nv">for<wbr></wbr>Resource</span><span class="p">:</span> <span class="s">"Programming<wbr></wbr>Language<wbr></wbr>Classifier"</span><span class="p">,</span>
<span class="nv">with<wbr></wbr>Extension</span><span class="p">:</span> <span class="s">"mlmodelc"</span>
<span class="p">)</span><span class="o">!</span>
<span class="k">let</span> <span class="nv">model</span> <span class="o">=</span> <span class="k">try!</span> <span class="kt">NLModel</span><span class="p">(</span><span class="nv">contents<wbr></wbr>Of</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span>
</code></pre>
</div>
</div>
<p>Now you can call the <code>predicted<wbr></wbr>Label(for:)</code> method
to predict the programming language of a string containing code:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="k">let</span> <span class="nv">code</span> <span class="o">=</span> <span class="s">"""
struct Plane: Codable {
var manufacturer: String
var model: String
var seats: Int
}
"""</span>
<span class="n">model</span><span class="o">.</span><span class="nf">predicted<wbr></wbr>Label</span><span class="p">(</span><span class="nv">for</span><span class="p">:</span> <span class="n">code</span><span class="p">)</span> <span class="c1">// Swift</span>
</code></pre>
</div>
</div>
<p>The <a href="https://github.com/Flight-School/Programming-Language-Classifier" rel="noopener noreferrer">sample code project</a>
for this post wraps this up with a fun drag-and-drop UI,
so you can easily test out your model
with whatever source files you have littering your Desktop.</p>
<p><img src="/assets/classifier-demo-screenshot-d1cb7d9462f6b2b36ae1445e786635540f3bdbae21303e76ea688f8647e0d7b5.png" alt="Screenshot of Classifier Example"></p>
<h2>
<a class="anchor" aria-hidden="true" id="conclusion" href="#conclusion">#</a>Conclusion</h2>
<p>Thereâs no way this actually works⌠right?
Source code isnât like other kinds of text,
and weighing keywords and punctuation equally
with comments and variable names
is obviously a flawed approach.</p>
<p>The way classifiers work,
our model may well be fixating on irrelevant details
like license comments in the file header.
Heck, that 99% accuracy we saw
could be more a reflection of file similarity within the same project
than of the modelâs actual predictive ability.</p>
<p><strong>All of that said,
it might just be good enough.</strong></p>
<p>Consider this:
in under an hour,
we went from nothing to a working solution
without any significant programming.
Thatâs pretty incredible.</p>
<p>Create ML is a powerful way to prototype new features quickly.
If a minimum-viable product is good enough, then your job is done.
Or if you need to go even further,
there are all kinds of optimizations to be had
in terms of model size, accuracy, and precision
by using something like TensorFlow or Turi Create.</p>MatttLearn how to use Create ML to train a Natural Language model that predicts the programming language of code.Flight School for Chinese Language Speakers2018-05-17T00:00:00+00:002018-05-17T00:00:00+00:00https://flight.school/articles/flight-school-chinese-language<p><img width="250" class="float-right" alt="ä˝żç¨ Swift Codable čżčĄéŤćçć°ćŽçźč§Łç " src="/assets/flight-school-guide-to-codable-chinese-cover-a1bac4e4e20e1a2dc3786d505b2b923c1ce616004d4c035a39cf789cbd9f5327.jpg" integrity="sha256-obrE5OIOGi3DeG1QWyuSPBzmFgBNTANaOc94nL2fUyc=" crossorigin="anonymous"></p>
<p>Flight School is partnering with <a href="https://juejin.im" rel="noopener noreferrer">ćé (<em>JuĂŠjÄŤn</em>)</a>,
one of the leading developer communities in China,
to publish our books for Chinese language speakers.</p>
<p>The Chinese language edition of <em>Flight School Guide to Swift Codable</em>
<a href="https://juejin.im/book/5ad19f07518825364001dd49" rel="noopener noreferrer">is now available</a>.
The book was translated by Croath Liu (<a href="https://github.com/croath" rel="noopener noreferrer">@croath</a>),
Flight Schoolâs content lead for China.</p>
<p>Our mission is to provide thoughtful learning resources
for developers around the world â
and we canât think of a better place to start than China.</p>
<p>As Tim Cook noted in
<a href="https://www.youtube.com/watch?v=_ng8xQ-SNGc#t=180" rel="noopener noreferrer">a recent interview</a>:</p>
<blockquote>
<p>China has extraordinary skills.
And the part thatâs the most unknown is
<strong>thereâs almost 2 million application developers in China
that write apps for the iOS App Store</strong>.
These are some of the most innovative mobile apps in the world,
and the entrepreneurs that run them
are some of the most inspiring and entrepreneurial in the world.</p>
<p><cite>Tim Cook</cite></p>
</blockquote>
<p>We couldnât be more excited to connect with
the millions of app developers in China.</p>
<p>To kick things off,
Mattt and Croath will be attending the
<a href="https://gmtc.geekbang.org/" rel="noopener noreferrer">GMTC 2018</a> conference in Beijing
on June 21st & 22nd.
We invite you to reach out
via Twitter (<a href="https://twitter.com/flightdotschool" rel="noopener noreferrer">@flightdotschool</a>)
or <a href="mailto:info@flight.school">email</a>
if youâd like to meet up!</p>
<p>ć¤č´<br>
ćŹç¤ź</p>MatttFlight School is partnering with Juejin to publish our books for Chinese language speakers.Running Xcode Playgrounds on Travis CI2018-05-16T00:00:00+00:002018-05-16T00:00:00+00:00https://flight.school/articles/running-xcode-playgrounds-on-travis-ci<p>Xcode Playgrounds are a great way to share sample code.
They allow you to communicate ideas effectively
without getting bogged down in implementation details.
The question is:
how do you ensure that things continue to work
with each new version of Swift and platform SDKs?</p>
<p>This is something weâve been thinking about
since the release of our <a href="/books/codable/" rel="noopener noreferrer">Guide to Swift Codable</a>.
We wanted to release the sample code Playgrounds
as open source on GitHub,
but not without some kind of testing strategy in place first
(with over a dozen Playgrounds in total,
doing this manually was out of the question).</p>
<p>As a baseline,
our goal was to create a continuous integration
that tested whether our code compiled and ran without any issues.
(From there,
we can progressively add more comprehensive tests
for expected output and behavior.)</p>
<hr>
<p>Each chapter directory contains one or more <code>.playground</code> bundles,
each of which contains a <code>Contents.swift</code> file
(this is what you first see when you open a Playground with Xcode)
as well as any auxiliary sources.</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> tree <span class="s2">"Chapter 2"</span>
<span class="go">Chapter\ 2
âââ Flight\ Plan.playground
âââ Contents.swift
âââ Sources
â  âââ Aircraft.swift
â  âââ Flight<wbr></wbr>Plan.swift
â  âââ Flight<wbr></wbr>Rules.swift
âââ contents.xcplayground
</span></code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="compiling-a-playground-from-scratch" href="#compiling-a-playground-from-scratch">#</a>Compiling a Playground from Scratch</h2>
<p>Without an Xcode project or Swift package manifest,
we canât directly hook into familiar solutions
for testing apps or libraries.
However, we can reasonably approximate how Xcode builds playgrounds
by invoking <code>swiftc</code> directly
(well, almost directly â
weâll call it via <code>xcrun</code>).</p>
<p>First,
we <code>cd</code> into the <code>.playground</code> bundle.</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> <span class="nb">cd</span> <span class="s2">"Chapter 1/Plane.playground"</span>
</code></pre>
</div>
</div>
<p>Next, we use <code>swiftc</code> to build an <code>Auxiliary<wbr></wbr>Sources</code> module
from the Swift files in the <code>Sources/</code> directory.</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> xcrun swiftc <span class="nt">-emit-library</span> <span class="se">\</span>
<span class="nt">-emit-module</span> <span class="nt">-module-name</span> Auxiliary<wbr></wbr>Sources <span class="se">\</span>
Sources/<span class="k">*</span>.swift
</code></pre>
</div>
</div>
<p>Some playgrounds depend on Playground-specific functionality in order to run.
We can use the <code>swiftc</code> commandâs <code>-emit-imported-modules</code> option
to detect whether <code>Playground<wbr></wbr>Support</code> is imported,
and only attempt to build and run the Playground if it isnât.</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> <span class="k">if</span> <span class="o">!</span> xcrun swiftc <span class="nt">-emit-imported-modules</span> Contents.swift | <span class="se">\</span>
<span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"Playground<wbr></wbr>Support"</span><span class="p">;</span> <span class="se">\</span>
<span class="k">then</span> <span class="se">\</span>
... <span class="se">\</span>
<span class="k">fi</span><span class="p">;</span>
</code></pre>
</div>
</div>
<p>When running a Playground in Xcode,
the shared sources module is imported automatically.
We can add a missing <code>import</code> statement
by concatenating it with <code>Contents.swift</code>
and then writing the output to a new <code>main.swift</code> file.</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">cat <(echo "import Auxiliary<wbr></wbr>Sources") \
Contents.swift ></span> main.swift <span class="o">&&</span> <span class="se">\</span>
xcrun <span class="nt">-sdk</span> <span class="s2">"</span><span class="k">${</span><span class="nv">SDK</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
swiftc <span class="nt">-target</span> <span class="s2">"</span><span class="k">${</span><span class="nv">TARGET</span><span class="k">}</span><span class="s2">"</span> <span class="nt">-emit-executable</span> <span class="se">\</span>
<span class="nt">-I</span> <span class="s2">"."</span> <span class="nt">-L</span> <span class="s2">"."</span> <span class="nt">-l<wbr></wbr>Auxiliary<wbr></wbr>Sources</span> <span class="se">\</span>
<span class="nt">-module-link-name</span> Auxiliary<wbr></wbr>Sources <span class="se">\</span>
<span class="nt">-o</span> Playground main.swift <span class="se">\</span>
</code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="building-across-all-playgrounds" href="#building-across-all-playgrounds">#</a>Building Across All Playgrounds</h2>
<p>We can get a list of all the <code>.playground</code> bundles
by running the following command from the root directory:</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> find <span class="nb">.</span> <span class="nt">-name</span> Chapter <span class="nt">-prune</span> <span class="nt">-o</span> <span class="nt">-name</span> <span class="s1">'*.playground'</span> <span class="nt">-print</span> | <span class="nb">sort</span>
<span class="go">./Chapter\ 1/Plane.playground
./Chapter\ 2/Flight Plan.playground
./Chapter\ 3/Any<wbr></wbr>Decodable.playground
./Chapter\ 3/Coordinates.playground
./Chapter\ 3/Economy<wbr></wbr>Seat.playground
./Chapter\ 3/Either<wbr></wbr>Bird<wbr></wbr>Or<wbr></wbr>Plane.playground
./Chapter\ 3/Fuel<wbr></wbr>Price.playground
./Chapter\ 3/Pixel.playground
./Chapter\ 3/Route.playground
./Chapter\ 4/Music Store.playground
./Chapter\ 5/In Flight Service.playground
./Chapter\ 6/Luggage Scanner.playground
./Chapter\ 7/Message<wbr></wbr>Pack<wbr></wbr>Encoder.playground
</span></code></pre>
</div>
</div>
<p>This list can be fed into the continuous integration system
as environment variables,
which are set on individual build jobs:</p>
<div class="language-terminal highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="gp">$</span> <span class="nb">export </span><span class="nv">PLAYGROUND_DIR</span><span class="o">=</span><span class="s2">"Chapter 1/Plane.playground"</span>
<span class="gp">$</span> <span class="nb">cd</span> <span class="s2">"</span><span class="k">${</span><span class="nv">PLAYGROUND_DIR</span><span class="k">}</span><span class="s2">"</span> <span class="o">&&</span> <span class="c"># compile playground</span>
</code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="putting-it-all-together" href="#putting-it-all-together">#</a>Putting it All Together</h2>
<p>Now that we can compile Playground files locally
and know how to repeat the process across our sample code,
itâs time to write our <code>.travis.yml</code> file
and kick off a test build:</p>
<h3>
<a class="anchor" aria-hidden="true" id="travisyml" href="#travisyml">#</a>.travis.yml</h3>
<div class="language-yaml highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="na">language</span><span class="pi">:</span> <span class="s">swift</span>
<span class="na">osx_image</span><span class="pi">:</span> <span class="s">xcode9.3</span>
<span class="na">env</span><span class="pi">:</span>
<span class="na">global</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">SDK=iphoneos</span>
<span class="pi">-</span> <span class="s">TARGET=armv7-apple-ios10</span>
<span class="na">matrix</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 1/Plane.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 2/Flight Plan.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Any<wbr></wbr>Decodable.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Coordinates.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Economy<wbr></wbr>Seat.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Either<wbr></wbr>Bird<wbr></wbr>Or<wbr></wbr>Plane.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Fuel<wbr></wbr>Price.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Pixel.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 3/Route.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 4/Music Store.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 5/In Flight Service.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 6/Luggage Scanner.playground"</span>
<span class="pi">-</span> <span class="s">PLAYGROUND_DIR="Chapter 7/Message<wbr></wbr>Pack<wbr></wbr>Encoder.playground"</span>
<span class="na">script</span><span class="pi">:</span> <span class="s">xcrun swift --version &&</span>
<span class="s">cd "${PLAYGROUND_DIR}" &&</span>
<span class="s">xcrun -sdk "${SDK}"</span>
<span class="s">swiftc -target "${TARGET}"</span>
<span class="s">-emit-library -emit-module -module-name Auxiliary<wbr></wbr>Sources</span>
<span class="s">Sources/*.swift &&</span>
<span class="s">if ! xcrun swiftc -emit-imported-modules Contents.swift |</span>
<span class="s">grep -q "Playground<wbr></wbr>Support";</span>
<span class="s">then</span>
<span class="s">cat <(echo "import Auxiliary<wbr></wbr>Sources") Contents.swift > main.swift &&</span>
<span class="s">xcrun -sdk "${SDK}"</span>
<span class="s">swiftc -target "${TARGET}"</span>
<span class="s">-I "." -L "." -l<wbr></wbr>Auxiliary<wbr></wbr>Sources -module-link-name Auxiliary<wbr></wbr>Sources</span>
<span class="s">-o Playground main.swift;</span>
<span class="s">fi</span>
</code></pre>
</div>
</div>
<p>Global environment variables are used
to declare constants for the SDK and target.
Together, they allow us to quickly tell (and later change)
what platform our builds are targeting.</p>
<p>Each Playground directory is specified as a separate environment variable,
forming a <a href="https://docs.travis-ci.com/user/customizing-the-build#Build-Matrix" rel="noopener noreferrer">Build Matrix</a>
with separate jobs for each one.</p>
<p><img src="/assets/travis-ci-playground-jobs-957a08d0dd11fcf2d89cbbc3fd8f8562e4408a60de13b5df54ccdb124dd87cdd.png" integrity="sha256-lXoI0N0R/PLYnLvD/Y+FYuRAimDeE7XfVMzbEk3YfN0=" crossorigin="anonymous"></p>
<p>Newlines and indentation help break up the long script command
into more meaningful chunks.
Everything works out in the end because<br>
<a href="http://yaml.org/spec/1.2/spec.html#id2788859" rel="noopener noreferrer">YAML Plain Style</a>
helpfully strips out the extra whitespace when being parsed.</p>
<hr>
<p>With CI all set up and ready to go,
weâre thrilled to share <a href="https://github.com/Flight-School/Guide-to-Codable-Sample-Code" rel="noopener noreferrer">the sample code for our first book</a>.</p>
<p>Feel free to Clone, Star, and Fork to your heartâs content!
And if you have any ideas for how to improve our setup even further,
please <a href="https://github.com/Flight-School/Guide-to-Codable-Sample-Code/issues/new" rel="noopener noreferrer">open an issue</a>
or <a href="https://twitter.com/flightdotschool" rel="noopener noreferrer">reach out via Twitter</a>.</p>MatttHow do you ensure that Playgrounds continue to work with each new version of Swift and platform SDKs?DIY Codable Encoder / Decoder Kit2018-05-07T00:00:00+00:002018-05-07T00:00:00+00:00https://flight.school/articles/codable-diy-kit<p>In Swift 4,
a type that conforms to the <code>Codable</code> protocol
can be encoded to or decoded from representations
for any format that implements a corresponding <code>Encoder</code> or <code>Decoder</code> type.</p>
<p>At the time of its release,
the only reference implementations for these types
were the Foundation frameworkâs <code>JSONEncoder</code> / <code>JSONDecoder</code>
and <code>Property<wbr></wbr>List<wbr></wbr>Encoder</code> and <code>Property<wbr></wbr>List<wbr></wbr>Decoder</code>.
The <a href="https://github.com/apple/swift/blob/master/stdlib/public/SDK/Foundation/JSONEncoder.swift" rel="noopener noreferrer">implementation details</a>
of these types, however,
are obfuscated by translation logic from
<code>JSONSerialization</code> and <code>Property<wbr></wbr>List<wbr></wbr>Serialization</code>.</p>
<p>The <a href="https://github.com/Flight-School/Codable-DIY-Kit" rel="noopener noreferrer">DIY Codable Encoder / Decoder Kit</a>
repository on GitHub
makes it easier for developers
to create encoders and decoders for custom formats.
The template includes stubbed placeholders for the required types and methods
as well as simple tests for encoding and decoding <code>Codable</code> types.</p>
<p>Just do a find-and-replace for the format name
and rename a few files,
and you can get right into the nitty-gritty of your formatâs
specific implementation details.</p>
<h3>
<a class="anchor" aria-hidden="true" id="encoder-structure" href="#encoder-structure">#</a>Encoder Structure</h3>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <var class="placeholder">Format</var><span class="kt">Encoder</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">func</span> <span class="n">encode</span><span class="o"><</span><span class="kt">T</span><span class="o">></span><span class="p">(</span><span class="n">_</span> <span class="nv">value</span><span class="p">:</span> <span class="kt">T</span><span class="p">)</span> <span class="k">throws</span> <span class="o">-></span> <span class="kt">Data</span>
<span class="k">where</span> <span class="kt">T</span> <span class="p">:</span> <span class="kt">Encodable</span>
<span class="p">}</span>
<span class="kd">class</span> <span class="n">_</span><var class="placeholder">Format</var><span class="kt">Encoder</span><span class="p">:</span> <span class="kt">Encoder</span> <span class="p">{</span>
<span class="kd">class</span> <span class="kt">Single<wbr></wbr>Value<wbr></wbr>Container</span><span class="p">:</span> <span class="kt">Single<wbr></wbr>Value<wbr></wbr>Encoding<wbr></wbr>Container</span>
<span class="kd">class</span> <span class="kt">Unkeyed<wbr></wbr>Container</span><span class="p">:</span> <span class="kt">Unkeyed<wbr></wbr>Encoding<wbr></wbr>Container</span>
<span class="kd">class</span> <span class="kt">Keyed<wbr></wbr>Container</span><span class="o"><</span><span class="kt">Key</span><span class="o">></span><span class="p">:</span> <span class="kt">Keyed<wbr></wbr>Encoding<wbr></wbr>Container<wbr></wbr>Protocol</span>
<span class="k">where</span> <span class="kt">Key</span><span class="p">:</span> <span class="kt">Coding<wbr></wbr>Key</span>
<span class="p">}</span>
<span class="kd">protocol</span> <var class="placeholder">Format</var><span class="kt">Encoding<wbr></wbr>Container</span><span class="p">:</span> <span class="kd">class</span> <span class="p">{}</span>
</code></pre>
</div>
</div>
<h3>
<a class="anchor" aria-hidden="true" id="decoder-structure" href="#decoder-structure">#</a>Decoder Structure</h3>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <var class="placeholder">Format</var><span class="kt">Pack<wbr></wbr>Decoder</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">func</span> <span class="n">decode</span><span class="o"><</span><span class="kt">T</span><span class="o">></span><span class="p">(</span><span class="n">_</span> <span class="nv">type</span><span class="p">:</span> <span class="kt">T</span><span class="o">.</span><span class="k">Type</span><span class="p">,</span>
<span class="n">from</span> <span class="nv">data</span><span class="p">:</span> <span class="kt">Data</span><span class="p">)</span> <span class="k">throws</span> <span class="o">-></span> <span class="kt">T</span>
<span class="k">where</span> <span class="kt">T</span> <span class="p">:</span> <span class="kt">Decodable</span>
<span class="p">}</span>
<span class="kd">final</span> <span class="kd">class</span> <span class="n">_</span><var class="placeholder">Format</var><span class="kt">Decoder</span><span class="p">:</span> <span class="kt">Decoder</span> <span class="p">{</span>
<span class="kd">class</span> <span class="kt">Single<wbr></wbr>Value<wbr></wbr>Container</span><span class="p">:</span> <span class="kt">Single<wbr></wbr>Value<wbr></wbr>Decoding<wbr></wbr>Container</span>
<span class="kd">class</span> <span class="kt">Unkeyed<wbr></wbr>Container</span><span class="p">:</span> <span class="kt">Unkeyed<wbr></wbr>Decoding<wbr></wbr>Container</span>
<span class="kd">class</span> <span class="kt">Keyed<wbr></wbr>Container</span><span class="o"><</span><span class="kt">Key</span><span class="o">></span><span class="p">:</span> <span class="kt">Keyed<wbr></wbr>Container</span>
<span class="k">where</span> <span class="kt">Key</span><span class="p">:</span> <span class="kt">Coding<wbr></wbr>Key</span>
<span class="p">}</span>
<span class="kd">protocol</span> <var class="placeholder">Format</var><span class="kt">Decoding<wbr></wbr>Container</span><span class="p">:</span> <span class="kd">class</span> <span class="p">{}</span>
</code></pre>
</div>
</div>
<hr>
<p>For an example of this template in action,
see this <a href="https://github.com/flight-school/messagepack" rel="noopener noreferrer"><code>Codable</code>-compatible encoder and decoder for the MessagePack format</a>.</p>
<p>Weâd love to see what you make with this!
Please get in touch <a href="https://twitter.com/flightdotschool" rel="noopener noreferrer">via Twitter</a>
to share your custom encoder or decoder.</p>MatttIn Swift 4, a type that conforms to the Codable protocol can be encoded to or decoded from representations for any format that implements a corresponding Encoder or Decoder type.Benchmarking Codable2018-05-01T00:00:00+00:002018-05-01T00:00:00+00:00https://flight.school/articles/benchmarking-codable<p>Swift Codable can automatically synthesize initializers that decode models from JSON.
But how does this generated code compare to what it replaces?</p>
<p>To find out, letâs benchmark the performance of
<a href="https://developer.apple.com/documentation/foundation/jsondecoder" rel="noopener noreferrer"><code>JSONDecoder</code></a>
against equivalent hand-written code that uses
<a href="https://developer.apple.com/documentation/foundation/jsonserialization" rel="noopener noreferrer"><code>JSONSerialization</code></a>
instead.</p>
<h2>
<a class="anchor" aria-hidden="true" id="defining-a-sample-model" href="#defining-a-sample-model">#</a>Defining a Sample Model</h2>
<p>In order to establish a baseline,
weâll create a model that reasonably approximates
something that youâd expect to find in a typical app.</p>
<p>For this example,
weâll use the following <code>Airport</code> model:</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="kd">struct</span> <span class="kt">Airport</span><span class="p">:</span> <span class="kt">Codable</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">name</span><span class="p">:</span> <span class="kt">String</span>
<span class="k">let</span> <span class="nv">iata</span><span class="p">:</span> <span class="kt">String</span>
<span class="k">let</span> <span class="nv">icao</span><span class="p">:</span> <span class="kt">String</span>
<span class="k">let</span> <span class="nv">coordinates</span><span class="p">:</span> <span class="p">[</span><span class="kt">Double</span><span class="p">]</span>
<span class="kd">struct</span> <span class="kt">Runway</span><span class="p">:</span> <span class="kt">Codable</span> <span class="p">{</span>
<span class="kd">enum</span> <span class="kt">Surface</span><span class="p">:</span> <span class="kt">String</span><span class="p">,</span> <span class="kt">Codable</span> <span class="p">{</span>
<span class="k">case</span> rigid, flexible, gravel, sealed, unpaved, other
<span class="p">}</span>
<span class="k">let</span> <span class="nv">direction</span><span class="p">:</span> <span class="kt">String</span>
<span class="k">let</span> <span class="nv">distance</span><span class="p">:</span> <span class="kt">Int</span>
<span class="k">let</span> <span class="nv">surface</span><span class="p">:</span> <span class="kt">Surface</span>
<span class="p">}</span>
<span class="k">let</span> <span class="nv">runways</span><span class="p">:</span> <span class="p">[</span><span class="kt">Runway</span><span class="p">]</span>
<span class="p">}</span>
</code></pre>
</div>
</div>
<p>The <code>Airport</code> structure has <code>String</code> properties
for its name,
along with
three-letter <a href="https://en.wikipedia.org/wiki/IATA_airport_code" rel="noopener noreferrer">IATA</a>
and
four-letter <a href="https://en.wikipedia.org/wiki/ICAO_airport_code" rel="noopener noreferrer">ICAO</a>
airport codes.
The <code>Runway</code> type specifies a direction and distance,
as well as a surface defined by a nested <code>Surface</code> enumeration.
A set of coordinates are stored in a <code>[Double]</code> array,
though we could also define a custom type for that, too.</p>
<p>By conforming to <code>Codable</code> in its declaration
and having stored properties with types that conform to <code>Codable</code>,
Swift automatically synthesizes the implementation for the
<a href="https://developer.apple.com/documentation/swift/decodable/2894081-init" rel="noopener noreferrer"><code>init(from:)</code></a>
initializer required by the
<a href="https://developer.apple.com/documentation/swift/decodable" rel="noopener noreferrer"><code>Decodable</code></a>
protocol
(the same goes for the
<a href="https://developer.apple.com/documentation/swift/encodable" rel="noopener noreferrer"><code>Encodable</code></a>
protocol and its
<a href="https://developer.apple.com/documentation/swift/encodable/2893603-encode" rel="noopener noreferrer"><code>encode(to:)</code></a>
method).</p>
<p><code>Airport</code> may not check all the boxes in terms of <code>Codable</code> functionality,
but it has sufficient complexity to extrapolate from our findings.
And for whatever deficiencies this model has,
it more than makes up for it by having real-world data to test with.
As the saying goes, âQuantity has a quality all its ownâ.</p>
<p>By scraping Wikipediaâs
<a href="https://en.wikipedia.org/wiki/List_of_airports" rel="noopener noreferrer">âList of Airportsâ article</a>,
we were able to create a list of 7361 airports,
from which we generated JSON files with
<a href="https://github.com/Flight-School/CodablePerformance/blob/master/Tests/airports1.json" rel="noopener noreferrer">1</a>,
<a href="https://github.com/Flight-School/CodablePerformance/blob/master/Tests/airports10.json" rel="noopener noreferrer">10</a>,
<a href="https://github.com/Flight-School/CodablePerformance/blob/master/Tests/airports100.json" rel="noopener noreferrer">100</a>,
<a href="https://github.com/Flight-School/CodablePerformance/blob/master/Tests/airports1000.json" rel="noopener noreferrer">1000</a>, and
<a href="https://github.com/Flight-School/CodablePerformance/blob/master/Tests/airports10000.json" rel="noopener noreferrer">10000</a> objects
(some records were duplicated to fill in the gaps for that last one).</p>
<div class="language-json highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Portland International Airport"</span><span class="p">,</span><span class="w">
</span><span class="s2">"iata"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PDX"</span><span class="p">,</span><span class="w">
</span><span class="s2">"icao"</span><span class="p">:</span><span class="w"> </span><span class="s2">"KPDX"</span><span class="p">,</span><span class="w">
</span><span class="s2">"coordinates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mf">-122.5975</span><span class="p">,</span><span class="w"> </span><span class="mf">45.5886111111111</span><span class="p">],</span><span class="w">
</span><span class="s2">"runways"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"distance"</span><span class="p">:</span><span class="w"> </span><span class="mi">1829</span><span class="p">,</span><span class="w">
</span><span class="s2">"direction"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3/21"</span><span class="p">,</span><span class="w">
</span><span class="s2">"surface"</span><span class="p">:</span><span class="w"> </span><span class="s2">"flexible"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="err">//</span><span class="w"> </span><span class="err">...</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre>
</div>
</div>
<p>Taking a look at the relative sizes of these data sets:</p>
<table>
<thead>
<tr>
<th>Count</th>
<th>Size</th>
<th>gzip Compressed Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>271 Bytes</td>
<td>193 Bytes</td>
</tr>
<tr>
<td>10</td>
<td>2.8 KB</td>
<td>703 Bytes</td>
</tr>
<tr>
<td>100</td>
<td>33.0 KB</td>
<td>4.7 KB</td>
</tr>
<tr>
<td>1000</td>
<td>328.0 KB</td>
<td>44.3 KB</td>
</tr>
<tr>
<td>10000</td>
<td>3.2 MB</td>
<td>477.4 KB</td>
</tr>
</tbody>
</table>
<p>Most apps donât process more than tens of thousands of records at once,
so our benchmark should be fairly representative as far as sample sizes go.</p>
<h2>
<a class="anchor" aria-hidden="true" id="manually-implementing-a-json-initializer" href="#manually-implementing-a-json-initializer">#</a>Manually Implementing a JSON Initializer</h2>
<p>The conventional way to decode models from JSON without Codable
is to implement an initializer that takes a <code>[String: Any]</code> type.
A hand-rolled implementation of this approach for <code>Airport</code>
weighs in at ~30 lines of code
(maybe 5 to 10 minutes to write from scratch):</p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="kd">extension</span> <span class="kt">Airport</span> <span class="p">{</span>
<span class="kd">public</span> <span class="nf">init</span><span class="p">(</span><span class="nv">json</span><span class="p">:</span> <span class="p">[</span><span class="kt">String</span><span class="p">:</span> <span class="kt">Any</span><span class="p">])</span> <span class="p">{</span>
<span class="k">guard</span> <span class="k">let</span> <span class="nv">name</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"name"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">String</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">iata</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"iata"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">String</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">icao</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"icao"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">String</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">coordinates</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"coordinates"</span><span class="p">]</span> <span class="k">as?</span> <span class="p">[</span><span class="kt">Double</span><span class="p">],</span>
<span class="k">let</span> <span class="nv">runways</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"runways"</span><span class="p">]</span> <span class="k">as?</span> <span class="p">[[</span><span class="kt">String</span><span class="p">:</span> <span class="kt">Any</span><span class="p">]]</span>
<span class="k">else</span> <span class="p">{</span>
<span class="nf">fatal<wbr></wbr>Error</span><span class="p">(</span><span class="s">"Cannot initialize Airport from JSON"</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">self</span><span class="o">.</span><span class="n">iata</span> <span class="o">=</span> <span class="n">iata</span>
<span class="k">self</span><span class="o">.</span><span class="n">icao</span> <span class="o">=</span> <span class="n">icao</span>
<span class="k">self</span><span class="o">.</span><span class="n">coordinates</span> <span class="o">=</span> <span class="n">coordinates</span>
<span class="k">self</span><span class="o">.</span><span class="n">runways</span> <span class="o">=</span> <span class="n">runways</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">Runway</span><span class="p">(</span><span class="nv">json</span><span class="p">:</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">extension</span> <span class="kt">Airport</span><span class="o">.</span><span class="kt">Runway</span> <span class="p">{</span>
<span class="kd">public</span> <span class="nf">init</span><span class="p">(</span><span class="nv">json</span><span class="p">:</span> <span class="p">[</span><span class="kt">String</span><span class="p">:</span> <span class="kt">Any</span><span class="p">])</span> <span class="p">{</span>
<span class="k">guard</span> <span class="k">let</span> <span class="nv">direction</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"direction"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">String</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">distance</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"distance"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">Int</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">surface<wbr></wbr>Raw<wbr></wbr>Value</span> <span class="o">=</span> <span class="n">json</span><span class="p">[</span><span class="s">"surface"</span><span class="p">]</span> <span class="k">as?</span> <span class="kt">String</span><span class="p">,</span>
<span class="k">let</span> <span class="nv">surface</span> <span class="o">=</span> <span class="kt">Surface</span><span class="p">(</span><span class="nv">raw<wbr></wbr>Value</span><span class="p">:</span> <span class="n">surface<wbr></wbr>Raw<wbr></wbr>Value</span><span class="p">)</span>
<span class="k">else</span> <span class="p">{</span>
<span class="nf">fatal<wbr></wbr>Error</span><span class="p">(</span><span class="s">"Cannot initialize Runway from JSON"</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">self</span><span class="o">.</span><span class="n">direction</span> <span class="o">=</span> <span class="n">direction</span>
<span class="k">self</span><span class="o">.</span><span class="n">distance</span> <span class="o">=</span> <span class="n">distance</span>
<span class="k">self</span><span class="o">.</span><span class="n">surface</span> <span class="o">=</span> <span class="n">surface</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
</div>
<p>With effective use of <code>guard</code> statements and the <code>map(_:)</code> method,
this isnât a particularly unappetizing chunk of boilerplate.
But itâs hard to compete with
the zero additional lines of code required of Codable.</p>
<h2>
<a class="anchor" aria-hidden="true" id="creating-performance-tests" href="#creating-performance-tests">#</a>Creating Performance Tests</h2>
<p>We use Xcodeâs built-in testing framework,
<a href="https://developer.apple.com/documentation/xctest" rel="noopener noreferrer">XCTest</a>
to measure the performance of each implementation.</p>
<p>In the setup for both tests (not shown here),
a count is specified and the corresponding data set is loaded.
Each test then decodes that data within a closure passed to the
<a href="https://developer.apple.com/documentation/xctest/xctestcase/1496290-measure" rel="noopener noreferrer"><code>measure(_:)</code></a></p>
<div class="language-swift highlighter-rouge">
<div class="highlight">
<pre class="highlight"><code><span class="kd">class</span> <span class="kt">Performance<wbr></wbr>Tests</span><span class="p">:</span> <span class="kt">XCTest<wbr></wbr>Case</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">data</span><span class="p">:</span> <span class="kt">Data</span>
<span class="k">var</span> <span class="nv">count</span><span class="p">:</span> <span class="kt">Int</span>
<span class="kd">func</span> <span class="nf">test<wbr></wbr>Performance<wbr></wbr>Codable</span><span class="p">()</span> <span class="p">{</span>
<span class="k">self</span><span class="o">.</span><span class="n">measure</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">decoder</span> <span class="o">=</span> <span class="kt">JSONDecoder</span><span class="p">()</span>
<span class="k">let</span> <span class="nv">airports</span> <span class="o">=</span> <span class="k">try!</span> <span class="n">decoder</span><span class="o">.</span><span class="nf">decode</span><span class="p">([</span><span class="kt">Airport</span><span class="p">]</span><span class="o">.</span><span class="k">self</span><span class="p">,</span> <span class="nv">from</span><span class="p">:</span> <span class="n">data</span><span class="p">)</span>
<span class="kt">XCTAssert<wbr></wbr>Equal</span><span class="p">(</span><span class="n">airports</span><span class="o">.</span><span class="n">count</span><span class="p">,</span> <span class="n">count</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">test<wbr></wbr>Performance<wbr></wbr>JSONSerialization</span><span class="p">()</span> <span class="p">{</span>
<span class="k">self</span><span class="o">.</span><span class="n">measure</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">json</span> <span class="o">=</span> <span class="k">try!</span> <span class="kt">JSONSerialization</span><span class="o">.</span><span class="nf">json<wbr></wbr>Object</span><span class="p">(</span><span class="nv">with</span><span class="p">:</span> <span class="n">data</span><span class="p">,</span> <span class="nv">options</span><span class="p">:</span> <span class="p">[])</span> <span class="k">as!</span> <span class="p">[[</span><span class="kt">String</span><span class="p">:</span> <span class="kt">Any</span><span class="p">]]</span>
<span class="k">let</span> <span class="nv">airports</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">map</span><span class="p">{</span> <span class="kt">Airport</span><span class="p">(</span><span class="nv">json</span><span class="p">:</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
<span class="kt">XCTAssert<wbr></wbr>Equal</span><span class="p">(</span><span class="n">airports</span><span class="o">.</span><span class="n">count</span><span class="p">,</span> <span class="n">count</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
</div>
</div>
<h2>
<a class="anchor" aria-hidden="true" id="benchmarking-execution-time" href="#benchmarking-execution-time">#</a>Benchmarking Execution Time</h2>
<p>You can download the Xcode project used to produce these results
<a href="https://github.com/Flight-School/CodablePerformance/" rel="noopener noreferrer">on GitHub</a>.</p>
<hr>
<p>Because <code>JSONDecoder</code> uses <code>JSONSerialization</code>
<a href="https://github.com/apple/swift/blob/d93e0dfa01ddd897ba733b6a2d43b05e2f0073f9/stdlib/public/SDK/Foundation/JSONEncoder.swift#L1105" rel="noopener noreferrer">under the hood</a>,
we should expect the performance characteristics to be similar.
And indeed thatâs what we see here:</p>
<figure>
<figcaption>
<p>Wall Clock Time (Smaller is Better)</p>
</figcaption>
<p><img src="/assets/json-wall-clock-time-chart-1bf9f6bf057c4ec27915029daf6cd5a0c69e5c094736ad6063e883d57794653b.svg" integrity="sha256-G/n2vwV8TsJ5FQKdr2zVoMaeXAlHNq1gY+iD1XeUZTs=" crossorigin="anonymous"></p>
<table>
<thead>
<tr>
<th>Count</th>
<th>JSONSerialization</th>
<th>Codable</th>
<th>Î</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0.5 ms</td>
<td>0.8 ms</td>
<td>+0.3 ms</td>
</tr>
<tr>
<td>10</td>
<td>1 ms</td>
<td>4 ms</td>
<td>+3 ms</td>
</tr>
<tr>
<td>100</td>
<td>3 ms</td>
<td>8 ms</td>
<td>+5 ms</td>
</tr>
<tr>
<td>1000</td>
<td>30 ms</td>
<td>51 ms</td>
<td>+21 ms</td>
</tr>
<tr>
<td>10000</td>
<td>382 ms</td>
<td>603 ms</td>
<td>+221 ms</td>
</tr>
</tbody>
</table>
<p><small>
Swift 4.1, Xcode 9.3 (9E145), iPhone X Simulator <br>
2017 MacBook Pro, 2.9 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3
</small></p>
</figure>
<p><strong>On average,
Codable with <code>JSONDecoder</code> is about half as fast as
the equivalent implementation with <code>JSONSerialization</code>.</strong></p>
<p>But does this mean that we shouldnât use Codable?
Probably not.</p>
<p>A 2x speedup factor may seem significant,
but measured in absolute time difference,
the savings are unlikely to be appreciable under most circumstances â
and besides, performance is only one consideration in making a successful app.</p>
<hr>
<p>If you have a codebase that uses <code>JSONSerialization</code> â
whether directly or through a third-party framework â
you might add benchmarks to see how Codable performs
against your existing implementations.
If performance is acceptable,
you could then proceed to build new functionality with <code>Codable</code>
before eventually transitioning existing code over.</p>
<p>Ultimately, every project is different,
and itâs up to you to determine whatâs right for you.</p>
<p>Codable isnât a silver bullet,
but itâs good enough that we should consider it to be our new default.
Unless you have a specific reason to use <code>JSONSerialization</code>,
Codable is an excellent choice for working with data representations.</p>MatttSwift Codable can automatically synthesize initializers that decode models from JSON. But how does this generated code compare to what it replaces?