Enter Text Here

Base64:

Base64 encoding is an algorithm which converts a byte stream to a printable character stream. The root of a binary-to-text encoding system such as Base64 is the need to send a stream of bytes over a medium of communication that does not allow binary data, but only text-based data.

Base64 Encoding

What is Base64?

To represent its data, Base64 uses 64 characters that are available in most text encoding schemes. Base64 is a mechanism that allows binary data to be represented and transferred over media that only allows for printable characters. “Base Encoding” is the most common type, with Base16 and Base32 being the others known to be in use.

7 bit ASCII characters, on the other hand, contain a selection of 94 printable and 33 non-printable characters. 64 is the maximum power of 2 that can be expressed using only printable characters, most of which are common among existing character encodings, most notably ASCII.

Base64 Encoding

The process of encoding the input stream is fairly simple.

  1. The stream of the octet reads from left to right.
  2. Three 8-bit groups are concatenated into a 24-bit group within the input source.
  3. This 24-bit group is further viewed as four correctly justified 6-bit groups using zeros. For the simple reason that 6 bits would cover the spectrum of printable characters, the grouping into 6 bits is
  4. Then each of these 4 classes is encoded.

A case where the stream of the input bit comprises less than 24 characters.

Let’s assume that we want to use the Base-64 alphabet to encode the string ‘ORACLE.’

An input stream of 8-bit bytes is processed by the encoding algorithm. It is assumed that this stream is ordered first with the most important bit: the first bit in the first byte is the high-order bit, the eighth bit in this byte is the low-order bit, and so on.

These bytes are grouped from left to right into 24-bit classes. Each group is viewed as four 6-bit groups that are concatenated. Every 6-bit group indexes 64 printable characters into an array; the resulting character is printed out.

Algorithm of Base64 Encoding

The Base64 encoder requires only 64 characters in the ASCII character set (A-Z, a-z, 0-9, +, /).

In order to represent all the characters in the table, it is worth noting that the Base64 encoder requires only 6 bits. ASCII characters use 8 bits, so 24 bits would be the smallest number that can be encoded effectively without bit manipulation (the lowest number 6 and 8 would be split without any remainder).

The Data Encoding Algorithm in Base64 format:

  1. Get the data byte representation as info.
  2. Retrieve the first six bits from the byte string, convert to decimal, and look up the Base64 character set table to get the corresponding value, then apply the value to the encoded string.
  3. Repeat this step until you hit the last set of bits, pad up to six bits with the last set of bits with zero to the right if they are not up to six, and search the table for the corresponding value.

This is quite sufficient to encode our data in base64 format, but not enough to decode when they get to Application C, base64 is outside the reach of this tutorial, but the additional rules needed for easy decoding are here:

Calculate the string length of the byte and break it by 24:

  1. We add “=” to the encoded string if the remainder is 18, which means that the encoded string is 6 bits short.
  2. We add “==” to the encoded string if the remaining string is 12, meaning that the encoded string is 12 bits short.
  3. We do not add anything to the encoded string if the remainder is 6, so this means that the encoded string is 18 bits short.

Uses of Base64 Encoding

In the Base64 spec, you will not find any mention of HTML. Instead, the authors simply note that Base64 encoding is used in environments where the storage or transmission of data is restricted to ASCII characters, possibly for legacy purposes. More or less, the browser and its heavy consumption of HTML, JSON, CSS and JavaScript are summed up by this notion.

This text is increasingly being encoded using UTF-8, an ASCII superset. Base64 encoding finds different niche apps in this text-heavy ecosystem.

URLs of Data

The scheme is the first component of a URL. It is the string for the prefix that goes before the first colon. The scheme tells the client how and what protocol to follow to retrieve the resource. URLs are often rendered extensible by the scheme prefix and suitable for future protocols. We may create a new URL scheme for it if a new protocol comes along, and still define resources by URL.

One such extension, which we saw in the image encoded in the introduction, is the data system. This scheme tells clients, “The data for my resource is located in the rest of this URL string right here.”

Source Map

In source maps, another common but less obvious use of Base64 encoding is. The Base64 encoding for the mapping field is used here. The Base64 encoded binary data of integers encoded as variable-length amounts are comma and semicolon delimited snippets.

Photos and source maps are only a few areas where Base64 encryption is used.

Applications of Base64 encoding

Table of Base64 Encoding

VALUE BASE64 ENCODE
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
8 I
9 J
10 K
11 L
12 M
13 N
14 O
15 P
16 Q
17 R
18 S
19 T
20 U
21 V
22 W
23 X
24 Y
25 Z
26 a
27 b
28 c
29 d
30 e
31 f
32 g
33 h
34 i
35 j
36 k
37 l
38 m
39 n
40 o
41 p
42 q
43 r
44 s
45 t
46 u
47 v
48 w
49 x
50 y
51 z
52 0
53 1
54 2
55 3
56 4
57 5
58 6
59 7
60 8
61 9
62 +
63 /

Applications of Base64 Encoding

In addition to being used as a method for content encoding within MIME, Base64 has been used for other purposes too.

Obfuscation of material

For example, when exchanged between applications, it is used for simple obfuscation of data. Of course, to obtain the original set of bytes, any Base64 encoded string can be reverse engineered. Therefore, any decent encryption method cannot be replaced.

Handling of binary content in Web Services

Base64 can also be used for sending/receiving binary content messages from web services. Notice that, because of the size bloat-up induced by the Base-64 transformation, this is not an effective process for large payloads.

Sending the payload as an attachment using SOAP with Attachments or Message Transmission Optimization Method [MTOM] is recommended for such use-cases.

XML and Base64

XML documents may be binary content carriers as well. Within any XML 1.0 text, binary data can be encoded with base-64 and defined inline. The Unicode repertoire belongs to data inside XML entities.

The XML Document may announce its character encoding as part of a character encoding declaration, or any entity within the XML Document. This is generally defined as a part of the XML document declaration for the whole document.

The character encodings provided by an XML processor may differ, but XML 1.0 compliance requires XML documents encoded using the Unicode transformation type to be accepted.

Base64 and Schema XML

The XML Schema datatype library defines a core datatype that contains base64-encoded binary data in its value space. It is referred to as base64Binary. This helps to promote binary element content description. Bas64Binary is the datatype used in BPEL PM to describe opaque content within your messages.

You may have seen the use of this datatype when designing an interactive adapter.

The product provides a Base64Encoder utility for specifically encoding base64 to a text, XML or otherwise. There are no XPath extension functions that allow documents to be encoded by base64.

To achieve the desired results, the utility can be used from inside a Java embedding operation.

Variants of encoding

Variants of Base64 Encoding

There have been many Base64 variants devised. Some variants require that, with each line not reaching a certain length limit, the encoded output stream be split into multiple fixed length lines and separated by a line separator from the next line. I will describe the three variants supported by the Base64 API of Java 8.

Basic

A Base64 variant known as Basic is described in RFC 4648. For encoding and decoding, this variant uses the Base64 alphabet presented in Table 1 of RFC 4648 and RFC 2045. The encoder treats the output stream encoded as one line; there are no line separators. The decoder rejects an encoding outside the Base64 alphabet that contains characters. Note that it is possible to override these and other provisions.

MIME

A Base64 variant known as MIME is described in RFC 2045. This variant uses RFC 2045’s Base64 alphabet for encoding and decoding. The encoded output stream is organized into lines not exceeding 76 characters; each line is separated by a line separator from the next line. During decoding, all of the line separators or other characters not found in the Base64 alphabet are ignored.

Filename and URLs Secure

A Base64 variant known as URL and Filename Safe is defined in RFC 4648. This version uses RFC 4648’s Base64 alphabet for coding and decoding. There are no output row separators. The decoder rejects an encoding outside the Base64 alphabet that includes characters.

In the form of long binary data and HTTP GET requests, Base64 encoding is advantageous. The idea is to encrypt this information and then add it to the HTTP GET URL. If the Simple or MIME version were used, it would be appropriate to encode any + or / characters in the encoded data into hexadecimal sequences. It will take much longer for the resulting URL string.

Disadvantages of Base64 Encoding

Now that Base64 increases the file size in a way that Gzip can’t really help us with, we’re pretty clear, but that’s just a small part of the issue. A single image can weigh well above 232K.

About Pictures

To understand how poor Base64 is, we first need to understand how good the pictures are.

Yes, photos are a problem. They’re the number one contributor to page bloat, in fact. Pictures make up about 1623K (or 65.46 percent) of the average web page as of 2 December 2016. By contrast, that makes our 232K stylesheet seem like a drop in the ocean.

A browser will begin rendering a page irrespective of whether or not the images have arrived. Heck, even when images never arrive at all, a browser will render a page! Images are not critical resources, so they are not a bottleneck, although they make up an excessive number of bytes over the wire.

Concerning Fonts

So far, I’ve only discussed photos, but fonts are almost exactly the same except for some nuance on how browsers handle Unstyled/Invisible Text Flash (FOUT or FOIT). Fonts equal 166K of uncompressed CSS in this project (124K Gzipped (there’s that horrible compression delta again).

For up to 3 seconds, Chrome and Firefox display no text at all. The text swaps from invisible to your custom font when the web font arrives over the next three seconds. If after 3 seconds the font still hasn’t arrived, the text swaps from invisible to whatever fallback(s) you’ve identified.

About Caching

Base64 also affects our ability to have more advanced caching strategies: they are all regulated by the same law by coupling our fonts, images, and styles together. This implies that we have to download hundreds of kilobytes of styles, images, and fonts if we change only one hex value in our CSS anywhere, a change that could reflect up to six bytes of new data.

Base64 encoding implies that we do not store things individually depending on their rate of change, and also implies that if something else changes, we have to cache unrelated things. This is a lose-lose case.

Frequently Asked Questions (FAQs)

Here are given some FAQs regarding Base64 Encoding.

What’s the point of encoding Base64?

In order to transmit the data without loss or alteration of the content itself, Base64 is a way to encode binary data into an ASCII character set recognized by almost any computer device. Mail systems, for example, are unable to deal with binary data because they expect (textual) ASCII data.

How are you reading Base64?

In programming, Base64 is a group of binary-to-text encoding schemes that, by converting data into a radix-64 representation, represent binary data (more precisely, an 8-bit byte sequence) in an ASCII string format. The word Base64 originates from a particular encoding for MIME content transfer.

Why does it end with Base64 with ==?

The final sequence ‘==’ indicates that only one byte was in the last category, and ‘=’ indicates that it contained two bytes. This is, therefore, a form of padding. Oh, no. The Base64-encoded string must be padded to 4 characters in length, so that it can be correctly decoded.

Conclusion

I hope you find this article interesting and can learn a lot about the encoding of base64. The basics of base64, its encoding method, application uses, and algorithms can be learned from the above article.