1. About the code
Only by arranging whether
electricity flows (1) or does not flow (0) on the computer, people feel
uncomfortable with the expression of the number system or language used in real
life. Therefore, a promise is made to have a certain meaning for a specific
binary sequence, which is collectively called a code.
Although there is no
separate standard, it can be divided into two main categories: codes for
reducing or checking errors in computer devices, and codes for comfortable
writing of numbers and characters. There are more types
of codes than the ones below, but let's look at only a few codes for the
convenience of people.
2. 8421 code
Humans use decimal numbers by
default, but computers use binary numbers by default, so readability is poor.
In particular, it is inconvenient to convert whenever the number of digits of
decimal notation increases or whenever a decimal representation other than the
decimal system such as date/time is used.
For this reason, 4 binary digits are allocated to each decimal number,
and 10 fixed arrays corresponding to 010 to 910 are used,
other expressible arrays are not used. This code is called BCD (Binary Coded
Decimal). BCD includes 8421 code, excess-3 code, 2421 code, 5421 code, 51111
code (5 binary digits), etc. Among them, 8421 codes are used as representative.
The 8421 code gives weights of 8, 4, 2, and 1 to each of the 4 binary
digits, and the correspondence with the decimal number is the same as that of
the basic binary number. If we change the decimal number 16910 to
the 8421 code, it is as follows.
Decimal |
1 |
2 |
7 |
|||||||||
Weight |
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
8421 code |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
Assign a decimal number to 4
binary digits for each digit, and place the weights in order so that the sum of
the decimal and binary numbers is equal. For decimal numbers 010 to
910, only binary numbers 00002 to 10012 are
supported, so it is not used for the rest of 10102 to 11112.
There is a difference in the number of digits between the 8421 code
and the general binary number, so when converted to binary like 12710,
even if the binary number ends with 7 digits, the 8421 code allocates 12 digits
and is often used unnecessarily. In addition, 6 digits are empty between 910
and 1010, so be careful when performing arithmetic operations.
3. ASCII
Human numbers can be used in
correspondence with binary numbers, but since computers do not have the concept
of characters, it is possible to express words or sentences only by matching
the entire character system. In the United States, in 1963, a code was created
to correspond to 7 binary digits by combining punctuation marks, numbers, uppercase
and lowercase of Roman characters (Latin characters), this is called ASCII (American
Standard Code for Information Interchange). It is used all over the world to
the extent that it is used as a standard for information transmission by adding
1 digit of parity for error checking to 7 digits of ASCII.
Besides ASCII, there is an EBCDIC(Extended Binary Coded Decimal
Interchange Code) that is extended from BCD to express characters, but it is an
8-digit code, with more inconspicuous symbols and due to the inconvenience of
using ASCII, so it is not used as well as ASCII.
The table below is the known ASCII correspondence table. (The
corresponding binary number is long, so it is replaced with a hexadecimal
number.)
Hex |
Char |
Dec |
Hex |
Char |
Dec |
Hex |
Char |
Dec |
Hex |
Char |
|
010 |
0016 |
NUL |
3210 |
2016 |
Space |
6410 |
4016 |
@ |
9610 |
6016 |
` |
110 |
0116 |
SOH |
3310 |
2116 |
! |
6510 |
4116 |
A |
9710 |
6116 |
a |
210 |
0216 |
STX |
3410 |
2216 |
“ |
6610 |
4216 |
B |
9810 |
6216 |
b |
310 |
0316 |
ETX |
3510 |
2316 |
# |
6710 |
4316 |
C |
9910 |
6316 |
c |
410 |
0416 |
EOT |
3610 |
2416 |
$ |
6810 |
4416 |
D |
10010 |
6416 |
d |
510 |
0516 |
ENQ |
3710 |
2516 |
% |
6910 |
4516 |
E |
10110 |
6516 |
e |
610 |
0616 |
ACK |
3810 |
2616 |
& |
7010 |
4616 |
F |
10210 |
6616 |
f |
710 |
0716 |
BEL |
3910 |
2716 |
‘ |
7110 |
4716 |
G |
10310 |
6716 |
g |
810 |
0816 |
BS |
4010 |
2816 |
( |
7210 |
4816 |
H |
10410 |
6816 |
h |
910 |
0916 |
TAB |
4110 |
2916 |
) |
7310 |
4916 |
I |
10510 |
6916 |
i |
1010 |
0A16 |
LF |
4210 |
2A16 |
* |
7410 |
4A16 |
J |
10610 |
6A16 |
j |
1110 |
0B16 |
VT |
4310 |
2B16 |
+ |
7510 |
4B16 |
K |
10710 |
6B16 |
k |
1210 |
0C16 |
FF |
4410 |
2C16 |
, |
7610 |
4C16 |
L |
10810 |
6C16 |
l |
1310 |
0D16 |
CR |
4510 |
2D16 |
- |
7710 |
4D16 |
M |
10910 |
6D16 |
m |
1410 |
0E16 |
SO |
4610 |
2E16 |
. |
7810 |
4E16 |
N |
11010 |
6E16 |
n |
1510 |
0F16 |
SI |
4710 |
2F16 |
/ |
7910 |
4F16 |
O |
11110 |
6F16 |
o |
1610 |
1016 |
DLE |
4810 |
3016 |
0 |
8010 |
5016 |
P |
11210 |
7016 |
p |
1710 |
1116 |
DC1 |
4910 |
3116 |
1 |
8110 |
5116 |
Q |
11310 |
7116 |
q |
1810 |
1216 |
DC2 |
5010 |
3216 |
2 |
8210 |
5216 |
R |
11410 |
7216 |
r |
1910 |
1316 |
DC3 |
5110 |
3316 |
3 |
8310 |
5316 |
S |
11510 |
7316 |
s |
2010 |
1416 |
DC4 |
5210 |
3416 |
4 |
8410 |
5416 |
T |
11610 |
7416 |
t |
2110 |
1516 |
NAK |
5310 |
3516 |
5 |
8510 |
5516 |
U |
11710 |
7516 |
u |
2210 |
1616 |
SYN |
5410 |
3616 |
6 |
8610 |
5616 |
V |
11810 |
7616 |
v |
2310 |
1716 |
ETB |
5510 |
3716 |
7 |
8710 |
5716 |
W |
11910 |
7716 |
w |
2410 |
1816 |
CAN |
5610 |
3816 |
8 |
8810 |
5816 |
X |
12010 |
7816 |
x |
2510 |
1916 |
EM |
5710 |
3916 |
9 |
8910 |
5916 |
Y |
12110 |
7916 |
y |
2610 |
1A16 |
SUB |
5810 |
3A16 |
: |
9010 |
5A16 |
Z |
12210 |
7A16 |
z |
2710 |
1B16 |
ESC |
5910 |
3B16 |
; |
9110 |
5B16 |
[ |
12310 |
7B16 |
{ |
2810 |
1C16 |
FS |
6010 |
3C16 |
< |
9210 |
5C16 |
\ |
12410 |
7C16 |
| |
2910 |
1D16 |
GS |
6110 |
3D16 |
= |
9310 |
5D16 |
] |
12510 |
7D16 |
} |
3010 |
1E16 |
RS |
6210 |
3E16 |
> |
9410 |
5E16 |
^ |
12610 |
7E16 |
~ |
3110 |
1F16 |
US |
6310 |
3F16 |
? |
9510 |
5F16 |
_ |
12710 |
7F16 |
DEL |
In this table, 010(0016) to 3110(1F16) and 12710(7F16) are called control characters and are not symbols representing actual characters. Excluding this control character, it consists of 10 numbers from 4810(3016), 26 uppercase letters from 6510(4116), 26 lowercase letters from 9710(6116), and the rest of the punctuation marks. Using these letters, numbers, and symbols, it is possible to express words, sentences, and numbers.
4. Unicode
As ASCII is used worldwide,
characters of each country are also assigned to codes by adding digits(12810(8016)~)
based on ASCII as needed. However, if an independently created character
sequence is exchanged globally, there is no way to be compatible. Accordingly,
in 1991, a coding system that handles all characters in the world was created
and announced, which is Unicode (Unique, Universal, and Uniform character
encoding).
The purpose of the code can be seen from the name, and characters
from the past, including modern ones, are continuously being added with the
goal of fully expressing all writing systems. If ASCII is 128 characters with 7
binary digits, Unicode has 21 binary digits, and the number of possible codes
exceeds 1 million characters. Even after allocating characters used up to the
present time, there is room for symbols, so pictures, emoticons, and game
symbols are sometimes added.
In the Unicode system, symbols are expressed using 4 to 6
hexadecimal digits as a prefix with U+. Here are a few examples that are
actually used:
From U+0020 to U+007F (127 characters): Basic Latin (same content as ASCII)
From U+0250 to U+02AF (96 characters): IPA Extensions
From U+0370 to U+03FF (135 characters): Greek
From U+3131 to U+318E (94 characters): Hangul Compatibility Jamo
From U+AC00 to U+D7A3 (11252 characters): Hangul Syllables
From U+1F300 to U+1F5FF (768 characters): Miscellaneous Symbols and Pictographs
From U+1F600 to U+1F64F (80 characters): Emoticons
Characters written in 4 hexadecimal digits belong to the Basic Multilingual Plane(BMP, U+0000~U+FFFF), and most of the characters used today fall into this category. In addition, there are Supplementary Multilingual Plane(SMP, 10000~1FFFF), Supplementary Ideographic Plane(SIP, U+20000~2FFFF), and Tertiary Ideographic Plane(TIP, U+30000~3FFFF). Supplementary multilingual is rarely used, and most of the ideograms are assigned Chinese characters as symbols.
If you visit https://www.unicode.org/roadmaps/bmp/index.html, you can check all currently used characters. This makes it possible to read and write all characters equally around the world just by-passing information arranged in binary.
5. Conclusion
Know only the type of code, and if necessary, find a table on the Internet and use it.
No comments:
Post a Comment