Monday, June 7

Upcoming on the Meta-Plane.



Don't be surprised if posts follow hot and heavy so to speak for the next few days - Right now I'm ripe with ideas. Of course, I'm not promising anything because sometimes even the best ideas stall out when the magnitude of actually committing them to writing dawns on the author! I like to be fairly thorough when I explain something, or do anything else for that matter, and while I realize this is a blog and not a collection of PhD dissertations, I still have to fight the tendency to keep going on and on about my topic - a 'feature' of my character you might have noticed in some of my existing posts and especially in some of my comments. I'm probably the only person on this blog who routinely curses Blogger's comment limit of 4096 characters. What do I do about it? I just post multiple comment in succession, of of course!

4096 is 2^12 by the way, and if each character is stored as a single byte (which consists of 8-bits, but never mind that), then that means a full comment will occupy roughly 4k (4,096 bytes) of memory on Google's storage farms. That's not much in this day and age of megabytes, gigabytes, and terabytes, but obviously the programmers have decided that 4 kilobytes is enough for a substantive comment, especially when you add up all the users on Blogger and realize that they're getting this online storage for free. And to top it off, with Unicode becoming the dominant text-standard over ASCII due to its ability to handle much larger character sets and letters and glyphs from non-Latin languages, the situation is probably even more of a concern. ASCII is an 8-bit system like that described above, where each character (a letter, number, punctuation mark, or even a space) take 1 byte to store.

The problem with ASCII is that it was developed long ago when computers largely lived in universities and large corporations, and mostly American ones at that. Even French, German, Italian, Spanish, and so on could be handled quite easily, since they all use the same Latin alphabet and Arabic numerals more or less, the odd accented or umlauted or circumspected (to get very lose with my use of the English language) notwithstanding. ASCII still provided enough "headroom" to incorporate all these characters. One byte consists of eight bits, each of which can be a 1 or a 0 (on or off, or whatever metaphor you like). From basic math we know that a string of 8 bits yields a total possible number of combinations of 2 (the number of possible states of the bit - 1 or 0) to the power of the number of bits in a byte. A byte therefore can "hold" 2^8, or 256, discrete values. For a long time this was enough for the Latin alphabet, punctuation, numbers, and even some Greek characters, arrows, wingdings, and so on. For the original (non-extended) ASCII set see here.

On the other hand, 256 is not nearly enough characters when you start adding in the non-Latin languages of the world. So a standard called Unicode was created, which has now been though many iterations and revisions, and has taken a long to replace ASCII, but it finally has done to a large extent (thanks mostly to the Web). So how were programmers going to make room for all those extra character? Pretty straightforwardly, it turns out. They allocated TWO BYTES for each rather than one; therefore doubling the amount of space in the computers memory or hard drive, but exponentially growing the number of possible combinations a 16-bit "word" (to use the jargon) could accomodate. Rather than 256, Unicode's 16 bits in essence created 2^16 little labeled cubby holes. 2^16 is 65,536! The problem seemed solved. And indeed, Unicode is what the vast majority of OSs, browsers, web sites, and applications use these days. You can represent a 256-fold increase in the number of discrete symbols while only doubling the memory footprint.

So I imagine that the 4096 characters that Blogger allows us actually occupies 8k on disk. Again, a very small size these days, but then, in case you haven't noticed, there are a hell of a lot of people in the world. At some point a revised standard using 32 bits (4 bytes) may have to be adopted, which would again double the memory requirement to store each character but would allow for millions of letters, numbers, punctuation, wingdings, inflections, complex Chinese ideograms, glyphs, typesetter's marks, flourishes, ligatures, and almost anything else you can imagine. This is very simplified explanation of how character encoding works in a computer; wars have nearly been fought over proposed standards, as well as what character actually fills each little empty box in a given scheme. There are multiple ASCII standards, multiple Unicode standards, and other standards you'll probably never hear about.

The ENIAC - the first true electronic computer.

In summary, I wish Blogger/Google would increase the maximum comment space to double-precision, allowing for many, many words to be written while only doubling the size on their end. But hey, it's free, so we can't really complain too vociferously. Or we can, but we'll be ignored or labeled as ingrates.

Just for fun, you hear a lot about bit-depth when it comes to color; old systems like the Apple II and the Atari were 8-bit (256 colors), the Sega Genesis was capable of outputting 16-bit color (65,536 hues), and for more than a decade now the standard on both PCs (a Mac is a PC, by the way, in the sense that it's a Personal Computer. It's not a Windows PC though, which causes much confusion in people who were born without a brain) and modern gaming consoles has been 32-bit color, which allows for a palette of literally millions of colors and is responsible for the astounding realism in games and CAD and animation you see these days. Will 64-bit color ever be needed? Apple and Adobe think so, as they're both building support for that standard into their products even now.

Generally speaking, 8-bit is dead. 16-bit continues to be valuable for graphics apps, photography and digital editing, and for the large character sets required to meet the world's needs with one standard. 32-bit color is also important for gaming, photography, scanning, and editing (and for Hollywood CGI effects); but if we go any further than that, current display technology (i.e., your monitor or HD television) will be unable to render so many colors distinctly, decoding video from a Blu-ray disk will take forever and require more horsepower for little to no noticeable gain, and video cards for PCs will render any games who dared to use that many colors as slideshows... in other words, unplayable.

Music in digital form, just like graphics and text, is stored in the same way, which is why you hear about different "bitrates" when you go to pirate music - you soon find out that higher bitrates give you better sound, but at some point the returns begin to diminish (asymptotically toward lossless fidelity, if you want to know); while file sizes increase linearly and start to eat up more space on your iPod than you like. Music works a bit different (mostly it's the nomenclature and not any real physical difference) than graphics or text, but it's always going to be a trade-off in terms of quality vs size. When most people were on dial-up and MP3 technology was in its youth and hard drive space at a premium, you saw lots of music "on the web" encoded as 128 kbit/s or even 96 kbit/s or 48 kbit/s. It sounded a little tinny but that was the situation. With much more bandwidth available to people generally these day, along with bigger hard drives and more efficient compression schemes than MP3 (eg, AAC and OGG), most people are shifting to ripping or downloading 160 kbit/s, 192 kbit/s or even 320 kbit/s music. 320 kbit/s is the last stop on the quality train before you enter Lossless Station, which is a 1:1 state of noncompression, and at current CD-mastering technology is rated at about 1,411 kbit/s.

So you can see that the bitrates 128, 160 and 192 kbit/s represent compression ratios of approximately 11:1, 9:1 and 7:1 respectively, still a huge space savings over lossless formats like M4A, FLAC, and WAV.

Graphic compression algorithms like JPG, GIF, and PNG, and video compression like MPEG-2, MPEG-4 (DivX), and H.264 follow the same analogy as music. The more you compress, the smaller the file but the worse the quality. There's no free lunch - just ask Rudolf Clausius.

~

This isn't what I really wanted to say, though.

Actually I wanted to let you know what the next week or so on the 'Plane will be like. I'm working on some educational pieces, musings, and poetry (and hopefully some photos of my own) with some solid substance, but that stuff takes inspiration and writes itself when it's ready.

In the meantime I may post an inordinate number of quizzes, surveys, polls, and some other interesting but not-too brain taxing things, to keep the blog moving while I hopefully complete some new changes to the structure of the blog itself, and also finish up some of the higher-quality and original work I mentioned. So if you like polls and quizzes, be sure to check back often! If not, then don't lose faith, more prose and poetry of my own creation will come soon... Also, I may post short little questions to YOU in an attempt to start interesting conversations. And of course, if anyone has a question for the Oracle or just a subject they find interesting, by all means submit it to my gmail address, and I'll be sure to make sure it sees the light of day.

Lol... I was just thinking about what a typical post this is for this site. If I ever become a "real" author, I'm gonna need one hell of an editor to cut all my tangents and irrelevant babblings. I can't help it - midway through a sentence something about what I'm typing will occur to me, like say the difference between a hyphen, an en-dash, and an em-dash, and I just start in on it and usually lose the main topic, or come back to it at the end. *sigh*

3 comments:

billybytedoc said...

I may be the only reader and made it all the way to the bottom and understood it all ( laughing ).

Interesting comment: the true ASCII Set is a 7 bit set with 128 characters. The 8th bit was used for parity checking I think. There is an "extended" ASCII of 256 chars which uses all 8 bits.

Unicode rules.

Please no 32 bits. hehe

Metamatician said...

Oh yeah! 7+1, that's right. Thanks for the reminder. I'll update the text.

It was the extended ASCII set I was thinking of then. Long time ago!

An Gabhar Ban said...

I'll sheepishly raise my hand and say: "I read it all but I'm not sure it sunk in." Might take another reading or two. :D

Archived Posts

Search The Meta-Plane