Use C# to process bit stream-based data
0x00 Cause
Recently, we need to process some bit stream-based data. Computer processing data is generally in byte (8bit), and the same is true for data read using BinaryReader, even if Reading bool type is also a byte. However, with the help of some methods provided in the C# basic class library, bit-based data can also be read. After completing the task, I felt that bit-based data was quite interesting, so I tried using 7-bit and 6-bit encoding to encode common ASCII characters. Finally, I will write something new as a blog. On the one hand, it will be a record, and on the other hand, I hope it will be helpful to gardeners with similar needs.
0x01 Reading of bit stream data
Suppose we have a byte b = 35, and we need to read the first 4 bits and the last 4 bits into two numbers respectively, so what should we do? Although there is no ready-made method in the basic class library, it can be done in two steps by using binary strings.
1. First represent b as a binary string 00100011
2. Convert the 4 bits before and after it into numbers. The core method is:
Convert.ToInt32("0010");
In this way, bit-based data reading is achieved.
There are many ways to convert byte into binary string in the first step,
1. The simplest Convert.ToString(b,2). If there are not enough 8 bits, add 0 in the high bits.
2. You can also perform an AND operation on byte with 1,2,4,8...128 respectively, and take out the bits from low to high.
3. You can also perform an AND operation on byte and 32, then shift the byte to the left and perform an AND operation with 128 again.
The first method will generate a large number of string objects. I didn’t find much difference in the 2nd and 3rd methods. I chose 3 purely based on my feeling. The code is as follows:
public static char[] ByteToBinString(byte b) { var result = new char[8]; for (int i = 0; i < 8; i++) { var temp = b & 128; result[i] = temp == 0 ? '0' : '1'; b = (byte)(b << 1); } return result; }
In order to convert byte[] into a binary string, you can
Public string BitReader(byte[] data) { BinString = new StringBuilder(data.Length * 8); for (int i = 0; i < data.Length; { BinString.Append(ByteToBinString(data[i])); } return BinString.ToString(); }
In this way, when the byte[] data is obtained, it can be converted into a binary string and saved. According to the offset bit position and The bit length is read from the binary string and converted to bool, Int16, Int32, etc. Based on this idea, you can write a BitReader class, which uses StringBuilder to store binary strings and provides a Read method to read data from binary strings. In order to better handle the data flow, a Position is added to record the current offset. When certain Read methods are used to read data, the Position will also move accordingly. For example, if you use ReadInt16 to read data, BitReader will read 16 bits from the current position of Position and convert it to Int16 and return it. At the same time, Position will move backward by 16 bits. The way to distinguish is that when the starting offset position needs to be specified when reading data, the Position does not move. When reading directly from the current Position, the Position moves. Part of the BitReader class code is as follows:
public class BitReader { public readonly StringBuilder BinString; public int Position { get; set; } public BitReader(byte[] data) { BinString = new StringBuilder(data.Length * 8); for (int i = 0; i < data.Length; i++) { BinString.Append(ByteToBinString(data[i])); } Position = 0; } public byte ReadByte(int offset) { var bin = BinString.ToString(offset, 8); return Convert.ToByte(bin, 2); } public byte ReadByte() { var result = ReadByte(Position); Position += 8; return result; } public int ReadInt(int offset, int bitLength) { var bin = BinString.ToString(offset, bitLength); return Convert.ToInt32(bin, 2); } public int ReadInt(int bitLength) { var result = ReadInt(Position, bitLength); Position += bitLength; return result; } public static char[] ByteToBinString(byte b) { var result = new char[8]; for (int i = 0; i < 8; i++) { var temp = b & 128; result[i] = temp == 0 ? '0' : '1'; b = (byte)(b << 1); } return result; } }
Use BitReader to buff from byte[] according to 4bit = {35,12}; Reading data can be like this:
var reader = new BitReader(buff); //二進(jìn)制字符串為0010001100001100 var num1 = reader.ReadInt(4); //從當(dāng)前Position讀取4bit為int,Position移動4bit,結(jié)果為2,當(dāng)前Position=4 var num2 = reader.ReadInt(5,6); //從偏移為5bit的位置讀取6bit為int,Position不移動,結(jié)果為48,當(dāng)前Position=4 var b = reader.ReadBool(); //從當(dāng)前Position讀取1bit為bool,Position移動1bit,結(jié)果為False,當(dāng)前Position=5
0x02 Writing of bit stream data
Writing data to bit stream is a reverse process. We use the BitWriter class to implement it, in which StringBuilder is stored to save the binary String, when writing data, you need to pass in the data and specify the number of bits required to save this data. After writing is completed, the binary string saved in StringBuilder can be converted into byte[] according to 8bit and returned. The core part of BitWriter is as follows:
public class BitWriter { public readonly StringBuilder BinString; public BitWriter() { BinString = new StringBuilder(); } public BitWriter(int bitLength) { var add = 8 - bitLength % 8; BinString = new StringBuilder(bitLength + add); } public void WriteByte(byte b, int bitLength=8) { var bin = Convert.ToString(b, 2); AppendBinString(bin, bitLength); } public void WriteInt(int i, int bitLength) { var bin = Convert.ToString(i, 2); AppendBinString(bin, bitLength); } public void WriteChar7(char c) { var b = Convert.ToByte(c); var bin = Convert.ToString(b, 2); AppendBinString(bin, 7); } public byte[] GetBytes() { Check8(); var len = BinString.Length / 8; var result = new byte[len]; for (int i = 0; i < len; i++) { var bits = BinString.ToString(i * 8, 8); result[i] = Convert.ToByte(bits, 2); } return result; } public string GetBinString() { Check8(); return BinString.ToString(); } private void AppendBinString(string bin, int bitLength) { if (bin.Length > bitLength) throw new Exception("len is too short"); var add = bitLength - bin.Length; for (int i = 0; i < add; i++) { BinString.Append('0'); } BinString.Append(bin); } private void Check8() { var add = 8 - BinString.Length % 8; for (int i = 0; i < add; i++) { BinString.Append("0"); } } }
Here is a simple example:
var writer = new BitWriter(); writer.Write(12,5); //把12用5bit寫入,此時(shí)二進(jìn)制字符串為:01100 writer.Write(8,16); //把8用16bit寫入,此時(shí)二進(jìn)制字符串為:011000000000000001000 var result = writer.GetBytes(); //8bit對齊為011000000000000001000000 //返回結(jié)果為[96,0,64]
0x03 7-bit character encoding
Our commonly used ASCII characters are encoded using 8bit, but the really commonly used characters are only 7bit, and the highest bit is 0, so for an English article, we can use 7bit to re-encode without losing information. The encoding process is to take out the article characters in sequence, write them in 7bit using BitWriter, and finally obtain the newly encoded byte[]. In order to be able to read correctly, we stipulate that when the 8-bit data is read as 2, it means the beginning of the data, and the next 16-bit data is the number of subsequent characters. The code is as follows:
public byte[] Encode(string text) { var len = text.Length * 7 + 24; var writer = new BitWriter(len); writer.WriteByte(2); writer.WriteInt(text.Length, 16); for (int i = 0; i < text.Length; i++) { var b = Convert.ToByte(text[i]); writer.WriteByte(b, 7); } return writer.GetBytes(); }
When reading data, we first look for the start identifier, then read out the number of characters, and read the characters in sequence according to the number of characters. The code is as follows:
public string Decode(byte[] data) { var reader = new BitReader(data); while (reader.Remain > 8) { var start = reader.ReadByte(); if (start == 2) break; } var len = reader.ReadInt(16); var result = new StringBuilder(len); for (int i = 0; i < len; i++) { var b = reader.ReadInt(7); var ch = Convert.ToChar(b); result.Append(ch); } return result.ToString(); }
Due to the existence of the data header, when encoding After encoding only a few characters, the data becomes longer
不過隨著字符越多,編碼后節(jié)省的越多。
0x04 6比特字符編碼
從節(jié)省數(shù)據(jù)量的角度,如果允許損失部分信息,例如損失掉字母大小寫,是可以進(jìn)一步減少編碼所需比特?cái)?shù)的。26個字母+10個數(shù)字+符號,可以用6bit(64)進(jìn)行編碼。不過使用這種編碼方式就不能用ASCII的映射方式了,我們可以自定義映射,例如0-10映射為十個數(shù)字等等,也可以使用自定義的字典,也就是傳說中的密碼本。經(jīng)常看國產(chǎn)諜戰(zhàn)片的應(yīng)該都知道密碼本吧,密碼本就是一個字典,把字符進(jìn)行重新映射獲取明文,算是簡單的單碼替代,加密強(qiáng)度很小,在獲取足量數(shù)據(jù)樣本后基于統(tǒng)計(jì)很容易就能破解。下面我們就嘗試基于自定義字典用6bit重新編碼。
編碼過程:
仍然像7bit編碼那樣寫入消息頭,然后依次取出文本中的字符,從字典中找到對應(yīng)的數(shù)字,把數(shù)字按照6bit長度寫入到BitWriter
public byte[] Encode(string text) { text = text.ToUpper(); var len = text.Length * 6 + 24; var writer = new BitWriter(len); writer.WriteByte(2); writer.WriteInt(text.Length, 16); for (int i = 0; i < text.Length; i++) { var index = GetChar6Index(text[i]); writer.WriteInt(index, 6); } return writer.GetBytes(); } private int GetChar6Index(char c) { for (int i = 0; i < 64; i++) { if (Dict.Custom[i] == c) return i; } return 10; //return * }
解碼過程:
解碼也很簡單,找到消息頭,依次按照6bit讀取數(shù)據(jù),并從字典中找到對應(yīng)的字符:
public string Decode(byte[] data) { var reader = new BitReader(data); while(reader.Remain > 8) { var start = reader.ReadByte(); if (start == 2) break; } var len = reader.ReadInt(16); var result = new StringBuilder(len); for (int i = 0; i < len; i++) { var index = reader.ReadInt(6); var ch = Dict.Custom[index]; result.Append(ch); } return result.ToString(); }
同樣一段文本用6bit自定義字典編碼后數(shù)據(jù)長度更短了,不過損失了大小寫和換行等格式。
如果從加密的角度考慮,可以設(shè)置N個自定義字典(假設(shè)10個),在消息頭中用M bit(例如4bit)表示所用的字典。這樣在每次編碼時(shí)隨機(jī)選擇一個字典編碼,解碼時(shí)根據(jù)4bit數(shù)據(jù)選擇相應(yīng)字典解碼,并且定時(shí)更換字典可以增大破解難度。感興趣的園友可以自行嘗試。
0x05 寫在最后
以上是我處理比特流數(shù)據(jù)的一點(diǎn)心得,僅僅是我自己能想到的一種方法,滿足了我的需求。如果有更效率的更合理的方法,希望賜教。另外編碼和解碼的兩個例子是出于有趣寫著玩的,在實(shí)際中估計(jì)也用不到。畢竟現(xiàn)在帶寬這么富裕,數(shù)據(jù)加密也有N種可靠的多的方式。
示例代碼:https://github.com/durow/TestArea/tree/master/BitStream

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)