字符编码检测

一、题目
猜测输入文本的编码格式UTF-8 UTF-16 GBK 三选一，以上都不是则为OTHER
要求：不能使用语言提供的转码功能
输入：不同编码格式的文本
输出：UTF-8、UTF-16、GBK、OTHER

二、注意事项
1.测试用例不含BOM头部，其中UTF-16编码采用小端存储
2.由于牛客网字符编码有兼容性问题，故对输入文本进行再编码预处理，并规定前三个字符为文本长度。例如，文本内容（UTF8编码）为123的16进制表示形式为313233，长度为6位，故进行再编码处理后为006313233，并将其以字符串形式保存为UTF8编码格式的文本
3.预处理函数
1.C语言：
//预处理还原函数
void hexStringToIntArray(const unsigned char* buff,unsigned char* retBuff,long *retSize){
//由于牛客网字符编码兼容性有问题，故将输入数据再编码为了16进制格式,并约定前三个数字为文本长度，这里需要将再编码的16进制还原为原始的二进制数据
unsigned int xbuffLength = 0;

//编码预处理的时候把‘\n’去掉了，所以约定前三位为文本长度，这里取前三位,获取文本长度
unsigned char temp[4];
temp[0]=buff[0];
temp[1]=buff[1];
temp[2]=buff[2];
temp[3]='\0';
sscanf(temp, "%x", &xbuffLength);

//还原
for (unsigned int i = 3; i < (xbuffLength+3); ++i){
if ((i-3) % 2 != 0) {
unsigned int a;
unsigned char temp[3];
temp[0]=buff[i-1];
temp[1]=buff[i];
temp[2]='\0';
sscanf(temp, "%x", &a);
retBuff[(*retSize)++] = a;
}
}

}
使用实例：
int main() {
unsigned char buff[2048];
fscanf(stdin, "%s",buff);

unsigned char xbuff[2048];
long length = 0;

//将预处理的编码还原回去
hexStringToIntArray(buff, xbuff, &length);

//考生需要实现的函数
char *str = verifyStringEncode(xbuff,length);
printf("%s\n", str);
return 0;
}
2.Java语言：
十六进制字符串转short []的Java代码：
static short[] hexStringToShortArray(String s) {
short[] data = new short[s.length() / 2];
int i = 0;
int k = 0;
while (i < s.length()) {
try {
char c1 = s.charAt(i);
char c2 = '\0';
if (i + 1 < s.length()) {
c2 = s.charAt(i + 1);
}

String temp = c1 + "";
if (c2 != '\0') {
temp = c1 + "" + c2;
}
data[k] = Short.parseShort(temp, 16);
} catch (NumberFormatException e) {
return null;
}
i = i + 2;
k++;
}
return data;
}
三、编码格式介绍
UTF-8
UTF-8是在互联网上使用最广的一种Unicode的实现方式，是Unicode的实现方式之一。UTF-8 最大的一个特点，就是它是一种变长的编码方式。它可以使用1~6个字节表示一个符号，根据不同的符号而变化字节长度。
UTF-8 的编码规则很简单，只有二条：
1）对于单字节的符号，字节的第一位设为0，后面7位为这个符号的 Unicode 码。因此对于英语字母，UTF-8 编码和 ASCII 码是相同的。
2）对于n字节的符号（n > 1），第一个字节的前n位都设为1，第n + 1位设为0，后面字节的前两位一律设为10。剩下的没有提及的二进制位，全部为这个符号的 Unicode 码。
下表总结了编码规则，字母x表示可用编码的位。
字节数 | UTF-8编码方式
| （二进制）
----------------------+---------------------------------------------
1字节 | 0xxxxxxx
2字节 | 110xxxxx 10xxxxxx
3字节 | 1110xxxx 10xxxxxx 10xxxxxx
4字节 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
5字节 | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx
6字节 | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx
跟据上表，解读 UTF-8 编码非常简单。如果一个字节的第一位是0，则这个字节单独就是一个字符；如果第一位是1，则连续有多少个1，就表示当前字符占用多少个字节。此外，UTF-8编码的文本文档，有的带有BOM (Byte Order Mark, 字节序标志)，即0xEF, 0xBB, 0xBF，有的没有。

GBK
GB2312是1981年开始实施的一套汉字处理的编码方案，GB是“基本”的意思，GB2312是对ASCII进行了扩展。GB2312编码共收录汉字6763个，其中一级汉字3755个，二级汉字3008个，一级汉字比二级汉字更常用。同时，GB2312编码收录了包括拉丁字母、希腊字母、日文平假名及片假名字母、俄语西里尔字母在内的682个全角字符。GB2312编码范围：A1A1－FEFE，并对所收录字符进行了“分区”处理，共94个区，每区含有94个位，共8836个码位，编码是从0xA1开始1区，到94区结束，这种表示方式也称为区位码。GB2312规定对ASCII采用单字节表示，对收录的汉字及符号采用两个字节表示，第一个字节为“高字节”，对应94个区；第二个字节为“低字节”，对应94个位。所以它的区位码范围是：0101－9494。区号和位号分别加上0xA0就是GB2312编码。
GBK是对GB2312的扩充，扩展到总体编码范围为8140-FEFE，首字节在81-FE 之间，尾字节在40-FE 之间，剔除 xx7F一条线。总计23940 个码位。
1.GBK用一个字节表示一个英文字符和一些基本符号和半角符号，用两个字节表示一个汉字和全角符号和一些我们日常使用的符号。
2.GBK利用了ASCII的127个字符之后空余的部分。
3.数值小于127的字节表示ASCII中原有字符，两个连续数值都大于127的字节表示一个汉字字符。
4.使用GBK编码，当读取到一个数值上小于127的字节时当作一个ASCII中原有的字符处理。读到一个数值大于127的字节时必定会继续读取下一个字节，下一个字节的数值无需大于127，将两个字节一起组合形成一个汉字字符。

UTF-16
UTF-16也是Unicode的实现方式之一，但UTF-16和UTF-8是彻底不同的两种Unicode实现形式。Unicode是一个符号集，它对世界上大部分的文字系统进行了整理、编码，使得电脑可以用更为简单的方式来呈现和处理文字。解决传统的字符编码方案的局限。Unicode一开始由两个字节表示（usc-2），后来扩充到了四个字节（usc-4）。usc-4的格式定义为第一个字节的首位固定为0，后面的7位表示128个组，每个组又有256个面，每个面可以表示65536个字符。其总表示的范围为00000000 - 7FFFFFFF。0组0面的字符集合被称为BMP（Basic Multilingual Plane）也被称为基本多文种平面，对应于usc-2。其中平面15和平面16上只是定义了两个各占65534个码位的专用区（Private Use Area），分别是0xF0000-0xFFFFD和0x100000-0x10FFFD。所谓专用区，就是保留给大家放自定义字符的区域，可以简写为PUA。平面0（基本多文种平面）也有一个专用区：0xE000-0xF8FF，有6400个码位。平面0的0xD800-0xDFFF，共2048个码位，是一个被称作代理区（Surrogate）的特殊区域。代理区的目的用两个UTF-16字符表示BMP以外的字符。
UTF-16编码涉及大小端问题，通常情况下,UTF-16的文本会通过在文件开头以名为BOM（Byte Order Mark）的字符来表明文件是Big Endian（大端）还是Little Endian（小段）。小端BOM为0xFF, 0xFE，大端BOM为0xFE, 0xFF，但携带BOM并不是强制要求。UTF-16 编码以16位无符号整数为单位进行编码，其中汉字的编码区间为0x4E00–0x9FA5。此外，UTF-16通过基本多文种平面的代理区的编码空间0xD800-0xDFFF来对0x10000到0x0FFFF（即辅助平面）进行字符映射。

Java 解法, 执行用时: 42ms, 内存消耗: 11380KB, 提交时间: 2020-10-31

import java.util.*;

public class Main{
    static boolean isAscii = true;  //如果全是首位为0的字节，无法判断是UTF-8，还是GBK，还是UTF-16；，所以输出OTHER
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        String str = in.next();
        str = str.substring(3);  //str字符串前三位代表的是输入字符串长度，但总比实际长度大1，
                                 //不知道怎么回事，猜测是牛客转码把结尾的'\n'也算进去了，不管怎样决定不用前三位
        short[] code = hexStringToShortArray(str);
        String ans = "";
        if(isUTF8(code)) ans = "UTF-8";
        else if(isAscii) ans = "OTHER";
        else if(isGBK(code)) ans = "GBK";
        else if(isUTF16(code)) ans = "UTF-16";
        else ans = "OTHER";
        System.out.println(ans);
    }
    
    /*
    UTF-16是以16位无符号整数为单位进行编码的，而这里一个short存8位，所以code数组的元素数必定要是2的倍数
    当UTF-16在BMP内时占两个字节，这两个字节是任意的没有要求的。而当UTF-16在BMP外，就是四个字节。因为题目说采用小端存储，
    所以低位在前，高位在后。所以UTF-16四个字节时低位两个字节的前6位是110111，高位两个字节的前6位是110110.
    但是两个字节时也有可能是上述两个前缀，这个冲突问题，呃，我也不知道...
    这道题目在这里并没怎么提到知识，应该考虑倍数问题即可
    */
    public static boolean isUTF16(short[] code){  
       if(code.length % 2 == 0) return true;
        return false;
    }
    
    /*
    1、数值小于127的字节表示ASCII中原有字符，此时GBK占用一个字节
    2、当数值大于127时，就需要读取下一个字节，此时GBK占用两个字节，高位的范围是81-fe，低位的范围是40-fe，但是不能是xx7f
    3、当数值等于127时，题目没说，索性认为不是GBK
    */
    public static boolean isGBK(short[] code){
        for(int i = 0; i < code.length; i++){
            if((int)code[i] > 127){
                if(i + 1 >= code.length) return false;
                if((int)code[i] < 8 * 16 + 1 || (int)code[i] > 15 * 16 + 14)
                    return false;
                i++;
                if((int)code[i] < 4* 16 + 0 || (int)code[i] > 15 * 16 + 14 || (int)code[i] == 7 * 16 + 15)
                    return false;
            }
            else if((int)code[i] == 127) return false;
        }
        return true;
    }
    /*
    UTF-8正如题目所总结的规律去判断，这里顺便判断是否全是以0为开头的字节isAscii
    */
    public static boolean isUTF8(short[] code){
        for(int i = 0; i < code.length; i++){
            String str = Integer.toBinaryString((int)code[i]);
            while(str.length() < 8) str = 0 + str;
            if(str.charAt(0) != '0'){
                isAscii = false;
                int j = 0;
                while(j < 8 && str.charAt(j) != '0') j++;
                if(j >= 7 || j == 1) return false;
                j--;
                for(int k = i + 1; k <= code.length; k++){
                    if(k == code.length) return false;
                    String temp = Integer.toBinaryString((int)code[k]);
                    while(temp.length() < 8) temp = 0 + temp;
                    if(temp.charAt(0) == '1' && temp.charAt(1) == '0')
                        j--;
                    else return false;
                    if(j == 0){
                        i = k;
                        break;
                    }
                }
            }
        }
        if(isAscii) return false;
        else return true;
    }
    
    public static short[] hexStringToShortArray(String s) {
        short[] data = new short[s.length() / 2];
        int i = 0;
        int k = 0;
        while (i < s.length()) {
            try {
                char c1 = s.charAt(i);
                char c2 = '\0';
                if (i + 1 < s.length()) {
                    c2 = s.charAt(i + 1);
                }
                String temp = c1 + "";
                if (c2 != '\0') {
                    temp = c1 + "" + c2;
                }
                data[k] = Short.parseShort(temp, 16);
            } catch (NumberFormatException e) {
                return null;
            }
            i = i + 2;
            k++;
        }
        return data;
     }
}

Java 解法, 执行用时: 43ms, 内存消耗: 12616KB, 提交时间: 2020-07-30

import java.util.*;
           
public class Main{
    static boolean isAscii = true;  //如果全是首位为0的字节，无法判断是UTF-8，还是GBK，还是UTF-16；，所以输出OTHER
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        String str = in.next();
        str = str.substring(3);
        short[] code = hexStringToShortArray(str);
        String ans = "";
        if(isUTF8(code)) ans = "UTF-8";
        else if(isAscii) ans = "OTHER";
        else if(isGBK(code)) ans = "GBK";
        else if(isUTF16(code)) ans = "UTF-16";
        else ans = "OTHER";
        System.out.println(ans);
    }
               
    
    public static boolean isUTF16(short[] code){
       if(code.length % 2 == 0) return true;
        return false;
    }
               
    
    public static boolean isGBK(short[] code){
        for(int i = 0; i < code.length; i++){
            if((int)code[i] > 127){
                if(i + 1 >= code.length) return false;
                if((int)code[i] < 8 * 16 + 1 || (int)code[i] > 15 * 16 + 14)
                    return false;
                i++;
                if((int)code[i] < 4* 16 + 0 || (int)code[i] > 15 * 16 + 14 || (int)code[i] == 7 * 16 + 15)
                    return false;
            }
            else if((int)code[i] == 127) return false;
        }
        return true;
    }
    
    public static boolean isUTF8(short[] code){
        for(int i = 0; i < code.length; i++){
            String str = Integer.toBinaryString((int)code[i]);
            while(str.length() < 8) str = 0 + str;
            if(str.charAt(0) != '0'){
                isAscii = false;
                int j = 0;
                while(j < 8 && str.charAt(j) != '0') j++;
                if(j >= 7 || j == 1) return false;
                j--;
                for(int k = i + 1; k <= code.length; k++){
                    if(k == code.length) return false;
                    String temp = Integer.toBinaryString((int)code[k]);
                    while(temp.length() < 8) temp = 0 + temp;
                    if(temp.charAt(0) == '1' && temp.charAt(1) == '0')
                        j--;
                    else return false;
                    if(j == 0){
                        i = k;
                        break;
                    }
                }
            }
        }
        if(isAscii) return false;
        else return true;
    }
               
    public static short[] hexStringToShortArray(String s) {
        short[] data = new short[s.length() / 2];
        int i = 0;
        int k = 0;
        while (i < s.length()) {
            try {
                char c1 = s.charAt(i);
                char c2 = '\0';
                if (i + 1 < s.length()) {
                    c2 = s.charAt(i + 1);
                }
                String temp = c1 + "";
                if (c2 != '\0') {
                    temp = c1 + "" + c2;
                }
                data[k] = Short.parseShort(temp, 16);
            } catch (NumberFormatException e) {
                return null;
            }
            i = i + 2;
            k++;
        }
        return data;
     }
}

Java 解法, 执行用时: 47ms, 内存消耗: 11740KB, 提交时间: 2020-09-13

import java.util.*;
           
public class Main{
    static boolean isAscii = true;  //如果全是首位为0的字节，无法判断是UTF-8，还是GBK，还是UTF-16；，所以输出OTHER
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        String str = in.next();
        str = str.substring(3);
        short[] code = hexStringToShortArray(str);
        String ans = "";
        if(isUTF8(code)) ans = "UTF-8";
        else if(isAscii) ans = "OTHER";
        else if(isGBK(code)) ans = "GBK";
        else if(isUTF16(code)) ans = "UTF-16";
        else ans = "OTHER";
        System.out.println(ans);
    }
               
    
    public static boolean isUTF16(short[] code){
       if(code.length % 2 == 0) return true;
        return false;
    }
               
    
    public static boolean isGBK(short[] code){
        for(int i = 0; i < code.length; i++){
            if((int)code[i] > 127){
                if(i + 1 >= code.length) return false;
                if((int)code[i] < 8 * 16 + 1 || (int)code[i] > 15 * 16 + 14)
                    return false;
                i++;
                if((int)code[i] < 4* 16 + 0 || (int)code[i] > 15 * 16 + 14 || (int)code[i] == 7 * 16 + 15)
                    return false;
            }
            else if((int)code[i] == 127) return false;
        }
        return true;
    }
    
    public static boolean isUTF8(short[] code){
        for(int i = 0; i < code.length; i++){
            String str = Integer.toBinaryString((int)code[i]);
            while(str.length() < 8) str = 0 + str;
            if(str.charAt(0) != '0'){
                isAscii = false;
                int j = 0;
                while(j < 8 && str.charAt(j) != '0') j++;
                if(j >= 7 || j == 1) return false;
                j--;
                for(int k = i + 1; k <= code.length; k++){
                    if(k == code.length) return false;
                    String temp = Integer.toBinaryString((int)code[k]);
                    while(temp.length() < 8) temp = 0 + temp;
                    if(temp.charAt(0) == '1' && temp.charAt(1) == '0')
                        j--;
                    else return false;
                    if(j == 0){
                        i = k;
                        break;
                    }
                }
            }
        }
        if(isAscii) return false;
        else return true;
    }
               
    public static short[] hexStringToShortArray(String s) {
        short[] data = new short[s.length() / 2];
        int i = 0;
        int k = 0;
        while (i < s.length()) {
            try {
                char c1 = s.charAt(i);
                char c2 = '\0';
                if (i + 1 < s.length()) {
                    c2 = s.charAt(i + 1);
                }
                String temp = c1 + "";
                if (c2 != '\0') {
                    temp = c1 + "" + c2;
                }
                data[k] = Short.parseShort(temp, 16);
            } catch (NumberFormatException e) {
                return null;
            }
            i = i + 2;
            k++;
        }
        return data;
     }
}

Java 解法, 执行用时: 48ms, 内存消耗: 12508KB, 提交时间: 2020-10-31

import java.util.*;

public class Main{
    static boolean isAscii = true;  //如果全是首位为0的字节，无法判断是UTF-8，还是GBK，还是UTF-16；，所以输出OTHER
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        String str = in.next();
        str = str.substring(3);  //str字符串前三位代表的是输入字符串长度，但总比实际长度大1，
                                 //不知道怎么回事，猜测是牛客转码把结尾的'\n'也算进去了，不管怎样决定不用前三位
        short[] code = hexStringToShortArray(str);
        String ans = "";
        if(isUTF8(code)) ans = "UTF-8";
        else if(isAscii) ans = "OTHER";
        else if(isGBK(code)) ans = "GBK";
        else if(isUTF16(code)) ans = "UTF-16";
        else ans = "OTHER";
        System.out.println(ans);
    }
    
    /*
    UTF-16是以16位无符号整数为单位进行编码的，而这里一个short存8位，所以code数组的元素数必定要是2的倍数
    当UTF-16在BMP内时占两个字节，这两个字节是任意的没有要求的。而当UTF-16在BMP外，就是四个字节。因为题目说采用小端存储，
    所以低位在前，高位在后。所以UTF-16四个字节时低位两个字节的前6位是110111，高位两个字节的前6位是110110.
    但是两个字节时也有可能是上述两个前缀，这个冲突问题，呃，我也不知道...
    这道题目在这里并没怎么提到知识，应该考虑倍数问题即可
    */
    public static boolean isUTF16(short[] code){  
       if(code.length % 2 == 0) return true;
        return false;
    }
    
    /*
    1、数值小于127的字节表示ASCII中原有字符，此时GBK占用一个字节
    2、当数值大于127时，就需要读取下一个字节，此时GBK占用两个字节，高位的范围是81-fe，低位的范围是40-fe，但是不能是xx7f
    3、当数值等于127时，题目没说，索性认为不是GBK
    */
    public static boolean isGBK(short[] code){
        for(int i = 0; i < code.length; i++){
            if((int)code[i] > 127){
                if(i + 1 >= code.length) return false;
                if((int)code[i] < 8 * 16 + 1 || (int)code[i] > 15 * 16 + 14)
                    return false;
                i++;
                if((int)code[i] < 4* 16 + 0 || (int)code[i] > 15 * 16 + 14 || (int)code[i] == 7 * 16 + 15)
                    return false;
            }
            else if((int)code[i] == 127) return false;
        }
        return true;
    }
    /*
    UTF-8正如题目所总结的规律去判断，这里顺便判断是否全是以0为开头的字节isAscii
    */
    public static boolean isUTF8(short[] code){
        for(int i = 0; i < code.length; i++){
            String str = Integer.toBinaryString((int)code[i]);
            while(str.length() < 8) str = 0 + str;
            if(str.charAt(0) != '0'){
                isAscii = false;
                int j = 0;
                while(j < 8 && str.charAt(j) != '0') j++;
                if(j >= 7 || j == 1) return false;
                j--;
                for(int k = i + 1; k <= code.length; k++){
                    if(k == code.length) return false;
                    String temp = Integer.toBinaryString((int)code[k]);
                    while(temp.length() < 8) temp = 0 + temp;
                    if(temp.charAt(0) == '1' && temp.charAt(1) == '0')
                        j--;
                    else return false;
                    if(j == 0){
                        i = k;
                        break;
                    }
                }
            }
        }
        if(isAscii) return false;
        else return true;
    }
    
    public static short[] hexStringToShortArray(String s) {
        short[] data = new short[s.length() / 2];
        int i = 0;
        int k = 0;
        while (i < s.length()) {
            try {
                char c1 = s.charAt(i);
                char c2 = '\0';
                if (i + 1 < s.length()) {
                    c2 = s.charAt(i + 1);
                }
                String temp = c1 + "";
                if (c2 != '\0') {
                    temp = c1 + "" + c2;
                }
                data[k] = Short.parseShort(temp, 16);
            } catch (NumberFormatException e) {
                return null;
            }
            i = i + 2;
            k++;
        }
        return data;
     }
}

Java 解法, 执行用时: 52ms, 内存消耗: 12648KB, 提交时间: 2020-11-01

import java.util.*;

public class Main{
    static boolean isAscii = true;  //如果全是首位为0的字节，无法判断是UTF-8，还是GBK，还是UTF-16；，所以输出OTHER
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        String str = in.next();
        str = str.substring(3);  //str字符串前三位代表的是输入字符串长度，但总比实际长度大1，
                                 //不知道怎么回事，猜测是牛客转码把结尾的'\n'也算进去了，不管怎样决定不用前三位
        short[] code = hexStringToShortArray(str);
        String ans = "";
        if(isUTF8(code)) ans = "UTF-8";
        else if(isAscii) ans = "OTHER";
        else if(isGBK(code)) ans = "GBK";
        else if(isUTF16(code)) ans = "UTF-16";
        else ans = "OTHER";
        System.out.println(ans);
    }
    
    /*
    UTF-16是以16位无符号整数为单位进行编码的，而这里一个short存8位，所以code数组的元素数必定要是2的倍数
    当UTF-16在BMP内时占两个字节，这两个字节是任意的没有要求的。而当UTF-16在BMP外，就是四个字节。因为题目说采用小端存储，
    所以低位在前，高位在后。所以UTF-16四个字节时低位两个字节的前6位是110111，高位两个字节的前6位是110110.
    但是两个字节时也有可能是上述两个前缀，这个冲突问题，呃，我也不知道...
    这道题目在这里并没怎么提到知识，应该考虑倍数问题即可
    */
    public static boolean isUTF16(short[] code){  
       if(code.length % 2 == 0) return true;
        return false;
    }
    
    /*
    1、数值小于127的字节表示ASCII中原有字符，此时GBK占用一个字节
    2、当数值大于127时，就需要读取下一个字节，此时GBK占用两个字节，高位的范围是81-fe，低位的范围是40-fe，但是不能是xx7f
    3、当数值等于127时，题目没说，索性认为不是GBK
    */
    public static boolean isGBK(short[] code){
        for(int i = 0; i < code.length; i++){
            if((int)code[i] > 127){
                if(i + 1 >= code.length) return false;
                if((int)code[i] < 8 * 16 + 1 || (int)code[i] > 15 * 16 + 14)
                    return false;
                i++;
                if((int)code[i] < 4* 16 + 0 || (int)code[i] > 15 * 16 + 14 || (int)code[i] == 7 * 16 + 15)
                    return false;
            }
            else if((int)code[i] == 127) return false;
        }
        return true;
    }
    /*
    UTF-8正如题目所总结的规律去判断，这里顺便判断是否全是以0为开头的字节isAscii
    */
    public static boolean isUTF8(short[] code){
        for(int i = 0; i < code.length; i++){
            String str = Integer.toBinaryString((int)code[i]);
            while(str.length() < 8) str = 0 + str;
            if(str.charAt(0) != '0'){
                isAscii = false;
                int j = 0;
                while(j < 8 && str.charAt(j) != '0') j++;
                if(j >= 7 || j == 1) return false;
                j--;
                for(int k = i + 1; k <= code.length; k++){
                    if(k == code.length) return false;
                    String temp = Integer.toBinaryString((int)code[k]);
                    while(temp.length() < 8) temp = 0 + temp;
                    if(temp.charAt(0) == '1' && temp.charAt(1) == '0')
                        j--;
                    else return false;
                    if(j == 0){
                        i = k;
                        break;
                    }
                }
            }
        }
        if(isAscii) return false;
        else return true;
    }
    
    public static short[] hexStringToShortArray(String s) {
        short[] data = new short[s.length() / 2];
        int i = 0;
        int k = 0;
        while (i < s.length()) {
            try {
                char c1 = s.charAt(i);
                char c2 = '\0';
                if (i + 1 < s.length()) {
                    c2 = s.charAt(i + 1);
                }
                String temp = c1 + "";
                if (c2 != '\0') {
                    temp = c1 + "" + c2;
                }
                data[k] = Short.parseShort(temp, 16);
            } catch (NumberFormatException e) {
                return null;
            }
            i = i + 2;
            k++;
        }
        return data;
     }
}

详情