返回信息流我在尝试做一个文件的解析。当解析到一个结构体时发现LLVM给出的结构体定义如下:
```C++
struct ModInfo {
uint32_t Unused1;
struct SectionContribEntry {
uint16_t Section;
char Padding1[2];
int32_t Offset;
int32_t Size;
uint32_t Characteristics;
uint16_t ModuleIndex;
char Padding2[2];
uint32_t DataCrc;
uint32_t RelocCrc;
} SectionContr;
uint16_t Flags;
uint16_t ModuleSymStream;
uint32_t SymByteSize;
uint32_t C11ByteSize;
uint32_t C13ByteSize;
uint16_t SourceFileCount;
char Padding[2];
uint32_t Unused2;
uint32_t SourceFileNameIndex;
uint32_t PdbFilePathNameIndex;
char ModuleName[];
char ObjFileName[];
};
```
具体的,该结构体由该网址提供:https://llvm.org/docs/PDB/DbiStream.html#module-info-substream
说明:该结构体虽然出自LLVM,但是该文件的结构由微软定义公开,然后LLVM要实现这个文件的格式,因此LLVM进行了重新定义。
我要解析的数据是这样的(从0x02CBE080开始按照上述结构体解析数据):
```C++
0x02CBE080 00 00 00 00 01 00 00 00 20 ca 18 00 2e 00 00 00 20 20 50 60 00 00 00 ........ ?...... P`...
0x02CBE097 00 00 00 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 00 00 00 00 .......................
0x02CBE0AE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 64 3a 5c 6f 73 ..................d:\os
0x02CBE0C5 5c 6f 62 6a 5c 61 6d 64 36 34 66 72 65 5c 6d 69 6e 6b 65 72 6e 65 6c \obj\amd64fre\minkernel
0x02CBE0DC 5c 74 6f 6f 6c 73 5c 67 73 5f 73 75 70 70 6f 72 74 5c 6b 6d 6f 64 65 \tools\gs_support\kmode
0x02CBE0F3 66 61 73 74 66 61 69 6c 5c 6d 70 5c 6f 62 6a 66 72 65 5c 61 6d 64 36 fastfail\mp\objfre\amd6
0x02CBE10A 34 5c 61 6d 64 73 65 63 67 73 2e 6f 62 6a 00 64 3a 5c 6f 73 5c 6f 62 4\amdsecgs.obj.d:\os\ob
0x02CBE121 6a 5c 61 6d 64 36 34 66 72 65 5c 6d 69 6e 6b 65 72 6e 65 6c 5c 74 6f j\amd64fre\minkernel\to
0x02CBE138 6f 6c 73 5c 67 73 5f 73 75 70 70 6f 72 74 5c 6b 6d 6f 64 65 66 61 73 ols\gs_support\kmodefas
0x02CBE14F 74 66 61 69 6c 5c 6d 70 5c 6f 62 6a 66 72 65 5c 61 6d 64 36 34 5c 61 tfail\mp\objfre\amd64\a
0x02CBE166 6d 64 73 65 63 67 73 2e 6f 62 6a 00 00 00 00 00 00 00 ff ff 00 00 00 mdsecgs.obj............
0x02CBE17D 00 00 00 ff ff ff ff 00 00 00 00 ff ff 00 00 00 00 00 00 00 00 00 00 .......................
```
理论上,解析完成后,ModuleName指向0x02CBE0C0,ObjFileName指向0x02CBE119.
我将这个结构体定义在MSVC编译器下源代码中的头文件中。然而,编译无法通过。
如果我删除``char ObjFileName[];``编译器会通过,并且正好使ModuleName指向0x02CBE0C0,然而这并没有实现我想要的数据解析结果。
以下是我在VS2017中的测试代码:
```C++
#include <iostream>
struct ModInfo {
uint32_t Unused1;
struct SectionContribEntry {
uint16_t Section;
char Padding1[2];
int32_t Offset;
int32_t Size;
uint32_t Characteristics;
uint16_t ModuleIndex;
char Padding2[2];
uint32_t DataCrc;
uint32_t RelocCrc;
} SectionContr;
uint16_t Flags;
uint16_t ModuleSymStream;
uint32_t SymByteSize;
uint32_t C11ByteSize;
uint32_t C13ByteSize;
uint16_t SourceFileCount;
char Padding[2];
uint32_t Unused2;
uint32_t SourceFileNameIndex;
uint32_t PdbFilePathNameIndex;
char ModuleName[];
};
char data[] = {
0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x20,0xca,0x18,0x00,0x2e,0x00,0x00,0x00,0x20,0x20,0x50,0x60,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x64,0x3a,0x5c,0x6f,0x73,
0x5c,0x6f,0x62,0x6a,0x5c,0x61,0x6d,0x64,0x36,0x34,0x66,0x72,0x65,0x5c,0x6d,0x69,0x6e,0x6b,0x65,0x72,0x6e,0x65,0x6c,
0x5c,0x74,0x6f,0x6f,0x6c,0x73,0x5c,0x67,0x73,0x5f,0x73,0x75,0x70,0x70,0x6f,0x72,0x74,0x5c,0x6b,0x6d,0x6f,0x64,0x65,
0x66,0x61,0x73,0x74,0x66,0x61,0x69,0x6c,0x5c,0x6d,0x70,0x5c,0x6f,0x62,0x6a,0x66,0x72,0x65,0x5c,0x61,0x6d,0x64,0x36,
0x34,0x5c,0x61,0x6d,0x64,0x73,0x65,0x63,0x67,0x73,0x2e,0x6f,0x62,0x6a,0x00,0x64,0x3a,0x5c,0x6f,0x73,0x5c,0x6f,0x62,
0x6a,0x5c,0x61,0x6d,0x64,0x36,0x34,0x66,0x72,0x65,0x5c,0x6d,0x69,0x6e,0x6b,0x65,0x72,0x6e,0x65,0x6c,0x5c,0x74,0x6f,
0x6f,0x6c,0x73,0x5c,0x67,0x73,0x5f,0x73,0x75,0x70,0x70,0x6f,0x72,0x74,0x5c,0x6b,0x6d,0x6f,0x64,0x65,0x66,0x61,0x73,
0x74,0x66,0x61,0x69,0x6c,0x5c,0x6d,0x70,0x5c,0x6f,0x62,0x6a,0x66,0x72,0x65,0x5c,0x61,0x6d,0x64,0x36,0x34,0x5c,0x61,
0x6d,0x64,0x73,0x65,0x63,0x67,0x73,0x2e,0x6f,0x62,0x6a,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xff,0xff,0x00,0x00,0x00,
0x00,0x00,0x00,0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00,0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
};
int main()
{
ModInfo *mi = (ModInfo*)data;
std::cout <<"&(mi->ModuleName)="<< &(mi->ModuleName)<<" mi->ModuleName="<< mi->ModuleName<< std::endl;
std::cout << "Hello World!\n";
}
```
实验结果如下图:
https://bbs.byr.cn/att/CPP/0/100003/5172
我的问题是:我如何在MSVC编译器中定义这个结构体,让该结构体指向的数据能达到上述理论上的结果?
这是一条镜像帖。来源:北邮人论坛 / cpp / #100003同步于 2020/5/19
该镜像源已超过 30 天没有更新,可能在源站已被删除。
CPP机器人发帖
C++ 中结构体如何解析
wf751620780
2020/5/19镜像同步7 回复
订阅后,新回复会通过你的通知中心匿名送达。
7 条回复
再看了下,编译出错应该是因为你定义的结构体里modulename字符数组是不定长,后面不能再接字段了,否者不知道后面的字段怎么偏移
首先结构体定义上先定好文件全路径最大长度(一般255),读写内存需要用同一套接口保证规范一致
二进制文件很少有能直接struct mapping的,因为有struct padding和大小序等问题。用不用C倒是没有半分关系。
就你这case,只能一点一点往结构体上对了
谢谢各位的慷慨解读。
这个和ABI有关系,另外结构体这样写也是不可能的。LLVM给出的文档只是文件格式的结构示意图。再代码实现上和给出的示意结构体还需要做额外的工作以解决文件解析问题。
我已经手动分析方式完成了该文件此部分数据的解析。
谢谢大佬们的解答!
这就是为什么各种rpc的库要有各种序列化的操作、序列化的库的道理是一样的。我不理解楼上说的ABI是什么。我理解造成这种解析不正确的原因有能很多。比如编译器可能会对结构体进行填充,使得数据结构的每个feild都能满足某种对齐,或者对结构体里的变量进行重新的顺序排列,这个应该每种编译器的实现都是不一样的。说不定同一个编译器的不同版本实现都有可能变,或者指定不同的优化等级编译器行为都有可能有区别。
char ModuleName[0]; // 非标准的C/C++扩展
【 在 wf751620780 的大作中提到: 】
: [md]
: 我在尝试做一个文件的解析。当解析到一个结构体时发现LLVM给出的结构体定义如下:
: ```C++
: struct ModInfo {
: uint32_t Unused1;
: struct SectionContribEntry {
: uint16_t Section;
: char Padding1[2];
: int32_t Offset;
: int32_t Size;
: uint32_t Characteristics;
: uint16_t Mo
: ............