erlang更新到R17已有一段时间了,公司项目打算从旧版的erlang迁移到R17,却不料有不少的困扰,其中一个问题是中文问题。
这个问题很容易重现:新建一个文件t.erl,保存为utf-8无bom格式
- -module(t).
- -export([test/0]).
- test() ->
- ["我", <<"我">>].
- Eshell V5.9.1 (abort with ^G)
- 1> c(t).
- {ok,t}
- 2> t:test().
- [[230,136,145],<<230,136,145>>]
- Eshell V6.0 (abort with ^G)
- 1> c(t).
- {ok,t}
- 2> t:test().
- [[25105],<<17>>]
In Erlang/OTP 17.0, the encoding default for Erlang source files was switched to UTF-8 and in Erlang/OTP 18.0 Erlang will support atoms in the full Unicode range, meaning full Unicode function and module names
想让R17正确识别utf-8无bom格式的代码文件,方法就是在文件头部加上“%% coding: latin-1”,代码就变成这样:
- %% coding: latin-1
- -module(t).
- -export([test/0]).
- test() ->
- ["我", <<"我">>].
最脑裂的是erlang没有提供启动参数,用来支持原来的latin-1模式。试过了erl +pc latin1还是无法解决问题,这里不知道是不是bug
所以,这里在erlang基础上重新实现erlang的编译,代码如下:
- -module(test).
- -compile(export_all).
- compile(FileName) ->
- compile(FileName, [verbose,report_errors,report_warnings]).
- compile(FileName, Options) ->
- Module = filename:basename(FileName),
- {ok, Forms } = epp:parse_file(FileName, [{default_encoding, latin1}]) ,
- {ok, Mod, Code} = compile:forms(Forms, Options),
- {ok, Cwd} = file:get_cwd(),
- code:load_binary(Mod, FileName, Code),
- file:write_file(lists:concat([Cwd, Module, ".beam"]), Code, [write, binary]).
- 14> c(test).
- {ok,test}
- 15> test:compile("t.erl").
- ok
- 16> t:test().
- [[230,136,145],<<230,136,145>>]
- consult(File) ->
- case file:open(File, [read]) of
- {ok, Fd} ->
- R = consult_stream(Fd),
- _ = file:close(Fd),
- R;
- Error ->
- Error
- end.
- consult_stream(Fd) ->
- _ = epp:set_encoding(Fd, latin1),
- consult_stream(Fd, 1, []).
- consult_stream(Fd, Line, Acc) ->
- case io:read(Fd, '', Line) of
- {ok,Term,EndLine} ->
- consult_stream(Fd, EndLine, [Term|Acc]);
- {error,Error,_Line} ->
- {error,Error};
- {eof,_Line} ->
- {ok,lists:reverse(Acc)}
- end.
参考:http://blog.csdn.net/mycwq/article/details/40718281