java 无损读取文本文件

hw1287789687

浏览: 5179015 次
性别:
来自: 北京

最近访客更多访客>>

zcm1205

morelily

beisika10368

jxjyzzc

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Java
Java Web

读取文件无损读取 charset 读取文本文件 ByteArrayOutputStream

java 如何无损读取文本文件呢？

以下是有损的

@Deprecated
	public static String getFullContent(File file, String charset) {
		BufferedReader reader = null;
		if (!file.exists()) {
			System.out.println("getFullContent: file(" + file.getAbsolutePath()
					+ ") does not exist.");
			return null;
		}
		if (charset == null) {
			charset = SystemHWUtil.CHARSET_ISO88591;
		}
		try {
			reader = getBufferReaderFromFile(file, charset);
			return getFullContent(reader);
		} catch (FileNotFoundException e1) {
			e1.printStackTrace();
		} finally {
			if (null != reader) {
				try {
					reader.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		return null;
	}

public static BufferedReader getBufferReaderFromFile(File file,
			String charset) throws FileNotFoundException {
		InputStream ss = new FileInputStream(file);
		InputStreamReader ireader;
		BufferedReader reader = null;
		try {
			if (charset == null) {
				ireader = new InputStreamReader(ss,
						SystemHWUtil.CHARSET_ISO88591);
			} else {
				ireader = new InputStreamReader(ss, charset);
			}
			reader = new BufferedReader(ireader);
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		}

		return reader;
	}

/**
	 * have closed reader
	 * 
	 * @param reader
	 * @return
	 */
	@Deprecated
	public static String getFullContent(BufferedReader reader) {
		StringBuilder sb = new StringBuilder();
		String readedLine = null;
		try {
			while ((readedLine = reader.readLine()) != null) {
				sb.append(readedLine);
				sb.append(SystemHWUtil.CRLF);
			}
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				reader.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		String content = sb.toString();
		int length_CRLF = SystemHWUtil.CRLF.length();
		if (content.length() <= length_CRLF) {
			return content;
		}
		return content.substring(0, content.length() - length_CRLF);//
	}

测试：

@Test
	public void test_getFullContent(){
		String filepath="D:\\bin\\config\\conf_passwd.properties";
		try {
			InputStream in =new FileInputStream(filepath);
			System.out.print(FileUtils.getFullContent(filepath, "UTF-8"));
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

介绍三种无损读取的方式

方式一：使用InputStreamReader，指定编码

/***
	 * 指定字符编码，无损地读取文本文件.
	 * 
	 * @param in
	 *            : 输入流，会关闭
	 * @param charset
	 *            : 字符编码
	 * @return
	 * @throws IOException
	 */
	public static String getFullContent3(InputStream in, String charset)
			throws IOException {
		StringBuffer sbuffer = new StringBuffer();
		InputStreamReader inReader;
		//设置字符编码
		inReader = new InputStreamReader(in, charset);
		char[] ch = new char[SystemHWUtil.BUFF_SIZE_1024];
		int readCount = 0;
		while ((readCount = inReader.read(ch)) != -1) {
			sbuffer.append(ch, 0, readCount);
		}
		inReader.close();
		in.close();
		return sbuffer.toString();
	}

测试：

@Test
	public void test_getFullContent3(){
		String filepath="D:\\bin\\config\\conf_passwd.properties";
		try {
			InputStream in =new FileInputStream(filepath);
			System.out.print(FileUtils.getFullContent3(in, "UTF-8"));
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

方式二：先读取出字节数组，再使用String的构造方法

public static String getFullContent4(InputStream in, String charset) throws IOException{
		byte[]bytes=FileUtils.readBytes3(in);
		return new String(bytes,charset);
	}

/***
	 * Has been tested
	 * 
	 * @param in
	 * @return
	 * @throws IOException
	 */
	public static byte[] readBytes3(InputStream in) throws IOException {
		BufferedInputStream bufin = new BufferedInputStream(in);
		int buffSize = BUFFSIZE_1024;
		ByteArrayOutputStream out = new ByteArrayOutputStream(buffSize);

		// System.out.println("Available bytes:" + in.available());

		byte[] temp = new byte[buffSize];
		int size = 0;
		while ((size = bufin.read(temp)) != -1) {
			out.write(temp, 0, size);
		}
		bufin.close();
		in.close();
		byte[] content = out.toByteArray();
		out.flush();
		out.close();
		return content;
	}

方式三：使用System.arraycopy，所以效率不高，因为有拷贝操作（不推荐）

public static String getFullContent2(InputStream in, String charset)
			throws IOException {
		int step = BUFFSIZE_1024;
		BufferedInputStream bis = new BufferedInputStream(in);

		// Data's byte array
		byte[] receData = new byte[step];

		// data length read from the stream
		int readLength = 0;

		// data Array offset
		int offset = 0;

		// Data array length
		int byteLength = step;

		while ((readLength = bis.read(receData, offset, byteLength - offset)) != -1) {
			// Calculate the current length of the data
			offset += readLength;
			// Determine whether you need to copy data , when the remaining
			// space is less than step / 2, copy the data
			if (byteLength - offset <= step / 2) {
				byte[] tempData = new byte[receData.length + step];
				System.arraycopy(receData, 0, tempData, 0, offset);
				receData = tempData;
				byteLength = receData.length;
			}
		}

		return new String(receData, 0, offset, charset);
	}

总结：推荐使用方式一和方式二

相关方法见附件中com.io.hw.file.util.FileUtils类

io0007-find_progess-0.0.6-SNAPSHOT-sources.jar (159.6 KB)
下载次数: 41

3
顶

5
踩

分享到：

执行可运行jar包时读取jar包中的文件 | 关于sessionid的一些问题

2013-12-19 10:42
浏览 5026
评论(12)
分类:编程语言
查看更多

12 楼 hw1287789687 2013-12-21

在世界的中心呼喚愛写道

看懂了，楼主应该是说，字节读取是无损的！！

嗯，就是这个意思

11 楼在世界的中心呼喚愛 2013-12-21

看懂了，楼主应该是说，字节读取是无损的！！

10 楼在世界的中心呼喚愛 2013-12-21

hw1287789687 写道

Grumpy 写道

太高深了，用readLine()居然是有损。。。那我用read(char[])，可以把所有字符都读出来，应该能算是无损了吧。。。

用readLine() 的话，会少一个换行或多一个换行（没法精确控制），反正没法保证读取出来的跟原来的文本文件内容完全一样（使用equals时可能返回false）。
用read(char[])可以达到无损的目标

API中说的很清楚，readLine就是读取行。。

9 楼 Grumpy 2013-12-20

hw1287789687 写道

Grumpy 写道

太高深了，用readLine()居然是有损。。。那我用read(char[])，可以把所有字符都读出来，应该能算是无损了吧。。。

如果Reader.read(char[])是无损，你这3个方法不就都是多此一举，而且要想一个字节不漏的拿到数据，我用InputStream.read(byte[])可以确保一个字节都不漏，更安全。换行一般都是用来区分段落，readLine可以使你减少解析步骤，居然被说成是有损。。。

8 楼 hw1287789687 2013-12-20

Grumpy 写道

太高深了，用readLine()居然是有损。。。那我用read(char[])，可以把所有字符都读出来，应该能算是无损了吧。。。

7 楼 Grumpy 2013-12-20

太高深了，用readLine()居然是有损。。。那我用read(char[])，可以把所有字符都读出来，应该能算是无损了吧。。。

6 楼 hjz1034979852 2013-12-20

啥莫叫做无损

5 楼 hw1287789687 2013-12-20

canghailan 写道

没看明白有损无损啥意思。有损是指BufferedReader.readLine()时丢掉的换行吗？文件大小已知，所以字节缓冲区的大小就可以确定，不需要用固定缓冲区大小。如果用nio的话，会更方便一些。

public static String readText(File file, String charset) throws IOException {
	FileChannel fileChannel = null;
	try {
		fileChannel = new FileInputStream(file).getChannel();
		ByteBuffer byteBuffer = fileChannel.map(MapMode.READ_ONLY, 0,
				fileChannel.size());
		CharBuffer charBuffer = Charset.forName(charset).decode(byteBuffer);
		return charBuffer.toString();
	} finally {
		if (fileChannel != null) {
			fileChannel.close();
		}
	}
}

1楼说得对。有损是指BufferedReader.readLine()时丢掉的换行

4 楼 w156445045 2013-12-19

不明觉厉。

3 楼 Grumpy 2013-12-19

没看明白lz说的是什么。
能不能从文件中读出String，是由文件中存储的数据类型决定的，如果存储的是字符类型，可以用BufferedReader.readLine()直接读取字符串，如果是字节类型，不管你是用Reader读还是用InputStream读出字节数组再转成String，都是不行的，即使读出来也是乱码。
另外编码跟文件的读取没关系，只跟显示有关系，编码不正确会显示成乱码，仅此而已。
可以通过String.getBytes(String encoding)后再new String (byte [] data, int start, int length, final String encoding)进行转码。

2 楼 canghailan 2013-12-19

其实方式三的效率比方式二要高，
1.ByteArrayOutputStream.write 会调用 System.arraycopy 拷贝一遍。
2.ByteArrayOutputStream 自动扩容时也会调用 System.arraycopy 拷贝一遍。
3.ByteArrayOutputStream.toByteArray 又会调用 System.arraycopy 拷贝一遍。
4.ByteArrayOutputStream 中有不必要的同步和index检查的开销。

方式二在最好情况要拷贝2次，最坏情况要拷贝3+n次。
方式三在最好情况要拷贝1次，最坏情况要拷贝2+n次。
n为扩容次数。

1 楼 canghailan 2013-12-19

public static String readText(File file, String charset) throws IOException {
	FileChannel fileChannel = null;
	try {
		fileChannel = new FileInputStream(file).getChannel();
		ByteBuffer byteBuffer = fileChannel.map(MapMode.READ_ONLY, 0,
				fileChannel.size());
		CharBuffer charBuffer = Charset.forName(charset).decode(byteBuffer);
		return charBuffer.toString();
	} finally {
		if (fileChannel != null) {
			fileChannel.close();
		}
	}
}

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论