In the last few months, I've heard a person say this multiple times "business runs on de-normalized data" The context of this statement has always been the discussion of database design and implementation... and I continue to wonder where this person's experience is coming from, to make such a statement. In my world - the business of developing software to run a business or automate a portion of a business - this is very far from the truth. My response to this statement is (and yes, you can quote me on this - please do) Management reports on de-normalized data, but operations runs entirely on well-normalized data I'm not going to make any ignorant or naive claims about de-normalized data having no place in business. There certainly is a lot of business value in de-normalized data. That's why we have data warehousing, OLAP cubes, and other reporting database structures (including Views, Stored Procedures, etc. that will de-normalize data for live reporting). When it comes to the day to day business, though - the people on the floor doing the low level business work - well normalized data is an absolute must. From WikiPedia's entry on Database Normalization: Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only. I'm certainly not a DBA and I'm not a real database guru. I do have more than 10 years experience working with various database systems (SQL Server, Access, Oracle, SQLite, MySQL, DB2/DB400, and XML & flat files , etc.) in various business scenarios (manufacturing, engineering, business process automation, e-commerce/e-business, enterprise integration, etc. etc.) and I have a pretty high opinion of my relational modeling capabilities ... and It evokes a sense of disbelief and shock when I see poorly normalized schema's or hear statements like this being made. Please, please, PLEASE take the time to understand what well normalized database design is and why it's necessary for flexible, maintainable software.
This might be common knowledge for those that use NHibernate - but it's new to me, since I am not an NHibernate power-user, yet. Given this object model: public class Invoice { private int _id = 0; private string _invoiceNumber = string.Empty; private DateTime _invoiceDueDate = DateTime.MinValue; private ICollection<InvoiceDetail> _invoiceDetails = new HashedSet<InvoiceDetail>(); public int Id { get { return _id; } } public string InvoiceNumber { get { return _invoiceNumber; } set { _invoiceNumber = value; } } public DateTime InvoiceDueDate { get { return _invoiceDueDate; } set { _invoiceDueDate = value; } } public ICollection<InvoiceDetail> InvoiceDetails { get { return _invoiceDetails; } } public InvoiceDetail CreateDetail() { InvoiceDetail detail = new InvoiceDetail(); InvoiceDetails.Add(detail); return detail; } } public class InvoiceDetail { internal InvoiceDetail() { } private int _id = 0; private string _productName = string.Empty; private decimal _productCost = 0; private int _productQuantity = 0; public int Id { get { return _id; } } public string ProductName { get { return _productName; } set { _productName = value; } } public decimal ProductCost { get { return _productCost; } set { _productCost = value; } } public int ProductQuantity { get { return _productQuantity; } set { _productQuantity = value; } } }
We can load any invoice that has an Invoice Detail with a specific Product Name, using this NHibernate code:
invoices = Session.CreateCriteria(typeof(Invoice)) .CreateCriteria("InvoiceDetails") .Add(Expression.Eq("ProductName", productName)) .List<Invoice>();
Additionally, we can add paging to the result set, by including the SetFirstResult and SetMaxResult calls in the Criteria.
invoices = Session.CreateCriteria(typeof(Invoice)) .CreateCriteria("InvoiceDetails") .Add(Expression.Eq("ProductName", productName)) .SetFirstResult(pageSize * pageNumber) .SetMaxResults(pageSize - 1) .List<Invoice>();
Notice the use of pageSize and PageNumber to set the First Result value - this creates an index-by-zero page number that we want to view. For example, if our page size is 10 and we are on page zero, then the FirstResult is going to be 0 and the last result will be 9. 0 through 9 = 10 results. If we are on page 3, then the FirstResult is 30 and the MaxResult is 39...
Works pretty well... now, let's just hope that the actual underlying database implementation of this can take advantage of each DBMS' optimizations for doing this.
If you are considering the use of NHibernate, or are already using NHibernate, be sure that you always override the .Equals and .GetHashCode methods of your entities. NHibernate makes use of these methods extensively and you are likely to have strange issues if you don't override both of these correctly. Unfortunately, GetHashCode is one of those areas that is difficult to get right; but it needs to be done anyways. For more information, see: And one specific note on .Equals: be sure to check ReferencEquals as the last resort, if no other comparison is possible. I've learned these lessons the hard way; hopefully you won't have to.
In WCF, when using NetDataContractSerializer to enable .NET Remoting, DataContract objects are still serialized. Only ServiceContract objects are marshaled ByRef. The same setup is possible in native .NET Remoting, as well. It seems more likely to happen in WCF, though. Based on my experience, Remoting is usually all MarshalByRef or all Serialized - but that's just my experience. Either way, if you are serializing your object model, you need to be careful. Parent <-> Child Bidirection Relationships If you have a Parent<->Child bidirectional relationship in your DataContract objects, when you send the DataContract objects across WCF, they will be serialized/deserialized and the resulting hierarchy of objects will be: Parent->Child->CopyOfParent. This causes problems when using NHibernate to auto save/load your object tree. For example, this object hierarchy: Parent |--Child | |--Parent (reference to actual Parent) Will end up looking like this, after being serialized / deserialized: Parent |--Child | |--CopyOfParent (new object, independent of actual Parent) To fix this, you'll have to manually re-build your references, after the objects are deserialized: foreach(Child child in parent.Children) { child.Parent = parent; }
If you don't rebuild the hierarchy references like this, NHibernate will not save or update your child objects correctly. You will either get "Transient Instance" exceptions or you will end up with orphaned children records because they will not have their parent id set correctly.
Here's an example of what one of my data access methods looks like, in an app that remotes the data access layers via WCF (and hides most of the NHibernate code in a base class):
public void Save(FooBar fooBar) { try { foreach(Widget widget in fooBar.Widgets) { widget.FooBar = fooBar; } OpenSession(); BeginTransaction(); Session.SaveOrUpdate(fooBar); CommitTransaction(); } catch { RollbackTransaction(); throw; } finally { CloseSession(); } }
Extended Problem: Multi-Parented Children
Although I haven't run into this situation yet, I am assuming that the same problem will occur if you have multiple parents pointing to the same child. For example, if you have:
Parent |--Child1 | |--GrandChild1 (same reference as Child2's GrandChild) |--Child2 | |--GrandChild1 (same reference as Child1's GrandChild)
When you send this structure across WCF as a set of DataContract objects, I imagine that you will end up with this:
Parent |--Child1 | |--GrandChild1 (duplicate of Child2's GrandChild) |--Child2 | |--CopyOfGrandChild1 (duplicate of Child1's GrandChild)
If your intention is to have Child1 and Child2 reference the same record in the database, you will need to reconstruct the Child1 and Child2 references to GrandChild1, ensuring that both point to the same object. I can see the basic code as traversing the children and comparing each of the gradchildren's values, then picking one winner between the same values and resetting the references on the rest of them. Unfortunately, I think the solution for this scenario would likely be unique to each situation, due to the complexity of picking the correct reference.
Has anyone run into this situation? If so, how did you solve it?
A few people have asked what my blog setup is, in respect to screen shots and code samples, so here's the answer for all to see. Blog system: dasBlog I love the simplicity of dasBlog. No extra fluff, no massive database or configuration system. It's just a blog with the features that I want, and it stores all of it's settings and content in XML files.  Be sure to change the web config so that the trust level is "full" and not "medium". Otherwise the blog posting API won't allow you to upload images and other attachments. Web Host: WebHost4Life A good .NET hosting company. I have the $10/month account type. There are plenty of other (and probably better) .NET hosting companies out there. Screen Shots: a custom app I wrote 4 years ago, called ScRap.NET It's easy to use. just hit the "PrtScn" (print screen) button on your keyboard, then click-n-drag with your mouse. The gray box that draws on your screen is where the screen shot will come from. There are professional options for this, though - such as SnagIt. The reality of it is that it doesn't matter what you use to create the screen shots. The reason mine look like they do is because of my post authoring tool. Post Authoring Tool: Windows Live Writer An off-line blog editing tool (which I am using to write this post). It supports most of the blogs out there - the major ones anyways; and has lots of nice little features to make post writing simple. It has it's limitations (not all HTML is supported or easy to do) but if you can live with the formatting limitations, it produces some high quality work in an offline, "draft"-able manner. It supports screen shots, file attachments and some other cool stuff. Check out the plugins for it, to add a lot more fun stuff. Code Format Tool: Code Snippet Plugin All the beauty of my code examples come from this plugin. Just paste some text into the plugin and set your formatting options. And that's it... a pretty simple setup that does exactly what I want in my blog.
A coworker and I often have conversations about Unit Testing vs. Test Driven Development. Generally speaking, we agree - there are some semantic or mechanical differences in what we're saying, but nothing major and we usually work that out through the conversations, defining what we are saying. Recently he asked if I ever allow myself to write any code without unit tests, or write code before unit testing it. My initial answer was no, not surprisingly. However, after discussing the question and it's implications further, he brought up a good point and a scenario where I highly encourage writing code without tests: Prototyping (or Spiking, in Agile terms). I've posted in the past about how I believe that Prototyping A Process is important in software development, so I won't completely re-hash that. Although, the language that I use to describe prototyping may be evolving, the core concepts and process are still in place (the spiking concept is the same as what I called Prototyping). Here's what ExtremeProgramming.com has to say about Spiking: "Create spike solutions to figure out answers to tough technical or design problems. A spike solution is a very simple program to explore potential solutions. Build a system which only addresses the problem under examination and ignore all other concerns. Most spikes are not good enough to keep, so expect to throw it away. The goal is reducing the risk of a technical problem or increase the reliability of a user story's estimate. When a technical difficulty threatens to hold up the system's development put a pair of developers on the problem for a week or two and reduce the potential risk. " This may seem counter to the creed of writing unit tests first and even counter to the creed of not coding for the future. There is a key element in this description, which I believe is not emphasized nearly enough. The code in your spike IS throw-away code. DO NOT copy and paste even one line of code from the spike into the production code. "Copy and paste is a design error." - David Parnas When you understand the process, technology or whatever it is that you are learning, well enough, you must step back from that solution and back into your actual project. Then, you continue the test-first process of Test Driven Development - you write your tests for the area that you are covering and then you write the implementation code using the spike as a read-only reference. So, yes - there is a time and place for writing code without any unit tests; production code is never that place, though.
A lot of people ask these questions when they first start unit testing - How many unit tests is too many?
- Do I need to cover every property, every individual method, ever object, every ???
The goal of unit testing is to provide 100% test coverage. The reality of unit testing is that you want 95% or more, test coverage. There are occasions when unit testing that one last line of code is horrendously repetitious or you miss something or accidentally couple something too tightly. But wait... there's more... and those seem like lousy excuses that lead to allowing bad design in your code. Ultra-Fine Granularity is Horrible If you are writing your unit tests after you write your production code, or if you are writing your unit tests first but are simply going through the mechanical process switch and it doesn't really matter if you write your tests first or not, then the answer is horrible. You'll end up unit testing way more than you need to. For example, I wrote a login screen last year. This login screen has three fields and two buttons on it: Username, Password, a drop list of locations assigned to the username, a Login button and a Cancel button. How many unit tests do you think should be written for this? ... I wrote 27 unit tests to cover every possible edge case in the presenter that controlled this view. What a giant horrible mess - changing anything in that login screen was almost as bad as not having it unit tested at all (well ok... nothing is that bad) I ended up unit testing setting an individual property, and then checking to make sure that property was stored correctly. I unit tested individual method calls with only the username set, or only the password set, or only the location set, or only whatever combination of those set. I unit tested loading the list of locations for the username, and ensuring that the location selected is valid for the user. I unit tested what would happen is an invalid location was selected or a null location was selected... every possible edge case was unit tested and it drove bad design into the application because no one wanted to go through the pain of having to change all of those unit tests at that level of granularity. Step Up To The API Just unit testing your code is a great way to ensure that you are writing way more unit tests than you need. Chances are, the code you are writing is not very cohesive and you will end up unit testing the read and write of individual properties rather than just unit testing the business value (process) that actually reads / writes the individual properties. That is to say, your unit tests should be written at one or two steps above ultra-fine granularity. Don't test the individual properties, test that API that you want to call, that has business value. So, how do you account for 100% code coverage if you are not unit testing the properties and all of the edge cases? Never write code that you don't need, right now. If you are writing a unit test and the test or the implementation needs a property, then you create that property for that unit test at that time. This does not mean that you write a bunch of get / set property unit tests, just so you can unit test the properties. This means that you specify the business value API in your unit test, and by virtue of having business value, you will likely have various properties associated with the classes in that API. The same is true for edge cases - if the business value of the unit test does not handle the edge cases, then there are no edge cases. Only when you have business value specifying an edge case, do you need to write a unit test for the edge case and possibly modify code to handle the edge case. Ok, then what happens if your code changes and you don't call that property in the original unit test, anymore? Never leave dead code in your system. Ever. Period. End of discussion. If you change your unit tests because the design of the object(s) change, and you are no longer using a property - delete the property! If you delete it and you find that you can't compile the code any longer because other parts of the system need that property, then you need to evaluate whether or not that property is really providing value to those other places vs. changing those other places to match the new design. Test First vs. Test After A big part of figuring out how many unit tests you need is understanding the functionality of the system. You should be writing a unit test for every functional point of the code, achieving 100% code coverage. The problem with the original question of how many unit tests to write, though, is that there is a hidden assumption in that question: "I wrote my code, now how many tests do I need, to cover it correctly?" This question is an underlying problem in Unit Testing and simple Test First development. If you are just unit testing your existing code or only going through the mechanical process switch of writing a unit test first, but not really using the test to drive your design, then you are likely not going to see some of the major benefits of Test Driven DESIGN / Development: not writing code you don't need, and creating the API that you want to call instead of the API coming together haphazardly as a bi-product of writing code first. When you take the step up to unit testing the API, it becomes more apparent that you really want to specify the API before you write it. If you specify the API before you write it, then you are one step closer to true Test Driven Development. Don't expect the test to design your code for you. Use the test to flesh out your design before you write any code. Test Driven DESIGN / Development Would you rather: Write 50+ lines of code into your model, then write a unit test that shows an ugly API causing you to go back to the code and re-write it in the hopes that it will produce a better API, most likely repeating this process once or twice until you get frustrated with changing your code because it takes so long or Write 5 lines of unit test code, specifying the API that you want, realizing that it's not going to work and changing 2 or lines of that test, going through this cycle 5 or 6 times until you have the API that you really do want to call; then implementing the API in the 50+ lines of code and being done with it I'll take #2. I don't like rewriting large chunks of code. Rewriting 2 or 3 lines of code is easy - I'll do that any minute of any day. Chances are, if you are willing to write the correct number of unit tests by specifying the higher level API in your unit tests, you will gravitate toward designing your API in your unit tests. TDD Misconception: TDD is NOT a design tool. It is not "the answer". Is will not design your application for you. It will not solve your problems for you. If you don't know how to design software, then you need to get some training on design patterns, loose coupling through single responsibility and separation of concerns, and various other core foundations of good Object Oriented Development. In reality, Test Driven Development is just an easier way of saying this: "Design your API in the context of a unit test, so that you have your API implementation covered by unit tests before you even write the implementation." Conclusions: In the end, we can answer the original questions from this post by re-stating Test Driven Development as a software development guideline: "Design via code, unit testing 100% as you go."
A coworker and I ran into a problem yesterday - we were trying to re-use an assembly from a WinForms app, in a WebService and ran into this code: Directory.SetCurrentDirectory(Application.StartupPath);
The problem with this, is that it uses the Application static class, which is part of System.Windows.Forms - and we're in a web service now, not a WinForms app. So, after some headache and thought, we tried to use this:
Assembly.GetExecutingAssembly().Location
That doesn't work well, either, because in the web, it gives you the ShadowCopy location of the assembly, not the original location of the assemblies. A little more thought, and a few hours later, we finally came up with this:
private static string GetBinFolder() { AppDomain appDomain = AppDomain.CurrentDomain; string binFolder; if (appDomain.RelativeSearchPath != null && appDomain.RelativeSearchPath != string.Empty ) binFolder = Path.Combine(appDomain.BaseDirectory, appDomain.RelativeSearchPath); else binFolder = appDomain.BaseDirectory; return binFolder; }
The AppDomain.BaseDirectory will give you the root folder that the application is being run from - no matter what type of app you are in; windows or web. This is perfect for Windows because it alone gives us the folder that the code is running from and lets us find the assembly we need. The RelativeSearchPath is important for the web - it gives us the "bin" folder where our assemblies live. So a simple check to see if there is a relative search path (it returns null in a standard WinForms app) and combine the two if there are, otherwise just get the base directory, and we now have our folder that the assemblies are located in, so we can call:
Directory.SetCurrentDirectory(GetBinFolder());
...
Of course, this problem could have been avoided if there was proper Inversion of Control in the code... don't have time to introduce it right now, but at least we removed some code duplication by creating a single GetBinFolder() method.
As a kid, I was never part of the boy-scouts or anything; but my family and I went camping a lot, and I went camping with my youth group on several occasions. I remember hearing my parents and the various youth leaders talking about we should always leave the camp site cleaner than we found it. I always thought this was annoying - why should I clean up someone else's mess? If I clean up my own mess, isn't that good enough? Yesterday, while helping a coworker fix some bugs in an application that I wrote around a year ago, I was suggesting ways to improve various parts of the code; move this property to a parameter of that method, make this method private and only call it from here in the owning class, and items as simple as making an if-then statement easier to read. After a few of these suggestions, he asked me if I always clean up the code that I'm working with, even if the bug is not directly related to the code that we are cleaning up. My answer was emphatically, "yes". If I'm reading code, trying to find a bug and I'm having a hard time understand what's going on with the code, then it becomes much more difficult to find the actual bug. Even if this code does not end up being part of the bug I was looking for, by cleaning up the code I am making it more likely that I will be able to understand what this code is doing the next time I have to look at it. Here's my basic perspective that drives all of this: if you have a hard time reading the code and seeing what it is doing, chances are, you or someone you know will have to debug that code at some point. I don't want to debug hard to read code - that's annoying, at best. I want to debug code that is easy to read and easy to understand. And I certainly don't want to make any of my coworkers debug hard to read code. I try not to torture coworkers like that. So, if my motivation is to not debug hard to read code, then doesn't it make sense that I would want to clean up that ugly code? It makes sense to me... What does this really come down to, then? Two things: - Leave the code cleaner than when you arrived, by
- Micro-refactoring - make that one line of code easier to read
(By the way - there's no such thing as "micro-refactoring". Refactoring, by definition, is exactly what I described above. Stop trying to change the architecture and learn to change that one line of ugly code. By doing this, you'll find that the architecture does change, because you clean up more than you realize and change become natural.)
Have you ever: - had a problem that you were having a hard time solving?
- been in need of a design idea for a particular situation, and you don't know where to start?
- solved a problem that was nagging you for a while?
- come up with a good design for a common situation?
- written some code that you wanted to keep around, to remind yourself how you did something?
- wanted to find some code examples on how to do something with a specific technology?
- wanted to know how to do something for a specific project?
- wanted to share your knowledge on how to use a specific technology a specific way?
- wanted to learn how to use a specific feature of a project?
- wanted the world to know your opinion of a piece of software or technology, be it good or bad?
If you can answer "yes" to any one of these questions - you should be blogging. If you can answer "yes" to more than one of these questions and you are not blogging, then shame on you! Start blogging today! Don't think your opinion matters, or that you have anything worth saying? Stop fooling yourself. If you write code, you have opinions and preferences. If you have opinions and preferences, they are worth sharing. It's not possible to write code without opinions. Software development is not a mechanical process like building a house or a car - you can't sick a robot on a keyboard and write a functional piece of software. The worst case scenario: If you post code examples on issues that you have solved, you will have a history of code you have written and issues you have solved. You'll be able to go back to this history and re-use existing knowledge, rather than having to think through the problem again. The best case scenario: If you post your code examples, your thoughts on software development, your opinions and preferences; chances are that someone else in this wide world of ours has the same opinion or has had the same issues and will find the information you provide useful. Why should you blog? Because you're a person with ideas worth listening to. ... Get started now. Register a domain name and buy some web hosting so you can have a blog that is accessible to the world. I use dasBlog and WebHost4Life. There are thousands of options out there - find the one that works for you.
|